read spark data frame #1

elenacuoco · 2016-09-28T15:47:30Z

Why don't use the dataframe way to read data in your example? Have you tried with these lines?

conf = SparkConf()
conf.set("spark.executor.memory", "4G")
conf.set("spark.driver.memory", "2G")
conf.set("spark.executor.cores", "7")
conf.set("spark.python.worker.memory", "4G")
conf.set("spark.driver.maxResultSize", "0")
conf.set("spark.sql.crossJoin.enabled", "true")
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
conf.set("spark.default.parallelism", "4")
conf.set("spark.sql.crossJoin.enabled", "true")
spark = SparkSession \
    .builder.config(conf=conf) \
    .appName("test-spark").getOrCreate()
df = spark.read.csv("../input/train_numeric.csv", header="true",inferSchema="true",mode="DROPMALFORMED")

The text was updated successfully, but these errors were encountered:

JoeriHermans · 2016-09-28T16:42:19Z

This works as well, but the DataBricks CSV package will allow you to indicate null values. For example, in the dataset they are denoted by -999. But anyway, you are right, you can do it like this. :)

Add temporary logging to help debugging

JoeriHermans closed this as completed Sep 28, 2016

JoeriHermans pushed a commit that referenced this issue Sep 10, 2017

Merge pull request #1 from mcamilleri/patch-1

8f68dee

Add temporary logging to help debugging

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read spark data frame #1

read spark data frame #1

elenacuoco commented Sep 28, 2016

JoeriHermans commented Sep 28, 2016

read spark data frame #1

read spark data frame #1

Comments

elenacuoco commented Sep 28, 2016

JoeriHermans commented Sep 28, 2016