# Reading and Writing Data with Spark

When running this notebook in Databricks, you do not need to create your own spark sessions as this is handled by the cluster.

First, let's see what the sparkContext from our cluster looks like and how spark is configured for us:

In [None]:
spark.sparkContext.getConf().getAll()

In [None]:
spark

Now, let's create our first dataframe from a fairly small sample data set. We have been working with a log file data set that describes user interactions with a music streaming service. The records describe events such as logging in to the site, visiting a page, listening to the next song, seeing an ad. 

Here, we will read in just one of the json files:

In [None]:
path = "/FileStore/tables/sparkify_log_small.json"
user_log = spark.read.json(path)

In [None]:
user_log.printSchema()

In [None]:
user_log.describe()

In [None]:
user_log.show(n=1)

In [None]:
user_log.take(5)

In [None]:
out_path = "/delta/sparkify_log_small.csv"

In [None]:
user_log.write.format("delta").mode("overwrite").save(out_path)

In [None]:
user_log_2 = spark.read.format("delta").load(out_path)

In [None]:
user_log_2.printSchema()

In [None]:
user_log_2.take(2)

In [None]:
user_log_2.select("userID").show()

In [None]:
user_log_2.take(1)

In [None]:
display(user_log_2)

artist,auth,firstName,gender,itemInSession,lastName,length,level,location,method,page,registration,sessionId,song,status,ts,userAgent,userId


Let's save our data as file in DBFS

In [None]:
user_log.write.format("delta").mode("overwrite").save("/delta/data")

Let's save our data as a table in the default database

In [None]:
user_log.write.format("delta").mode("overwrite").saveAsTable("dataTable")