# **Learning Spark Chapter 5, Loading and Saving your Data, Examples in Python**

[![Learning Spark](http://akamaicovers.oreilly.com/images/0636920028512/cat.gif)](http://www.jdoqocy.com/click-7645222-11260198?url=http%3A%2F%2Fshop.oreilly.com%2Fproduct%2F0636920028512.do%3Fcmp%3Daf-strata-books-videos-product_cj_9781449358600_%2525zp&cjsku=0636920028512)

Many of the examples in Chapter 5 require access to certain storage systems, these examples have been left out. For Databricks Cloud you should consider using the table creation mechanism.

## S3 setup

Note: If you have '/' characters in your secret access key, they must be escaped with '%2F'

In [3]:
ACCESS_KEY = "YOUR ACCESS KEY GOES HERE"
SECRET_KEY = "YOUR SECRET KEY GOES HERE"

## Example 5-1. Python load text file example

Read all text files from the directory and do a word count on them. With the (k,v) pair RDD, make Rows of RDDs and create a dataframe. 
Save it as a temp table and issue some example queries

In [6]:
from pyspark import Row
from pyspark.sql import *

input = sc.textFile("file:///dbfs/learning-spark-master/files/*.txt")
inputWords = input.flatMap(lambda l: l.split()).map(lambda w: w.lower())
pairs = inputWords.map(lambda w: (w, 1))
wordCount = pairs.reduceByKey(lambda x, y: x + y)
wordRowRDD = wordCount.map(lambda p: Row(word= p[0], value=p[1]))
wordDF = sqlContext.createDataFrame(wordRowRDD)
display(wordDF)

Register dataframe as a temporary table

In [8]:
wordDF.registerTempTable("wordcount")

Issue some SQL queries

In [10]:
%sql select word, value from wordcount where value >= 5

In [11]:
display (wordDF)

## Example 5-6. Python load unstructured JSON example

In [13]:
# Import SparkFiles
from pyspark import SparkFiles
import json
# Fetch the remote example since it isn't on local FS or S3
# Load the file into an RDD
jsonInput = sc.textFile("file:///dbfs/learning-spark-master/files/testweet.json")
jsonInput.collect()
#Parse it
data = jsonInput.map(lambda x: json.loads(x))
# Collect the parsed results back to the driver
data.collect()