# **Save a Join as a Table in Python**
This notebook shows how to register a join statement of two tables as a new Spark SQL table. It's an illusrative example of how dataframes provide an interface to SQL, and how one can create SQL tables from Dataframe and issue queries against them. Powerful stuff!

### Registering a join as its own table can make your SQL statements easier to read.
Or perhaps you want to query the joined table many times and want to save it as an independent table.

### **Setup:** Create two test temporary tables which we can join.

In [4]:
from pyspark.sql import Row
#create a dataframe with these rows
array = [Row(key="a", group="vowels", value=1),
         Row(key="b", group="consonants", value=2),
         Row(key="c", group="consonants", value=3),
         Row(key="d", group="consonants", value=4),
         Row(key="e", group="vowels", value=5)]
dataFrame = sqlContext.createDataFrame(sc.parallelize(array))
#create the first table
dataFrame.registerTempTable("table1")

In [5]:
from pyspark.sql import Row
#create the dataframe with these rows
array = [Row(key="a", word="apple"),
         Row(key="a", word="arrow"),
         Row(key="b", word="bat"),
         Row(key="b", word="barn"),
         Row(key="c", word="cat"),
         Row(key="d", word="dog"),
         Row(key="e", word="elephant")]
dataFrame = sqlContext.createDataFrame(sc.parallelize(array))
#create the second table
dataFrame.registerTempTable("table2")

In [6]:
%sql select table1.*, table2.word from table1 join table2 on table1.key = table2.key

### **Step 1:** Output the results of the join as a schemaRDD.

In [8]:
## now we are creating an Dataframe schemaRDD with the sQL. Once we have an RDD we can then perform further transformation on it
## notice the results from the above SQL query and the results from display(joinedRDD)
joinedRDD = sqlContext.sql("select table1.*, table2.word from table1 join table2 on table1.key = table2.key")
display(joinedRDD)

### **Step 2:** The schemaRDD of the joined data can be registered as a Spark SQL table as any other schemaRDD.

In [10]:
joinedRDD.registerTempTable("pythonJoinTable")

### **Step 3:** Now the join is a new table that you can query like any other table.

In [12]:
%sql describe pythonJoinTable

In [13]:
%sql select * from pythonJoinTable where key = "a"