Creating multiple SparkSessions and SparkContexts can cause issues, so it's best practice to use the ``SparkSession.builder.getOrCreate()`` method. This returns an existing SparkSession if there's already one in the environment, or creates a new one if necessary!

In [None]:
# Import SparkSession from pyspark.sql
from pyspark.sql import SparkSession

# Create my_spark
my_spark = SparkSession.builder.getOrCreate()

# Print my_spark
print(my_spark)

``SparkSession`` has an attribute called catalog which lists all the data inside the cluster. This attribute has a few methods for extracting different pieces of information.

Eg: ``.listTables()`` method returns the names of all the tables in your cluster as a list.

In [None]:
# Print the tables in the catalog
print(spark.catalog.listTables())

One of the advantages of the DataFrame interface is that you can run SQL queries on the tables in your Spark cluster. If you don't have any experience with SQL, 

In [None]:
# Don't change this query
query = "FROM flights SELECT * LIMIT 10"

# Get the first 10 rows of flights
flights10 = spark.sql(query)

# Show the results
flights10.show()

#conver the spark dataframe into pandas
flights10.toPandas()

The ``.createDataFrame()`` method takes a pandas DataFrame and returns a Spark DataFrame. The output of this method is stored locally, not in the ``SparkSession`` catalog. This means that you can use all the Spark DataFrame methods on it, but you can't access the data in other contexts.

For example, a SQL query (using the ``.sql()`` method) that references your DataFrame will throw an error. To access the data in this way, you have to save it as a temporary table.

You can do this using the ``.createTempView()`` Spark DataFrame method, which takes as its only argument the name of the temporary table you'd like to register. This method registers the DataFrame as a table in the catalog, but as this table is temporary, it can only be accessed from the specific ``SparkSession`` used to create the Spark DataFrame.

There is also the method .``createOrReplaceTempView()``. This safely creates a new temporary table if nothing was there before, or updates an existing table if one was already defined. You'll use this method to avoid running into problems with duplicate tables.

``CreateOrReplaceTempView`` will create a temporary view of the table on memory it is not persistent at this moment but you can run SQL query on top of that. if you want to save it you can either persist or use saveAsTable to save.

In [None]:
# Create pd_temp
pd_temp = pd.DataFrame(np.random.random(10))

# Create spark_temp from pd_temp
spark_temp = spark.createDataFrame(pd_temp)

# Examine the tables in the catalog
print(spark.catalog.listTables())

# Add spark_temp to the catalog and name it as temp
spark_temp.createOrReplaceTempView("temp")

# Examine the tables in the catalog again
print(spark.catalog.listTables())

In [None]:
# Read the data directly from a file into spark
# without pandas

# Don't change this file path
file_path = "path_to_csv_file"

# Read in the airports data
airports = spark.read.csv(file_path, header=True)

# Show the data
airports.show()
