# Day 11 - Managing Metadata: Catalog, Tables and Views

##  Class: pyspark.sql.session.<a href="https://spark.apache.org/docs/2.4.5/api/python/pyspark.sql.html#pyspark.sql.SparkSession">SparkSession</a>
Usually the SparkSession object is usually assigned to the variable named *spark*. 

### Object Properties:
* **catalog** - to access the the `Catalog`interface for maintaining metadata regarding databases, tables, functions, etc.

### Object Methods:
* **table(** *name* **)** - Returns the specified table as a DataFrame

## Class: pyspark.sql.dataframe.<a href="https://spark.apache.org/docs/2.4.5/api/python/pyspark.sql.html#pyspark.sql.DataFrame">DataFrame</a>

### Object Properties:
* **write** - to access the `DataFrameWriter` interface for writing from a DataFrame to a data sink
* **writeStream** - to access the `DataStreamWriter` object for writing Stream data to external storage.

### Object Methods:
* **createGlobalTempView(** *name* **)** - Creates a global temporary view with this DataFrame. The lifetime of this temporary view is tied to this Spark application. throws TempTableAlreadyExistsException, if the view name already exists in the catalog.
* **createOrReplaceGlobalTempView(** *name* **)** - Creates or replaces a global temporary view using the given name. The lifetime of this temporary view is tied to this Spark application.
* **createOrReplaceTempView(** *name* **)** - Creates or replaces a local temporary view with this DataFrame. The lifetime of this temporary table is tied to the SparkSession that was used to create this DataFrame.
* **createTempView(** *name* **)** - Creates a local temporary view with this DataFrame. The lifetime of this temporary table is tied to the SparkSession that was used to create this DataFrame. throws TempTableAlreadyExistsException, if the view name already exists in the catalog.
* **registerTempTable(** *name* **)** - Registers this DataFrame as a temporary table using the given name. The lifetime of this temporary table is tied to the SparkSession that was used to create this DataFrame.

## Class: pyspark.sql.readwriter.<a href="https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter">DataFrameWriter</a>

Interface used to write a `DataFrame` to external storage systems (e.g. file systems, key-value stores, etc). Accessed through the `DataFrame.write` property
### Object Methods:
* **bucketBy(** *numBuckets, cols* **)** - Buckets the output by the given columns.If specified, the output is laid out on the file system similar to Hive’s bucketing scheme
* **insertInto(** *tableName, overwrite=False* **)** - Inserts the content of the DataFrame to the specified table.It requires that the schema of the class:DataFrame is the same as the schema of the table.
* **saveAsTable(** *name, format=None, mode=None, partitionBy=None, options* **)** - Saves the content of the DataFrame as the specified table. In the case the table already exists, behavior of this function depends on the save mode, specified by the mode function (default to throwing an exception). When mode is Overwrite, the schema of the DataFrame does not need to be the same as that of the existing table.

## Class: pyspark.sql.catalog.<a href="https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Catalog">Catalog</a> [abstract]
Wrapper around its Scala implementation org.apache.spark.sql.catalog.Catalog, which is an abstract class. Accessed through the `SparkSession.calatog`property.
### Class Functions:
* **createExternalTable(** *tableName, path=None, source=None, schema=None, options* **)** - Creates a table based on the dataset in a data source. It returns the DataFrame associated with the external table. The data source is specified by the source and a set of options. If source is not specified, the default data source configured by spark.sql.sources.default will be used. Optionally, a schema can be provided as the schema of the returned DataFrame and created external table.
* **createTable(** *tableName, path=None, source=None, schema=None, options* **)** - Creates a table based on the dataset in a data source. It returns the DataFrame associated with the table. The data source is specified by the source and a set of options. If source is not specified, the default data source configured by spark.sql.sources.default will be used. When path is specified, an external table is created from the data at the given path. Otherwise a managed table is created.Optionally, a schema can be provided as the schema of the returned DataFrame and created table.
* **dropGlobalTempView(** *viewName* **)** - Drops the global temporary view with the given view name in the catalog. If the view has been cached before, then it will also be uncached. Returns true if this view is dropped successfully, false otherwise.
* **dropTempView(** *viewName* **)** - Drops the local temporary view with the given view name in the catalog. If the view has been cached before, then it will also be uncached. Returns true if this view is dropped successfully, false otherwise.