# Spark SQL Getting Started - Practice Notebook

This notebook covers the fundamentals of Spark SQL based on the [official Spark SQL Getting Started Guide](https://spark.apache.org/docs/latest/sql-getting-started.html).

## Learning Objectives
- Understand SparkSession as the entry point to Spark functionality
- Create DataFrames from various sources (lists, files)
- Perform basic DataFrame operations
- Understand the difference between transformations and actions

## Sections
1. **SparkSession Initialization**
2. **Creating DataFrames from Python Data**
3. **Creating DataFrames from Files**
4. **Basic DataFrame Operations**
5. **Practice Exercises**

---


## 1. SparkSession Initialization

The **SparkSession** is the entry point to all Spark functionality. It provides a unified interface for working with Spark SQL, DataFrames, and Datasets.

### Key Points:
- SparkSession replaces the older SparkContext + SQLContext pattern
- Use `SparkSession.builder` to create a session
- Configure application name and options during creation
- Built-in support for Hive features (HiveQL, UDFs, Hive tables)


## 2. Creating DataFrames from Python Data

DataFrames can be created from various Python data structures like lists, tuples, and dictionaries.


## 3. Creating DataFrames from Files

Spark can read data from various file formats including JSON, CSV, and Parquet. Let's create sample files and read them.


## 4. Basic DataFrame Operations

Now let's explore fundamental DataFrame operations including selections, filtering, and transformations.


## 5. Practice Exercises

Now it's your turn! Complete these exercises to practice what you've learned.

### Exercise 1: Create Your Own DataFrame
Create a DataFrame with information about your favorite books including: title, author, year_published, and rating.


### Exercise 2: DataFrame Operations
Using the books DataFrame you created, perform the following operations:


### Exercise 3: File Operations
Create a CSV file with employee data and read it back into a DataFrame.


## Summary

In this notebook, you learned:

1. **SparkSession** - The entry point to Spark functionality
2. **Creating DataFrames** - From Python data structures and files
3. **Basic Operations** - Select, filter, group, sort, and transform data
4. **Schema Inspection** - Understanding DataFrame structure

## Next Steps

Continue to the next notebook: `02_dataframe_operations.ipynb` to dive deeper into DataFrame transformations and operations.

## References

- [Spark SQL Getting Started Guide](https://spark.apache.org/docs/latest/sql-getting-started.html)
- [PySpark SQL Module Documentation](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql.html)
