# Objectives
- Querying data files
- Writing to tables
- Performing advanced ETL operations
- Discover the potential of higher-order functions and user-defined functions (UDFs) in Spark SQL

# Querying Data Files
To initiate a file query, we use the SELECT * FROM syntax, followed by the file format and the path to the file. 
```sql
SELECT * FROM file_format.`/path/to/file`
```
The filepath is specified between **backticks**, to prevent potential syntax errors and ensure the correct interpretation of the path. 

A filepath in this context can refer to 
- A single file
- A wildcard character to simultaneously read multiple files; or
- An entire directory, assuming that all files within that directory adhere to the same format and schema

We can now demonstrate extracting data directly from files using a real-world dataset representing an online school environment. This dataset consists of three tables:
- Students
- Enrollments
- Courses

We begin by running a helper notebook, "School-Setup", which can be found within the `Include` subfolder. This helper notebook facilitates downloading of the dataset to the Databricks file system and prepares the working environment accordingly:

In [0]:
%run ./Includes/School-Setup

## Querying JSON Format
The student data in this dataset is formatted in JSON. The placeholder `dataset_school` referenced in the following query, is a variable defined within our "School-Setup" notebook. It points to the location where the dataset files are stored on the filesystem. 

In [0]:
%python
files = dbutils.fs.ls(f"{dataset_school}/students-json")
display(files)

The output above shows that there are 6 JSON files in the `students-json` folder.