# ELT (Extract, Load, Transform) - Practical Exercises
In this notebook, we will explore the ELT process using Python and some common data manipulation libraries such as pandas and SQLAlchemy.

### Objectives
- Understand the differences between ETL and ELT.
- Perform basic data extraction, loading, and transformation using Python.
- Work with a sample dataset to gain hands-on experience with ELT operations.

## 1. Extract
The extraction step involves fetching data from a source, which can be an API, a database, or a file. Here, we'll work with a CSV file as the data source.

**Exercise**: Load a sample CSV file into a pandas DataFrame.

In [None]:
# Import necessary libraries
import pandas as pd

# Load a sample CSV file (replace with a file path or URL if necessary)
url = 'https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv'
df = pd.read_csv(url)
df.head()

## 2. Load
Once the data is extracted, it needs to be loaded into a destination, such as a database or a data warehouse. For this exercise, we'll simulate this by loading the DataFrame into an in-memory SQLite database using SQLAlchemy.

**Exercise**: Load the DataFrame into an SQLite database.

In [None]:
# Import necessary libraries
from sqlalchemy import create_engine

# Create an in-memory SQLite database and load the DataFrame into a table
engine = create_engine('sqlite:///:memory:')
df.to_sql('airtravel', con=engine, index=False, if_exists='replace')

# Display the tables in the database to confirm the load operation
engine.table_names()

## 3. Transform
The transform step involves cleaning, aggregating, or otherwise modifying the data to fit the desired format. This step can include operations such as filtering, grouping, or joining multiple datasets.

**Exercise**: Perform a transformation on the data by filtering records where air travel in '1958' was greater than 300 and calculate the average number of passengers for these records.

In [None]:
# Transform the data
transformed_df = df[df['1958'] > 300]
average_passengers = transformed_df['1958'].mean()

# Display the transformed data and the average
transformed_df, average_passengers

## 4. Putting it All Together
Now that you've seen each of the individual steps, try putting them together in a single function.

**Exercise**: Create a function `elt_pipeline` that:
1. Extracts data from a CSV file.
2. Loads it into an SQLite database.
3. Transforms the data by filtering for a specified year and calculating the average number of passengers.

In [None]:
def elt_pipeline(csv_url, year_column, passenger_threshold):
    import pandas as pd
    from sqlalchemy import create_engine

    # Step 1: Extract
    df = pd.read_csv(csv_url)

    # Step 2: Load
    engine = create_engine('sqlite:///:memory:')
    df.to_sql('airtravel', con=engine, index=False, if_exists='replace')

    # Step 3: Transform
    transformed_df = df[df[year_column] > passenger_threshold]
    average_passengers = transformed_df[year_column].mean()

    return transformed_df, average_passengers

# Run the ELT pipeline with the sample CSV
elt_pipeline(url, '1958', 300)

## 5. Conclusion
You've now gone through a basic ELT pipeline using Python. This notebook demonstrated how to extract data from a source, load it into a database, and perform transformations. You can expand upon this by connecting to external databases, adding more complex transformations, and integrating with BI tools.