# FRE 521D: Data Analytics in Climate, Food and Environment
## Lab 1: Loading Data into MySQL

**Program:** UBC Master of Food and Resource Economics  
**Instructor:** Asif Ahmed Neloy

---

<div style="background-color: #FFF3CD; border-left: 4px solid #E6A23C; padding: 15px; margin: 15px 0;">
    <h3 style="margin-top: 0; color: #856404;">Submission Deadline</h3>
    <p style="margin-bottom: 0; font-size: 1.2em;"><strong>End of Day: Sunday, January 12, 2026</strong></p>
</div>

---

## Lab Objectives

In this lab, you will:

1. Load a CSV file containing food nutrition data
2. Clean and prepare the data for database storage
3. Create a MySQL table with the correct schema
4. Insert the data into your database
5. Run basic SQL queries to verify your work

---

## Before You Start

Make sure you have:

1. Docker Desktop running
2. MySQL container started (`docker compose up -d` in your course folder)
3. The `FOOD-DATA-GROUP1.csv` file downloaded to your working directory
4. Your conda environment activated

---

## Part 1: Setup and Connect to Database

### Step 1.1: Import Required Libraries

In [None]:
!pip install mysql-connector-python --quiet


In [None]:
# Import libraries
import pandas as pd
import mysql.connector
from mysql.connector import Error

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

print("Libraries imported successfully!")

### Step 1.2: Connect to MySQL

Use the connection details from your `docker-compose.yml` file.

In [None]:
# Database connection settings
# These match your docker-compose.yml configuration

DB_CONFIG = {
    'host': '127.0.0.1',
    'port': 3306,
    'database': 'mfre521d', #change if you have a different database name
    'user': 'mfre521d_user', #chnage if you have a different user
    'password': 'mfre521d_user_pw' #change if you have a different password
}

# Test the connection
try:
    connection = mysql.connector.connect(**DB_CONFIG)
    if connection.is_connected():
        print("Successfully connected to MySQL!")
        print(f"Database: {DB_CONFIG['database']}")
        connection.close()
except Error as e:
    print(f"Error connecting to MySQL: {e}")
    print("\nTroubleshooting:")
    print("1. Is Docker Desktop running?")
    print("2. Run 'docker ps' to check if the container is running")
    print("3. Run 'docker compose up -d' in your course folder")

---
## Part 2: Load and Explore the CSV File

### Step 2.1: Load the CSV File

The file `FOOD-DATA-GROUP1.csv` contains nutrition information for various food items.

In [None]:
# Load the CSV file
# Update the path if your file is in a different location



### Step 2.2: Explore the Data

Let's look at what we have.

In [None]:
# View the first few rows
df.head()

In [None]:
# View column names
print("Column names:")
for i, col in enumerate(df.columns):
    print(f"  {i}: {col}")

In [None]:
# Check data types
print("Data types:")
print(df.dtypes)

### Step 2.3: Understanding the Dataset

This dataset contains:
- **food**: Name of the food item
- **Caloric Value**: Calories per serving
- **Macronutrients**: Fat, Carbohydrates, Protein, etc.
- **Vitamins**: A, B1, B2, B3, B5, B6, B11, B12, C, D, E, K
- **Minerals**: Calcium, Iron, Magnesium, Zinc, etc.
- **Nutrition Density**: Overall nutritional score

Notice there are some unnecessary columns at the beginning that we need to remove.

---
## Part 3: Clean the Data

### Step 3.1: Remove Unnecessary Columns

The first two columns are just index columns that we don't need.

In [None]:
# Look at the first two columns


In [None]:
# Remove the first two columns (they are just row indices)
# We keep everything from 'food' onwards



### Step 3.2: Clean Column Names for MySQL

MySQL column names work best when they:
- Are lowercase
- Use underscores instead of spaces
- Don't have special characters

In [None]:
# Clean column names


### Step 3.3: Add a Primary Key Column

Every table should have a primary key. We'll add a `food_id` column.

In [None]:
# Add food_id as the first column (starting from 1)


### Step 3.4: Check for Missing Values

In [None]:
# Check for missing values


### Step 3.5: View the Cleaned Data

In [None]:
# Final check of the cleaned data


In [None]:
# Check data types


---
## Part 4: Create MySQL Table

### Step 4.1: Generate CREATE TABLE Statement

We need to create a table with the correct column types.

In [None]:
# Generate CREATE TABLE statement

def get_mysql_type(pandas_dtype, column_name):
    """
    Convert pandas dtype to MySQL type.
    """
    if column_name == 'food_id':
        return 'INT PRIMARY KEY'
    elif column_name == 'food':
        return 'VARCHAR(255)'
    elif 'int' in str(pandas_dtype):
        return 'INT'
    elif 'float' in str(pandas_dtype):
        return 'DECIMAL(10, 3)'
    else:
        return 'VARCHAR(255)'

# Build the CREATE TABLE statement
table_name = 'food_nutrition'
columns_sql = []

for col in df_clean.columns:
    mysql_type = get_mysql_type(df_clean[col].dtype, col)
    columns_sql.append(f"    {col} {mysql_type}")

create_table_sql = f"""CREATE TABLE IF NOT EXISTS {table_name} (
{',\n'.join(columns_sql)}
);"""

print("CREATE TABLE Statement:")
print("=" * 60)
print(create_table_sql)

### Step 4.2: Create the Table in MySQL

In [None]:
# Connect and create the table

try:
    connection = mysql.connector.connect(**DB_CONFIG)
    cursor = connection.cursor()
    
    # Drop table if it exists (for clean re-runs)
    cursor.execute(f"DROP TABLE IF EXISTS {table_name}")
    print(f"Dropped existing table '{table_name}' if it existed.")
    
    # Create the table
    cursor.execute(create_table_sql)
    print(f"Table '{table_name}' created successfully!")
    
    connection.commit()
    cursor.close()
    connection.close()
    
except Error as e:
    print(f"Error: {e}")

---
## Part 5: Insert Data into MySQL

### Step 5.1: Prepare the INSERT Statement

In [None]:
# Create INSERT statement template
columns_list = ', '.join(df_clean.columns)
placeholders = ', '.join(['%s'] * len(df_clean.columns))

insert_sql = f"INSERT INTO {table_name} ({columns_list}) VALUES ({placeholders})"

print("INSERT statement template:")
print(insert_sql[:100] + "...")

### Step 5.2: Insert All Rows

In [None]:
# Insert data into MySQL



---
## Part 6: Verify Your Data with SQL Queries

### Step 6.1: Create a Helper Function for Queries

In [None]:
def run_query(sql, fetch=True):
    """
    Run a SQL query and return results as a DataFrame.
    
    Parameters:
    -----------
    sql : str
        SQL query to execute
    fetch : bool
        Whether to fetch results (False for INSERT/UPDATE/DELETE)
    
    Returns:
    --------
    pd.DataFrame or None
    """
    try:
        connection = mysql.connector.connect(**DB_CONFIG)
        
        if fetch:
            df = pd.read_sql(sql, connection)
            connection.close()
            return df
        else:
            cursor = connection.cursor()
            cursor.execute(sql)
            connection.commit()
            cursor.close()
            connection.close()
            return None
            
    except Error as e:
        print(f"Error: {e}")
        return None

print("Query helper function created!")

### Step 6.2: Count Total Rows

In [None]:
# Query 1: Count total rows
query = "SELECT COUNT(*) AS total_foods FROM food_nutrition"

result = run_query(query)
print("Total number of food items:")
print(result)

### Step 6.3: View Sample Data

In [None]:
# Query 2: View first 10 rows


---
## Submission Checklist

Before submitting, make sure you have:

- [ ] Successfully connected to MySQL
- [ ] Loaded and cleaned the CSV data
- [ ] Created the `food_nutrition` table
- [ ] Inserted all 551 rows
- [ ] Verified data with the provided queries

### How to Submit

1. Save this notebook
2. Export as PDF or HTML
3. Submit via Canvas by end of day January 12, 2026

---
