# FRE 521D: Data Analytics in Climate, Food and Environment
## Lab 1: Loading Data into MySQL

**Program:** UBC Master of Food and Resource Economics  
**Instructor:** Asif Ahmed Neloy

---

<div style="background-color: #FFF3CD; border-left: 4px solid #E6A23C; padding: 15px; margin: 15px 0;">
    <h3 style="margin-top: 0; color: #856404;">Submission Deadline</h3>
    <p style="margin-bottom: 0; font-size: 1.2em;"><strong>End of Day: Sunday, January 12, 2026</strong></p>
</div>

---

## Lab Objectives

In this lab, you will:

1. Load a CSV file containing food nutrition data
2. Clean and prepare the data for database storage
3. Create a MySQL table with the correct schema
4. Insert the data into your database
5. Run basic SQL queries to verify your work

---

## Before You Start

Make sure you have:

1. Docker Desktop running
2. MySQL container started (`docker compose up -d` in your course folder)
3. The `FOOD-DATA-GROUP1.csv` file downloaded to your working directory
4. Your conda environment activated

---

## Part 1: Setup and Connect to Database

### Step 1.1: Import Required Libraries

In [5]:
!pip install mysql-connector-python --quiet



[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [1]:
# Import libraries
import pandas as pd
import mysql.connector
from mysql.connector import Error

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

print("Libraries imported successfully!")

Libraries imported successfully!


### Step 1.2: Connect to MySQL

Use the connection details from your `docker-compose.yml` file.

In [2]:
# Database connection settings
# These match your docker-compose.yml configuration

DB_CONFIG = {
    'host': '127.0.0.1',
    'port': 3306,
    'database': 'mfre521d',
    'user': 'mfre521d_user',
    'password': 'mfre521d_user_pw'
}

# Test the connection
try:
    connection = mysql.connector.connect(**DB_CONFIG)
    if connection.is_connected():
        print("Successfully connected to MySQL!")
        print(f"Database: {DB_CONFIG['database']}")
        connection.close()
except Error as e:
    print(f"Error connecting to MySQL: {e}")
    print("\nTroubleshooting:")
    print("1. Is Docker Desktop running?")
    print("2. Run 'docker ps' to check if the container is running")
    print("3. Run 'docker compose up -d' in your course folder")

Successfully connected to MySQL!
Database: mfre521d


---
## Part 2: Load and Explore the CSV File

### Step 2.1: Load the CSV File

The file `FOOD-DATA-GROUP1.csv` contains nutrition information for various food items.

In [3]:
# Load the CSV file
# Update the path if your file is in a different location

df = pd.read_csv('FOOD-DATA-GROUP1.csv')

print(f"Data loaded successfully!")
print(f"Shape: {df.shape[0]} rows, {df.shape[1]} columns")

Data loaded successfully!
Shape: 551 rows, 37 columns


### Step 2.2: Explore the Data

Let's look at what we have.

In [4]:
# View the first few rows
df.head()

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,food,Caloric Value,Fat,Saturated Fats,Monounsaturated Fats,Polyunsaturated Fats,Carbohydrates,Sugars,Protein,Dietary Fiber,Cholesterol,Sodium,Water,Vitamin A,Vitamin B1,Vitamin B11,Vitamin B12,Vitamin B2,Vitamin B3,Vitamin B5,Vitamin B6,Vitamin C,Vitamin D,Vitamin E,Vitamin K,Calcium,Copper,Iron,Magnesium,Manganese,Phosphorus,Potassium,Selenium,Zinc,Nutrition Density
0,0,0,cream cheese,51,5.0,2.9,1.3,0.2,0.8,0.5,0.9,0.0,14.6,0.016,7.6,0.2,0.033,0.064,0.092,0.097,0.084,0.052,0.096,0.004,0.0,0.0,0.1,0.008,14.1,0.082,0.027,1.3,0.091,15.5,19.1,0.039,7.07
1,1,1,neufchatel cheese,215,19.4,10.9,4.9,0.8,3.1,2.7,7.8,0.0,62.9,0.3,53.6,0.2,0.099,0.079,0.09,0.1,0.2,0.5,0.078,0.0,0.0,0.3,0.045,99.5,0.034,0.1,8.5,0.088,117.3,129.2,0.054,0.7,130.1
2,2,2,requeijao cremoso light catupiry,49,3.6,2.3,0.9,0.0,0.9,3.4,0.8,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.4
3,3,3,ricotta cheese,30,2.0,1.3,0.5,0.002,1.5,0.091,1.5,0.0,9.8,0.017,14.7,0.075,0.019,0.079,0.091,0.027,0.041,0.016,0.007,0.006,0.0,0.001,0.011,0.097,41.2,0.097,0.096,4.0,0.024,30.8,43.8,0.035,5.196
4,4,4,cream cheese low fat,30,2.3,1.4,0.6,0.042,1.2,0.9,1.2,0.0,8.1,0.046,10.0,0.016,0.08,0.062,0.049,0.026,0.08,0.1,0.003,0.0,0.036,0.009,0.019,22.2,0.072,0.008,1.2,0.098,22.8,37.1,0.034,0.053,27.007


In [5]:
# View column names
print("Column names:")
for i, col in enumerate(df.columns):
    print(f"  {i}: {col}")

Column names:
  0: Unnamed: 0.1
  1: Unnamed: 0
  2: food
  3: Caloric Value
  4: Fat
  5: Saturated Fats
  6: Monounsaturated Fats
  7: Polyunsaturated Fats
  8: Carbohydrates
  9: Sugars
  10: Protein
  11: Dietary Fiber
  12: Cholesterol
  13: Sodium
  14: Water
  15: Vitamin A
  16: Vitamin B1
  17: Vitamin B11
  18: Vitamin B12
  19: Vitamin B2
  20: Vitamin B3
  21: Vitamin B5
  22: Vitamin B6
  23: Vitamin C
  24: Vitamin D
  25: Vitamin E
  26: Vitamin K
  27: Calcium
  28: Copper
  29: Iron
  30: Magnesium
  31: Manganese
  32: Phosphorus
  33: Potassium
  34: Selenium
  35: Zinc
  36: Nutrition Density


In [6]:
# Check data types
print("Data types:")
print(df.dtypes)

Data types:
Unnamed: 0.1              int64
Unnamed: 0                int64
food                     object
Caloric Value             int64
Fat                     float64
Saturated Fats          float64
Monounsaturated Fats    float64
Polyunsaturated Fats    float64
Carbohydrates           float64
Sugars                  float64
Protein                 float64
Dietary Fiber           float64
Cholesterol             float64
Sodium                  float64
Water                   float64
Vitamin A               float64
Vitamin B1              float64
Vitamin B11             float64
Vitamin B12             float64
Vitamin B2              float64
Vitamin B3              float64
Vitamin B5              float64
Vitamin B6              float64
Vitamin C               float64
Vitamin D               float64
Vitamin E               float64
Vitamin K               float64
Calcium                 float64
Copper                  float64
Iron                    float64
Magnesium               floa

### Step 2.3: Understanding the Dataset

This dataset contains:
- **food**: Name of the food item
- **Caloric Value**: Calories per serving
- **Macronutrients**: Fat, Carbohydrates, Protein, etc.
- **Vitamins**: A, B1, B2, B3, B5, B6, B11, B12, C, D, E, K
- **Minerals**: Calcium, Iron, Magnesium, Zinc, etc.
- **Nutrition Density**: Overall nutritional score

Notice there are some unnecessary columns at the beginning that we need to remove.

---
## Part 3: Clean the Data

### Step 3.1: Remove Unnecessary Columns

The first two columns are just index columns that we don't need.

In [7]:
# Look at the first two columns
print("First two columns:")
print(df.iloc[:5, :3])

First two columns:
   Unnamed: 0.1  Unnamed: 0                              food
0             0           0                      cream cheese
1             1           1                 neufchatel cheese
2             2           2  requeijao cremoso light catupiry
3             3           3                    ricotta cheese
4             4           4              cream cheese low fat


In [8]:
# Remove the first two columns (they are just row indices)
# We keep everything from 'food' onwards

df_clean = df.drop(columns=[df.columns[0], df.columns[1]])

print(f"Columns before cleaning: {df.shape[1]}")
print(f"Columns after cleaning: {df_clean.shape[1]}")
print(f"\nRemaining columns:")
print(df_clean.columns.tolist())

Columns before cleaning: 37
Columns after cleaning: 35

Remaining columns:
['food', 'Caloric Value', 'Fat', 'Saturated Fats', 'Monounsaturated Fats', 'Polyunsaturated Fats', 'Carbohydrates', 'Sugars', 'Protein', 'Dietary Fiber', 'Cholesterol', 'Sodium', 'Water', 'Vitamin A', 'Vitamin B1', 'Vitamin B11', 'Vitamin B12', 'Vitamin B2', 'Vitamin B3', 'Vitamin B5', 'Vitamin B6', 'Vitamin C', 'Vitamin D', 'Vitamin E', 'Vitamin K', 'Calcium', 'Copper', 'Iron', 'Magnesium', 'Manganese', 'Phosphorus', 'Potassium', 'Selenium', 'Zinc', 'Nutrition Density']


### Step 3.2: Clean Column Names for MySQL

MySQL column names work best when they:
- Are lowercase
- Use underscores instead of spaces
- Don't have special characters

In [9]:
# Clean column names
def clean_column_name(name):
    """
    Convert column name to MySQL-friendly format:
    - lowercase
    - spaces to underscores
    - remove special characters
    """
    clean = name.lower()
    clean = clean.replace(' ', '_')
    return clean

# Apply to all columns
df_clean.columns = [clean_column_name(col) for col in df_clean.columns]

print("Cleaned column names:")
for col in df_clean.columns:
    print(f"  {col}")

Cleaned column names:
  food
  caloric_value
  fat
  saturated_fats
  monounsaturated_fats
  polyunsaturated_fats
  carbohydrates
  sugars
  protein
  dietary_fiber
  cholesterol
  sodium
  water
  vitamin_a
  vitamin_b1
  vitamin_b11
  vitamin_b12
  vitamin_b2
  vitamin_b3
  vitamin_b5
  vitamin_b6
  vitamin_c
  vitamin_d
  vitamin_e
  vitamin_k
  calcium
  copper
  iron
  magnesium
  manganese
  phosphorus
  potassium
  selenium
  zinc
  nutrition_density


### Step 3.3: Add a Primary Key Column

Every table should have a primary key. We'll add a `food_id` column.

In [10]:
# Add food_id as the first column (starting from 1)
df_clean.insert(0, 'food_id', range(1, len(df_clean) + 1))

print("Added food_id column:")
df_clean[['food_id', 'food', 'caloric_value', 'protein']].head()

Added food_id column:


Unnamed: 0,food_id,food,caloric_value,protein
0,1,cream cheese,51,0.9
1,2,neufchatel cheese,215,7.8
2,3,requeijao cremoso light catupiry,49,0.8
3,4,ricotta cheese,30,1.5
4,5,cream cheese low fat,30,1.2


### Step 3.4: Check for Missing Values

In [11]:
# Check for missing values
missing = df_clean.isnull().sum()
missing_cols = missing[missing > 0]

if len(missing_cols) > 0:
    print("Columns with missing values:")
    print(missing_cols)
else:
    print("No missing values found!")

No missing values found!


### Step 3.5: View the Cleaned Data

In [12]:
# Final check of the cleaned data
print(f"Final dataset shape: {df_clean.shape}")
print(f"\nFirst 5 rows:")
df_clean.head()

Final dataset shape: (551, 36)

First 5 rows:


Unnamed: 0,food_id,food,caloric_value,fat,saturated_fats,monounsaturated_fats,polyunsaturated_fats,carbohydrates,sugars,protein,dietary_fiber,cholesterol,sodium,water,vitamin_a,vitamin_b1,vitamin_b11,vitamin_b12,vitamin_b2,vitamin_b3,vitamin_b5,vitamin_b6,vitamin_c,vitamin_d,vitamin_e,vitamin_k,calcium,copper,iron,magnesium,manganese,phosphorus,potassium,selenium,zinc,nutrition_density
0,1,cream cheese,51,5.0,2.9,1.3,0.2,0.8,0.5,0.9,0.0,14.6,0.016,7.6,0.2,0.033,0.064,0.092,0.097,0.084,0.052,0.096,0.004,0.0,0.0,0.1,0.008,14.1,0.082,0.027,1.3,0.091,15.5,19.1,0.039,7.07
1,2,neufchatel cheese,215,19.4,10.9,4.9,0.8,3.1,2.7,7.8,0.0,62.9,0.3,53.6,0.2,0.099,0.079,0.09,0.1,0.2,0.5,0.078,0.0,0.0,0.3,0.045,99.5,0.034,0.1,8.5,0.088,117.3,129.2,0.054,0.7,130.1
2,3,requeijao cremoso light catupiry,49,3.6,2.3,0.9,0.0,0.9,3.4,0.8,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.4
3,4,ricotta cheese,30,2.0,1.3,0.5,0.002,1.5,0.091,1.5,0.0,9.8,0.017,14.7,0.075,0.019,0.079,0.091,0.027,0.041,0.016,0.007,0.006,0.0,0.001,0.011,0.097,41.2,0.097,0.096,4.0,0.024,30.8,43.8,0.035,5.196
4,5,cream cheese low fat,30,2.3,1.4,0.6,0.042,1.2,0.9,1.2,0.0,8.1,0.046,10.0,0.016,0.08,0.062,0.049,0.026,0.08,0.1,0.003,0.0,0.036,0.009,0.019,22.2,0.072,0.008,1.2,0.098,22.8,37.1,0.034,0.053,27.007


In [13]:
# Check data types
print("Data types:")
print(df_clean.dtypes)

Data types:
food_id                   int64
food                     object
caloric_value             int64
fat                     float64
saturated_fats          float64
monounsaturated_fats    float64
polyunsaturated_fats    float64
carbohydrates           float64
sugars                  float64
protein                 float64
dietary_fiber           float64
cholesterol             float64
sodium                  float64
water                   float64
vitamin_a               float64
vitamin_b1              float64
vitamin_b11             float64
vitamin_b12             float64
vitamin_b2              float64
vitamin_b3              float64
vitamin_b5              float64
vitamin_b6              float64
vitamin_c               float64
vitamin_d               float64
vitamin_e               float64
vitamin_k               float64
calcium                 float64
copper                  float64
iron                    float64
magnesium               float64
manganese               floa

---
## Part 4: Create MySQL Table

### Step 4.1: Generate CREATE TABLE Statement

We need to create a table with the correct column types.

In [16]:
# Generate CREATE TABLE statement

def get_mysql_type(pandas_dtype, column_name):
    """
    Convert pandas dtype to MySQL type.
    """
    if column_name == 'food_id':
        return 'INT PRIMARY KEY'
    elif column_name == 'food':
        return 'VARCHAR(255)'
    elif 'int' in str(pandas_dtype):
        return 'INT'
    elif 'float' in str(pandas_dtype):
        return 'DECIMAL(10, 3)'
    else:
        return 'VARCHAR(255)'

# Build the CREATE TABLE statement
table_name = "food_nutrition"
columns_sql = []

for col in df_clean.columns:
    mysql_type = get_mysql_type(df_clean[col].dtype, col)
    columns_sql.append(f"    {col} {mysql_type}")

columns_block = ",\n".join(columns_sql)

create_table_sql = f"""CREATE TABLE IF NOT EXISTS {table_name} (
{columns_block}
);"""

print("CREATE TABLE Statement:")
print("=" * 60)
print(create_table_sql)


CREATE TABLE Statement:
CREATE TABLE IF NOT EXISTS food_nutrition (
    food_id INT PRIMARY KEY,
    food VARCHAR(255),
    caloric_value INT,
    fat DECIMAL(10, 3),
    saturated_fats DECIMAL(10, 3),
    monounsaturated_fats DECIMAL(10, 3),
    polyunsaturated_fats DECIMAL(10, 3),
    carbohydrates DECIMAL(10, 3),
    sugars DECIMAL(10, 3),
    protein DECIMAL(10, 3),
    dietary_fiber DECIMAL(10, 3),
    cholesterol DECIMAL(10, 3),
    sodium DECIMAL(10, 3),
    water DECIMAL(10, 3),
    vitamin_a DECIMAL(10, 3),
    vitamin_b1 DECIMAL(10, 3),
    vitamin_b11 DECIMAL(10, 3),
    vitamin_b12 DECIMAL(10, 3),
    vitamin_b2 DECIMAL(10, 3),
    vitamin_b3 DECIMAL(10, 3),
    vitamin_b5 DECIMAL(10, 3),
    vitamin_b6 DECIMAL(10, 3),
    vitamin_c DECIMAL(10, 3),
    vitamin_d DECIMAL(10, 3),
    vitamin_e DECIMAL(10, 3),
    vitamin_k DECIMAL(10, 3),
    calcium DECIMAL(10, 3),
    copper DECIMAL(10, 3),
    iron DECIMAL(10, 3),
    magnesium DECIMAL(10, 3),
    manganese DECIMAL(10, 3),

### Step 4.2: Create the Table in MySQL

In [17]:
# Connect and create the table

try:
    connection = mysql.connector.connect(**DB_CONFIG)
    cursor = connection.cursor()
    
    # Drop table if it exists (for clean re-runs)
    cursor.execute(f"DROP TABLE IF EXISTS {table_name}")
    print(f"Dropped existing table '{table_name}' if it existed.")
    
    # Create the table
    cursor.execute(create_table_sql)
    print(f"Table '{table_name}' created successfully!")
    
    connection.commit()
    cursor.close()
    connection.close()
    
except Error as e:
    print(f"Error: {e}")

Dropped existing table 'food_nutrition' if it existed.
Table 'food_nutrition' created successfully!


---
## Part 5: Insert Data into MySQL

### Step 5.1: Prepare the INSERT Statement

In [18]:
# Create INSERT statement template
columns_list = ', '.join(df_clean.columns)
placeholders = ', '.join(['%s'] * len(df_clean.columns))

insert_sql = f"INSERT INTO {table_name} ({columns_list}) VALUES ({placeholders})"

print("INSERT statement template:")
print(insert_sql[:100] + "...")

INSERT statement template:
INSERT INTO food_nutrition (food_id, food, caloric_value, fat, saturated_fats, monounsaturated_fats,...


### Step 5.2: Insert All Rows

In [19]:
# Insert data into MySQL

try:
    connection = mysql.connector.connect(**DB_CONFIG)
    cursor = connection.cursor()
    
    # Convert DataFrame to list of tuples
    data_tuples = [tuple(row) for row in df_clean.values]
    
    # Insert all rows
    cursor.executemany(insert_sql, data_tuples)
    
    # Commit the transaction
    connection.commit()
    
    print(f"Successfully inserted {cursor.rowcount} rows into '{table_name}'!")
    
    cursor.close()
    connection.close()
    
except Error as e:
    print(f"Error: {e}")

Successfully inserted 551 rows into 'food_nutrition'!


---
## Part 6: Verify Your Data with SQL Queries

### Step 6.1: Create a Helper Function for Queries

In [20]:
def run_query(sql, fetch=True):
    """
    Run a SQL query and return results as a DataFrame.
    
    Parameters:
    -----------
    sql : str
        SQL query to execute
    fetch : bool
        Whether to fetch results (False for INSERT/UPDATE/DELETE)
    
    Returns:
    --------
    pd.DataFrame or None
    """
    try:
        connection = mysql.connector.connect(**DB_CONFIG)
        
        if fetch:
            df = pd.read_sql(sql, connection)
            connection.close()
            return df
        else:
            cursor = connection.cursor()
            cursor.execute(sql)
            connection.commit()
            cursor.close()
            connection.close()
            return None
            
    except Error as e:
        print(f"Error: {e}")
        return None

print("Query helper function created!")

Query helper function created!


### Step 6.2: Count Total Rows

In [21]:
# Query 1: Count total rows
query = "SELECT COUNT(*) AS total_foods FROM food_nutrition"

result = run_query(query)
print("Total number of food items:")
print(result)

Total number of food items:
   total_foods
0          551


  df = pd.read_sql(sql, connection)


### Step 6.3: View Sample Data

In [22]:
# Query 2: View first 10 rows
query = """
SELECT food_id, food, caloric_value, protein, fat, carbohydrates
FROM food_nutrition
LIMIT 10
"""

result = run_query(query)
print("First 10 food items:")
result

First 10 food items:


  df = pd.read_sql(sql, connection)


Unnamed: 0,food_id,food,caloric_value,protein,fat,carbohydrates
0,1,cream cheese,51,0.9,5.0,0.8
1,2,neufchatel cheese,215,7.8,19.4,3.1
2,3,requeijao cremoso light catupiry,49,0.8,3.6,0.9
3,4,ricotta cheese,30,1.5,2.0,1.5
4,5,cream cheese low fat,30,1.2,2.3,1.2
5,6,cream cheese fat free,19,2.8,0.2,1.4
6,7,gruyere cheese,116,8.3,9.1,0.1
7,8,cheddar cheese,113,6.4,9.3,0.9
8,9,parmesan cheese,71,6.4,4.5,0.6
9,10,romano cheese,19,1.6,1.3,0.2


---
## Submission Checklist

Before submitting, make sure you have:

- [ ] Successfully connected to MySQL
- [ ] Loaded and cleaned the CSV data
- [ ] Created the `food_nutrition` table
- [ ] Inserted all 551 rows
- [ ] Verified data with the provided queries

### How to Submit

1. Save this notebook
2. Export as PDF or HTML
3. Submit via Canvas by end of day January 12, 2026

---
