# Module 0 Lab: Environment Setup and First Database

Welcome to your first hands-on lab! This lab will verify your setup and create your first SQLite database using Python.

## Lab Objectives
By the end of this lab, you will have:
1. Verified all required packages are working
2. Created your first SQLite database
3. Performed basic database operations
4. Visualized simple data from your database

**Estimated Time:** 30-45 minutes

## Section 1: Environment Verification

Let's start by verifying that all our required packages are installed and working correctly.

In [None]:
# Test 1: Import all required packages
import sqlite3
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sys
from datetime import datetime

print("✅ All packages imported successfully!")
print(f"Python version: {sys.version}")
print(f"SQLite version: {sqlite3.sqlite_version}")
print(f"Pandas version: {pd.__version__}")

In [None]:
# Test 2: Basic SQLite functionality
try:
    # Create an in-memory database for testing
    test_conn = sqlite3.connect(':memory:')
    test_cursor = test_conn.cursor()
    
    # Create a simple table
    test_cursor.execute('''CREATE TABLE test (id INTEGER, message TEXT)''')
    
    # Insert test data
    test_cursor.execute('''INSERT INTO test VALUES (1, 'Hello SQLite!')''')
    
    # Retrieve data
    test_cursor.execute('''SELECT * FROM test''')
    result = test_cursor.fetchone()
    
    test_conn.close()
    
    print("✅ SQLite basic operations working!")
    print(f"Test query result: {result}")
    
except Exception as e:
    print(f"❌ SQLite test failed: {e}")

In [None]:
# Test 3: Pandas-SQLite integration
try:
    # Create test data
    test_data = pd.DataFrame({
        'id': [1, 2, 3],
        'name': ['Alice', 'Bob', 'Charlie'],
        'score': [85, 92, 78]
    })
    
    # Create in-memory database
    conn = sqlite3.connect(':memory:')
    
    # Write DataFrame to SQLite
    test_data.to_sql('students', conn, index=False)
    
    # Read back from SQLite
    result_df = pd.read_sql('SELECT * FROM students ORDER BY score DESC', conn)
    
    conn.close()
    
    print("✅ Pandas-SQLite integration working!")
    print("\nTest data retrieved from database:")
    print(result_df)
    
except Exception as e:
    print(f"❌ Pandas-SQLite integration test failed: {e}")

In [None]:
# Test 4: Matplotlib and Seaborn
try:
    # Create a simple test plot
    plt.figure(figsize=(8, 5))
    
    # Test data
    x = [1, 2, 3, 4, 5]
    y = [2, 5, 3, 8, 7]
    
    # Create subplot
    plt.subplot(1, 2, 1)
    plt.plot(x, y, 'bo-', linewidth=2, markersize=8)
    plt.title('Matplotlib Test', fontsize=12)
    plt.xlabel('X Values')
    plt.ylabel('Y Values')
    plt.grid(True, alpha=0.3)
    
    # Seaborn test
    plt.subplot(1, 2, 2)
    sns.set_style("whitegrid")
    test_df = pd.DataFrame({'x': x, 'y': y})
    sns.scatterplot(data=test_df, x='x', y='y', s=100, color='red')
    plt.title('Seaborn Test', fontsize=12)
    
    plt.tight_layout()
    plt.show()
    
    print("✅ Matplotlib and Seaborn working!")
    
except Exception as e:
    print(f"❌ Plotting test failed: {e}")

## Section 2: Your First SQLite Database

Now let's create your first real SQLite database file and perform some operations with it.

In [None]:
# Create your first database file
database_name = 'my_first_database.db'

# Connect to the database (this creates the file if it doesn't exist)
conn = sqlite3.connect(database_name)
cursor = conn.cursor()

print(f"✅ Connected to database: {database_name}")
print(f"Database created at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

In [None]:
# Create your first table - a simple personal expense tracker
create_table_sql = '''
CREATE TABLE IF NOT EXISTS expenses (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    date TEXT NOT NULL,
    category TEXT NOT NULL,
    description TEXT,
    amount REAL NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
'''

cursor.execute(create_table_sql)
conn.commit()

print("✅ Created 'expenses' table successfully!")

In [None]:
# Insert some sample data
sample_expenses = [
    ('2024-01-01', 'Food', 'Breakfast at cafe', 12.50),
    ('2024-01-01', 'Transportation', 'Bus fare', 2.75),
    ('2024-01-01', 'Food', 'Lunch', 18.00),
    ('2024-01-02', 'Entertainment', 'Movie tickets', 25.00),
    ('2024-01-02', 'Food', 'Grocery shopping', 67.30),
    ('2024-01-03', 'Transportation', 'Gas', 45.00),
    ('2024-01-03', 'Food', 'Coffee', 4.50),
    ('2024-01-03', 'Entertainment', 'Book purchase', 15.99)
]

# Insert the data
insert_sql = 'INSERT INTO expenses (date, category, description, amount) VALUES (?, ?, ?, ?)'

cursor.executemany(insert_sql, sample_expenses)
conn.commit()

print(f"✅ Inserted {len(sample_expenses)} expense records!")

In [None]:
# Query the data to verify it was inserted correctly
cursor.execute('SELECT * FROM expenses ORDER BY date, id')
all_expenses = cursor.fetchall()

print("📊 All expenses in the database:")
print(f"{'ID':<3} {'Date':<12} {'Category':<15} {'Description':<20} {'Amount':<8} {'Created At':<20}")
print("-" * 85)

for expense in all_expenses:
    print(f"{expense[0]:<3} {expense[1]:<12} {expense[2]:<15} {expense[3]:<20} ${expense[4]:<7.2f} {expense[5]:<20}")

## Section 3: Basic Data Analysis

Let's perform some basic analysis on our expense data using SQL queries.

In [None]:
# Analysis 1: Total expenses
cursor.execute('SELECT SUM(amount) as total_expenses FROM expenses')
total = cursor.fetchone()[0]

print(f"💰 Total Expenses: ${total:.2f}")

In [None]:
# Analysis 2: Expenses by category
cursor.execute('''
SELECT category, 
       COUNT(*) as num_transactions,
       SUM(amount) as total_amount,
       AVG(amount) as avg_amount
FROM expenses 
GROUP BY category 
ORDER BY total_amount DESC
''')

category_analysis = cursor.fetchall()

print("📈 Expenses by Category:")
print(f"{'Category':<15} {'Count':<6} {'Total':<10} {'Average':<10}")
print("-" * 45)

for cat, count, total, avg in category_analysis:
    print(f"{cat:<15} {count:<6} ${total:<9.2f} ${avg:<9.2f}")

In [None]:
# Analysis 3: Daily expenses
cursor.execute('''
SELECT date, 
       COUNT(*) as num_transactions,
       SUM(amount) as daily_total
FROM expenses 
GROUP BY date 
ORDER BY date
''')

daily_analysis = cursor.fetchall()

print("📅 Daily Expense Summary:")
print(f"{'Date':<12} {'Transactions':<12} {'Total':<10}")
print("-" * 35)

for date, count, total in daily_analysis:
    print(f"{date:<12} {count:<12} ${total:<9.2f}")

## Section 4: Pandas Integration

Now let's use Pandas to work with our database data more effectively.

In [None]:
# Load data into a Pandas DataFrame
df = pd.read_sql_query('SELECT * FROM expenses', conn)

print("📊 Expense Data in Pandas DataFrame:")
print(df.head(10))
print(f"\nDataFrame shape: {df.shape}")
print(f"\nData types:")
print(df.dtypes)

In [None]:
# Pandas analysis - more advanced than pure SQL
print("📈 Pandas Statistical Summary:")
print(df['amount'].describe())

print("\n🏷️ Category Value Counts:")
print(df['category'].value_counts())

print("\n📊 Expense Summary by Category:")
category_summary = df.groupby('category')['amount'].agg(['count', 'sum', 'mean', 'std']).round(2)
print(category_summary)

## Section 5: Data Visualization

Finally, let's create some visualizations of our expense data.

In [None]:
# Create visualizations
plt.figure(figsize=(15, 10))

# 1. Expenses by Category (Bar Plot)
plt.subplot(2, 3, 1)
category_totals = df.groupby('category')['amount'].sum().sort_values(ascending=True)
category_totals.plot(kind='barh', color='steelblue')
plt.title('Total Expenses by Category', fontsize=12, fontweight='bold')
plt.xlabel('Amount ($)')
plt.grid(axis='x', alpha=0.3)

# 2. Daily Expenses (Line Plot)
plt.subplot(2, 3, 2)
daily_totals = df.groupby('date')['amount'].sum()
plt.plot(daily_totals.index, daily_totals.values, 'o-', linewidth=2, markersize=8, color='green')
plt.title('Daily Expense Trends', fontsize=12, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Amount ($)')
plt.xticks(rotation=45)
plt.grid(True, alpha=0.3)

# 3. Expense Distribution (Histogram)
plt.subplot(2, 3, 3)
plt.hist(df['amount'], bins=8, color='orange', alpha=0.7, edgecolor='black')
plt.title('Expense Amount Distribution', fontsize=12, fontweight='bold')
plt.xlabel('Amount ($)')
plt.ylabel('Frequency')
plt.grid(axis='y', alpha=0.3)

# 4. Category Pie Chart
plt.subplot(2, 3, 4)
category_totals.plot(kind='pie', autopct='%1.1f%%', startangle=90)
plt.title('Expense Distribution by Category', fontsize=12, fontweight='bold')
plt.ylabel('')  # Remove default ylabel

# 5. Box Plot by Category
plt.subplot(2, 3, 5)
df.boxplot(column='amount', by='category', ax=plt.gca())
plt.title('Expense Amount by Category\n(Box Plot)', fontsize=12, fontweight='bold')
plt.suptitle('')  # Remove automatic suptitle
plt.xlabel('Category')
plt.ylabel('Amount ($)')
plt.xticks(rotation=45)

# 6. Seaborn Enhanced Scatter Plot
plt.subplot(2, 3, 6)
# Convert date to numeric for plotting
df['date_num'] = pd.to_datetime(df['date']).astype(int) / 10**9
sns.scatterplot(data=df, x='date_num', y='amount', hue='category', s=100, alpha=0.8)
plt.title('Expenses Over Time by Category', fontsize=12, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Amount ($)')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')

plt.tight_layout()
plt.show()

print("✅ Created comprehensive expense analysis dashboard!")

In [None]:
# Create a more sophisticated seaborn visualization
plt.figure(figsize=(12, 8))

# Set seaborn style
sns.set_style("whitegrid")
sns.set_palette("husl")

# 1. Category analysis with seaborn
plt.subplot(2, 2, 1)
sns.barplot(data=df, x='category', y='amount', estimator=sum, ci=None)
plt.title('Total Expenses by Category', fontweight='bold')
plt.xticks(rotation=45)
plt.ylabel('Total Amount ($)')

# 2. Average expenses by category
plt.subplot(2, 2, 2)
sns.barplot(data=df, x='category', y='amount', ci='sd')
plt.title('Average Expenses by Category (with Std Dev)', fontweight='bold')
plt.xticks(rotation=45)
plt.ylabel('Average Amount ($)')

# 3. Strip plot showing individual expenses
plt.subplot(2, 2, 3)
sns.stripplot(data=df, x='category', y='amount', size=8, alpha=0.7)
plt.title('Individual Expenses by Category', fontweight='bold')
plt.xticks(rotation=45)
plt.ylabel('Amount ($)')

# 4. Violin plot for distribution shape
plt.subplot(2, 2, 4)
sns.violinplot(data=df, x='category', y='amount')
plt.title('Expense Distribution Shape by Category', fontweight='bold')
plt.xticks(rotation=45)
plt.ylabel('Amount ($)')

plt.tight_layout()
plt.show()

print("✅ Created advanced Seaborn visualizations!")

## Section 6: Cleanup and Summary

In [None]:
# Let's verify our database schema one more time
cursor.execute("SELECT sql FROM sqlite_master WHERE type='table' AND name='expenses'")
table_schema = cursor.fetchone()[0]

print("🗄️ Final Database Schema:")
print(table_schema)

# Get final row count
cursor.execute("SELECT COUNT(*) FROM expenses")
row_count = cursor.fetchone()[0]

print(f"\n📊 Total records in database: {row_count}")

# Close the database connection
conn.close()
print(f"\n✅ Database connection closed successfully!")
print(f"📁 Database file '{database_name}' saved to disk.")

## Lab Summary

🎉 **Congratulations!** You have successfully completed Module 0 Lab!

### What You Accomplished:

1. ✅ **Environment Verification**: Confirmed all required packages are working
2. ✅ **Database Creation**: Created your first SQLite database file
3. ✅ **Table Creation**: Designed and created an expenses table with proper schema
4. ✅ **Data Insertion**: Added sample expense data to the database
5. ✅ **SQL Queries**: Performed basic data analysis using SQL
6. ✅ **Pandas Integration**: Loaded database data into DataFrames for analysis
7. ✅ **Data Visualization**: Created multiple charts and graphs using Matplotlib and Seaborn
8. ✅ **Best Practices**: Followed proper connection management and data handling

### Key Skills Developed:
- SQLite database creation and management
- Python sqlite3 module usage
- Basic SQL queries (SELECT, INSERT, GROUP BY)
- Pandas DataFrame operations
- Data visualization with multiple libraries
- Error handling and debugging

### Files Created:
- `my_first_database.db` - Your first SQLite database with expense tracking data

### What's Next?
In **Module 1**, we'll dive deeper into:
- SQLite architecture and advanced features
- Connection management patterns
- Error handling strategies
- Performance optimization
- Database design principles

### Troubleshooting Tips:
If you encountered any issues:
1. Make sure your virtual environment is activated
2. Verify all packages are installed: `pip list`
3. Check Python version: `python --version`
4. Restart Jupyter kernel if imports fail
5. Ensure you have write permissions in the current directory

**Great job and welcome to the world of Python and SQLite integration!** 🚀