# Lesson 03: Introduction to Databases
## Walkthrough - Student Version

## Sub-Lesson 03a: Database Concepts (Brief Review)

**Data vs. Information:**
- **Data** is raw, unprocessed facts (e.g., "25", "female", "yes")
- **Information** is processed data with context and meaning (e.g., "25-year-old female passenger survived")

**What is a Database?**
A database is an organized collection of related data, structured to allow efficient storage, retrieval, and manipulation. Instead of scattered CSV files, a database provides:
- **Structure**: Tables with clearly defined columns and data types
- **Relationships**: Links between tables (one table can reference another)
- **Efficiency**: Fast queries on large datasets
- **Integrity**: Rules to ensure data quality and consistency
- **Security**: Access control and audit trails

**Why Databases Instead of CSVs?**
- CSVs are simple but limited: they're flat files with no structure, no data types, and slow searches on large files
- Databases enforce data types, allow complex queries, and scale to millions of rows
- In Lesson 02, you filtered a small CSV in memory with pandas. For enterprise data, that approach breaks down — databases are designed for this

**What is SQLite?**
SQLite is a lightweight, file-based database engine. It's perfect for learning because:
- No separate server to install
- Built into Python (and most languages)
- Stores data in a single `.db` file
- Uses SQL (the universal language for databases)
- Fast for small-to-medium datasets

## Sub-Lesson 03b: Converting CSV to SQLite

### Section 1: Setup & Import

In [None]:
# Import the libraries we need
import pandas as pd     # For DataFrames (you already know this)
import sqlite3          # For working with SQLite databases (built into Python)

print("Libraries imported successfully!")

### Section 2: Load CSV (Review)

In [None]:
# Load the Titanic CSV into a DataFrame — just like Lessons 01-02
titanic = pd.read_csv("Titanic Dataset.csv")

# Quick check
print(f"Loaded {titanic.shape[0]} rows and {titanic.shape[1]} columns")
titanic.head()

### Section 3: Create SQLite Database

**What is `sqlite3.connect()`?**

`sqlite3.connect()` is your entry point to SQLite. It:
- Creates a new `.db` file (if it doesn't exist) or opens an existing one
- Returns a **connection object** — your handle to the database
- The connection is how you execute queries and manage transactions

Once you have a connection, you can:
- Create tables
- Insert data
- Run queries
- Commit (save) or rollback changes

In [None]:
# Create (or open) a SQLite database file called titanic.db
# If the file doesn't exist, SQLite creates it automatically
conn = sqlite3.connect("titanic.db")

print("Connected to titanic.db!")

### Section 4: Save DataFrame to Database

**What is `to_sql()`?**

The `to_sql()` method (from pandas) bridges DataFrames and databases:
- **tableName** (first argument): The name of the table to create/update in the database
- **connection** (second argument): The connection object from `sqlite3.connect()`
- **if_exists**: What to do if the table already exists:
  - `"replace"` — overwrite the table (useful for testing)
  - `"append"` — add rows to existing table (useful in production)
  - `"fail"` — raise an error if table exists (safe, prevents accidents)
- **index**: Whether to save the DataFrame's row numbers as a column:
  - `False` — don't save row numbers (usually what you want)
  - `True` — save row numbers (rarely needed)

In [None]:
# Save our DataFrame as a table called "passengers" in the database
# if_exists="replace" means: if the table already exists, overwrite it
# index=False means: don't save the row numbers as a column
titanic.to_sql("passengers", conn, if_exists="replace", index=False)

print("Data saved to 'passengers' table in titanic.db!")

### Section 5: Your First SQL Query

**What is `pd.read_sql()`?**

`pd.read_sql()` runs a SQL query and returns the results as a DataFrame:
- **SQL string** (first argument): Your query (e.g., `"SELECT * FROM passengers"`)
- **connection** (second argument): The connection to the database
- **Returns**: A DataFrame with the query results

This means you can use SQL for complex filtering, then switch back to pandas for analysis — best of both worlds!

In [None]:
# Run a SQL query and get results as a DataFrame
# SELECT * FROM passengers means "get all columns from the passengers table"
# LIMIT 5 means "only return the first 5 rows"
result = pd.read_sql("SELECT * FROM passengers LIMIT 5", conn)
result

In [None]:
# Select specific columns — just like titanic[["name", "age"]] in pandas
# This query says: "Get name, age, and survived columns, first 10 rows"
result = pd.read_sql("SELECT name, age, survived FROM passengers LIMIT 10", conn)
result

**Try This:**

Write a query to select `name`, `sex`, and `fare` from the passengers table (first 10 rows). 

Hint: Follow the pattern above!

In [None]:
# Write your query here


### Section 6: Exploring the Database Structure

Just like `.info()` tells you about a DataFrame, SQLite has queries that show you the database structure.

In [None]:
# List all tables in the database
# sqlite_master is a special table that SQLite maintains
# It tracks all objects in the database (tables, indexes, views, etc.)
tables = pd.read_sql("SELECT name FROM sqlite_master WHERE type='table'", conn)
print("Tables in the database:")
print(tables)

In [None]:
# See column info for the passengers table
# This is the database equivalent of .info()
# PRAGMA table_info() is SQLite-specific and returns column details
columnInfo = pd.read_sql("PRAGMA table_info(passengers)", conn)
columnInfo

**Try This:**

How many tables are in the database? What columns does the passengers table have? Use the queries above to find out!

In [None]:
# Write your exploration here


### Section 7: Pandas vs SQL Side-by-Side

You already know how to do this in pandas. Let's see the SQL equivalent.

In [None]:
# PANDAS way (from Lesson 02):
# Load the CSV and filter columns
titanicPandas = pd.read_csv("Titanic Dataset.csv")
pandasResult = titanicPandas[["name", "age"]].head(5)
print("Pandas result:")
print(pandasResult)

print("\n" + "="*50 + "\n")

# SQL way (new!):
# Query the database for the same columns
sqlResult = pd.read_sql("SELECT name, age FROM passengers LIMIT 5", conn)
print("SQL result:")
print(sqlResult)

print("\nSame data, different approach!")

### Section 8: Closing the Connection

Always close your connection when done to ensure all data is saved properly.

In [None]:
# Always close the connection when you're done
# This ensures all data is saved properly and resources are freed
conn.close()

print("Connection closed. Database saved!")

**Note:** If you close the connection and then try to run a query, you'll get an error. You'll need to reconnect with `sqlite3.connect("titanic.db")` first.

## Summary

### Key Commands Reference

| Task | Code | Purpose |
|------|------|----------|
| Import SQLite | `import sqlite3` | Add database functionality to Python |
| Connect to database | `conn = sqlite3.connect("filename.db")` | Open or create a SQLite database file |
| Save DataFrame to DB | `df.to_sql("table_name", conn, if_exists="replace", index=False)` | Create a table from a DataFrame |
| Run a query | `result = pd.read_sql("SELECT ...", conn)` | Execute SQL and get results as DataFrame |
| List tables | `pd.read_sql("SELECT name FROM sqlite_master WHERE type='table'", conn)` | See all tables in the database |
| See columns | `pd.read_sql("PRAGMA table_info(table_name)", conn)` | Inspect a table's structure |
| Close connection | `conn.close()` | Save and close the database |

### Key Concepts

- **Database**: Organized, structured collection of data (better than CSV for large/complex datasets)
- **SQLite**: Lightweight, file-based database system built into Python
- **SQL**: Universal language for databases (SELECT, INSERT, UPDATE, DELETE, etc.)
- **Connection**: Your link to the database; you need it to run queries
- **Table**: Like a DataFrame or CSV—rows and columns of structured data
- **PRAGMA**: SQLite-specific commands for system info (other databases use different commands)

### What's Next?

In upcoming lessons, you'll learn:
- More complex SQL queries (WHERE, GROUP BY, ORDER BY, JOIN)
- How to design databases with multiple related tables
- How databases enforce data integrity
- How to work with other database systems (MySQL, PostgreSQL)