# dbApps04: SQL Queries - SELECT, WHERE, AND/OR, ORDER BY

This notebook walks through the fundamentals of SQL querying using the Titanic dataset. We'll learn how to filter, sort, and retrieve data using SQL SELECT statements.

## Sub-Lesson 04a: SELECT, FROM, WHERE

In [None]:
# Import required libraries
import pandas as pd
import sqlite3
import os

In [None]:
# Set up database connection and create titanic.db if it doesn't exist
dbPath = '/sessions/sweet-lucid-archimedes/mnt/databaseApplicationsForGitHub/dbApps04/'
dbFile = 'titanic.db'
fullPath = os.path.join(dbPath, dbFile)

# Check if database exists; if not, create from CSV
if not os.path.exists(fullPath):
    csvFile = os.path.join(dbPath, 'passengers.csv')
    df = pd.read_csv(csvFile)
    conn = sqlite3.connect(fullPath)
    df.to_sql('passengers', conn, if_exists='replace', index=False)
    conn.close()
    print("Database created from CSV")
else:
    print("Database already exists")

# Connect to the database
conn = sqlite3.connect(fullPath)

In [None]:
# Query: SELECT all columns from passengers table, limit to first 5 rows
query = "SELECT * FROM passengers LIMIT 5"
result = pd.read_sql(query, conn)
result

In [None]:
# Query: SELECT specific columns (name and age), limit to 10 rows
query = "SELECT name, age FROM passengers LIMIT 10"
result = pd.read_sql(query, conn)
result

In [None]:
# Query: SELECT with WHERE clause - show survivors only (survived = 1), limit to 10 rows
query = "SELECT name, age, survived FROM passengers WHERE survived = 1 LIMIT 10"
result = pd.read_sql(query, conn)
result

In [None]:
# Query: WHERE with comparison operators - show passengers NOT in third class
# Using != (not equal) operator
query = "SELECT name, pclass FROM passengers WHERE pclass != 3 LIMIT 10"
result = pd.read_sql(query, conn)
result

In [None]:
# Query: WHERE with > operator - show passengers older than 50
query = "SELECT name, age FROM passengers WHERE age > 50 LIMIT 10"
result = pd.read_sql(query, conn)
result

In [None]:
# Query: WHERE with text values in quotes - show only female passengers
query = "SELECT name, sex FROM passengers WHERE sex = 'female' LIMIT 10"
result = pd.read_sql(query, conn)
result

### Try This 04a

Write a SQL query to find all passengers under age 18. Display their name and age. Limit to 15 rows.

In [None]:
# Try This 04a: Write your query here
query = ""
result = pd.read_sql(query, conn)
result

## Sub-Lesson 04b: AND/OR/NOT, LIKE, BETWEEN, IN

In [None]:
# Query: AND operator - find female passengers who survived
query = "SELECT name, sex, survived FROM passengers WHERE sex = 'female' AND survived = 1 LIMIT 10"
result = pd.read_sql(query, conn)
result

In [None]:
# Query: OR operator - find passengers in first OR second class
query = "SELECT name, pclass FROM passengers WHERE pclass = 1 OR pclass = 2 LIMIT 10"
result = pd.read_sql(query, conn)
result

In [None]:
# Query: NOT operator - find passengers NOT in third class
query = "SELECT name, pclass FROM passengers WHERE NOT pclass = 3 LIMIT 10"
result = pd.read_sql(query, conn)
result

In [None]:
# Query: LIKE with % wildcard - find names containing 'Mrs.'
query = "SELECT name FROM passengers WHERE name LIKE '%Mrs.%' LIMIT 10"
result = pd.read_sql(query, conn)
result

In [None]:
# Query: LIKE with % at start - find names starting with 'A'
query = "SELECT name FROM passengers WHERE name LIKE 'A%' LIMIT 10"
result = pd.read_sql(query, conn)
result

In [None]:
# Query: BETWEEN operator - find passengers aged 20 to 40
query = "SELECT name, age FROM passengers WHERE age BETWEEN 20 AND 40 LIMIT 10"
result = pd.read_sql(query, conn)
result

In [None]:
# Query: IN operator - find passengers who embarked in Cherbourg or Queenstown
query = "SELECT name, embarked FROM passengers WHERE embarked IN ('C', 'Q') LIMIT 10"
result = pd.read_sql(query, conn)
result

### Try This 04b

Write a SQL query to find all male passengers in first class who survived. Display their name, sex, pclass, and survived columns. Limit to 15 rows.

In [None]:
# Try This 04b: Write your query here
query = ""
result = pd.read_sql(query, conn)
result

## Sub-Lesson 04c: ORDER BY, LIMIT, Combining

In [None]:
# Query: ORDER BY with ASC (ascending) - sort passengers by age from youngest to oldest
query = "SELECT name, age FROM passengers ORDER BY age ASC LIMIT 10"
result = pd.read_sql(query, conn)
result

In [None]:
# Query: ORDER BY with DESC (descending) - sort passengers by fare from highest to lowest
query = "SELECT name, fare FROM passengers ORDER BY fare DESC LIMIT 10"
result = pd.read_sql(query, conn)
result

In [None]:
# Query: ORDER BY combined with LIMIT - find top 10 highest fares
query = "SELECT name, fare FROM passengers WHERE fare IS NOT NULL ORDER BY fare DESC LIMIT 10"
result = pd.read_sql(query, conn)
result

In [None]:
# Query: Full combined query - SELECT columns FROM table WHERE conditions ORDER BY column DESC LIMIT N
# Find: female survivors, show name/age/fare, order by fare descending, top 10
query = "SELECT name, age, fare FROM passengers WHERE sex = 'female' AND survived = 1 ORDER BY fare DESC LIMIT 10"
result = pd.read_sql(query, conn)
result

### Try This 04c

Write a SQL query to find the 5 youngest female survivors. Display their name, age, and fare. Order by age ascending to show youngest first.

In [None]:
# Try This 04c: Write your query here
query = ""
result = pd.read_sql(query, conn)
result

In [None]:
# Close the database connection when finished
conn.close()