# dbApps04 DIY Task: SQL Query Challenge

**Lesson:** 04 â€” SQL Fundamentals

In this task, you will design and write your own SQL queries to extract meaningful insights from the Titanic dataset. Combine everything you've learned about SELECT, WHERE, AND/OR/NOT, LIKE, BETWEEN, and IN.

## Setup

Connect to the titanic.db database.

In [None]:
# Import pandas and sqlite3 libraries
import pandas as pd
import sqlite3
import os

In [None]:
# Connect to the titanic.db database
db_path = 'titanic.db'
csv_path = 'Titanic Dataset.csv'

if not os.path.exists(db_path):
    df = pd.read_csv(csv_path)
    conn = sqlite3.connect(db_path)
    df.to_sql('passengers', conn, index=False, if_exists='replace')
    print("Database created from CSV.")
else:
    conn = sqlite3.connect(db_path)
    print("Connected to existing titanic.db.")

## Part 1: Targeted Queries

Write SQL queries to answer specific questions. Use ORDER BY and LIMIT to refine results.

### Task 1.1

Find the 10 oldest passengers. Show name and age, sorted by age DESC.

In [None]:
# Query to find the 10 oldest passengers, sorted by age in descending order
query_1_1 = """

"""

result_1_1 = pd.read_sql(query_1_1, conn)
result_1_1

### Task 1.2

Find the 10 most expensive tickets. Show name, fare, pclass, sorted by fare DESC.

In [None]:
# Query to find the 10 passengers with the highest fares
query_1_2 = """

"""

result_1_2 = pd.read_sql(query_1_2, conn)
result_1_2

### Task 1.3

Find all children (age <= 12) who survived. Show name, age, survived. Sort by age.

In [None]:
# Query to find all children who survived, sorted by age ascending
query_1_3 = """

"""

result_1_3 = pd.read_sql(query_1_3, conn)
result_1_3

### Task 1.4

Find all passengers whose name contains 'Dr.' and show their name, survived, and pclass. Sort by pclass.

In [None]:
# Query to find passengers with 'Dr.' in their name, sorted by passenger class
query_1_4 = """

"""

result_1_4 = pd.read_sql(query_1_4, conn)
result_1_4

### Task 1.5

Find passengers aged 18 to 25 from first class. Show name, age, pclass, fare. Sort by fare DESC.

In [None]:
# Query to find young first-class passengers, sorted by fare descending
query_1_5 = """

"""

result_1_5 = pd.read_sql(query_1_5, conn)
result_1_5

## Part 2: Analysis Queries

Write queries to answer analytical questions. Provide markdown answers explaining your findings.

### Task 2.1

How many female passengers survived vs how many male passengers survived? Write two queries and compare. Answer in markdown.

In [None]:
# Query 1: Count female survivors
query_2_1a = """

"""

result_2_1a = pd.read_sql(query_2_1a, conn)
print("Female survivors:")
print(result_2_1a)

In [None]:
# Query 2: Count male survivors
query_2_1b = """

"""

result_2_1b = pd.read_sql(query_2_1b, conn)
print("Male survivors:")
print(result_2_1b)

**Analysis:** (Write your comparison here. What patterns do you notice?)



### Task 2.2

Find the top 5 most expensive tickets for passengers who did NOT survive. Show name, fare, pclass. What do you notice?

In [None]:
# Query to find the most expensive tickets for non-survivors
query_2_2 = """

"""

result_2_2 = pd.read_sql(query_2_2, conn)
result_2_2

**Observation:** (What patterns do you see? Were expensive tickets helpful for non-survivors?)



### Task 2.3

How many passengers embarked from each port? Write 3 queries using WHERE for each port (C, Q, S) and count with len(). Answer in markdown.

In [None]:
# Query for Cherbourg (C)
query_2_3a = """

"""

result_2_3a = pd.read_sql(query_2_3a, conn)
print(f"Passengers from Cherbourg: {len(result_2_3a)}")

In [None]:
# Query for Queenstown (Q)
query_2_3b = """

"""

result_2_3b = pd.read_sql(query_2_3b, conn)
print(f"Passengers from Queenstown: {len(result_2_3b)}")

In [None]:
# Query for Southampton (S)
query_2_3c = """

"""

result_2_3c = pd.read_sql(query_2_3c, conn)
print(f"Passengers from Southampton: {len(result_2_3c)}")

**Summary:** (Create a markdown table or written summary of embarkation port distribution)



### Task 2.4

Write a single query that combines ALL of these: specific columns, a WHERE with AND, ORDER BY DESC, and LIMIT. Describe what your query finds.

In [None]:
# Write a complex query combining multiple SQL features
query_2_4 = """

"""

result_2_4 = pd.read_sql(query_2_4, conn)
result_2_4

**Query Description:** (Explain what your query searches for and why. What business insight does it provide?)



## Part 3: Vocabulary

Demonstrate your understanding of SQL concepts through written definitions.

### Task 3.1

In your own words, define: Boolean logic, relational operator, logical operator.

**Boolean Logic:** 


**Relational Operator:** 


**Logical Operator:** 



### Task 3.2

Write the truth table for AND (all 4 combinations of True/False).

| Condition 1 | Condition 2 | AND Result |
|---|---|---|
| | | |
| | | |
| | | |
| | | |

### Task 3.3

What is the difference between LIKE, BETWEEN, and IN? Give a one-sentence description of each.

**LIKE:** 


**BETWEEN:** 


**IN:** 



## Close Connection and Push to GitHub

Always close the database connection when finished.

In [None]:
# Close the database connection
conn.close()
print("Database connection closed.")