# Analysis of the Cyberchase Episode Database

In this project, we will be solving a series of query problems based on the database `cyberchase.db`, which is associated with the problem set available at [https://cs50.harvard.edu/sql/2024/psets/0/cyberchase/](https://cs50.harvard.edu/sql/2024/psets/0/cyberchase/). This database contains information about episodes of the *Cyberchase* educational television series.

## Database Schema: `cyberchase.db` - Table: `episodes`

The `cyberchase.db` database includes a single table named `episodes`. This table stores key details for each *Cyberchase* episode, organized into the following columns:

| Column Name         | Data Type | Description                                                              |
|---------------------|-----------|--------------------------------------------------------------------------|
| `id`                | INTEGER   | A unique identifier for each episode record.                             |
| `season`            | INTEGER   | The season number of the episode's broadcast.                           |
| `episode_in_season` | INTEGER   | The episode's number within its specific season.                        |
| `title`             | TEXT      | The title of the *Cyberchase* episode.                                  |
| `topic`             | TEXT      | A brief description of the educational concept taught in the episode.    |
| `air_date`          | TEXT      | The date the episode was first aired (YYYY-MM-DD).                       |
| `production_code`   | TEXT      | A unique internal code for the episode.                                |

By understanding this schema, we can formulate SQL queries to extract specific data points and answer the questions posed in the problem set. The subsequent steps will involve writing and executing these queries to explore the *Cyberchase* episode data.

In [50]:
import sqlite3
import pandas as pd

### Connecting to the Database

In [51]:
connection = sqlite3.connect("data_bases/cyberchase.db")
print("Connected to the database successfully!")

Connected to the database successfully!


### Visualizing the first 5 rows of the data

In [52]:
query = """
SELECT *
FROM "episodes"
LIMIT 5
;
"""

df_0 = pd.read_sql_query(query_0, connection)

df_0

Unnamed: 0,id,season,episode_in_season,title,topic,air_date,production_code
0,1,1,1,Lost My Marbles,Navigation,2002-01-21,CYB001
1,2,1,2,Castleblanca,Data Collection and Analysis,2002-01-22,CYB010
2,3,1,3,R-Fair City,Probability and chance,2002-01-23,CYB004
3,4,1,4,Snow Day to be Exact,Estimation,2002-01-24,CYB007
4,5,1,5,Sensible Flats,Area,2002-01-25,CYB005


### 1. Write a SQL query to list the titles of all episodes in Cyberchase’s original season, Season 1.

In [53]:
query = """
SELECT "title"
FROM "episodes"
WHERE "season" = 1
;
"""

df = pd.read_sql_query(query, connection)

df

Unnamed: 0,title
0,Lost My Marbles
1,Castleblanca
2,R-Fair City
3,Snow Day to be Exact
4,Sensible Flats
5,Zeus on the Loose
6,The Poddleville Case
7,And They Counted Happily Ever After
8,Clock Like An Egyptian
9,Secrets of Symmetria


### 2. List the season number of, and title of, the first episode of every season.

In [54]:
query = """
SELECT "season", "title"
FROM "episodes"
WHERE "episode_in_season" = 1
;
"""

df = pd.read_sql_query(query, connection)

df

Unnamed: 0,season,title
0,1,Lost My Marbles
1,2,Hugs & Witches
2,3,EcoHaven CSE
3,4,Balancing Act
4,5,The Halloween Howl
5,6,Digit's B-Day Surprise
6,7,Weather Watchers Gone With The Fog
7,8,The Hacker's Challenge
8,9,An Urchin Matter
9,10,Fit to be Heroes


### 3. Find the production code for the episode “Hackerized!”.

In [55]:
query = """
SELECT "production_code"
FROM "episodes"
WHERE "title" = "Hackerized!"
;
"""

df = pd.read_sql_query(query, connection)

df

Unnamed: 0,production_code
0,CYB091


### 4. Write a query to find the titles of episodes that do not yet have a listed topic.

In [56]:
query = """
SELECT "title"
FROM "episodes"
WHERE "topic" IS NULL
;
"""

df = pd.read_sql_query(query, connection)

df

Unnamed: 0,title
0,Space Waste Odyssey
1,Space Waste Odyssey
2,Giving Thanks Day
3,A Garden Grows in Botlyn
4,Missing Bats in Sensible Flats
5,Water Woes
6,Soil Turmoil
7,Hacker Hugs a Tree
8,Pursuit of the Prism of Power
9,Composting in the Clutch


### 5. Find the title of the holiday episode that aired on December 31st, 2004.

In [57]:
query = """
SELECT "title"
FROM "episodes"
WHERE "air_date" = '2004-12-31'
;
"""

df = pd.read_sql_query(query, connection)

df

Unnamed: 0,title
0,Starlight Night


### 6. List the titles of episodes from season 6 (2008) that were released early, in 2007.

In [58]:
query = """
SELECT "title"
FROM "episodes"
WHERE "season" = 6
    AND "air_date" BETWEEN '2007-01-01'
        AND '2007-12-31'
;
"""

df = pd.read_sql_query(query, connection)

df

Unnamed: 0,title
0,Digit's B-Day Surprise
1,When Penguins Fly


### 7. Write a SQL query to list the titles and topics of all episodes teaching fractions.

In [59]:
query = """
SELECT "title", "topic"
FROM "episodes"
WHERE "topic" LIKE "%fraction%"
;
"""

df = pd.read_sql_query(query, connection)

df

Unnamed: 0,title,topic
0,Zeus on the Loose,Fractions
1,Harriet Hippo & the Mean Green,Equivalent Fractions
2,Shari Spotter and the Cosmic Crumpets,Mixed-Number Fractions
3,A Fraction of a Chance,Fractions 101
4,"Peace, Love, and Hackerness",Measuring with Mixed Number Fractions
5,Trash Creep,"Fractions, Effects of Trash, and Recycling"


### 8. Write a query that counts the number of episodes released in the last 6 years, from 2018 to 2023, inclusive.

In [60]:
query = """
SELECT COUNT(*) AS "Num_eps_2018_to_2023"
FROM "episodes"
WHERE "air_date" BETWEEN '2018-01-01'
    AND '2023-12-31'
;
"""

df = pd.read_sql_query(query, connection)

df

Unnamed: 0,Num_eps_2018_to_2023
0,31


### 9. Write a query that counts the number of episodes released in Cyberchase’s first 6 years, from 2002 to 2007, inclusive.

In [61]:
query = """
SELECT COUNT(*) AS "Num_eps_2007_to_2007"
FROM "episodes"
WHERE "air_date" BETWEEN '2002-01-01'
    AND '2007-12-31'
;
"""

df = pd.read_sql_query(query, connection)

df

Unnamed: 0,Num_eps_2007_to_2007
0,74


### 10. Write a SQL query to list the ids, titles, and production codes of all episodes. Order the results by production code, from earliest to latest.

In [62]:
query = """
SELECT "id", "title", "production_code"
FROM "episodes"
ORDER BY "production_code", "air_date"
;
"""

df = pd.read_sql_query(query, connection)

df

Unnamed: 0,id,title,production_code
0,1,Lost My Marbles,CYB001
1,6,Zeus on the Loose,CYB002
2,8,And They Counted Happily Ever After,CYB003
3,3,R-Fair City,CYB004
4,5,Sensible Flats,CYB005
...,...,...,...
135,136,A Garden is Born,CYB137
136,137,Clean-Up on Isle 8,CYB138
137,140,"Trees, Please",CYB139
138,138,Weather or Not,CYB140


### 11. List the titles of episodes from season 5, in reverse alphabetical order.

In [63]:
query = """
SELECT "title"
FROM "episodes"
WHERE "season" = 5
ORDER BY "title" DESC
;
"""

df = pd.read_sql_query(query, connection)

df

Unnamed: 0,title
0,The Halloween Howl
1,The Flying Parallinis
2,The Fairy Borg Father
3,On the Line
4,Inside Hacker
5,EcoHaven Ooze
6,Designing Mr. Perfect
7,Crystal Clear
8,A Fraction of a Chance
9,A Clean Sweep


### 12. Count the number of unique episode titles.

In [64]:
query = """
SELECT COUNT(DISTINCT "title") AS "NUM_unique_titles"
FROM "episodes"
;
"""

df = pd.read_sql_query(query, connection)

df

Unnamed: 0,NUM_unique_titles
0,136


### Disconnecting from the Database

In [65]:
connection.close()