<a href="https://colab.research.google.com/github/diogo-costa-silva/sql-murder-mystery/blob/main/SQL_gabriel.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Notebook from this Medium [article](https://gab-code.medium.com/cracking-the-sql-murder-mystery-a-step-by-step-solution-4daef268fa90).

In [None]:
# getting the database from my github
!wget https://github.com/diogo-costa-silva/assets/raw/main/databases/sql-murder-mystery.db -O sql-murder-mystery.db

--2024-02-21 12:14:44--  https://github.com/diogo-costa-silva/assets/raw/main/databases/sql-murder-mystery.db
Resolving github.com (github.com)... 20.27.177.113
Connecting to github.com (github.com)|20.27.177.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/diogo-costa-silva/assets/main/databases/sql-murder-mystery.db [following]
--2024-02-21 12:14:44--  https://raw.githubusercontent.com/diogo-costa-silva/assets/main/databases/sql-murder-mystery.db
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3661824 (3.5M) [application/octet-stream]
Saving to: ‘sql-murder-mystery.db’


2024-02-21 12:14:45 (14.9 MB/s) - ‘sql-murder-mystery.db’ saved [3661824/3661824]



In [None]:
# Importing the necessary libraries
import pandas as pd
import sqlite3 as sql

In [None]:
# Setting up a connection to the sqlite database
con = sql.connect('sql-murder-mystery.db')

#seting the dataframe width to max
pd.set_option('display.max_colwidth', None)

## The Brief
“A crime has taken place and the detective needs your help. The detective gave you the crime scene report, but you somehow lost it. You vaguely remember that the crime was a ​murder​ that occurred sometime on ​Jan.15, 2018,​ and that it took place in ​SQL City. Start by retrieving the corresponding crime scene report from the police department’s database.”

## Step 1: Delving into the Police Department’s Database Mastery

To kick off our investigation, I familiarized myself with the police department’s database. I executed the following query to retrieve the names of tables in the database:

In [None]:
step_1 = '''
  -- Find the table names in the Murder Mystery database
  SELECT name
  FROM sqlite_master
  WHERE type = "table";
'''

#running the query
pd.read_sql_query(step_1, con)

Unnamed: 0,name
0,crime_scene_report
1,drivers_license
2,facebook_event_checkin
3,interview
4,get_fit_now_member
5,get_fit_now_check_in
6,solution
7,income
8,person


To gain a deeper understanding of how the tables are connected, I took the time to study the schema. Check out the image below for a visual representation of the database schema:

![Schema](https://raw.githubusercontent.com/NUKnightLab/sql-mysteries/master/schema.png)

This initial exploration laid the groundwork for navigating through the database in our quest to solve the mystery.

## Step 2: Unraveling the Crime Scene Details

With crucial information about the crime type, date, and location in hand, I executed the following query to extract detailed insights from the crime_scene_report table:

In [None]:
step_2 = '''
  -- Check details of the crime
  SELECT *
  FROM crime_scene_report
  WHERE type = "murder" AND city LIKE "%SQL%" AND date = 20180115;
'''

#running the query
pd.read_sql_query(step_2, con)

Unnamed: 0,date,type,description,city
0,20180115,murder,"Security footage shows that there were 2 witnesses. The first witness lives at the last house on ""Northwestern Dr"". The second witness, named Annabel, lives somewhere on ""Franklin Ave"".",SQL City


The investigation reveals that **two witnesses** were present during the crime. However, the details of the witnesses are sparse at this point — **Witness 1** resides at the **last house** on **Northwestern Dr**, while **Witness 2**, **Annabel**, lives on **Franklin Avenue**.

Our next move? Dive deeper into the lives of these witnesses to propel our investigation forward.

## Step 3: Unveiling Witness Insights

In our quest for valuable information about our two key witnesses, I devised **two main conditions** utilizing the power of subqueries.

In **Condition 1**, the aim was to fetch records related to **Witness 2** or any closely associated records. This condition hinges on two criteria: the name column containing “**Annabel**” and the street name aligning with a subquery. The subquery extracts street names from the “**person**” table where the street name resembles “**Frank%**.”

Meanwhile, **Condition 2** focuses on retrieving records related to **Witness 1**. This condition zeroes in on street names resembling “**%North%**” and the **last house number** (MAX ( )function) is used here to produce the last house number corresponding to street addresses resembling “**Northwest%**.”

Let’s take a peek at the SQL magic and the result:

In [None]:
step_3 = '''
  -- Querying the details of the 2 witnesses
  SELECT id, name, address_number, address_street_name
  FROM person
  WHERE (name LIKE 'Annabel%'
        AND address_street_name IN
          (SELECT address_street_name
           FROM person
           WHERE address_street_name LIKE 'Frank%'))
    OR (address_street_name LIKE '%North%'
        AND address_number =
          (SELECT MAX(address_number)
           FROM person
           WHERE address_street_name LIKE 'Northwest%'));
'''

#running the query
pd.read_sql_query(step_3, con)

Unnamed: 0,id,name,address_number,address_street_name
0,14887,Morty Schapiro,4919,Northwestern Dr
1,16371,Annabel Miller,103,Franklin Ave


Our witnesses are unveiled! **Witness 1** is **Morty Shapiro** and **Witness 2** is **Annabel Miller**, with IDs **14887** and **16371**, respectively.

Now, for those new to the world of SQL, I understand the potential head-spinning effect of subqueries. Fear not! I’ve got you covered with simpler optional queries below to ease your journey into witness details:

In [None]:
step_3_1 = '''
  -- optional query A to get Witness 1 details (Morty)
  SELECT id, name, MAX(address_number) AS House_Number, address_street_name
  FROM person
  WHERE address_street_name LIKE ('%Northwestern%');
'''

#running the query
pd.read_sql_query(step_3_1, con)

Unnamed: 0,id,name,House_Number,address_street_name
0,14887,Morty Schapiro,4919,Northwestern Dr


In [None]:
step_3_2 = '''
  -- optional query B to get Witness 1 details
  SELECT id, name, address_number AS House_Number, address_street_name
  FROM person
  ORDER BY address_number DESC
  LIMIT 1;
'''

#running the query
pd.read_sql_query(step_3_2, con)

Unnamed: 0,id,name,House_Number,address_street_name
0,14887,Morty Schapiro,4919,Northwestern Dr


In [None]:
step_3_3 = '''
  -- optional query B to get Witness 1 details
  SELECT id, name, address_number AS House_Number, address_street_name
  FROM person
  WHERE name LIKE '%Anna%' AND address_street_name LIKE '%Frank%';
'''

#running the query
pd.read_sql_query(step_3_3, con)

Unnamed: 0,id,name,House_Number,address_street_name
0,16371,Annabel Miller,103,Franklin Ave


Now embark on this investigative adventure with confidence! 😊

## Step 4: Decrypting Witness Testimonies

Let’s delve into the minds of our witnesses by retrieving and analyzing their interview **transcripts**. Using the IDs **14887** and **16371**, we aim to extract insights from the “**interview**” table.

This query amalgamates information from the “**person**” and “**interview**” tables, connecting them based on the common column **id** in the “**person**” table and **person_id** in the “**interview**” table. The result provides us with a comprehensive understanding of our witnesses’ accounts.

In [None]:
step_4 = '''
  -- Query the database for the 1st (Morty) and 2nd (Annabel) witnesses' interview transcripts
  SELECT p.name, i.transcript
  FROM person p
  JOIN interview i ON p.id = i.person_id
  WHERE p.id = 14887 OR p.id = 16371;
'''

#running the query
pd.read_sql_query(step_4, con)

Unnamed: 0,name,transcript
0,Morty Schapiro,"I heard a gunshot and then saw a man run out. He had a ""Get Fit Now Gym"" bag. The membership number on the bag started with ""48Z"". Only gold members have those bags. The man got into a car with a plate that included ""H42W""."
1,Annabel Miller,"I saw the murder happen, and I recognized the killer from my gym when I was working out last week on January the 9th."


Excitingly, our pool of information deepens! **Annabel** vividly recalls the crime transpiring on **January 9, 2018**, and points to a fellow **gym-goer** as the potential perpetrator.

On the flip side, **Morty Shapiro’s** testimony is equally compelling. He narrates an auditory experience of a gunshot, swiftly followed by a visual encounter with an individual carrying a distinctive “**Get Fit Now Gym**” bag. What’s more intriguing is the gym membership number on the bag, commencing with the enigmatic “**48Z**”. According to Morty, the mysterious figure swiftly fled the crime scene in a getaway vehicle with a license plate featuring “**H42W**”.

Fingers crossed as the pieces of the puzzle start falling into place! 🕵️‍♂️

## Step 5: Navigating the Crime Trail — Suspect Identification

Armed with crucial details from the witnesses, including the crime date, killer’s **gender**, **gym membership**, and **car plate number**, we embark on a targeted query to track down the elusive suspect. The common thread in the witness statements is the gym, making it a focal point for our investigation.

In [None]:
step_5 = '''
  -- Confirm suspect's gym detail based on witness statements
  SELECT id, person_id, name, membership_status
  FROM get_fit_now_member
  WHERE id LIKE "%48Z%" AND membership_status LIKE "%Gold%";
'''

#running the query
pd.read_sql_query(step_5, con)

Unnamed: 0,id,person_id,name,membership_status
0,48Z7A,28819,Joe Germuska,gold
1,48Z55,67318,Jeremy Bowers,gold


Voila! We now have two prime suspects in our sights: **Joe Germuska** and **Jeremy Bowers**, both holding the prestigious status of **gold members**. The plot thickens as we zero in on these individuals, scrutinizing their every move.

## Step 6: Gym Alibis Under Scrutiny

In our pursuit of truth, we turn our attention to the gym records to discern the whereabouts of our suspects on the fateful day of the crime. The query below aims to retrieve crucial details about **check-ins** at the fitness centre, encompassing **membership_id**, **membership status**, **member name**, **check-in date**, **check-in time**, and **check-out time** on the date the crime occurred.

In [None]:
step_6 = '''
  SELECT chk.membership_id,
         g_fit.membership_status,
         g_fit.name,
         chk.check_in_date,
         chk.check_in_time,
         chk.check_out_time
  FROM get_fit_now_check_in chk
  JOIN get_fit_now_member g_fit ON chk.membership_id = g_fit.id
  WHERE (chk.membership_id LIKE '%48Z7A%' OR chk.membership_id LIKE '%48Z55%')
    AND chk.check_in_date = 20180115;
'''

#running the query
pd.read_sql_query(step_6, con)

Unnamed: 0,membership_id,membership_status,name,check_in_date,check_in_time,check_out_time


The outcome, unfortunately, yields **no data**. This leaves us at a crossroads — either both suspects were absent from the gym on that specific date, or… there’s a twist waiting to be uncovered. Our investigative instincts tell us there’s more to explore. Let’s plunge deeper into the digital breadcrumbs and unveil the final piece of this enigmatic puzzle.

## Step 7: Connecting the Dots — Suspect Showdown

In this pivotal step, we amalgamate comprehensive details about our suspects, **Jeremy** and **Joe**, leveraging their **membership IDs**, **names**, and associated **license plate numbers** to unravel the truth.

In [None]:
step_7 = '''
  -- Let's use other info to track down the suspect
  SELECT g.id AS membership_id,
         g.person_id,
         g.name,
         g.membership_status,
         p.license_id,
         d.plate_number
  FROM get_fit_now_member g
  JOIN person p ON g.person_id = p.id
  JOIN drivers_license d ON p.license_id = d.id
  WHERE g.id LIKE "%48Z%" AND (p.name LIKE "%Joe%" OR p.name LIKE "%JER%")
                          AND d.plate_number LIKE "%H42W%";
'''

#running the query
pd.read_sql_query(step_7, con)

Unnamed: 0,membership_id,person_id,name,membership_status,license_id,plate_number
0,48Z55,67318,Jeremy Bowers,gold,423327,0H42W2


Gotcha! **Jeremy Bowers**, it seems the jig is up.

The result points to **Jeremy Bowers**, with a plate number that aligns seamlessly with the one disclosed in **Morty’s** testimony.

So, Jeremy, care to explain why your plate number matches the getaway vehicle mentioned in the crime scene? The game is afoot, and we’re closing in on the elusive.

Let’s confirm the solution 👇

![Check-your-solution](https://raw.githubusercontent.com/diogo-costa-silva/sql-murder-mystery/main/images/check_your_solution_1.png)


# Unmasking the Mastermind — Jeremy’s Revelation

## Step 8: Unveiling the Puppet Master — Insights from Jeremy’s Testimony

Only **Jeremy** can lead us to the brain behind the murder! In our relentless pursuit of the truth, we now shift our focus to Jeremy Bowers, the prime suspect. A detailed review of Jeremy’s interview **transcript** unfolds a trove of information that could lead us to the orchestrator of this intricate crime.

In [None]:
step_8 = '''
  -- Querying Jeremy's Bowers interview transcript
  SELECT p.id, p.name, i.transcript
  FROM person p
  JOIN interview i ON p.id = i.person_id
  WHERE i.person_id = 67318;
'''

#running the query
pd.read_sql_query(step_8, con)

Unnamed: 0,id,name,transcript
0,67318,Jeremy Bowers,"I was hired by a woman with a lot of money. I don't know her name but I know she's around 5'5"" (65"") or 5'7"" (67""). She has red hair and she drives a Tesla Model S. I know that she attended the SQL Symphony Concert 3 times in December 2017.\n"


Hold onto your seats! We’ve struck gold in Jeremy’s revelations:

- Jeremy discloses that he was hired by a **woman** of substantial financial means.

- Though unaware of her name, Jeremy provides a vivid physical description. The mysterious woman stands at around **5'5" (65")** or **5'7" (67")**, boasts **red hair**, and drives a **Tesla Model S**.

- Adding another layer to the mystery, the woman attended the **SQL Symphony Concert** **three** times in **December 2017**.


This revelation transforms our investigation, providing vital insights into the potential orchestrator of this intricate crime. The detailed physical description and specific details about her attendance at the SQL Symphony Concert now stand as key elements in unmasking and locating this enigmatic woman. The final pieces of the puzzle are falling into place, and the climax of our investigation draws near.

## Step 9: A Confluence of Data — Miranda Priestly Emerges

In a masterful stroke of cross-referencing, we combine key information from Jeremy’s confession across multiple tables. The query orchestrates a symphony involving the "**person**", "**drivers_license**", “**income**”, and “**facebook_event_checkin**” tables, unveiling detailed insights about individuals meeting specific criteria laid out in Jeremy’s confession.

In [None]:
step_9 = '''
  -- Use details of Jeremy's confession to track down the mastermind of the murder
  SELECT p.id,
        p.name,
        i.annual_income,
        f.event_name,
        f.date AS event_date,
        d.age,
        d.gender,
        d.height,
        d.hair_color,
        d.car_make,
        d.car_model
  FROM person p
  JOIN drivers_license d ON p.license_id = d.id
  JOIN income i ON p.ssn = i.ssn
  JOIN facebook_event_checkin f ON p.id = f.person_id
  WHERE d.hair_color = 'red'
    AND f.event_name LIKE '%Symphon%'
    AND d.gender = 'female'
    AND d.car_make LIKE '%Tesl%'
    AND d.car_model LIKE '%s%';
'''

#running the query
pd.read_sql_query(step_9, con)

Unnamed: 0,id,name,annual_income,event_name,event_date,age,gender,height,hair_color,car_make,car_model
0,99716,Miranda Priestly,310000,SQL Symphony Concert,20171206,68,female,66,red,Tesla,Model S
1,99716,Miranda Priestly,310000,SQL Symphony Concert,20171212,68,female,66,red,Tesla,Model S
2,99716,Miranda Priestly,310000,SQL Symphony Concert,20171229,68,female,66,red,Tesla,Model S


Behold the convergence!

We’ve zeroed in on a match, and the spotlight shines on Miranda Priestly.

The alignment of Miranda’s attendance at the SQL Symphony Concert and the detailed description provided by Jeremy Bowers creates a compelling narrative. Finally, the puzzle pieces interlock, and Miranda Priestly emerges as a key figure in our investigation.

Let’s confirm that Miranda Priestly is the right answer👇

![Check-your-solution](https://raw.githubusercontent.com/diogo-costa-silva/sql-murder-mystery/main/images/check_your_solution_2.png)

Mission accomplished, we found the mastermind!