# Assignment #8 - Data Gathering and Warehousing - DSSA-5102

Instructor: Melissa Laurino</br>
Spring 2025</br>

Name: Justin Davis 
</br>
Date: 04/10/25
<br>
<br>
**At this time in the semester:** <br>
- We have explored a dataset. <br>
- We have cleaned our dataset. <br>
- We created a Github account with a repository for this class and included a metadata read me file about our data. <br>
- We introduced general SQL syntax, queries, and applications in Python.<br>
- Created our own databases from scratch using MySQL Workbench and Python with SQLAlchemy on our local server and locally on our machine.
- Populated our databases with the data we cleaned at the start of the semester.
<br>

Now we will **JOIN** our knowledge and tables to answer more complex questions about our dataset! We will practice joining tables and understand the importance of using different commands.<br>

JOIN statements are used to combine results from two or more tables based on a related column between them.<br>

Review the powerpoint and readings specified on Blackboard.<br>

In the event your database does not meet the requirements below to answer the question, please use the database provided in Assignment #4 and #5. Remember to credit your data source. <br>

Follow the instructions below to complete the assignment. Write your question you are answering with your data query and visualize your results in a way that fits your data. <br>
Be sure to comment **all** code and answer **all** questions in markdown for full credit.<br>

**Data origin:** This database was created by @NUKnightLab on Github and can be found here: https://github.com/NUKnightLab/sql-mysteries

In [1]:
# Load necessary libraries
from sqlalchemy import create_engine, inspect, text # Database navigation
import sqlite3 # A second option for working with databases
import pandas as pd # Python data manilpulation
import matplotlib.pyplot as plt #used for creating a bar plot

In [2]:
# Connect to our .db file
db_file = "sql-murder-mystery.db"
engine = create_engine(f"sqlite:///{db_file}") #using create_engine function for connecting to database

In [3]:
# For a quick reference for tables and columns, refer to schema on Blackboard, or list the tables and fields below:
inspector = inspect(engine)

#getting list of table names
tables = inspector.get_table_names()

#loop through all table and column names
for table in tables:
    print(f"\nColumns in '{table}' table:")
 
    columns = inspector.get_columns(table)
    for col in columns:
        print(f" {col['name']} - {col['type']}")


Columns in 'crime_scene_report' table:
 date - INTEGER
 type - TEXT
 description - TEXT
 city - TEXT

Columns in 'drivers_license' table:
 id - INTEGER
 age - INTEGER
 height - INTEGER
 eye_color - TEXT
 hair_color - TEXT
 gender - TEXT
 plate_number - TEXT
 car_make - TEXT
 car_model - TEXT

Columns in 'facebook_event_checkin' table:
 person_id - INTEGER
 event_id - INTEGER
 event_name - TEXT
 date - INTEGER

Columns in 'get_fit_now_check_in' table:
 membership_id - TEXT
 check_in_date - INTEGER
 check_in_time - INTEGER
 check_out_time - INTEGER

Columns in 'get_fit_now_member' table:
 id - TEXT
 person_id - INTEGER
 name - TEXT
 membership_start_date - INTEGER
 membership_status - TEXT

Columns in 'income' table:
 ssn - INTEGER
 annual_income - INTEGER

Columns in 'interview' table:
 person_id - INTEGER
 transcript - TEXT

Columns in 'person' table:
 id - INTEGER
 name - TEXT
 license_id - INTEGER
 address_number - INTEGER
 address_street_name - TEXT
 ssn - INTEGER

Columns in 'solu

#### INNER JOIN (or JOIN)
Display matching records from TWO tables! Choose to combine two tables using inner join. <br>
Write your question you are answering with your data query and visualize your results. <br>
<br>
**Example Question:** Which gym members have actually checked into the gym?
<br>
**What tables are we joining?** get_fit_now_member and get_fit_now_check_in

In [14]:
#establishing a connection to database
with engine.connect() as connection:  
    #defining the SQL query for inner join 
    query_inner = text("""
        -- selecting specific columns from both tables and renaming them 
        SELECT get_fit_now_member.id AS membership_id, get_fit_now_member.name, 
               get_fit_now_check_in.check_in_date, get_fit_now_check_in.check_in_time
        -- selecting information from the first table
        FROM get_fit_now_member
        -- using inner join to connect get_fit_now_member and get_fit_now_check_in
        INNER JOIN get_fit_now_check_in
        -- using on to match id from get_fit_now_member with membership_id from get_fit_now_check_in
        ON get_fit_now_member.id = get_fit_now_check_in.membership_id
    """) 
    #execute query 
    result_inner = pd.read_sql(query_inner, connection)  # Execute the query and fetch results
#printing results 
result_inner

Unnamed: 0,membership_id,name,check_in_date,check_in_time
0,NL318,Everette Koepke,20180212,329
1,NL318,Everette Koepke,20170811,469
2,NL318,Everette Koepke,20180429,506
3,NL318,Everette Koepke,20180128,124
4,NL318,Everette Koepke,20171027,418
...,...,...,...,...
2698,4KB72,Emile Hege,20170422,1016
2699,4KB72,Emile Hege,20170630,408
2700,48Z7A,Joe Germuska,20180109,1600
2701,48Z55,Jeremy Bowers,20180109,1530


#### LEFT JOIN
<br>
Returns ALL records from the left table and matching records from the right table. Write your question you are answering with your data query and visualize your results.. 
<br><br>
**Question:** What are the incomes of all people, even if their income record is missing?
<br>
**What tables are we joining?** person and income

In [15]:
#establish a connection to the database 
with engine.connect() as connection: 
    #defining the SQL query for left join 
    query_left = text("""
        -- selecting specific columns from both tables and renaming them 
        SELECT person.id AS person_id, person.name, income.annual_income
        -- selecting information from the first table
        FROM person
        -- using left join to perform the action between the first table (person) and the second table (income)
        LEFT JOIN income
        -- using on to select all of the person's incomes, even if income is missing
        ON person.ssn = income.ssn
    """)
    #execute query 
    result_left = pd.read_sql(query_left, connection)  # Execute the query and fetch results
#print results
result_left

Unnamed: 0,person_id,name,annual_income
0,10000,Christoper Peteuil,31000.0
1,10007,Kourtney Calderwood,24000.0
2,10010,Muoi Cary,14800.0
3,10016,Era Moselle,47400.0
4,10025,Trena Hornby,
...,...,...,...
10006,99936,Luba Benser,35100.0
10007,99941,Roxana Mckimley,80100.0
10008,99965,Cherie Zeimantz,70200.0
10009,99982,Allen Cruse,78500.0


#### RIGHT JOIN
<br>
Returns ALL records from the right table and matching records from the left table. Write your question you are answering with your data query and visualize your results.. <br>
<br>
**Question:** Who checked into Facebook events, even if we don't have their name on file? <br>
**What tables are we joining?** person and facebook_event_checkin

In [16]:
#establish a connection to the database 
with engine.connect() as connection: 
    #defining the SQL query for right join 
    query_right = text("""
        -- selecting specific columns from both tables and renaming them 
        SELECT person.id AS person_id, person.name, facebook_event_checkin.event_name, facebook_event_checkin.date
        -- selecting information from the first table
        FROM person
        -- using right join to connect the person table to the facebook_events_checkin
        RIGHT JOIN facebook_event_checkin
        -- using on to check who checked into facebook events, even if theres no name 
        ON person.id = facebook_event_checkin.person_id
    """)
    #execute query 
    result_right = pd.read_sql(query_right, connection)  # Execute the query and fetch results
#print results
result_right

Unnamed: 0,person_id,name,event_name,date
0,10000,Christoper Peteuil,Steinbach's Guideline for Systems Programming\n,20170306
1,10000,Christoper Peteuil,The Universe is laughing behind your back\n,20171130
2,10007,Kourtney Calderwood,Green light in A.M. for new projects. Red lig...,20170925
3,10007,Kourtney Calderwood,Modern man is the missing link between apes an...,20171017
4,10010,Muoi Cary,upon to act in accordance with the dictates of...,20180319
...,...,...,...,...
20006,99965,Cherie Zeimantz,"""That must be wonderful! I don't understand i...",20171202
20007,99965,Cherie Zeimantz,pedestrians.\n,20170705
20008,99982,Allen Cruse,Don't cook tonight -- starve a rat today!\n,20170609
20009,99982,Allen Cruse,Do not drink coffee in early A.M. It will kee...,20180404


#### FULL JOIN or UNION of RIGHT JOIN and LEFT JOIN
<br>
Can answer multiple objectives at the same time! Not recommended for large databases. Results may slow your machine or quit before finishing. Write your question you are answering with your data query and visualize your results. <br>
<br>
**Question:** List all individuals and all income records.
<br>
**What tables are we joining?** person and income 

In [17]:
#establish a connection to the database 
with engine.connect() as connection: 
    #defining query for full join 
    query_full_union = text("""
        -- selecting specific columns from both tables 
        SELECT person.id AS person_id, person.name, income.annual_income
        -- selecting the data from the first table (person)
        FROM person
        -- using left join to include all rows from person table
        -- and match it to the income table
        LEFT JOIN income
        ON person.ssn = income.ssn

        -- union will combine both the left and right join 
        UNION

        -- selecting specific columns from both tables for the right join 
        SELECT person.id AS person_id, person.name, income.annual_income
        -- selecting data from the first table of the right join (person)
        FROM person
        -- right joining to match all rows from the income table and match it with the persons table
        RIGHT JOIN income
        ON person.ssn = income.ssn
    """)
    #execute query 
    result_full_union = pd.read_sql(query_full_union, connection)  # Execute the query and fetch results
#print results
result_full_union

Unnamed: 0,person_id,name,annual_income
0,10000,Christoper Peteuil,31000.0
1,10007,Kourtney Calderwood,24000.0
2,10010,Muoi Cary,14800.0
3,10016,Era Moselle,47400.0
4,10025,Trena Hornby,
...,...,...,...
10006,99936,Luba Benser,35100.0
10007,99941,Roxana Mckimley,80100.0
10008,99965,Cherie Zeimantz,70200.0
10009,99982,Allen Cruse,78500.0


In [20]:
# Close your connection :)
connection.close()