Create and Insert Assignment
This repo contains a file named `Titanic`. Your task is to load the data from the provided file(use `pandas`). 

Using the `psycopg2` and `pandas` library:

* Read in the `titanic.csv` file to a DataFrame object.
* Use `df.to_sql()` or create a `Base` class to insert the data into a new table named `titanic` in a PostGreSQL database.

Then, in SQL, write the following queries to test:

* Count how many rows you have.
* How many people survived?
* What passenger class has the largest population?

These queries should be saved to a .sql file and uploaded along with your pipeline to create the database/table

In [11]:
import pandas as pd
import psycopg2   # <== I suspect that this is not doing anything
from sqlalchemy import create_engine, text, select

# pretty sure that the PostgreSQL extension to VS is doing the work here

In [7]:
# this URL was the toughest part
sql_url = 'removed at GitHub request'

In [3]:
titanic = pd.read_csv('data/titanic.csv')
titanic.columns = (titanic.columns.str.lower().str.replace(" ", "_").str.replace("/", "_"))

In [5]:
titanic.head()

Unnamed: 0,survived,pclass,name,sex,age,siblings_spouses_aboard,parents_children_aboard,fare
0,0,3,Mr. Owen Harris Braund,male,22.0,1,0,7.25
1,1,1,Mrs. John Bradley (Florence Briggs Thayer) Cum...,female,38.0,1,0,71.2833
2,1,3,Miss. Laina Heikkinen,female,26.0,0,0,7.925
3,1,1,Mrs. Jacques Heath (Lily May Peel) Futrelle,female,35.0,1,0,53.1
4,0,3,Mr. William Henry Allen,male,35.0,0,0,8.05


In [8]:
# write to ElephantSQL
titanic.to_sql('titanic', con=sql_url, if_exists='replace')

887

In [9]:
# create SQLAlchemy engine connection
engine = create_engine(sql_url)

In [14]:
# count how many rows in titanic table
sql_query = text("""select count(*) from titanic""")
with engine.connect() as conn:
    for row in conn.execute(sql_query):
        print(f'The number of rows in the Titanic dataset are: {row[0]}')

The number of rows in the Titanic dataset are: 887


In [15]:
# how many people survived?
sql_query = text("""select count(*) from titanic where survived=1""")
with engine.connect() as conn:
    for row in conn.execute(sql_query):
        print(f'The number of people who survived the sinking is: {row[0]}')

The number of people who survived the sinking is: 342


In [19]:
# what passenger class had the largest population?
sql_query = text("""select pclass as passenger_class, 
                           count(pclass) 
                    from titanic group by pclass
                    order by count desc""")
with engine.connect() as conn:
    result_proxy = conn.execute(sql_query)

In [20]:
test_db_query = pd.DataFrame(result_proxy.fetchall(), columns=result_proxy.keys())
test_db_query

Unnamed: 0,passenger_class,count
0,3,487
1,1,216
2,2,184


In [21]:
# just for yucks now that I got the tables loaded
sql_query = text("""select  c.first_name, 
                            c.last_name, 
                            b.brand_name, 
                            p.product_name, 
                            p.amount as prod_amount,
                            o.order_date,
                            o.sub_total,
                            o.total_cost
                        from customer c, cart ct, orders o, brand b, product p
                        where c.customer_id = ct.customer_id
                        and  ct.order_id = o.order_id
                        and  b.seller_id = p.seller_id
                        and  p.upc = o.upc""")
with engine.connect() as conn:
    result_proxy = conn.execute(sql_query)

In [22]:
test_db_query = pd.DataFrame(result_proxy.fetchall(), columns=result_proxy.keys())
test_db_query

Unnamed: 0,first_name,last_name,brand_name,product_name,prod_amount,order_date,sub_total,total_cost
0,Joel,Carter,Coding Temple,Python 101,20.0,2023-10-10,40.0,43.35
1,Sam,Snead,Flatiron School,Javascript 220,25.0,2023-10-09,45.0,48.32
