# Lesson 2 Exercise 3: Creating Fact and Dimension Tables with Star Schema

![](../images/postgresSQLlogo.png)

Walk through the basics of modeling data using Fact and Dimension tables. You will create both Fact and Dimension tables and show how this is a basic element of the Star Schema. 
Where you see ##### you will need to fill in code. 
This exercise will be more challenging than the last. Use the information provided to create the tables and write the insert statements. 

In [None]:
import psycopg2
from src.database import (
    get_pg_connection,
    get_pg_cursor,
    create_pg_table,
    drop_pg_table,
    insert_pg_rows,
    close_pg_connection,
)

tables = set()

conn = get_pg_connection()
cur = get_pg_cursor(conn)
conn.set_session(autocommit=True)

Imagine you work at an online Music Store. There will be many tables in our database, but let's just focus on 4 tables around customer purchases. 

![](images/starSchema.png)

From this representation you can start to see the makings of a "STAR". You will have one fact table (the center of the star) and 3 dimension tables that are coming from it.

### TO-DO: Create the Fact table and insert the data into the table

In [None]:
table_name = "customer_transactions"
tables.add(table_name)
column_names = ["customer_id", "store_id", "spent"]
column_types = ["int", "int", "float4"]
columns = {name: type for name, type in zip(column_names, column_types)}
drop_pg_table(conn, table_name)
create_pg_table(conn, table_name, columns)
insert_pg_rows(
    conn,
    table_name,
    column_names,
    [(1, 1, 20.5), (2, 1, 35.21)],
)

### TO-DO: Create the Dimension tables and insert data into those tables.

In [None]:
# Customer
table_name = "customer"
tables.add(table_name)
column_names = ["customer_id", "customer_name", "rewards"]
column_types = ["int", "varchar", "varchar(1)"]
columns = {name: type for name, type in zip(column_names, column_types)}
drop_pg_table(conn, table_name)
create_pg_table(conn, table_name, columns)
insert_pg_rows(
    conn,
    table_name,
    column_names,
    [(1, "Amanda", "Y"), (2, "Toby", "N")],
)

# Items purchased
table_name = "items_purchased"
tables.add(table_name)
column_names = ["customer_id", "item_number", "item_name"]
column_types = ["int", "int", "varchar"]
columns = {name: type for name, type in zip(column_names, column_types)}
drop_pg_table(conn, table_name)
create_pg_table(conn, table_name, columns)
insert_pg_rows(
    conn,
    table_name,
    column_names,
    [(1, 1, "Rubber Soul"), (2, 3, "Let It Be")],
)

# Store
table_name = "store"
tables.add(table_name)
column_names = ["store_id", "state"]
column_types = ["int", "varchar"]
columns = {name: type for name, type in zip(column_names, column_types)}
drop_pg_table(conn, table_name)
create_pg_table(conn, table_name, columns)
insert_pg_rows(
    conn,
    table_name,
    column_names,
    [(1, "CA"), (2, "WA")],
)

Now run the following queries on this data easily because of utilizing the Fact/ Dimension and Star Schema
 
- **Query 1**: Find all the customers that spent more than 30 dollars, who are they, which store they bought it from, location of the store, what they bought and if they are a rewards member.

- **Query 2**: How much did Customer 2 spend?

### Query 1:

In [None]:
try:
    cur.execute(
        "SELECT \
            ct.customer_id, c.customer_name, s.store_id, s.state, it.item_name, c.rewards \
        FROM  \
            ((customer_transactions ct JOIN customer c ON ct.customer_id = c.customer_id) \
            JOIN store s ON ct.store_id = s.store_id) \
            JOIN items_purchased it ON ct.customer_id = it.customer_id WHERE ct.spent > 30;"
    )
except psycopg2.Error as e:
    print("Error: select *")
    print(e)

row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

### Your output from the above cell should look like this:
('Toby', 1, 'CA', 'Let It Be', False)

### Query 2: 

In [None]:
try:
    cur.execute(
        "SELECT \
            ct.customer_id, c.customer_name, SUM(ct.spent) \
        FROM  \
            customer_transactions ct JOIN customer c ON ct.customer_id = c.customer_id \
        WHERE ct.customer_id = 2 \
        GROUP BY ct.customer_id, c.customer_name;"
    )

except psycopg2.Error as e:
    print("Error: select *")
    print(e)

row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

Your output from the above cell should include Customer 2 and the amount: 
(2, 35.21)

## Summary
You can see here from this elegant schema that we were: 1) able to get "facts/metrics" from our fact table (how much each store sold), and 2) information about our customers that will allow us to do more indepth analytics to get answers to business questions by utilizing our fact and dimension tables. 

### TO-DO: Drop the tables

In [None]:
for table in tables:
    drop_pg_table(conn, table)

# close connection
close_pg_connection(conn)