### Creating tables

I divided the *"final_festival_dataset"* into:

1 fact table: *"fact_attendee"*  
2 dim tables: *"dim_ratings"* and *"dim_expenses"*

This will create a star schema linking all 3 tables through *"ticket_id"*

In [None]:
import pandas as pd
import numpy as np

df = pd.read_csv(r"C:\Festival Purchase Behavior Analysis\Datasets\final_festival_dataset.csv")

#-------------------------------
# Creating fact table
#-------------------------------

fact_columns = [
    "ticket_id",
    "attendance_date",
    "gender",
    "age",
    "age_group",
    "ticket_type",
    "favourite_genre",
    "group_size",
    "group_type",
    "is_multiday",
    "recommend_to_friend"
]

# Creates empty df for the fact table
fact_attendees = pd.DataFrame()
# Creates a df containing only the columns needed for the fact table
for col in fact_columns:
    # Iterates through the df and assigns the respective columns to the fact table
    fact_attendees[col] = df[col].copy()

#-------------------------------
# Creating fact table
#-------------------------------

dim_rating_columns = [
    "ticket_id",
    "satisfaction_rating",
    "cleanliness_rating",
    "security_rating",
    "satisfaction_level",
    "cleanliness_level",
    "security_level",
    "mean_rating"
]

dim_expenses_columns = [
    "ticket_id",
    "payment_method",
    "food_expense",
    "drink_expense",
    "merch_expense",
    "total_spent"
]

# Creates empty dictionary to store the dim tables
dim_tables = {}
# Creates a dictionary containing names of dim tables + their respective columns
dim_tables_info = {"dim_ratings": dim_rating_columns, "dim_expenses": dim_expenses_columns}
# Loops through the dictionary and assigns the respective columns to each dim table
# Using .item() it iterates through both the key (table name) and value (columns)
# So it creates a new df for each dim table with the respective columns
for table, col in dim_tables_info.items():
    # Converts col to list type to be able to use it to filter the df
    dim_tables[table] = df[col].copy()

Once created the tables, I proceed to eliminate duplicates from both dim tables.

In [None]:
# Drops duplicates in the dim tables based on ticket_id
for table in dim_tables:
    dim_tables[table] = dim_tables[table].drop_duplicates("ticket_id")

After all 3 tables are properly fixed, I saved them.

In [None]:
# I create the string location for the CSV file
directory = r"C:\Festival Purchase Behavior Analysis\Tables"
# Save the fact table to a CSV file
fact_attendees.to_csv(fr"{directory}\fact_table\fact_attendees.csv", index=False)
# Save each dimension table to a separate CSV file in a shared directory
for table in dim_tables:
    dim_tables[table].to_csv(fr"{directory}\dim_tables\{table}.csv", index=False)