<h1 style="color: #8b5e3c;">Merging AccountLevel, Game Level & Seat Level</h1>
In this Jupyter Notebook, we aim to merge `AccountLevel`, `GameLevel` & `SeatLevel` datasets. The purpose of merging these datasets is that we would like to find any possible relationships between the features of all three datasets by combining them together.

<h3 style="color: #8b5e3c">Converting the CSV File to a Pandas Dataframe</h3>
In this section, we convert the `.csv` file into a pandas dataframe by importing pandas and using `.read_csv` to read the csv into a pandas data frame. Finally, we display the dataframes that we have created.

In [3]:
# importing the pandas library
import pandas as pd

# importing ipython display
from IPython.display import display

# importing the .csv files as dataframes
game_seat_df_2024 = pd.read_csv("C:/Users/galvanm/python/BucksHackathon25/BucksDatasets/GLSL_2024.csv").drop(columns="Unnamed: 0")
game_seat_df_2023 = pd.read_csv("C:/Users/galvanm/python/BucksHackathon25/BucksDatasets/GLSL_2023.csv").drop(columns="Unnamed: 0")
game_seat_df = pd.read_csv("C:/Users/galvanm/python/BucksHackathon25/BucksDatasets/GLSL.csv").drop(columns="Unnamed: 0")
account_df = pd.read_csv("C:/Users/galvanm/python/BucksHackathon25/BucksDatasets/AccountLevel.csv")

# splitting the AccountLevel dataset by Season
account_df_2024 = account_df[account_df['Season'] == 2024].sort_values('AccountNumber', ascending=True)
account_df_2023 = account_df[account_df['Season'] == 2023].sort_values('AccountNumber', ascending=True)

# displaying the data frames
display(game_seat_df_2024)
display(account_df_2024)


Unnamed: 0,Unnamed: 0_x,Season,AccountNumber,Game,GameDate,GameTier,Unnamed: 0_y,Giveaway,GiveawayLabel
0,94012,2024,2,2025-02-20 Los Angeles Clippers,2025-02-20,C,67,,0
1,94013,2024,2,2025-02-20 Los Angeles Clippers,2025-02-20,C,67,,0
2,119608,2024,34,2024-11-22 Indiana Pacers,2024-11-22,B,49,,0
3,119609,2024,34,2024-11-22 Indiana Pacers,2024-11-22,B,49,,0
4,125861,2024,60,2024-11-30 Washington Wizards,2024-11-30,D,51,,0
...,...,...,...,...,...,...,...,...,...
419109,493714,2024,43027,2025-01-19 Philadelphia 76ers,2025-01-19,A,62,,0
419110,493715,2024,43027,2025-01-19 Philadelphia 76ers,2025-01-19,A,62,,0
419111,493716,2024,43027,2025-01-19 Philadelphia 76ers,2025-01-19,A,62,,0
419112,493719,2024,43028,2024-12-10 Orlando Magic,2024-12-10,D,53,,0


Unnamed: 0.1,Unnamed: 0,Season,AccountNumber,SingleGameTickets,PartialPlanTickets,GroupTickets,STM,AvgSpend,GamesAttended,FanSegment,DistanceToArena,BasketballPropensity,SocialMediaEngagement
15835,15835,2024,2,2,0,0,0,95.00,0,A,47.0,916.0,High
16274,16274,2024,34,0,0,2,0,4.32,0,B,4.0,502.0,Medium
16472,16472,2024,60,1,0,0,0,6.00,1,D,17.0,936.0,Medium
16580,16580,2024,73,0,0,3,0,49.00,1,D,9.0,719.0,Low
16612,16612,2024,76,3,0,0,0,82.00,1,A,34.0,914.0,Medium
...,...,...,...,...,...,...,...,...,...,...,...,...,...
44205,44205,2024,43024,3,0,0,0,365.00,0,A,58.0,914.0,Medium
44206,44206,2024,43025,2,0,0,0,2.00,1,A,26.0,290.0,High
44207,44207,2024,43026,0,0,3,0,6.34,1,D,6.0,266.0,Medium
44208,44208,2024,43027,0,0,6,0,41.00,1,A,9.0,392.0,High


<h3 style="color: #8b5e3c">Extracting a Row for Validation</h3>
Next, we extract observations for each dataset to where we expect that both observations would merge. In this case, we merge based on the `AccountNumber`. We extract the following as shown below.

In [5]:
# the row for account number
account_row = account_df.iloc[44210]
print(account_row)

Unnamed: 0                44210
Season                     2024
AccountNumber             15667
SingleGameTickets             0
PartialPlanTickets            0
GroupTickets                  0
STM                           1
AvgSpend                  144.0
GamesAttended                 0
FanSegment                    G
DistanceToArena            10.0
BasketballPropensity      385.0
SocialMediaEngagement    Medium
Name: 44210, dtype: object


In [6]:
# the row for game & seat level
#game_seat_row = game_seat_df.loc[game_seat_df.apply(lambda row: row.astype(str).str.contains('15667').any(), axis=1)]
#print(game_seat_row)

<h3 style="color: #8b5e3c">Merging the Two Datasets</h3>
Now that we've imported the datasets as pandas dataframes, we now create a new dataframe by merging two tables together. The feature that is in common with both datasets is the AccountNumber feature. As a result, we perform what is otherwise called a composite key. The type of merge we aim for is a left merge.

In [8]:
# merging the two datasets together
account_game_seat_df = pd.merge(game_seat_df, account_df, on='AccountNumber', how='left')
account_game_seat_df_2024 = pd.merge(game_seat_df_2024, account_df_2024, on='AccountNumber', how='left')
account_game_seat_df_2023 = pd.merge(game_seat_df_2023, account_df_2023, on='AccountNumber', how='left')

display(account_game_seat_df.head(3))

Unnamed: 0.1,Unnamed: 0_x,Season_x,AccountNumber,Game,GameDate,GameTier,Unnamed: 0_y,Giveaway,GiveawayLabel,Unnamed: 0,...,SingleGameTickets,PartialPlanTickets,GroupTickets,STM,AvgSpend,GamesAttended,FanSegment,DistanceToArena,BasketballPropensity,SocialMediaEngagement
0,0,2023,1,2024-01-24 Cleveland Cavaliers,2024-01-24,D,22,Bucket Cap,1,0,...,0,0,0,0,467.0,0,F,12.0,872.0,Low
1,1,2023,1,2024-01-24 Cleveland Cavaliers,2024-01-24,D,22,Bucket Cap,1,0,...,0,0,0,0,467.0,0,F,12.0,872.0,Low
2,2,2023,1,2024-01-24 Cleveland Cavaliers,2024-01-24,D,22,Bucket Cap,1,0,...,0,0,0,0,467.0,0,F,12.0,872.0,Low


<h3 style="color: #8b5e3c">Validating the Merge</h3>
Next, we move on to validating the merge. We achieve this by checking if the rows that we checked before merging are the same as after merging. As a result, we display the data from the rows on `AccountNumber` of 15667.

In [10]:
#account_game_seat_row = account_game_seat_df.loc[account_game_seat_df.apply(lambda row: row.astype(str).str.contains('15667').any(), axis=1)]
#print(account_game_seat_row)

In [11]:
# printing out the details of the new data frame
#display(account_game_seat_df.info())


<h3 style="color: #8b5e3c">Converting to a `.csv` file</h3>
Now that we've completed our merge and are confident with the results, we can finally convert our dataset into a .csv file and have an opportunity to perform data visualization.

In [13]:
# converting data frame to a .csv file
account_game_seat_df.to_csv('C:/Users/galvanm/python/BucksHackathon25/BucksDatasets/ALGLSL.csv', index='False')
account_game_seat_df_2024.to_csv('C:/Users/galvanm/python/BucksHackathon25/BucksDatasets/ALGLSL_2024.csv', index='False')
account_game_seat_df_2023.to_csv('C:/Users/galvanm/python/BucksHackathon25/BucksDatasets/ALGLSL_2023.csv', index='False')

