<h1 style="color: #8b5e3c;">Bucks Model Development</h1>
Next, we will be working on developing a predictive model that helps assign each user account for one of tucket plans that the Miwaukee Bucks is interested in. We revisit the plans as follows:

- **Value Plan:** focuses on affordable tickets for weekday games
- **Marquee Opponent Plan:** featuring games against high-profile opponents
- **Weekend Plan:** highlighting weekend games for fans looking for weekend entertainment
- **Promotional Giveaway Inclusive Plan:** centered around games with promotional giveaways

<h2 style="color: #8b5e3c;">KMeans Clustering</h2>
We move to performing the KMeans Clustering. This will allow us to perform the clustering that will allow us to assign each of the clusters possibly to one of the plans.

In [None]:
# importing all the neceessary libraries
import pandas as pd
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.cluster import KMeans
from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

In [None]:
df = pd.read_csv("C:/GitHub/BucksHackathon25/BucksDatasets/ALGLSL_2023.csv")
df.info()

In [None]:
# retrieving the numerical and categorical features
numerical_features = ['BasketballPropensity', 'AvgSpend', 'GamesAttended', 'DistanceToArena']
categorical_features = ['STM', 'FanSegment', 'SocialMediaEngagement', 'GameTier', 'GiveawayLabel']

# applying the standard scaling and one hot encoding
preprocessor = ColumnTransformer([
    ('num', StandardScaler(), numerical_features),
    ('cat', OneHotEncoder(), categorical_features)
])

# performing PCA
X_preprocessed = preprocessor.fit_transform(df)
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_preprocessed)

# using k-means clustering
kmeans = KMeans(n_clusters=4, random_state=42)
kmeans.fit(X_pca)

# retrieving cluster labels
df['Cluster'] = kmeans.labels_

In [None]:
# importing seaborn library
import seaborn as sns

# plotting out the clustering
plt.figure(figsize=(8,6), facecolor='#EEE1C6')
ax = plt.gca()
ax.set_facecolor('#EEE1C6')

plt.scatter(X_pca[:, 0], X_pca[:, 1], c=df['Cluster'], cmap='Set1', edgecolor='k')
plt.title('KMeans Clusters with PCA')
plt.xlabel('PCA Component 1')
plt.ylabel('PCA Component 2')
plt.grid(True)
plt.show()

In [None]:
print("Explained variance by each component:", pca.explained_variance_ratio_)
print("Total variance explained:", pca.explained_variance_ratio_.sum())

In [None]:
numerical_summary = df.groupby('Cluster').mean(numeric_only=True).round(2)
print("Numerical Summary by Cluster:")
print(numerical_summary)

In [None]:
categorical_cols = df.select_dtypes(include='object').columns

for col in categorical_cols:
    print(f"\nCategory distribution for {col} by Cluster:")
    summary = df.groupby('Cluster')[col].value_counts(normalize=True).unstack(fill_value=0).round(2)
    print(summary)

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

numerical_features = ['BasketballPropensity', 'AvgSpend', 'GamesAttended', 'DistanceToArena']

fig, axes = plt.subplots(2, 2, figsize=(12, 8), facecolor='#EEE1C6') 
ax = axes.flatten()

for i, feature in enumerate(numerical_features):
    sns.boxplot(x='Cluster', y=feature, data=df, ax=ax[i])
    ax[i].set_title(f'{feature} by Cluster')
    ax[i].set_xlabel('Cluster')
    ax[i].set_ylabel(feature)
    ax[i].grid(True)

plt.tight_layout()
plt.show()

In [None]:
fig, axes = plt.subplots(3, 2, figsize=(10, 8), facecolor='#EEE1C6') 
ax = axes.flatten()

for i, feature in enumerate(categorical_features):
    crosstab = pd.crosstab(df['Cluster'], df[feature], normalize='index')
    
    sns.heatmap(crosstab, annot=True, cmap='Blues', ax=ax[i], fmt=".2f")
    ax[i].set_title(f'{feature} Proportion by Cluster')
    ax[i].set_xlabel(feature)
    ax[i].set_ylabel('Cluster')

if len(categorical_features) < len(ax):
    for j in range(len(categorical_features), len(ax)):
        fig.delaxes(ax[j])

plt.tight_layout()
plt.show()


In [None]:
# saving the data frame as a .csv file
df.to_csv("C:/GitHub/BucksHackathon25/BucksDatasets/CustomerPlans_2023.csv")

df.info()