# **Pandas Project: Super Smash Bros Ultimate Data Analysis**

In this project, I demonstrated various and creative ways to visualize data using **pandas**, **ipywidgets**, **matplotlib**, and **numpy**. The data set that I used is from Kaggle, which is a data set containing statistics of Super Smash Bros Ultimate characters. For your information, I cleaned this data before, to only use the data that I need for this project using Excel (I'm sorry). This project was inspired from the Intro to Pandas lecture, where we explored the basic functions of Pandas with Pokemon character statistics. I aimed to achieve something similar, but with different types of visualizations. 

Here are the types of visualizations I used:

1. **Bar Chart and Slider:** displayed each character's attributes in a bar chart and created a slider for interaction

2. **Bar Chart (Zoom In and Out) and Column Drop Down:** bar chart to display specific atrributes of characters to compare ; a zoom in/out function to see more/less data; drop down to select the specific attribute to visualize 

3. **Character Dropdown and Stats:** dropdown to select character and view their stats

4. **Colorful DataFrame:** applied a color gradient to the DataFrame to make the table more appealing and meaningful

With that, let's look at our data

### Data Check

First, I imported the libraries for this project:

**NumPy**: to manipulate numerical arrays and perform mathematics

**Pandas**: to manipulate and analyze data

**Matplotlib**: to visualize data

**Ipywidgets**: to interact with the application

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import ipywidgets as widgets
import numpy as np
from ipywidgets import interactive, interact 

Next, we read in the data.

In [2]:
smash= pd.read_csv('cleaned_smash_bros_stats.csv')
smash.head()

Unnamed: 0,Character,Full Hop,Weight,Run Speed,Attack Range
0,Bayonetta,39.0,81,1.76,B-
1,Bowser,33.0,135,1.971,SS
2,Bowser Jr.,34.4,108,1.566,D
3,Captain Falcon,37.31,104,2.552,F
4,Charizard,32.0,116,2.2,S


The data looks good, however, I had to make changes for the sake of visualizaton. For instance, notice that the the 'Run Speed' values are signifcantly smaller than the 'Full Hop' and 'Weight' values. When I plotted the statistcs in a bar chart for these characters, the 'Run Speed' bar for every character was so small, compared to the 'Full Hop' and 'Weight' bar which were so much taller. Because of how visually decieving this can be and exagerating, I divided the values of 'Full Hop' and 'Weight' by 100 to try to make them on the same level with Run Speed, in terms of digits. 

The other thing to notice in this data is that 'Attack Range' data are not numerical. Trying to visulaize the Attack Range on a bar chart with the other numerical categories would not work. You can't really quanitfy strings either. Instead, I made a dictionary to provide numerical values to the rankings in Attack Range. When Dominick reviewed my code, he suggested that I also clarify the attack range values. With that,'H-' is the lowest rank you for 'Attack Range' and 'SS' is the highest. I converted the values, starting from 0 for 'H-' then added by 0.25 per rank. Therefore, the higher the number is, the higher the rank, and therefore a better attack range. I added then in intervals of 0.25 to make sure that values are on the same level of digits with Run Speed and the rest of the data. Then, I converted the Attack Range column with this mapping and called it 'Attack Range (Numerical)'. After that, I dropped the orginal 'Attack Range' column.

In [3]:
# load data
smash= pd.read_csv('cleaned_smash_bros_stats.csv')

# overwrite the 'Full Hop' and 'Weight' columns by dividing by 10
smash['Full Hop'] = smash['Full Hop'] / 100
smash['Weight'] = smash['Weight'] / 100

# dictonary for 'Attack Range' ranks
attack_range_mapping = {
    'SS': 4.5, 'S+': 4.25, 'S': 4, 'S-': 3.75, 
    'A': 3.5, 'B': 3.25, 'B-': 3, 
    'C': 2.75, 'C-': 2.5, 'D': 2.25, 
    'D-': 2, 'E': 1.75, 'E-':1.5 , 
    'F': 1.25, 'F-': 1, 'G': 0.75, 
    'G-': 0.5, 'H': 0.25, 'H-': 0  
}

# convert 'Attack Range' using the dictionary
smash['Attack Range (Numeric)'] = smash['Attack Range'].map(attack_range_mapping)

# drop the Attack Range column
smash = smash.drop(columns=['Attack Range'])

#read the data to see what it looks like 
smash.head()

Unnamed: 0,Character,Full Hop,Weight,Run Speed,Attack Range (Numeric)
0,Bayonetta,0.39,0.81,1.76,3.0
1,Bowser,0.33,1.35,1.971,4.5
2,Bowser Jr.,0.344,1.08,1.566,2.25
3,Captain Falcon,0.3731,1.04,2.552,1.25
4,Charizard,0.32,1.16,2.2,4.0


Now, let's start with our first visualization.

### Bar Chart and Slider

For this code, I used **pandas** for data manipulation, **matplotlib** for plotting, and **ipywidgets** for interactive widgest. Then, I selected the columns from the **smash** DataFrame to plot. 

Next, I defined a range slider widget to move horizontally through the chart. The slider is configured to show the first 10 characters by default, with a range from zero to the maximum number of rows in the data minus one. The slider updates the plot in real-time as it moves.

Then, I created the **update_bar_chart** which updates the chart based on the selected range of characters. It clears the previous output, creates a new figure and axes for the plot, subsets the data based on the selected range, and plots the data as a bar chart. The plot is customized with a title, labels, and x -tick labels for better readability. 

The function is initially called to display the plot witha refault range. The slider is linked to the update function to refresh the plot when the slider value changes. Finally, I displayed the slider and the output plot.

In [4]:
# select columns to plot
plot_data = smash[['Character', 'Full Hop', 'Weight', 'Run Speed', 'Attack Range (Numeric)']]

# define the range slider to move horizontally through the chart
character_range = widgets.IntRangeSlider(
    value=[0, 10],  # by default, it will show the first 10 characters
    min=0,
    max=len(plot_data) - 1, # max is the max amount of rows in the data
    step=1, # bars are separated in intervals of 1
    description='Characters', # we're describing the attributes of the characters
    continuous_update=True # simultaneously update the plot as we play with slider
)

# create an output widget to display the plot
output = widgets.Output()

# function to update the bar chart
def update_bar_chart(range_values): # unpacking the start and end values from the slider
    start, end = range_values # start and end indices for the scroller
    
    # clear the previous output
    output.clear_output(wait=True)
    
    # create the bar chart with the selected character range
    with output:
        fig, ax = plt.subplots(figsize=(10, 5))  #  # adjust figure size
        sub_data = plot_data.iloc[start:end].set_index('Character') # subset and set index
        
        # plot the data as a bar plot
        sub_data.plot(kind='bar', ax=ax, width=0.8)
        
        # customize the plot
        ax.set_title('Super Smash Bros Character Attributes') # set the title
        ax.set_xlabel('Character') # label the x-axis
        ax.set_ylabel('Attribute Values') # label the y-axis
        ax.set_xticks(range(len(sub_data)))  # set x-ticks
        ax.set_xticklabels(sub_data.index, rotation=45, ha='right') # rotate x-tick labels
        
        plt.tight_layout() # adjust layout for better fit, Chat GPT recommended this
        plt.show() # display the plot

# initial call to the function to display the plot
update_bar_chart(character_range.value)

# link the slider to the update function to refresh the plot on slider change
character_range.observe(lambda change: update_bar_chart(change['new']), names='value')

# display the slider and the output plot
display(character_range)
display(output)


IntRangeSlider(value=(0, 10), description='Characters', max=76)

Output()

### Bar Chart (Zoom In and Out) and Column Drop Down

This code creates a bar chart to display specific attributes of characters, allowing for comparison. It includes a zoom in/out function (like the one we learned in the lecture about ipywidgets) to see more or less data and a dropdown to select the specific attribute to visualize.

First, I defined a function, **plotVar** that takes two parameters: **var** (the variable to plot) and **N** (the number of top characters to display). This function resets the index of the **smash** DataFrame so that 'Character' becomes a regular column again. It then selects the specified column, sorts the data by the variable in descending order, and selects the top N rows. 

Next, I created a new figure and axis for the plot with a specified size. The function then plots a bar chart of the selected variable for the top **N** characters. I customized the plot by setting the x-tick labels to the character names, rotating them for better readbility, and aligning them to the right. I also set the y-axis label to the selected vaiable and the title of the plot. Finally, I adjusted the layout for a better fit and display the plot.

To allow for interactive selection of the attribite and the number of characters to display, I created a dropdown widget **colSelector** to select a column (excluding 'Character') to plot and a slider widget **nSlider** to filter the number of characters displayed in the plot. The dropdown and slider widgets are linked to the **plotVar** function using **interact**, so the plot updates based on the selected column and number of characters.

In [5]:
# function to plot the selected variable for the top N characters
def plotVar(var, N):
    # reset the index so 'Character' becomes a regular column again
    # before, it was treating 'Character' as an index
    # Chat GPT fixed this for me
    characters = smash.reset_index()[['Character', var]].sort_values(by=var, ascending=False).head(N)
    
    # create a new figure and axis for the plot
    fig, ax = plt.subplots(figsize=(13, 3))
    
    # plot the selected variable for the top N characters
    ax.bar(characters['Character'], characters[var])
    
    # set labels and adjust the x-tick labels for better readability
    ax.set_xticklabels(characters['Character'], rotation=45, ha='right')
    ax.set_ylabel(var)
    plt.title(f'Top {N} Characters by {var}')
    
    # display the plot
    # Chat GPT recommended using a tight layout to make the layout tight - it's self explanatory
    plt.tight_layout()
    plt.show()
    
# update the options to exclude the 'Character' column
# Chat GPT helped me exclude the 'Character' column for the dropdown
colSelector = widgets.Dropdown(
    options = [col for col in smash.columns if col != 'Character'],  # list of columns excluding 'Character'
    description = 'Select Column:', # label for the dropdown
    value = smash.columns[1] #set a default value from the remaining columns
)

# create a slider to filter the number of characters displayed
nSlider = widgets.IntSlider(value=30, # default value
                            min=2, # minimum value
                            max=81, # maximum value
                            step=1, # step size
                            description='Filter') # label for the slider
# link the dropdown and slider to the plotVar function
interact(plotVar, var=colSelector, N=nSlider)


interactive(children=(Dropdown(description='Select Column:', options=('Full Hop', 'Weight', 'Run Speed', 'Atta…

<function __main__.plotVar(var, N)>

### Character Dropdown and Stats

In this code, I created a dropdown to select a character and view their stats. First, I set the 'Character' column as the index for the **smash** DataFrame to make it easier to access each character's stats.

Next, I created a dropdown widget **char_dropdown** for character selection. The options for the dropdown are set to the character names, which are now the index of the DataFrame. I also set a label for the dropdown and set the default value to the first character in the list.

I then defined a function **display_stats** that takes a character name as input. The function retrieves the stats for the selected character from the **smash** DataFrame and coverts the Series to a DataFrame for better display. I renamed the column to 'Value' for clarity and display the stats as a table.

In [6]:
# set 'Character' as the index for the DataFrame
smash.set_index('Character', inplace=True)

# create the dropdown widget for character selection
char_dropdown = widgets.Dropdown(
    options=smash.index, # set the options to the character names (index of the DataFrame)
    description='Character:',  # label for the dropdown
    value=smash.index[0]  # set default to the first character
)

# function to display the stats of the selected character
def display_stats(character):
    # get the stats for the selected character
    char_stats = smash.loc[character].to_frame()  # convert Series to DataFrame

    # display the stats as a table
    char_stats.columns = ['Value']  # rename the column for clarity
    display(char_stats)

# set up the interaction
widgets.interact(display_stats, character=char_dropdown)


interactive(children=(Dropdown(description='Character:', options=('Bayonetta', 'Bowser', 'Bowser Jr.', 'Captai…

<function __main__.display_stats(character)>

### Colorful DataFrame

In this code, I added a color gradient to the DataFrame to make it moer appealing and the numbers more meaningful. I got inspiration from a blog post for how to make your DataFrames look more lively. The brighter colors signify more strength and the darker colors less strength. Just to let you know, I had Chat GPT help me make the functions in this code. 

First, I specified the columns to which I want to apply the color gradient: 'Full Hop', 'Weight', 'Run Speed', and 'Attack Range (Numeric)'.

Next, I defined a function **color_gradient** that takes a Series **x** as an input. Here's a detailed breakdown of the function:

**Normalization:** norm = (x - x.min()) / (x.max() - x.min())

This line normalizes the values in the Series x between 0 and 1. It subtracts the minimum value from each element and then divides by the range (max - min).

Normalize means to adjust the values in a dataseet to a common scale, usually it's between 0 and 1. By nomralizing the values, I ensure that the values fall within the same range, making it easirer to apply consistent color gradients. 

**Colormap Values:** colors = plt.cm.viridis(norm)

This line uses the **viridis** colormap from **matplotlib** to get the corresponding colors for the normalized values. The **viridis** colormap is a perceptually uniform colormap, which means it is visually appealing and easy to interpret.

**CSS Background Colors:** return ['background-color: rgba({}, {}, {}, {})'.format(
    int(c[0]*255), int(c[1]*255), int(c[2]*255), c[3]) for c in colors]

RGBA represents colors with red, green, blue, and alpha. Alpha is the opacity of the colors. CSS stands for Cascading Style Sheets. It’s a stylesheet language used to describe the presentation of a document written in HTML or XML. CSS controls the layout, colors, fonts, and overall visual appearance of web pages.

This line creates a list of CSS background-color strings in RGBA format for each value in **x**. The colors array contains RGBA values between 0 and 1, so I multiply the RGB values by 255 to convert them to the 0-255 range used in CSS. The alpha value (c[3]) is already in the correct format.

To apply the color gradient to the specified columns, I used the style.apply method on the smash DataFrame:

**Lambda Function:** lambda x: pd.Series(color_gradient(x), index=x.index)

Lambda functions are when you need to make a function on the fly. They're great for small operations. 

This lambda function applies the **color_gradient** function to each column in the DataFrame. It converts the list of background-color strings returned by **color_gradient** into a pd.Series with the same index as **x**.

**Subset:** subset=columns_to_style

This parameter ensures that the styling is only applied to the specified columns.

Finally, I display the styled DataFrame, which now has a color gradient applied to the specified columns, making the table more visually appealing and the data more meaningful.

In [7]:
# columns to apply the color gradient
columns_to_style = ['Full Hop', 'Weight', 'Run Speed', 'Attack Range (Numeric)']

# define the styling function
def color_gradient(x):
    # normalize the values in the Series x between 0 and 1
    norm = (x - x.min()) / (x.max() - x.min())  
    # get colormap values from the 'viridis' colormap in matplotlib
    colors = plt.cm.viridis(norm) 
    # create a list of CSS background-color strings in RGBA format for each value in x
    return ['background-color: rgba({}, {}, {}, {})'.format( 
        int(c[0]*255), int(c[1]*255), int(c[2]*255), c[3]) for c in colors]

# apply the color gradient to the specified columns
styled_df = smash.style.apply(
    # apply the color_gradient function to each column
    lambda x: pd.Series(color_gradient(x), index=x.index), subset=columns_to_style # only apply to the specified columns
)

# display styled DataFrame
styled_df

Unnamed: 0_level_0,Full Hop,Weight,Run Speed,Attack Range (Numeric)
Character,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bayonetta,0.39,0.81,1.76,3.0
Bowser,0.33,1.35,1.971,4.5
Bowser Jr.,0.344,1.08,1.566,2.25
Captain Falcon,0.3731,1.04,2.552,1.25
Charizard,0.32,1.16,2.2,4.0
Chrom,0.3097,0.95,2.145,3.25
Cloud,0.325,1.0,2.167,3.75
Corrin,0.33,0.98,1.595,3.0
Daisy,0.3003,0.89,1.595,2.0
Dark Pit,0.31,0.96,1.828,3.0


# Conclusion

In sum, this project was a success for what it was. In this project, I believe that I exemplified my understanding of data analysis, manipulation, and analysis. Though this project was less math heavy than NumPy project, this project carried its challenges. The main challenge that I faced was dealing with uncertainty. To me, Pandas feels straightforward and it's easy to understand. However, if you only know a few basics, it may be hard to put them to use in practical situations. Because of that, I tried to go outside the box and learn new skills. For sure, I am not the best at the skills and visualization techniques for this project -  I had to use Chat GPT to help me out. Although I was uncertain, I didn't let that stop me. I kept trying new things.

Speaking about trying new things, I know that there are a few ways to make my application better. First, I should have cleaned the data with Python in the first place. The data file contained a lot of sheets and columns. Manually cleaning the data was a mess and I'm not sure why I didn't think of using Pandas right away - I literally learned Pandas for this reason! In addition, I could have kept some more columns in here. There were columns in the data that I thought were not important for what I was trying to do. However, since I changed the objetive of my project, I was able to have more columns of data, which are more attributes for the characters.

The second thing is that I don't think it's necessary to make a scroller so you can change the range of bars you want to see ( I'm referring to the first visualization). If you set your scroller to have many indeces, the data becomes smaller and it's overwhelming to look it. I think I should just stick to a regular horizontal scroller next time. 

Thirdly, I could have added a color gradient for the character dropdown and stats. I was trying this earlier but I never got it to work, which is why it's plain. However, if I added a color gradient, so the cells for the character match the cell colors and values in the styled DataFrame, that would be even more appealing.

Lastly, I could have added a legend to show users what the colors represent. If you were to use this notebook and disregard the explanations, you would be lost to how the color mapping works. I also wish that I was able to make the last visualization by myself. I don't know anything about pandas_styler, but based on what ChatGPT did and what the post said, it worked very well and it looks fantastic.

Fourthly, if I had more time, I would have added a more compelling and user-friendly setup, like Brenda suggested when she reviewed my code. I feel that my widgets are somewhat user-friendly, so I can improve on that for sure.

However, though I have a lot to critique about my project, I find that the project was insightful and unique. Moving forward, I will do more research how certain functions work in Pandas and other ways to visualize data.