<h1>📚 CSMODEL - PROJECT [PHASE 2] 📚</h1>

by <b>Group 1:</b>
- DAYON, Elijah
- ERMITANO, Kate Justine
- LLAMADO, Jon

<b>Instructor:</b> Sir Arren Matthew C. Antioquia

<h2>🚩 Review of PHASE 1 🚩</h2>

This is a continuation of the project that uses the **`Complete Pokemon Dataset (Updated 16.04.21)`** dataset submitted back in the 21st of November 2022.

The link to the dataset is as follows: https://www.kaggle.com/datasets/mariotormo/complete-pokemon-dataset-updated-090420?resource=download by Mario Tormo Romero.

The processes done in the first notebook are as follows: <b>📝 Dataset Description 📝, 🧽 Dataset Cleaning 🧽, 📊 Exploratory Data Analysis 📊, and 🤔 Research Question 🤔</b>. To review the first notebook, the Google Drive link can be accessed: https://drive.google.com/drive/folders/1fZw-4M6GmT9ROLmb5XPm6uyI65HDtOnq?usp=share_link
<figure>
    <img src="https://i.pinimg.com/564x/24/d1/01/24d101849738df521f17db9701ec63d2.jpg" style="width: 80%">
    <figcaption style="text-align: center">Pokemon Starters from Generation 1 to 8</figcaption>
</figure>

<h2>🤔 Research Question 🤔</h2>

<h3 style="color: #ff8d8c">What is the relation between pokemon generation and type 1 element to health points, defense, special attack, special defense, and speed?</h3>

This research question aims to find which generation and type 1 element element is best to use based on the pokemons' statistics on total points, health points, defense, special attack, special defense, and speed. 

It may be difficult for persons to choose which pokemon is best to use because there are over 1000 pokemons as observed in the dataset. To rely only on one single pokemon with the strongest statistic is not the best solution because there are many different factors to choose which pokemon is best on certain situations. With that, a solution will be made by checking which generation and type 1 element is best to use based on the statistics provided in the pokemon dataframe, mainly health points, defense, special attack, special defense, and speed. 

<h2>📤 Reading the Dataset 📤</h2>

To retrieve the dataset that was cleaned from the previous phase of the project, we implement the following code to read the `.csv` file.

In [None]:
# import pandas and numpy
import pandas as pd
import numpy as np

In [None]:
# read and assign the .csv file to pokemon_df
poke_df = pd.read_csv("cleaned-pokemon-dataset.csv")

To ensure that the correct file has been read, use the `.info()` function to review the contents of the dataset.

In [None]:
poke_df.info()

Once done, we can continue with the process of modifying our data to suit the needs of our chosen data modelling technique.

<h2>📦 Data Modelling 📦</h2>

<h3>🔧 Data Preprocessing 🔨</h3>

Before we start modelling the data, it is be wise to perform data preprocessing to transform the data into the appropriate representations. Techniques include: <b>(1) Querying, (2) Imputation, (3) Binning, (4) Outlier Detection, (5) One Hot Encoding, (6) Log Transformation, (7) Aggregation, (8) Column Transformation, (9) Feature Scaling,</b> and <b>(10) Feature Engineering</b>.

<h3>Querying</h3>

Querying is done by selecting observations based on a certain condition (e.g. Select all the pokemon with attack stats greater than 100, among others). There can be multiple conditions set to filter more data.

However, since we are interested in modelling our data using all of the Pokemon in the dataset, this step can be skipped. We can now move on to imputation.

<h3>Imputation</h3>

The process of imputation involves replacing all <b>Nan</b> or <b>Null</b> values with the mean or mode of the series the missing value belongs to. To ensure that our dataset does not contain missing values, use `isnull()` combined with `.any()` to retrieve the list of columns and the answer to whether it contains null missings or not.

In [None]:
poke_df.isnull().any()

Like what was mentioned in the previous phase of the project, only <b>type_2, ability_2, ability_hidden</b>, and <b>egg_type_2</b> can contain missing values as some Pokemon are monotypes, have one ability, no hidden abiliy, and single egg type. 

Since the rest of the columns do not have null values, the process of imputation is done.

<h3>Binning</h3>

Binning is grouping specific data together into general categories. For example, all the water starters from generations 1-8 (Squirtle, Totodile, Mudkip, Piplup, Oshawott, Froakie, Popplio, and Sobble) can be grouped as water starters in general.

In [None]:
poke_df

Upon inspecting the dataset, since we are interested in every single Pokemon and their individual statistics (HP, attack, defense, sp. attack, sp. defense, and speed), there is no need to perform binning. We want to determine who are those Pokemon belonging in the cluster with the strongest overall statistic. Binning could potentially lose all crucial information in finding out the best Pokemon.
<figure>
    <img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftse1.mm.bing.net%2Fth%3Fid%3DOIP.oYFYbDjFYnalbDmsV7EntAHaFr%26pid%3DApi&f=1&ipt=60210ccb25e0429e5eb69f6ac25c36d20b321f680f91f11496d6225d6e9fef7d&ipo=images" style="width: 40%">
    <figcaption style="text-align: center">Who's that Pokemon?</figcaption>
</figure>

<h3>Outlier Detection</h3>

Values such as 9999 or negative values do not make sense in the Pokemon world. We can inspect numerical values for outliers. We can check the height_m and weight_kg columns by its primary type for unusual numbers using a boxplot. We can use the library `matplotlib` to plot a box plot.

In [None]:
%matplotlib inline
# import the library
import matplotlib as plt
# set up a theme for all visuals
plt.style.use('fivethirtyeight')

In [None]:
poke_df.boxplot("height_m", by="type_1", figsize=(12,6), color='#fc67a5')

In [None]:
poke_df.boxplot("weight_kg", by="type_1", figsize=(12,6), color='#fc67a5')

Based on the boxplots above, we notice several outliers. However, considering that the Pokemon dataset is entirely fictional, we can give an exemption. Thus, there is no need to remove outliers.

In [None]:
copy_poke_df = poke_df.copy()

<h3>One Hot Encoding</h3>

This method converts categorical data into binary values (0 or 1) to represent a boolean value. This is useful for recommender systems. Upon inspection, there is no column that is categorical in nature where the values can be converted into boolean values. More, almost all the columns have more than two values.

In [None]:
poke_df.info()

In [None]:
poke_df["type_number"]

In [None]:
poke_df["type_number"].unique()

We can take a look at the `type_number` column as it contains values that are either 1 or 2. However, one hot encoding cannot be applied to numerical values. Thus, there is no way we can employ one hot encoding to further prepare our data. 

<h3><strike>Log Transformation</strike> Box-Cox Transformation</h3>

The data collected will not always be as perfect as those printed in the text books. Normalizing the values minimizes the effect of very large values on our data while maintaining its order in the dataset. Before we decide to perform log transformation, let us first take a look at the graph plotting the values from the attack series to determine if log transformation is the right method to use.

First, import the following libraries to be used to determine if our data has a normal distribution:
- `matplotlib.pyplot` - a library specializing in visualizing data.
- `scipy.stats` - a library specializing in statistics.

Moreover, we import the `shapiro` package to use the <b>Shapiro-Wilk</b> test. The Shapiro-Wilk test is a method to test for normality. It outputs the <b>test statistic</b> and the <b>p-value</b>. We then set our alpha value <b>α = 0.05</b> for comparison.

In [None]:
import matplotlib.pyplot as plt
import scipy.stats as stats
from scipy.stats import shapiro 

Next, create a histogram and Q-Q plot for the attack series.

In [None]:
plt.hist(poke_df["attack"],bins=25, color="#ffed8b")

In [None]:
stats.probplot(poke_df["attack"], dist="norm", plot=plt)

In [None]:
stat, pval = shapiro(poke_df["attack"])
print("{:2f}".format(pval))

Since the p-value is less than α = 0.05, we then reject the null hypothesis. Thus, the data does not have a normal distribution. To normalize the data, we use log transformation using the `.log` function to calculate the logarithm of the series.

In [None]:
log_attack = np.log(poke_df["attack"])

To ensure that if our data has a normal distribution, we use the Shapiro-Wilk test to calculate for the p-value. We will also create the null hypothesis and alternative hypothesis.

- <b>H<sub>O</sub></b> - the data has a normal distribution.
- <b>H<sub>A</sub></b> - the data does not have a normal distribution.

In [None]:
stat, pval = shapiro(log_attack)
print("{:2f}".format(pval))

In [None]:
stats.probplot(log_attack, dist="norm", plot=plt)

However, when we use the log transformation method, our Q-Q plot reveals no sign of normal distribution, thus making the method failed to effectively normalize the data. To fix the issue, we use the `box-cox transformation` to transform our data. This outputs the normalized series and the lambda value.

In [None]:
norm_attack, norm_lambda = stats.boxcox(poke_df["attack"])

We repeat the process of creating the histogram to testing with the Shapiro-Wilk method to ensure if our data has been normalized.

In [None]:
plt.hist(norm_attack,bins=25, color="#ffed8b")

In [None]:
stats.probplot(norm_attack, dist="norm", plot=plt)

In [None]:
stat, pval = shapiro(norm_attack)
print("{:2f}".format(pval))

Now that our p-value is greater than α = 0.05, we fail to reject the null hypothesis. Finally, we can conclude that the data has been normalized. We can repeat the process with the rest of the columns we are interested in.

<h4>Transform Total Points</h4>

In [None]:
plt.hist(poke_df["total_points"],bins=25, color="#9ef078")

In [None]:
stats.probplot(poke_df["total_points"], dist="norm", plot=plt)

In [None]:
stat, pval = shapiro(poke_df["total_points"])
print("{:2f}".format(pval))

In [None]:
norm_total, norm_lambda_2 = stats.boxcox(poke_df["total_points"])

In [None]:
plt.hist(norm_total,bins=25, color="#9ef078")

In [None]:
stats.probplot(norm_total, dist="norm", plot=plt)

In [None]:
stat, pval = shapiro(norm_total)
print("{:2f}".format(pval))

<h4>Transform HP</h4>

In [None]:
plt.hist(poke_df["hp"],bins=25, color="#ff8d8c")

In [None]:
stats.probplot(poke_df["hp"], dist="norm", plot=plt)

In [None]:
stat, pval = shapiro(poke_df["hp"])
print("{:2f}".format(pval))

In [None]:
norm_hp, norm_lambda_3 = stats.boxcox(poke_df["hp"])

In [None]:
plt.hist(norm_hp,bins=25, color="#ff8d8c")

In [None]:
stats.probplot(norm_hp, dist="norm", plot=plt)

In [None]:
stat, pval = shapiro(norm_hp)
print("{:2f}".format(pval))

<h4>Transform Defense</h4>

In [None]:
plt.hist(poke_df["defense"],bins=25, color="#feb58c")

In [None]:
stats.probplot(poke_df["defense"], dist="norm", plot=plt)

In [None]:
stat, pval = shapiro(poke_df["defense"])
print("{:2f}".format(pval))

In [None]:
norm_defense, norm_lambda = stats.boxcox(poke_df["defense"])

In [None]:
plt.hist(norm_defense,bins=25, color="#feb58c")

In [None]:
stats.probplot(norm_defense, dist="norm", plot=plt)

In [None]:
stat, pval = shapiro(norm_defense)
print("{:2f}".format(pval))

<h4>Transform Special Attack</h4>

In [None]:
plt.hist(poke_df["sp_attack"],bins=25, color="#8bffff")

In [None]:
stats.probplot(poke_df["sp_attack"], dist="norm", plot=plt)

In [None]:
stat, pval = shapiro(poke_df["sp_attack"])
print("{:2f}".format(pval))

In [None]:
norm_sp_attack, norm_lambda = stats.boxcox(poke_df["sp_attack"])

In [None]:
plt.hist(norm_sp_attack,bins=25, color="#8bffff")

In [None]:
stats.probplot(norm_sp_attack, dist="norm", plot=plt)

In [None]:
stat, pval = shapiro(norm_sp_attack)
print("{:2f}".format(pval))

<h4>Transform Special Defense</h4>

In [None]:
plt.hist(poke_df["sp_defense"],bins=25, color="#8aa1ff")

In [None]:
stats.probplot(poke_df["sp_defense"], dist="norm", plot=plt)

In [None]:
stat, pval = shapiro(poke_df["sp_defense"])
print("{:2f}".format(pval))

In [None]:
norm_sp_defense, norm_lambda = stats.boxcox(poke_df["sp_defense"])

In [None]:
plt.hist(norm_sp_defense,bins=25, color="#8aa1ff")

In [None]:
stats.probplot(norm_sp_defense, dist="norm", plot=plt)

In [None]:
stat, pval = shapiro(norm_sp_defense)
print("{:2f}".format(pval))

<h4>Transform Speed</h4>

In [None]:
plt.hist(poke_df["speed"],bins=15, color="#fc8bff")

In [None]:
stats.probplot(poke_df["speed"], dist="norm", plot=plt)

In [None]:
stat, pval = shapiro(poke_df["speed"])
print("{:2f}".format(pval))

In [None]:
norm_speed, norm_lambda = stats.boxcox(poke_df["speed"])

In [None]:
plt.hist(norm_speed,bins=25, color="#fc8bff")

In [None]:
stats.probplot(norm_speed, dist="norm", plot=plt)

In [None]:
stat, pval = shapiro(norm_speed)
print("{:2f}".format(pval))

Regardless of the results in the p-values across the columns, if we rely on the Q-Q plots, the box-cox method seems to work as the data points were aligned in a straight diagonal line. Since we have introduced changes into our data, it would be wise to copy the dataframe and then use the copy to replace the old data with the newly transformed data. Use the `.copy()` method to copy the dataframe.

In [None]:
pokemon_df = poke_df.copy()

Next, drop the old columns using the `.drop()` method.

In [None]:
pokemon_df.drop(["total_points", "hp", "attack", "defense", "sp_attack", "sp_defense", "speed"], axis = 1, inplace = True)

In [None]:
pokemon_df.columns

Then, add the normalized series into the dataframe.

In [None]:
pokemon_df["total_points"] = norm_total
pokemon_df["hp"] = norm_hp
pokemon_df["attack"] = norm_attack
pokemon_df["defense"] = norm_defense
pokemon_df["sp_attack"] = norm_sp_attack
pokemon_df["sp_defense"] = norm_sp_defense
pokemon_df["speed"] = norm_speed

In [None]:
pokemon_df

After the process of transforming the data in its normalized form, we can then move on to aggregation.

<h3>Aggregation</h3>

Aggregation summarizes data belonging to the same group or category. Numerical data and categorical data can be summarized using aggregation. This method can be used to get the overall mean, median, or mode of a series.

Since no two Pokemon have identical individual statistics and the modelling technique relies on individual statistics, aggregation will not be used to provide the complete information needed for data modelling. Thus, we are skipping this process.

<h3>Column Transformation</h3>

If we want to separate data from one column into two columns (e.g. full name -> last name, given name), we can use column transformation. When looking at the dataset, there is no way we can implement column transformation. We cannot split the name, species, ability 1, ability 2, or hidden ability as it will not make sense if we do split the values.

<h3>Feature Scaling</h3>

Feature Scaling is a method that standardizes data in a given range. Unlike log transformation - a method that normalizes data between 0 or 1 - feature scaling takes into account the distances of each point from the mean or, to simply put, the standard deviation. 

Considering that we will be using an algorithm that computes for the Euclidean distance of points, we perform feature scaling and then use the standardized values for our computation. Since clustering revolves around calculating the distance of points from the centroids, standardizing the data would be beneficial as the effect of outliers is minimal unlike when we use log transformation. Moreover, it speeds up the calculation process of the algorithm we will be using.
<figure>
    <img src="https://media.geeksforgeeks.org/wp-content/uploads/standard.png" style="width: 20%">
    <figcaption style="text-align: center">Formula for the Z-Score</figcaption>
</figure>
The formula above is used to compute the z-score, where <b>x</b> is the data point, <b>μ</b> is the mean, and <b>ơ</b> is the standard deviation. The z-score tells us how many standard deviations away from the mean is the data point.
<bR>
To perform feature scaling on our data, we install the package by inputting `pip install sklearn` or `conda install sklearn` in the command line. Next, import the method `StandardScaler` from the package `preprocessing` under the library called `sklearn`, also known as <b>SciKit-Learn</b>. This library specializes on machine learning for Python users and it contains various algorithms used in the learning process.

In [None]:
!pip install sklearn

In [None]:
from sklearn.preprocessing import StandardScaler

After importing the library, use the `.fit()` function to reformat the data into its standardized versions. This calculates for the mean and standard deviations of each data point.

In [None]:
pokemon_df[["total_points", "hp", "attack", "defense", "sp_attack", "sp_defense", "speed"]]

In [None]:
sd_scaler = StandardScaler()
standardized_features = sd_scaler.fit_transform(pokemon_df[["total_points", "hp", "attack", "defense", "sp_attack", "sp_defense", "speed"]])

Since we have an array of the standardized features, convert them to a dataframe. Drop the old columns and then concatenate this dataframe into the `pokemon_df` dataframe.

In [None]:
sd_feat_df = pd.DataFrame(standardized_features, columns=["total_points", "hp", "attack", "defense", "sp_attack", "sp_defense", "speed"])
sd_feat_df

In [None]:
pokemon_df.drop(["total_points", "hp", "attack", "defense", "sp_attack", "sp_defense", "speed"], axis = 1, inplace = True)

In [None]:
pokemon_df = pd.concat([pokemon_df, sd_feat_df], axis=1, join='outer')
pokemon_df

<h3>Feature Engineering</h3>

Finally, if we want to introduce new variables into our dataset, we can use feature engineering. Based on given series of data, feature engineering can be employed [e.g. BMI calculation (from weight_kg and height_m)].

However, our chosen data modelling technique and research question relies on the Pokemon's statistics and nothing more. All other features are already given in the dataset. Hence, there is no need to extract values to create new values.

<p style="color: #f0789f; font-weight: bold">🍚🍜 With all that, we can now conclude the data preprocessing chapter of the notebook. We can now proceed with modelling our finally-cleaned dataset.🍣🍙</p>

<h3>🍱 K-Means Clustering 🍱</h3>

<p style="font-weight: bold">🤔 Research Question 🤔: <span style="color: #8aa1ff;">What is the relation between pokemon generation and type 1 element to health points, defense, special attack, special defense, and speed?</span></p>

When analyzing the research question, it implies that grouping all the Pokemon based on their generation and primary element will be the intuition. Looking at the data modelling algorithms discussed in class, <b>clustering</b> is the most appropriate to implement that sort of intuition into action to get the desired results. 

There are two types of clustering: <b>(1) Hierarchical</b>, and <b>(2) K-Means</b>. Since hierarchical clustering does not handle varied data types and large amounts of data, we will use the <b>K-Means clustering</b> method to divide the data into groups. Moreover, K-Means works faster than hierarchical clustering when used on large datasets.

<h4>Elbow Method</h4>

To get started with clustering, it is important to determine how many clusters we have to use to group all the Pokemon. We can think of using 8 clusters if we base our reasoning on how many generations there are or 18 clusters for the same reason but for types. However, we have to use the formal way of determining the number of clusters the algorithm will use.

The <b>elbow method</b> helps you calculate the number of Ks the dataset can use. Ks are another term for clusters, hence the name <i>K-Means</i>. To employ the elbow method into the notebook, make use of the `KMeans` function from the sklearn package. We can set the maximum clusters to 18 just in case.

In [None]:
from sklearn.cluster import KMeans
from scipy.spatial.distance import cdist

In [None]:
# declare a list to keep the inertia of each number between 1 and max_clusters (30)
inertia_list = []
inertia_mappings = {}

# use a for loop for the elbow method
for each in range(1, 18):
    # cluster the dataset based on the amount of 'each'
    k_means = KMeans(n_clusters = each)

    # calculate for the mean and standard deviation of the features
    k_means.fit(standardized_features)

    # append the calculated inertia into the inertia list
    inertia_list.append(k_means.inertia_)

    # get the mappings of the sum of the minimum euclidean distances between each point
    # then divide the sum by the total number of observations/Pokemon
    inertia_mappings[each] = sum(np.min(cdist(standardized_features, k_means.cluster_centers_, 
    'euclidean'), axis=1)) / standardized_features.shape[0]

Next, plot the inertias on a line graph to find out where the bend of the curve is located. That will determine the optimal number of clusters to use. Use the `.plot()` function.

In [None]:
# plot the line of the inertias
plt.plot(range(1, 18), inertia_list)

# plot the point of the inertias
plt.scatter(range(1, 18), inertia_list)

# describe the graph
plt.title("Graph of Inertias")
plt.xlabel("No. of Clusters")
plt.ylabel("Inertia")

# reveal
plt.show() 

We can also print out the inertias below and observe when the inertia drops and slows down.

In [None]:
for x, y in inertia_mappings.items():
    print(f"{x} : {y}")

If we want to make sure we have the right number of clusters to use, use the `KneeLocator` from the `kneed` library. The kneed library can detect where the elbow is on the graph. Install the package first into the notebook.

In [None]:
!pip install kneed==0.8.1

In [None]:
from kneed import KneeLocator

In [None]:
ideal_no = KneeLocator(range(1, 18), inertia_list, curve="convex", direction="decreasing")
ideal_no.elbow

Based on the graph and list of inertias, a steady decrease in inertia starts if we set our number of clusters at <b>4</b>. This means that the algorithm has found 4 patterns of each Pokemon's individual statistics. Now that we found that the optimal number of clusters is 4, we can then start clustering our dataset.

<h4>Clustering</h4>

To begin clustering, use the `KMeans` function and set the `n_clusters` to <b>4</b>. Create another copy of the pokemon dataframe as a checkpoint in case mistakes have occurred.

In [None]:
sets = ['total_points', 'hp', 'attack','defense','sp_attack','sp_defense','speed']

In [None]:
clustered_poke_df = pokemon_df.copy()

In [None]:
kmeans = KMeans(n_clusters = 4,
init = 'random',             # Initialization method for kmeans
max_iter = 300,                 # Maximum number of iterations
n_init = 10,                    # Choose how often algorithm will run with different centroid
random_state = 1)               # Choose random state for reproducibility

Get the mean and standard deviations of each points using the `.fit()` function and create a column in the dataframe listing the clusters where each Pokemon belong to. With that, we can view the dataframe and see the clusters the Pokemon is assigned.

In [None]:
kmeans.fit(standardized_features)

In [None]:
clustered_poke_df["cluster"] = kmeans.labels_

In [None]:
clustered_poke_df

<h3>🔎 Data Exploration 🔍</h3>

Now that we have modelled our data, we can now explore it using visuals. We also want to discover how distinct each cluster is, or what makes them different from the other clusters. 

Before that, import the `seaborn` library. We can also use different color palettes for the visuals. For this notebook, we will use the `pastel` palette.

In [None]:
import seaborn as sns

In [None]:
sns.color_palette("pastel")

If we want to know which cluster is the most powerful, we can use the `.scatterplot()` method from the seaborn library.

In [None]:
sns.scatterplot(x = clustered_poke_df["attack"], y = clustered_poke_df["defense"], hue = clustered_poke_df["cluster"], palette = "pastel");

Additionally, if we want to save time, we can use the `.pairplot()` function. This pairs every category of statistic (e.g. attack, defense, etc) to each other to generate scatter plots of every combination. 

In [None]:
stats_set = clustered_poke_df[["total_points", "hp", "attack", "defense", "sp_attack", "sp_defense", "speed", "cluster"]]

In [None]:
sns.pairplot(stats_set, hue = "cluster", palette = "pastel");

Observing each of the scatterplots above, we can infer that <span><b style="color: limegreen">cluster 2</b></span> is the category where all the strongest Pokemon belong to. It dominates in almost every statistic, while <span><b style="color: skyblue">cluster 0</b></span> and <span><b style="color: orange">cluster 1</b></span> have average performance. Meanwhile, <span><b style="color: tomato">cluster 3</b></span> is the group with the weakest Pokemon.

Now, if we want to know how well each cluster performs on average per category, we can use a bar graph. Use the `.barplot()` method from the seaborn package.

In [None]:
stats_mean_df = stats_set.groupby("cluster").mean()
stats_mean_df["cluster"] = pd.Series([0, 1, 2, 3])

<h4>Overall Average for Total Points</h4>

In [None]:
sns.barplot(x = "cluster",y = "total_points",data = stats_mean_df, palette = "pastel")
plt.show()

Based on the total points, <span><b style="color: limegreen">cluster 2</b></span> has the highest mean of around 1.4 while <span><b style="color: tomato">cluster 3</b></span> has the overall worst mean for the total points.

<h4>Top 10 Pokemon per cluster</h4>

After looking at the mean, we want to know who are the top 10 Pokemon per cluster. To filter observations based on cluster, use a condition that is equal to the cluster number we are targeting. The, sort the values in descending order using `sort_values()` and set ascending to `false`. Reset the indexes using the `.reset_index()` function and lastly, set the `.head()` to <b>10</b> to view the top 10 Pokemon in that cluster.

<h5><span><b style="color: skyblue">Cluster 0</b></span></h5>

The list below reveals the top Pokemon of this cluster, with <b>Deoxys Attack Forme</b> taking the #1 spot with 600.0 points while <b>Delphox</b> takes the last spot with 534.0.

In [None]:
cluster_0_members = clustered_poke_df[clustered_poke_df["cluster"] == 0].sort_values("total_points", axis = 0, ascending = False).reset_index()
cluster_0_members[["name", "total_points"]].head(10)

<h5><span><b style="color: orange">Cluster 1</b></span></h5>

The list below reveals the top Pokemon of this cluster, with <b>Mega Aggron</b> taking the #1 spot with 630.0 points while <b>Regice</b> takes the last spot with 580.0.

In [None]:
cluster_1_members = clustered_poke_df[clustered_poke_df["cluster"] == 1].sort_values("total_points", axis = 0, ascending = False).reset_index()
cluster_1_members[["name", "total_points"]].head(10)

<h5><span><b style="color: limegreen">Cluster 2</b></span></h5>

As the most powerful members of the cluster, <b>Eternatus Eternamax</b> taking the #1 spot with a whopping 1125.0 points while <b>Zacian Crowned Sword</b> takes the last spot with 720.0.

In [None]:
cluster_2_members = clustered_poke_df[clustered_poke_df["cluster"] == 2].sort_values("total_points", axis = 0, ascending = False).reset_index()
cluster_2_members[["name", "total_points"]].head(10)

<h5><span><b style="color: tomato">Cluster 3</b></span></h5>

Lastly, as the cluster where the weakest Pokemon come to, <b>Onix</b> takes the #1 spot with 385.0 points while <b>Nidorina</b> takes the last spot with 365.0.

In [None]:
cluster_3_members = clustered_poke_df[clustered_poke_df["cluster"] == 3].sort_values("total_points", axis = 0, ascending = False).reset_index()
cluster_3_members[["name", "total_points"]].head(10)

<h4>Starters</h4>

Starter Pokemon are a trainer's best friend. They are the first-ever Pokemon you will get once you enter your journey to becoming a Pokemon Trainer. Though, we sometimes wonder, <i>"Who is the best starter of them all?"</i>. Even though starter selection depends on the person's interests or biases, this notebook showcases factual data to convince you who is objectively the best starter.

Upon inspection, we notice that all the base evolutions of the starters belong in <span><b style="color: tomato">Cluster 3</b></span>. Each of their statistics are very low that they are challenging to use in competitive battles. 

We are to rank the base starters in terms of total points, the best starters tie for the #1 spot with total points of 320.0: <b>Popplio, Litten</b>, and <b>Rowlet</b> - the generation 7 starters.
<figure>
    <img src="https://external-content.duckduckgo.com/iu/?u=http%3A%2F%2F66.media.tumblr.com%2F5644587e63c22b05ab209a933d48ef20%2Ftumblr_ocutvsLXeu1v68t0mo1_500.gif&f=1&nofb=1&ipt=7eb95e2abad2699ed717174b5af40729d033a37f8dc32d6edf962ec760438f91&ipo=images" style="width: 50%">
    <figcaption style="text-align: center">Generation 7 Starters from left to right: Rowlet, Litten, Popplio</figcaption>
</figure>

In [None]:
clustered_poke_df[["name", "total_points", "cluster"]].iloc[[0, 4, 9, 192, 195, 198, 299, 303, 
307, 464, 467, 470, 586, 589, 592, 756, 759, 762, 841, 844, 847, 940, 943, 946]].sort_values("total_points", axis = 0, ascending = False)

However, if we are to look at the fully evolved starters, they no longer belong to the weakest cluster of the four. Rather, each starter has been categorized in varied clusters. <b>Swampert</b> ranks 1 out of 24 for the starter with 535.0 total points.
<figure>
    <img src="https://external-content.duckduckgo.com/iu/?u=http%3A%2F%2Fcdn.playbuzz.com%2Fcdn%2Fb9ec3838-45db-4439-97bc-3f1c535c0e3c%2F6cd30f1a-0011-40a9-8936-383cdccffa2e.gif&f=1&nofb=1&ipt=4b39cde4d105d3af61f3b95c1746ddf3461a1746cfe6c3c6add35b168c43199b&ipo=images" style="width: 25%">
    <figcaption style="text-align: center">Swampert from Generation 3</figcaption>
</figure>

In [None]:
evolved_starters = clustered_poke_df[["name", "total_points", "cluster"]].iloc[[2, 6, 11, 194, 197, 200, 301, 305, 
309, 466, 469, 472, 588, 591, 594, 758, 761, 764, 843, 846, 849, 942, 945, 948]].sort_values("total_points", axis = 0, ascending = False)
evolved_starters

If we want to know who are the most powerful starters of them all, we have to look at those who belong to <span><b style="color: limegreen">Cluster 2</b></span>. The Top 3 starters from this cluster are: <b>(1) Typhlosion, (2) Charizard</b> and <b>(3) Primarina</b>.
<figure>
    <img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fprofessorlotus.com%2FSprites%2FTyphlosion.gif&f=1&nofb=1&ipt=dfc82b78a29f70e3fc00df1d99cf8a847bf09108ba519cb83aa78506479868f1&ipo=images" style="width: 15%">
    <figcaption style="text-align: center">Typhlosion from Generation 2</figcaption>
</figure>
<figure>
    <img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fi.pinimg.com%2Foriginals%2F13%2F57%2F33%2F135733b07291badc8cb3c083d4fd90b0.gif&f=1&nofb=1&ipt=7019c080da34f170d3fa876b77013bee201c176e5d4a91ced9fe2506a25966e6&ipo=images" style="width: 30%">
    <figcaption style="text-align: center">Charizard from Generation 1</figcaption>
</figure>
<figure>
    <img src="https://datadex.talzz.com/images/sprites/33_updated/730-primarina.gif" style="width: 20%">
    <figcaption style="text-align: center">Primarina from Generation 7</figcaption>
</figure>

In [None]:
evolved_starters[evolved_starters["cluster"] == 2]

From <span><b style="color: skyblue">Cluster 0</b></span>, the Top 3 starters from this cluster are: <b>(1) Infernape, (2) Delphox</b> and <b>(3) Inteleon</b>.
<figure>
    <img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fprofessorlotus.com%2FSprites%2FInfernape.gif&f=1&nofb=1&ipt=70e61312f6079525e1b2b50a56f7c6a28a371a762acb683ed6ff7cf8ba43e3e3&ipo=images" style="width: 25%">
    <figcaption style="text-align: center">Infernape from Generation 4</figcaption>
</figure>
<figure>
    <img src="https://external-content.duckduckgo.com/iu/?u=http%3A%2F%2Fimg4.wikia.nocookie.net%2F__cb20140319093018%2Fpokemon%2Fimages%2F3%2F31%2FDelphox_XY.gif&f=1&nofb=1&ipt=4efe53c5708d929d94bfbe4f53a3c482117516fc6bdc498df53e8a74afa9b85c&ipo=images" style="width: 20%">
    <figcaption style="text-align: center">Delphox from Generation 6</figcaption>
</figure>
<figure>
    <img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fimages.wikidexcdn.net%2Fmwuploads%2Fwikidex%2F8%2F81%2Flatest%2F20191201052530%2FInteleon_EpEc.gif&f=1&nofb=1&ipt=729fc1311983122eb0bb354c839b92b3aaaf51d159936eb2718d5bccd6eb6cc5&ipo=images" style="width: 13%">
    <figcaption style="text-align: center">Inteleon from Generation 8</figcaption>
</figure>

In [None]:
evolved_starters[evolved_starters["cluster"] == 0]

Finally, from <span><b style="color: orange">Cluster 1</b></span>, the Top 3 starters from this cluster are: <b>(1) Swampert, (2) Chesnaught</b> and <b>(3) Incineroar</b>.
<figure>
    <img src="https://external-content.duckduckgo.com/iu/?u=http%3A%2F%2Fcdn.playbuzz.com%2Fcdn%2Fb9ec3838-45db-4439-97bc-3f1c535c0e3c%2F6cd30f1a-0011-40a9-8936-383cdccffa2e.gif&f=1&nofb=1&ipt=4b39cde4d105d3af61f3b95c1746ddf3461a1746cfe6c3c6add35b168c43199b&ipo=images" style="width: 25%">
    <figcaption style="text-align: center">Swampert from Generation 3</figcaption>
</figure>
<figure>
    <img src="https://external-content.duckduckgo.com/iu/?u=http%3A%2F%2Fimg1.wikia.nocookie.net%2F__cb20140319091349%2Fpokemon%2Fimages%2Fa%2Fad%2FChesnaught_XY.gif&f=1&nofb=1&ipt=6490d8344367ef915dead8b70f4fa667cb801fcb0b7a588c28eaf9ea05bf5b40&ipo=images" style="width: 30%">
    <figcaption style="text-align: center">Chesnaught from Generation 6</figcaption>
</figure>
<figure>
    <img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fprojectpokemon.org%2Fimages%2Fnormal-sprite%2Fincineroar.gif&f=1&nofb=1&ipt=a3b284ac590647535c2ac053570a5c436d92a8846b189255fbe0bbab28d1a709&ipo=images" style="width: 30%">
    <figcaption style="text-align: center">Incineroar from Generation 7</figcaption>
</figure>

In [None]:
evolved_starters[evolved_starters["cluster"] == 1]

Based on the rankings above, we now finally know who are the top Pokemon per cluster.

Once we are done exploring our data, we can proceed with statistical inference or hypothesis testing,

<h2>🔬 Statistical Inference 🔬</h2>

Now that we have modelled our data using K-Means clustering and explored them, we will now perform hypothesis testing to reveal the answer to our research question. <b>Hypothesis testing</b> is done to prove or disprove our hypothesis based on a certain threshold.

<p style="font-weight: bold">🤔 Research Question 🤔: <span style="color: #8aa1ff;">What is the relation between pokemon generation and type 1 element to health points, defense, special attack, special defense, and speed?</span></p>

To start, install the statsmodels.formula.api then import `ols` from the `statsmodels.formula.api` package.

In [None]:
!pip install statsmodels==0.13.5

In [None]:
from statsmodels.formula.api import ols

Since our research questions revolves around the idea on whether the clusters are similar or not, the <b>One Way F-test (ANOVA)</b> will be employed to determine how dissimilar the clusters are. This test specializes on finding out if the means of all groups are the same. The following hypotheses will be used:
- <b>H<sub>O</sub>:</b> The clusters are similar
- <b>H<sub>A</sub>:</b> At least one of the clusters are dissimilar

To begin, group the Pokemon by cluster using `.groupby()`. Print a list of Pokemon belong to each cluster using a `for loop`.

In [None]:
new_clustered_poke_df = clustered_poke_df.groupby(["cluster"])
sets = ['name','status','generation','total_points', 'hp', 'attack','defense','sp_attack','sp_defense','speed','cluster']

for i in range(0,4):
    print("cluster ",i)
    print(new_clustered_poke_df[sets].get_group(i).head(10))
    print("\n")

Next, perform the test using `.f_oneway()` to compute for the f-Score and the p-value. Set the <b>significance level</b> to <b>α = 0.05</b> as the threshold for determining the significance of the resulting p-value. 

In [None]:
sets = ["total_points","hp","attack","defense","sp_attack","sp_defense","speed"]
clustered_poke_df[sets] = copy_poke_df[sets]

new_clustered_poke_df = clustered_poke_df.groupby("cluster")
sets = ['hp', 'attack','defense','sp_attack','sp_defense','speed']
cluster0 = []
cluster1 = []
cluster2 = []
cluster3 = []

cluster0 = new_clustered_poke_df[sets].get_group(0)
cluster1 = new_clustered_poke_df[sets].get_group(1)
cluster2 = new_clustered_poke_df[sets].get_group(2)
cluster3 = new_clustered_poke_df[sets].get_group(3)

# Identify whether there is any relationship within their spending scores
F, p = stats.f_oneway(cluster0,cluster1,cluster2,cluster3)
print('F-Score: ' + str(F))
print('P-value: ' + str(p))

for _ in p:
    if _ < 0.05:
        print('Reject Null hypothesis. Clusters are not similar')
    else:
        print('Accept Null hypothesis. Clusters are similar')

After testing, the <b>f-scores</b> are <b>[191.50148188, 265.28037858, 305.77092542, 326.16211164, 295.52398644, 423.59848505]</b>.

Moreover, the resulting <b>p-values</b> for each stat is <b>[1.43892319e-098, 2.31624906e-127, 1.03220488e-141, 1.32527294e-148, 3.62316869e-138, 7.11334581e-179]</b>. 

Since these p-values are less than the <b>significance level of 0.05</b>, we <b>reject the null hypothesis</b>. 

Therefore, we conclude that <b>the clusters are dissimilar (not similar)</b>.

<h3>Classifying Clusters</h3>

In [None]:
to_csv_poke_df = clustered_poke_df.copy()
length = len(to_csv_poke_df)
to_csv_poke_df["cluster_name"] ="none"
data = []

If we look at the clusters of each starter Pokemon, their differences are noticeable once we analyze the features. To put them simply:
- <span><b style="color: skyblue">Cluster 0</b>:</span> Fragile sweeper
- <span><b style="color: orange">Cluster 1</b>:</span> Slow yet bulky
- <span><b style="color: limegreen">Cluster 2</b>:</span> Best overall
- <span><b style="color: tomato">Cluster 3</b>:</span> No potential for competitive

In [None]:
for i in range(0,length):
    if(to_csv_poke_df.iloc[i]["cluster"] == 0):
        data.append("Fragile sweeper")
    elif(to_csv_poke_df.iloc[i]["cluster"] == 1):
        data.append("Slow yet bulky")
    elif(to_csv_poke_df.iloc[i]["cluster"] == 2):
        data.append("Best overall")
    elif(to_csv_poke_df.iloc[i]["cluster"] == 3):
        data.append("No potential for competitive")

to_csv_poke_df["cluster_name"] = data

sets = ["name","generation","status","type_1","total_points","hp","attack","defense","sp_attack","sp_defense","speed","cluster","cluster_name"]
to_csv_poke_df = to_csv_poke_df[sets]
sets = ["total_points","hp","attack","defense","sp_attack","sp_defense","speed"]
to_csv_poke_df[sets] = copy_poke_df[sets]
sets = ["name","generation","status","type_1","total_points","hp","attack","defense","sp_attack","sp_defense","speed","cluster","cluster_name"]
to_csv_poke_df[sets].head(4)

The median and mean health points, attack, defense, special attack, special defense, and speed of each cluster will be taken.

In [None]:
to_csv_radar = to_csv_poke_df.groupby(["cluster"]).agg({
    "hp":"median",
    "attack":"median",
    "defense":"median",
    "sp_attack":"median",
    "sp_defense":"median",
    "speed":"median"
})
to_csv_radar

to_csv_radar2 = to_csv_poke_df.groupby(["cluster"]).agg({
    "hp":"mean",
    "attack":"mean",
    "defense":"mean",
    "sp_attack":"mean",
    "sp_defense":"mean",
    "speed":"mean"
})
to_csv_radar2

To support the argument stated above regarding the clusters, we take into account each statistic and then graph a radar chart.

In [None]:
to_csv_poke_df.to_csv("hierachy_poke.csv")
to_csv_radar.to_csv("radar_poke.csv")
to_csv_radar2.to_csv("radar_poke2.csv")

In [None]:
df_from_cluster0 = poke_df[poke_df['pokedex_number'].isin(cluster0.index.values.tolist())]
df_from_cluster1 = poke_df[poke_df['pokedex_number'].isin(cluster1.index.values.tolist())]
df_from_cluster2 = poke_df[poke_df['pokedex_number'].isin(cluster2.index.values.tolist())]
df_from_cluster3 = poke_df[poke_df['pokedex_number'].isin(cluster3.index.values.tolist())]

<h4><span><b style="color: skyblue">Cluster 0</b>:</span> Fragile sweeper</h4>

In [None]:
import plotly.graph_objects as go
cols = poke_df.columns[16:22].tolist()
fig = go.Figure()

for index, row in df_from_cluster0.iterrows():
    fig.add_trace(go.Scatterpolar(
        r=row[cols].values.tolist(),
        theta=cols,
        fill='toself',
        name=row[2]
    ))

fig.update_layout(
  polar=dict(
    radialaxis=dict(
      visible=True
    ),
  ),
  showlegend=True
)

fig.show()

If we look at the radar chart of this cluster, their highest stats are <b>speed, attack</b>, and <b>special attack</b>. Yet, their special defense and defense are low. We can use the `.mean()` function to get the average points for this cluster.

In [None]:
cluster_0_df = clustered_poke_df[clustered_poke_df["cluster"] == 0]
cluster_0_df[["hp", "attack", "defense", "sp_attack", "sp_defense", "speed"]].mean().sort_values(ascending = False)

Based on the results, we can assume that <span><b style="color: skyblue">Cluster 0</b></span> houses Pokemon that are mixed attackers with quick reflexes yet as fragile as glass.

<h4><span><b style="color: orange">Cluster 1</b>:</span> Slow yet bulky</h4>

In [None]:
for index, row in df_from_cluster1.iterrows():
    fig.add_trace(go.Scatterpolar(
        r=row[cols].values.tolist(),
        theta=cols,
        fill='toself',
        name=row[2]
    ))

fig.update_layout(
  polar=dict(
    radialaxis=dict(
      visible=True
    ),
  ),
  showlegend=True
)

fig.show()

If we look at the radar chart of this cluster, their highest stat is <b>defense</b>. In contrast to the fragile sweepers previously, this cluster has low average speed, making them the last to move in a battle.

In [None]:
cluster_1_df = clustered_poke_df[clustered_poke_df["cluster"] == 1]
cluster_1_df[["hp", "attack", "defense", "sp_attack", "sp_defense", "speed"]].mean().sort_values(ascending = False)

Based on the results, we can assume that <span><b style="color: orange">Cluster 1</b></span> houses the slow yet ultra defensive walls made to stall the opponent's Pokemon.

<h4><span><b style="color: limegreen">Cluster 2</b>:</span> Best overall</h4>

In [None]:
for index, row in df_from_cluster2.iterrows():
    fig.add_trace(go.Scatterpolar(
        r=row[cols].values.tolist(),
        theta=cols,
        fill='toself',
        name=row[2]
    ))

fig.update_layout(
  polar=dict(
    radialaxis=dict(
      visible=True
    ),
  ),
  showlegend=True
)

fig.show()

If we look at the radar chart of this cluster, their highest stat are <b>attack</b> and <b>special attack</b>. Moreover, their special defense is also good, as well as their speed. Their lowest, however, is HP, which would mean that their energy can be drained quicker in battle.

In [None]:
cluster_2_df = clustered_poke_df[clustered_poke_df["cluster"] == 2]
cluster_2_df[["hp", "attack", "defense", "sp_attack", "sp_defense", "speed"]].mean().sort_values(ascending = False)

Based on the results, we can assume that <span><b style="color: limegreen">Cluster 2</b></span> houses the Pokemon that have the potential for competitive battles.

<h4><span><b style="color: tomato">Cluster 3</b>:</span> No potential for competitive</h4>

In [None]:
for index, row in df_from_cluster3.iterrows():
    fig.add_trace(go.Scatterpolar(
        r=row[cols].values.tolist(),
        theta=cols,
        fill='toself',
        name=row[2]
    ))

fig.update_layout(
  polar=dict(
    radialaxis=dict(
      visible=True
    ),
  ),
  showlegend=True
)

fig.show()

If we look at the radar chart of this cluster, their highest stat are <b>attack</b>. However, this is still the lowest when compared with the previous clusters. Since all of their stats are the lowest among the clusters, they have no use for competitive battling unless you are capable of their high maintenance nature.

In [None]:
cluster_3_df = clustered_poke_df[clustered_poke_df["cluster"] == 3]
cluster_3_df[["hp", "attack", "defense", "sp_attack", "sp_defense", "speed"]].mean().sort_values(ascending = False)

Based on the results, we can assume that <span><b style="color: tomato">Cluster 3</b></span> houses the Pokemon that should be used at your own risk unless you only want to keep them as pets at home.

<h3>Median and Mean</3>

To gather better information from the individual data of each Pokemon, the median and mean radar charts are shown to know which cluster is the best and the worst in each stat. To graph a radar chart, import the `display` and `HTML` from the `IPython.core.display` package.

In [None]:
from IPython.core.display import display, HTML

<h4>Median</h4>

The radar chart of the median of each clusters are shown below

In [None]:
display(HTML('<div class="flourish-embed flourish-radar" data-src="visualisation/12108136"><script src="https://public.flourish.studio/resources/embed.js"></script></div>'))


Like before, <span><b style="color: limegreen">Cluster 2</b></span> dominates all stats while <span><b style="color: tomato">Cluster 3</b></span> is the weakest cluster.

<h4>Mean</h4>

The radar chart of the means of each clusters are also shown below

In [None]:
display(HTML('<div class="flourish-embed flourish-radar" data-src="visualisation/12108618"><script src="https://public.flourish.studio/resources/embed.js"></script></div>'))


Like before, <span><b style="color: limegreen">Cluster 2</b></span> dominates all stats while <span><b style="color: tomato">Cluster 3</b></span> is the weakest cluster.

<h3>Clusters of Generations and Elements</3>

The cluster below groups Pokemon in terms of generation first followed by their primary typing. The display is interactive. Click on a cluster to view its subclusters of generations and typing.

In [None]:
display(HTML('<div class="flourish-embed flourish-hierarchy" data-src="visualisation/12105634"><script src="https://public.flourish.studio/resources/embed.js"></script></div>'))

In <span><b style="color: purple">Cluster 0 (ranged attacker with good mobility)</b></span>, although <b>Normal</b> and <b>Water</b> seem to stand out more, there are not a lot of type 1 elements that stand out so much. As per generation, <b>generation 1</b> dominates this cluster with 59 total Pokémon and second place goes to generation 5 with 44 total Pokémon

In <span><b style="color: orange">Cluster 1(Physical type that can tank and damage)</b></span>, since they are known as the physical type that can tank and take hits very well, <b>rock</b> and <b>steel</b> seem to dominate this cluster more by having larger counts than the other elements. As per generation, there does not seem to be much of a significant difference in frequency where <b>generation 1</b> having 40 total Pokémon generation 3 having 39 total Pokémon and generation 5 having a total of 38 Pokémon.

In <span><b style="color: hotpink">Cluster 2 (permanent ban because of OP statistics)</b></span>, you can notice that <b>dragon</b> and <b>psychic</b> have larger clusters than the other elements. This is expected because these are known to be a very strong pair. <b>Generation 3</b> consists a total of 30 Pokémon, making it the most frequent one in this cluster. However, it does not matter that much because Generation 5 and 4 come out to having 27 and 26 Pokémon respectively. You could say these generations are the most apparent in this cluster.

Lastly, in <span><b style="color: blue">Cluster 3 (group of weak pokemon)</b></span>, <b>water</b> seems to stand out the most as it has larger clusters compared to the other type 1 elements. On the other hand, <b>generation 1</b> and 5 seem to dominate this cluster having 68 and 62 total Pokémon respectively.

<p style="color: skyblue; font-weight: bold">📖 Now that the whole process of data preprocessing, data modelling, exploratory data analysis, and statistical inference, we can finally come up with our conclusion based on the results. 📚</p>

<h2>💡 Insights and Conclusion 💡</h2>

<h3>Which generation is best to use?</h3>

Generation-wise, there were not a lot of differences to say which generation is the "strongest" or the "best" to use. Most of the generations seem pretty equal despite generation 3 dominating the overpowered cluster 2. 

There are many choices of Pokémon you can choose among each generation so it is best to not be so worried on which generation to use to have stronger Pokémon. It all depends in the region/game the trainer/user is in. Some games limit you to only use Pokemon exclusive to the region, while some games allow foreign Pokemon in their region upon transfer.

In short, <b>there is no "best" generation</b>. 

<h3>Which type 1 element is best to use?</h3>

As seen in cluster 2, <b>psychic</b> and <b>dragon</b> type 1 element is best to use. Since cluster 2 dominates all the clusters in all statistics but defense, it is best to find Pokémon of these elements to dominate in battle. 

However, if you would like Pokémon that is good on defense, it best to use <b>rock</b> and <b>steel</b> Pokémon as seen on cluster 1 because, statistically, their defense is better than even those on the overpowered cluster. 

<h2>💌 The End ✉️</h2>

You have reached the end of the notebook. This project is done by <b>DAYON, Elijah, ERMITANO, Kate Justine,</b> and <b>LLAMADO, Jon</b> as a requirement to pass CSMODEL under the instructions of <b>Sir Arren Matthew C. Antioquia</b>, submitted on December 9, 2022.

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=c26f6b62-b819-4c99-9d10-2b13381c9f87' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>