<center><h1>Pokemon Analysis and Clustering</h1></center>

![Pokemon_image](https://www.technosamrat.com/wp-content/uploads/2012/05/Pokemon-Wallpapers-Picture.jpg)

<a id="top"></a>
<div class="list-group" id="list-tab" role="tablist">
<h3 class="list-group-item list-group-item-action active" data-toggle="list" role="tab" aria-controls="home">Table of Content</h3>

* [1. Introduction](#1)
* [2. What is a Pokemon?](#2)
* [3. The Data](#3)
* [4. Importing Packages](#4)    
* [5. Import Dataset](#5)
* [6. Explore the Dataset and Calculate the Weakness Scores](#6)
* [7. Cluster Analysis](#7)
* [8. Analyzing Clusters](#8)
* [9. Conclusion](#9)


<a id="1"></a>
# 1. Introduction

In this Notebook, we will analyze which Pokemon are the strongest and weakest based only on the type the Pokemon has been assigned. Using K-Means clustering we will also be clustering which Pokemon are alike based on the Pokemon type combinations. We will also be exploring several different methods to find the optimal number of clusters This will allow the user to organize their starting Pokemon so that the trainer will have a strong attack and defence no matter what Pokemon the opponent plays. 

<a id="2"></a>
# 2. What is a Pokemon?

Pokemon are creatures in a video game that are used to battle against other Pokemon. A person who owns Pokemon for battle is called a trainer, typically trainers duel each other in a Pokemon battle with a set of 6 Pokemon each. Pokemon are assigned types along with different attacks, each attack is also assigned a type. This is important because each Pokemon type is assigned strenths and weaknesses. For example, Charmander is a fire type Pokemon and because he is fire type he is weak if he is attacked by a water move and thus will deal extra damage to Charmander if attacked. Charmander also has a set of attacks, of course he will be assigned fire attacks but he might have an array of other attack types too. His fire attacks will be strong against grass type Pokemon and deal extra damage against grass type Pokemon. One last thing, Pokemon have be assigned up to two types. For example, Charazard is both fire and flying type, this comes with both weakness and advantages. Along with the characteristics of fire type moves that we discussed Charazard also adops the strenths and weaknesses of flying type Pokemon. Fo example, Charazard will be strong against fighting type Pokemon along with grass types and weak against electricc types along with water.

<a id="3"></a>
# 3. The Data

The data set used in this notebook contains the weakness muliplier for about 500 Pokemon for each type of attack it encouners. If the attack is super affective on the Pokemon it will be assigned a 2. If is weak or super weak it will be assigned a .5 or .25. If it is upaffected at all it will be assigned 0. Lastly, if the attack has no special affect it will be assigned a 1.

<a id="4"></a>
# 4. Importing Packages

In [None]:
!pip install kneed

In [None]:
#import libraries
import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from kneed import KneeLocator
from sklearn.metrics import silhouette_score

<a id="5"></a>
# 5. Import Dataset

In [None]:
#import dataset
poke = pd.read_csv("/kaggle/input/pokemon-type-matchup-data/PokeTypeMatchupData.csv")
print(poke.head())

<a id="6"></a>
# 6. Explore the Dataset and Calculate the Weakness Scores

In [None]:
#define variables
n=len(poke)
unique_type_combos = 154
numeric_columns = poke.columns.drop(['Name'])
poke_type_columns = numeric_columns.drop(['Number'])

In [None]:
print(numeric_columns)
print(type(poke["Normal"][0]))

In [None]:
#takeout the first position of each value of each field 
#and convert each to a float type
for name in numeric_columns:
    for i in range(0,n):
        poke[name][i] = float(poke[name][i][1:])

In [None]:
print(poke.head())

In [None]:
#lets rank each pokemon based on how well they battle each type
poke["score"] = 1
for i in poke_type_columns:
    poke["score"] = poke["score"]+poke[i]

In [None]:
#check to see the top and bottom strongest pokemon
print(poke.sort_values("score")[["Name","score"]].head(20))
print(poke.sort_values("score")[["Name","score"]].tail(20))

In [None]:
x = poke["score"]
plt.hist(x)
print("Overall our best score is", str(min(x)), " and our worst score is ", str(max(x)))

In [None]:
#list the pokemon with the highest score
print(poke[poke["score"] == 14.25]["Name"])

All of the above pokemon have have the type combination of steel and fairy. This is an impressive combination because defensively steel is strong against normal, grass, ice, flying, psychic, bug, rock, ghost, dragon, dark, steel, and fairy. When fairy is added to this type combination it negates steel's weakness against fighing and increases it's strength against bug, dark and dragon. This leaves only weeknesses to ground and fire.

In [None]:
#list the pokemon with the lowest score
print(poke[poke["score"] == 26]["Name"])

The above two pokemon have have the type combination of grass and ice. Although, grass is defensively strong against water, grass, elecctric, and ground, it is weak against fire, ice, poison, flying, and bug. Additionally, when ice is added to the equation the weakness increase to fighting, rock, and steel. It also has a double weaakness to fire.

<a id="7"></a>
# 7. Cluster Analysis

In the above analysis we found which pokemon types are the strongest and weakest. Let's go a bit further to see which type combinations are most similar to each other by using a cluster analysis. This way we can better understand we want to choose when forming our team.

## K-Means Clustering

In [None]:
#let's do a cluster analysis on each pokemon

kmeans = KMeans(
    init="random",
    n_clusters=8,
    n_init=10,
    max_iter=300,
    random_state=42
)
kmeans.fit(poke[poke_type_columns])

In [None]:
#analyze the output
# The lowest SSE value
print(kmeans.inertia_)

# Final locations of the centroid
kmeans.cluster_centers_

# The number of iterations required to converge
print(kmeans.n_iter_)

# first 5 predicted labels
print(kmeans.labels_[:5])

Below I'll use the Elbow Method to find the optimal number of Clusters

In [None]:
# here we can use the elbow method to find the optimal number of clusters
kmeans_kwargs = {
    "init": "random",
    "n_init": 10,
    "max_iter": 300,
    "random_state": 42,
}

# A list holds the SSE values for each k
# Python’s dictionary unpacking operator (**)
sse = []
for k in range(1, 20):
    kmeans = KMeans(n_clusters=k, **kmeans_kwargs)
    kmeans.fit(poke[poke_type_columns])
    sse.append(kmeans.inertia_)

In [None]:
 plt.style.use("fivethirtyeight") 
 plt.plot(range(1, 20), sse) 
 plt.xticks(range(1, 20))
 plt.xlabel("Number of Clusters")
 plt.ylabel("SSE")
 plt.show()

Below I will compare the Elbow Method to the Silhouette Method for finding the optimal number of clusters

In [None]:
# A list holds the silhouette coefficients for each k
silhouette_coefficients = []
scaled_features = poke[poke_type_columns]

# Notice you start at 2 clusters for silhouette coefficient
for k in range(2, 11):
    kmeans = KMeans(n_clusters=k, **kmeans_kwargs)
    kmeans.fit(scaled_features)
    score = silhouette_score(scaled_features, kmeans.labels_)
    silhouette_coefficients.append(score)

In [None]:
plt.style.use("fivethirtyeight")
plt.plot(range(2, 11), silhouette_coefficients)
plt.xticks(range(2, 11))
plt.xlabel("Number of Clusters")
plt.ylabel("Silhouette Coefficient")
plt.show()

## Hierarchical clustering

In [None]:
import numpy as np
import pandas as pd

import scipy
from scipy.cluster.hierarchy import dendrogram,linkage
from scipy.cluster.hierarchy import fcluster
from scipy.cluster.hierarchy import cophenet
from scipy.spatial.distance import pdist

import matplotlib.pyplot as plt
from pylab import rcParams

import sklearn
from sklearn import datasets
from sklearn.cluster import AgglomerativeClustering
import sklearn.metrics as sm
from sklearn.preprocessing import scale

In [None]:
#Configure the output
np.set_printoptions(precision=4,suppress=True)
%matplotlib inline
rcParams["figure.figsize"] =20,10

In [None]:
z = linkage(scaled_features,"ward")

#generate dendrogram
dendrogram(z,truncate_mode= "lastp", p =10, leaf_rotation=45,leaf_font_size=15, show_contracted=True)
plt.title("Truncated Hierachial Clustering Dendrogram")
plt.xlabel("Cluster Size")
plt.ylabel("Distance")
#divide the cluster
plt.axhline(y=20)
plt.show()

<a id="8"></a>
# 8. Analyzing Clusters

Going further I went with the heirarichal clustering results and chose to go with 5 clusters. I did this because a trainer is able to choose 6 pokemon for their team. Therefore the cluster analysis can help us choose 5 pokemon with disjoint type combinations and one extra to be chosen by the trainer.

In [None]:
#Plotly Libraris
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.colors import n_colors
from plotly.subplots import make_subplots

In [None]:
#let's do a cluster analysis on each pokemon

kmeans = KMeans(
    init="random",
    n_clusters=5,
    n_init=10,
    max_iter=300,
    random_state=42
)
kmeans.fit(poke[poke_type_columns])

# attach predicted labels to our dataset
poke["clusters"] = kmeans.labels_

#histogram of clusters
fig = px.histogram(poke, x="clusters")
fig.update_layout(
    title_text='Number of Pokemon in Each Cluster', # title of plot
    xaxis_title_text='Cluster', # xaxis label
    yaxis_title_text='Number of Pokemon', # yaxis label
    bargap=0.2, # gap between bars of adjacent location coordinates
)
fig.show()

Now that we've assigned each pokemon to a cluster let's find the weaknesses and strengths of each cluster

In [None]:
grid = []
for j in poke_type_columns:
    typ = []
    for i in range(0,5):
        typ.append(poke[poke["clusters"] == i][j].mean())
    grid.append(typ)

In [None]:
grid1 = pd.DataFrame(grid)
grid2 = grid1.transpose()
grid2.columns = poke_type_columns
# grid2["clusters"] = [1,2,3,4,5,6,7,8]
print(grid2)

In [None]:
pts = []
for i in range(0,5):
    df = pd.DataFrame(grid2)
    pts.append(df.iloc[i, :])

### Cluster 1

In [None]:
fig_country = go.Figure()
fig_country.add_trace(go.Bar(x=poke_type_columns,y=pts[0], marker=dict(color=list(range(20)), colorscale="Sunsetdark")))

fig_country.update_layout(title="Cluster 1 Weakness Chart",
                             xaxis_title="Poke Type", yaxis_title="Damage Multiplyer",title_x=0.5, paper_bgcolor="mintcream",
                             title_font_size=20)
fig_country.show()

From the above output we see that this cluster is weak against fire, ice, flying, and rock and strong against grass, fighting, and ground. This cluster consists of the type of pokemon that are grass, flying, fairy or some combination of the 3. Some pokemon in this cluster include Butterfree, Exeggutor, Mr. Mime, Pinser, and even Charizard. 

In [None]:
#run to see 30 pokemon in this cluster
print(poke[poke["clusters"]==0]["Name"].head(30))

### Cluster 2

In [None]:
fig_country = go.Figure()
fig_country.add_trace(go.Bar(x=poke_type_columns,y=pts[1], marker=dict(color=list(range(20)), colorscale="Sunsetdark")))

fig_country.update_layout(title="Cluster 2 Weakness Chart",
                             xaxis_title="Poke Type", yaxis_title="Damage Multiplyer",title_x=0.5, paper_bgcolor="mintcream",
                             title_font_size=20)
fig_country.show()

From the above output we see that this cluster is moderately weak against fire, water, fighting and very weak against ground. This cluster is strong against grass, poison, and bug. This cluster consists of the type of pokemon that are fire, electric, poison, or steel which explains the strong weakness to ground type attacks. Some pokemon in this cluster include Ninetails, Joltion, gastly, and Magnemite.

In [None]:
#run to see 30 pokemon in this cluster
print(poke[poke["clusters"]==1]["Name"].head(30))

### Cluster 3

In [None]:
fig_country = go.Figure()
fig_country.add_trace(go.Bar(x=poke_type_columns,y=pts[2], marker=dict(color=list(range(20)), colorscale="Sunsetdark")))

fig_country.update_layout(title="Cluster 3 Weakness Chart",
                             xaxis_title="Poke Type", yaxis_title="Damage Multiplyer",title_x=0.5, paper_bgcolor="mintcream",
                             title_font_size=20)
fig_country.show()

From the above output we see that this cluster is moderately weak against ice, ground, fighting, very weak against water and extremely weak against grass. This cluster is strong against poison, and rock and extremely strong against electric. This cluster consists of the type of mostly pokemon that are ground and rock which explains the strong weakness to grass type attacks and strong defense to electric. Some pokemon in this cluster include Diglett, Cubone, Tyranitar, and Rydon.

In [None]:
#run to see 30 pokemon in this cluster
print(poke[poke["clusters"]==2]["Name"].head(30))

### Cluster 4

In [None]:
fig_country = go.Figure()
fig_country.add_trace(go.Bar(x=poke_type_columns,y=pts[3], marker=dict(color=list(range(20)), colorscale="Sunsetdark")))

fig_country.update_layout(title="Cluster 4 Weakness Chart",
                             xaxis_title="Poke Type", yaxis_title="Damage Multiplyer",title_x=0.5, paper_bgcolor="mintcream",
                             title_font_size=20)
fig_country.show()

From the above output we see that this cluster is weak against electric, grass, ghost, and dark. This cluster is strong against fire, water, ice, and steel. This cluster consists of the type of pokemon that are water and psychic type. Some pokemon in this cluster include Abra, Wartortle, Slowpoke, Mew, and Kingler.

In [None]:
#run to see 30 pokemon in this cluster
print(poke[poke["clusters"]==3]["Name"].head(30))

### Cluster 5

In [None]:
fig_country = go.Figure()
fig_country.add_trace(go.Bar(x=poke_type_columns,y=pts[4], marker=dict(color=list(range(20)), colorscale="Sunsetdark")))

fig_country.update_layout(title="Cluster 5 Weakness Chart",
                             xaxis_title="Poke Type", yaxis_title="Damage Multiplyer",title_x=0.5, paper_bgcolor="mintcream",
                             title_font_size=20)
fig_country.show()

From the above output we see that this cluster is weak against electric, grass, ghost, and dark. This cluster is strong against fire, water, ice, and steel. This cluster consists of the type of pokemon that are water and psychic type. Some pokemon in this cluster include Abra, Wartortle, Slowpoke, Mew, and Kingler.

In [None]:
#run to see 30 pokemon in this cluster
print(poke[poke["clusters"]==3]["Name"].head(30))

<a id="9"></a>
# 9. Conclusion

In conclusion we used the pokemon dataset to get better understanding of pokemon tpes and their combinations. I first calculated an aggregate score from the type matchup scores to find the weakest and strongest pokemon type combinations. Then I used a K-Means cluster analysis to group like-type combinations to better understand which types are most similar and to help us choose a pokemon team. 

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Go to TOC</a>