https://github.com/RichardKCho/info2950final.git

#### Data analysis questions:
Which stats (hp, attack, defense, sp.attack, sp.defense, speed), types, and generations are correlated with given tiers (OU, UU, RU, PU, NU)? After identifying a correlation, how can we use specific stats to determine a new pokemon's tier ranking?

#### What are the observations (rows) and the attributes (columns)?
Rows: pokemon names

Columns: type1, type2, hp, attack, defense, sp.attack, sp.defense, speed, total, average, generation, usage rate, tier

#### Why was this dataset created?
This dataset was created to keep track of how often (in percentage) certain pokemon were used in competitive pokemon battles. The dataset was organized so that users could easily see which pokemon were overused (good), poor-used (bad), and in between.

#### Who funded the creation of the dataset?
Smogon, a competitive pokemon site featuring analyses, articles, and popular forums. They would want this dataset created in order to have more pokemon players use the website for their needs in competitive play.

#### What processes might have influenced what data was observed and recorded and what was not? 
This particular dataset was trying to measure the frequency at which pokemon are used. Therefore, when the data was collected, it collected raw usage and percent usage, along with the pokemon names. Although pokemon stats and moves are largely relevant in competitive play, it was not directly used in collecting usage percentage, therefore it was not recorded. Moreover, this data was collected from the top players in competitive Pokemon play, and therefore are most representative of what the best players use in each tier.

#### What preprocessing was done, and how did the data come to be in the form that you are using? 
First, pokemon in each tier that were used less than 0.00001% were eliminated, as they would be recorded as used 0.00000%. Some pokemon were also listed as "Type:Null" and were therefore removed from the dataset. Additionally, the stats dataset with all pokemon was merged with the tiered dataset to include statistical information on the different pokemon -- all pokemon that were not in the original dataset or showed up more than once were removed.

#### Potential problems with the dataset.
One potential problem with the dataset is that pokemon are listed in multiple tiers because of how Smogon allows competitive players to play lower-tiered pokemon in higher tiers -- for example, Venusaur exists in the UU tier, but is sometimes used in the OU tier of gameplay as a surprise pick. Moreover, lesser evolved and weaker Pokemon (ie. Bulbasaur) are listed in several tiers of gameplay, we have currently left them in to gauge their impact as an outlier, and to understand the extent to which stats play a factor in a Pokemon's usage rate in competitive play.

#### If people are involved, were they aware of the data collection and if so, what purpose did they expect the data to be used for? 
N/A

#### Where can your raw source data be found, if applicable? Provide a link to the raw data (hosted in a Cornell Google Drive or Cornell Box).
https://www.smogon.com/stats/2020-04/?fbclid=IwAR2L5meKdsTH0Hx-uoPIcq_X-N8f899xK0u_IPkhkYMX01xvK9MiXWwfOq4

In [27]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from sklearn.linear_model import LinearRegression
from scipy.stats import spearmanr

In [16]:
data = pd.read_csv("pokemon_gen_1_to_8.csv")
pok = pd.DataFrame(data)
pok.head()

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Attack,Sp. Defense,Speed,Total,Average,Generation
0,1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,318,53.0,1
1,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,405,67.5,1
2,3,Venusaur,Grass,Poison,80,82,83,100,100,80,525,87.5,1
3,3,Venusaur (Mega Venusaur),Grass,Poison,80,100,123,122,120,80,625,104.17,1
4,4,Charmander,Fire,,39,52,43,60,50,65,309,51.5,1


In [17]:
#creating a list for OU Pokemon
ou = pd.read_csv("gen8_ou.csv")
ou.columns = ["Name", "Usage Rate"]
tier = "OU"
ou["Tier"] = tier
#for col in ou.columns:
 #   print(col)
ou_merge = pok.merge(ou)
ou_merge

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Attack,Sp. Defense,Speed,Total,Average,Generation,Usage Rate,Tier
0,1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,318,53.00,1,0.00850%,OU
1,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,405,67.50,1,0.00020%,OU
2,3,Venusaur,Grass,Poison,80,82,83,100,100,80,525,87.50,1,1.85513%,OU
3,4,Charmander,Fire,,39,52,43,60,50,65,309,51.50,1,0.00001%,OU
4,5,Charmeleon,Fire,,58,64,58,80,65,80,405,67.50,1,0.00013%,OU
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
324,882,Dracovish,Water,Dragon,90,90,100,70,80,75,505,84.17,8,15.58086%,OU
325,883,Arctovish,Water,Ice,90,90,100,80,90,55,505,84.17,8,0.01585%,OU
326,884,Duraludon,Steel,Dragon,70,95,115,120,50,85,535,89.17,8,0.09644%,OU
327,886,Drakloak,Dragon,Ghost,68,80,50,60,50,102,410,68.33,8,0.00001%,OU


In [18]:
#creating a list for UU Pokemon
uu = pd.read_csv("gen8_uu.csv")
uu.columns = ["Name", "Usage Rate"]
tier = "UU"
uu["Tier"] = tier
uu_merge = pok.merge(uu)
uu_merge

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Attack,Sp. Defense,Speed,Total,Average,Generation,Usage Rate,Tier
0,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,405,67.50,1,0.04553%,UU
1,3,Venusaur,Grass,Poison,80,82,83,100,100,80,525,87.50,1,6.07078%,UU
2,6,Charizard,Fire,Flying,78,84,78,109,85,100,534,89.00,1,0.57949%,UU
3,8,Wartortle,Water,,59,63,80,65,80,58,405,67.50,1,0.00622%,UU
4,9,Blastoise,Water,,79,83,100,85,105,78,530,88.33,1,3.01063%,UU
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
239,879,Copperajah,Steel,,122,130,69,80,69,30,500,83.33,8,2.50127%,UU
240,881,Arctozolt,Electric,Ice,90,100,90,90,80,55,505,84.17,8,0.01599%,UU
241,883,Arctovish,Water,Ice,90,90,100,80,90,55,505,84.17,8,0.00819%,UU
242,884,Duraludon,Steel,Dragon,70,95,115,120,50,85,535,89.17,8,3.09772%,UU


In [19]:
#creating a list for RU Pokemon
ru = pd.read_csv("gen8_ru.csv")
ru.columns = ["Name", "Usage Rate"]
tier = "RU"
ru["Tier"] = tier
ru_merge = pok.merge(ru)
ru_merge

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Attack,Sp. Defense,Speed,Total,Average,Generation,Usage Rate,Tier
0,1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,318,53.00,1,0.00025%,RU
1,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,405,67.50,1,0.15707%,RU
2,6,Charizard,Fire,Flying,78,84,78,109,85,100,534,89.00,1,16.40012%,RU
3,8,Wartortle,Water,,59,63,80,65,80,58,405,67.50,1,1.47330%,RU
4,12,Butterfree,Bug,Flying,60,45,50,90,80,70,395,65.83,1,0.10416%,RU
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
218,877,Morpeko,Electric,Dark,58,95,58,70,58,97,436,72.67,8,6.00128%,RU
219,879,Copperajah,Steel,,122,130,69,80,69,30,500,83.33,8,15.06932%,RU
220,881,Arctozolt,Electric,Ice,90,100,90,90,80,55,505,84.17,8,0.49840%,RU
221,883,Arctovish,Water,Ice,90,90,100,80,90,55,505,84.17,8,0.82586%,RU


In [20]:
#creating a list for NU Pokemon
nu = pd.read_csv("gen8_nu.csv")
nu.columns = ["Name", "Usage Rate"]
tier = "NU"
nu["Tier"] = tier
nu_merge = pok.merge(nu)
nu_merge

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Attack,Sp. Defense,Speed,Total,Average,Generation,Usage Rate,Tier
0,1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,318,53.00,1,0.00025%,NU
1,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,405,67.50,1,1.87055%,NU
2,5,Charmeleon,Fire,,58,64,58,80,65,80,405,67.50,1,0.00283%,NU
3,8,Wartortle,Water,,59,63,80,65,80,58,405,67.50,1,0.12936%,NU
4,12,Butterfree,Bug,Flying,60,45,50,90,80,70,395,65.83,1,3.84170%,NU
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
217,871,Pincurchin,Electric,,48,101,95,91,85,15,435,72.50,8,0.10565%,NU
218,874,Stonjourner,Rock,,100,125,135,20,20,70,470,78.33,8,1.02128%,NU
219,881,Arctozolt,Electric,Ice,90,100,90,90,80,55,505,84.17,8,0.44822%,NU
220,883,Arctovish,Water,Ice,90,90,100,80,90,55,505,84.17,8,5.29169%,NU


In [21]:
#creating a list for PU Pokemon
pu = pd.read_csv("gen8_pu.csv")
pu.columns = ["Name", "Usage Rate"]
tier = "PU"
pu["Tier"] = tier
pu_merge = pok.merge(pu)
pu_merge

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Attack,Sp. Defense,Speed,Total,Average,Generation,Usage Rate,Tier
0,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,405,67.50,1,1.44871%,PU
1,5,Charmeleon,Fire,,58,64,58,80,65,80,405,67.50,1,0.14623%,PU
2,7,Squirtle,Water,,44,48,65,50,64,43,314,52.33,1,0.00001%,PU
3,8,Wartortle,Water,,59,63,80,65,80,58,405,67.50,1,1.70883%,PU
4,25,Pikachu,Electric,,35,55,40,50,50,90,320,53.33,1,6.07391%,PU
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
173,874,Stonjourner,Rock,,100,125,135,20,20,70,470,78.33,8,16.52269%,PU
174,878,Cufant,Steel,,72,80,49,40,49,40,330,55.00,8,0.11456%,PU
175,881,Arctozolt,Electric,Ice,90,100,90,90,80,55,505,84.17,8,0.36365%,PU
176,883,Arctovish,Water,Ice,90,90,100,80,90,55,505,84.17,8,0.60533%,PU


In [22]:
#creating a singular DataFrame with all tiers
new_df = pd.concat([ou, uu, ru, nu, pu], axis=0)
#print(len(new_df))

#filtering out the "Type:Null" Pokemon
new_df[~new_df.Name.str.contains("Type: Null")]
#print(len(new_df))

Unnamed: 0,Name,Usage Rate,Tier
0,Clefable,56.53422%,OU
1,Corviknight,31.22480%,OU
2,Toxapex,28.25718%,OU
3,Hippowdon,25.68515%,OU
4,Ferrothorn,25.17680%,OU
...,...,...,...
202,Wailmer,0.00001%,PU
203,Pidove,0.00001%,PU
204,Squirtle,0.00001%,PU
205,Wooloo,0.00001%,PU


In [23]:
#merging the Pokemon stats data with their usage rate and tier
merged = pok.merge(new_df)
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    display(merged)
  #  print(len(merged))

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Attack,Sp. Defense,Speed,Total,Average,Generation,Usage Rate,Tier
0,1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,318,53.0,1,0.00850%,OU
1,1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,318,53.0,1,0.00025%,RU
2,1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,318,53.0,1,0.00025%,NU
3,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,405,67.5,1,0.00020%,OU
4,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,405,67.5,1,0.04553%,UU
5,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,405,67.5,1,0.15707%,RU
6,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,405,67.5,1,1.87055%,NU
7,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,405,67.5,1,1.44871%,PU
8,3,Venusaur,Grass,Poison,80,82,83,100,100,80,525,87.5,1,1.85513%,OU
9,3,Venusaur,Grass,Poison,80,82,83,100,100,80,525,87.5,1,6.07078%,UU


In [28]:
#OU Graphs: Log of Usage Rates are taken because the values are very small
ou_hp = ou_merge["HP"]
ou_atk = ou_merge["Attack"]
ou_def = ou_merge["Defense"]
ou_spatk = ou_merge["Sp. Attack"]
ou_spdef = ou_merge["Sp. Defense"]
ou_spd = ou_merge["Speed"]

log_use = np.log(ou_merge["Usage Rate"])

plt.scatter(ou_hp,log_use)
plt.title("OU: HP vs. Log of Usage Rate")
plt.xlabel("HP Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(ou_atk,log_use)
plt.title("OU: Attack vs. Log of Usage Rate")
plt.xlabel("Attack Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(ou_def,log_use)
plt.title("OU: Defense vs. Log of Usage Rate")
plt.xlabel("Defense Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(ou_spatk,log_use)
plt.title("OU: Sp. Attack vs. Log of Usage Rate")
plt.xlabel("Sp. Attack Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(ou_spdef,log_use)
plt.title("OU: Sp. Defense vs. Log of Usage Rate")
plt.xlabel("Sp. Defense Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(ou_spd,log_use)
plt.title("OU: Speed vs. Log of Usage Rate")
plt.xlabel("Speed Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

TypeError: loop of ufunc does not support argument 0 of type str which has no callable log method

In [14]:
#UU Graphs: Log of Usage Rates are taken because the values are very small
uu_hp = uu_merge["HP"]
uu_atk = uu_merge["Attack"]
uu_def = uu_merge["Defense"]
uu_spatk = uu_merge["Sp. Attack"]
uu_spdef = uu_merge["Sp. Defense"]
uu_spd = uu_merge["Speed"]

log_use = np.log(uu_merge["Usage Rate"])

plt.scatter(uu_hp,log_use)
plt.title("UU: HP vs. Log of Usage Rate")
plt.xlabel("HP Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(uu_atk,log_use)
plt.title("UU: Attack vs. Log of Usage Rate")
plt.xlabel("Attack Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(uu_def,log_use)
plt.title("UU: Defense vs. Log of Usage Rate")
plt.xlabel("Defense Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(uu_spatk,log_use)
plt.title("UU: Sp. Attack vs. Log of Usage Rate")
plt.xlabel("Sp. Attack Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(uu_spdef,log_use)
plt.title("UU: Sp. Defense vs. Log of Usage Rate")
plt.xlabel("Sp. Defense Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(uu_spd,log_use)
plt.title("UU: Speed vs. Log of Usage Rate")
plt.xlabel("Speed Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

TypeError: loop of ufunc does not support argument 0 of type str which has no callable log method

In [26]:
#RU Graphs: Log of Usage Rates are taken because the values are very small
ru_hp = ru_merge["HP"]
ru_atk = ru_merge["Attack"]
ru_def = ru_merge["Defense"]
ru_spatk = ru_merge["Sp. Attack"]
ru_spdef = ru_merge["Sp. Defense"]
ru_spd = ru_merge["Speed"]

log_use = np.log(ru_merge["Usage Rate"])

plt.scatter(ru_hp,log_use)
plt.title("RU: HP vs. Log of Usage Rate")
plt.xlabel("HP Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(ru_atk,log_use)
plt.title("RU: Attack vs. Log of Usage Rate")
plt.xlabel("Attack Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(ru_def,log_use)
plt.title("RU: Defense vs. Log of Usage Rate")
plt.xlabel("Defense Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(ru_spatk,log_use)
plt.title("RU: Sp. Attack vs. Log of Usage Rate")
plt.xlabel("Sp. Attack Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(ru_spdef,log_use)
plt.title("RU: Sp. Defense vs. Log of Usage Rate")
plt.xlabel("Sp. Defense Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(ru_spd,log_use)
plt.title("RU: Speed vs. Log of Usage Rate")
plt.xlabel("Speed Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()


TypeError: loop of ufunc does not support argument 0 of type str which has no callable log method

In [29]:
#NU Graphs: Log of Usage Rates are taken because the values are very small
nu_hp = nu_merge["HP"]
nu_atk = nu_merge["Attack"]
nu_def = nu_merge["Defense"]
nu_spatk = nu_merge["Sp. Attack"]
nu_spdef = nu_merge["Sp. Defense"]
nu_spd = nu_merge["Speed"]

log_use = np.log(nu_merge["Usage Rate"])

plt.scatter(nu_hp,log_use)
plt.title("NU: HP vs. Log of Usage Rate")
plt.xlabel("HP Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(nu_atk,log_use)
plt.title("NU: Attack vs. Log of Usage Rate")
plt.xlabel("Attack Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(nu_def,log_use)
plt.title("NU: Defense vs. Log of Usage Rate")
plt.xlabel("Defense Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(nu_spatk,log_use)
plt.title("NU: Sp. Attack vs. Log of Usage Rate")
plt.xlabel("Sp. Attack Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(nu_spdef,log_use)
plt.title("NU: Sp. Defense vs. Log of Usage Rate")
plt.xlabel("Sp. Defense Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(nu_spd,log_use)
plt.title("NU: Speed vs. Log of Usage Rate")
plt.xlabel("Speed Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()
#NU Graphs: Log of Usage Rates are taken because the values are very small
nu_hp = nu_merge["HP"]
nu_atk = nu_merge["Attack"]
nu_def = nu_merge["Defense"]
nu_spatk = nu_merge["Sp. Attack"]
nu_spdef = nu_merge["Sp. Defense"]
nu_spd = nu_merge["Speed"]

log_use = np.log(nu_merge["Usage Rate"])

plt.scatter(nu_hp,log_use)
plt.title("NU: HP vs. Log of Usage Rate")
plt.xlabel("HP Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(nu_atk,log_use)
plt.title("NU: Attack vs. Log of Usage Rate")
plt.xlabel("Attack Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(nu_def,log_use)
plt.title("NU: Defense vs. Log of Usage Rate")
plt.xlabel("Defense Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(nu_spatk,log_use)
plt.title("NU: Sp. Attack vs. Log of Usage Rate")
plt.xlabel("Sp. Attack Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(nu_spdef,log_use)
plt.title("NU: Sp. Defense vs. Log of Usage Rate")
plt.xlabel("Sp. Defense Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(nu_spd,log_use)
plt.title("NU: Speed vs. Log of Usage Rate")
plt.xlabel("Speed Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()


TypeError: loop of ufunc does not support argument 0 of type str which has no callable log method

In [30]:
#PU Graphs: Log of Usage Rates are taken because the values are very small
pu_hp = pu_merge["HP"]
pu_atk = pu_merge["Attack"]
pu_def = pu_merge["Defense"]
pu_spatk = pu_merge["Sp. Attack"]
pu_spdef = pu_merge["Sp. Defense"]
pu_spd = pu_merge["Speed"]

log_use = np.log(pu_merge["Usage Rate"])

plt.scatter(pu_hp,log_use)
plt.title("PU: HP vs. Log of Usage Rate")
plt.xlabel("HP Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(pu_atk,log_use)
plt.title("PU: Attack vs. Log of Usage Rate")
plt.xlabel("Attack Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(pu_def,log_use)
plt.title("PU: Defense vs. Log of Usage Rate")
plt.xlabel("Defense Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(pu_spatk,log_use)
plt.title("PU: Sp. Attack vs. Log of Usage Rate")
plt.xlabel("Sp. Attack Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(pu_spdef,log_use)
plt.title("PU: Sp. Defense vs. Log of Usage Rate")
plt.xlabel("Sp. Defense Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()

plt.scatter(pu_spd,log_use)
plt.title("PU: Speed vs. Log of Usage Rate")
plt.xlabel("Speed Stat")
plt.ylabel("Usage Rate")
plt.xlim(0,)
plt.show()


TypeError: loop of ufunc does not support argument 0 of type str which has no callable log method

In [31]:
#Regression analysis to see the correlation between specific stats and usage rate within the OU tier
reg_hp = LinearRegression().fit(ou_merge[["HP"]], ou_merge[["Usage Rate"]])
print("The correlation coefficienct of HP and OU Usage Rate is", float(np.corrcoef(ou_hp, ou_merge["Usage Rate"])[0][1]))
print("The regression slope of HP and the OU Usage Rate is", float(reg_hp.coef_))
print("The Spearman Coefficient of HP and the OU Usage Rate is", spearmanr(ou_hp, ou_merge["Usage Rate"]).correlation)

reg_atk = LinearRegression().fit(ou_merge[["Attack"]], ou_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Attack and OU Usage Rate is", float(np.corrcoef(ou_atk, ou_merge["Usage Rate"])[0][1]))
print("The regression slope of Attack and the OU Usage Rate is", float(reg_atk.coef_))
print("The Spearman Coefficient of Attack and the OU Usage Rate is", spearmanr(ou_atk, ou_merge["Usage Rate"]).correlation)

reg_def = LinearRegression().fit(ou_merge[["Defense"]], ou_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Defense and OU Usage Rate is", float(np.corrcoef(ou_def, ou_merge["Usage Rate"])[0][1]))
print("The regression slope of Defense and the OU Usage Rate is", float(reg_def.coef_))
print("The Spearman Coefficient of Defense and the OU Usage Rate is", spearmanr(ou_def, ou_merge["Usage Rate"]).correlation)

reg_spatk = LinearRegression().fit(ou_merge[["Sp. Attack"]], ou_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Sp. Attack and OU Usage Rate is", float(np.corrcoef(ou_spatk, ou_merge["Usage Rate"])[0][1]))
print("The regression slope of Sp. Attack and the OU Usage Rate is", float(reg_spatk.coef_))
print("The Spearman Coefficient of Sp. Attack and the OU Usage Rate is", spearmanr(ou_spatk, ou_merge["Usage Rate"]).correlation)

reg_spdef = LinearRegression().fit(ou_merge[["Sp. Defense"]], ou_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Sp. Defense and OU Usage Rate is", float(np.corrcoef(ou_spdef, ou_merge["Usage Rate"])[0][1]))
print("The regression slope of Sp. Defense and the OU Usage Rate is", float(reg_spdef.coef_))
print("The Spearman Coefficient of Sp. Defense and the OU Usage Rate is", spearmanr(ou_spdef, ou_merge["Usage Rate"]).correlation)

reg_spd = LinearRegression().fit(ou_merge[["Speed"]], ou_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Speed and OU Usage Rate is", float(np.corrcoef(ou_spd, ou_merge["Usage Rate"])[0][1]))
print("The regression slope of Speed and the OU Usage Rate is", float(reg_spd.coef_))
print("The Spearman Coefficient of Speed and the OU Usage Rate is", spearmanr(ou_spd, ou_merge["Usage Rate"]).correlation)

ValueError: could not convert string to float: '0.00850%'

In [32]:
#Regression analysis to see the correlation between specific stats and usage rate within the UU tier
reg_hp = LinearRegression().fit(uu_merge[["HP"]], uu_merge[["Usage Rate"]])
print("The correlation coefficienct of HP and UU Usage Rate is", float(np.corrcoef(uu_hp, uu_merge["Usage Rate"])[0][1]))
print("The regression slope of HP and the UU Usage Rate is", float(reg_hp.coef_))
print("The Spearman Coefficient of HP and the UU Usage Rate is", spearmanr(uu_hp, uu_merge["Usage Rate"]).correlation)

reg_atk = LinearRegression().fit(uu_merge[["Attack"]], uu_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Attack and UU Usage Rate is", float(np.corrcoef(uu_atk, uu_merge["Usage Rate"])[0][1]))
print("The regression slope of Attack and the UU Usage Rate is", float(reg_atk.coef_))
print("The Spearman Coefficient of Attack and the UU Usage Rate is", spearmanr(uu_atk, uu_merge["Usage Rate"]).correlation)

reg_def = LinearRegression().fit(uu_merge[["Defense"]], uu_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Defense and UU Usage Rate is", float(np.corrcoef(uu_def, uu_merge["Usage Rate"])[0][1]))
print("The regression slope of Defense and the UU Usage Rate is", float(reg_def.coef_))
print("The Spearman Coefficient of Defense and the UU Usage Rate is", spearmanr(uu_def, uu_merge["Usage Rate"]).correlation)

reg_spatk = LinearRegression().fit(uu_merge[["Sp. Attack"]], uu_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Sp. Attack and UU Usage Rate is", float(np.corrcoef(uu_spatk, uu_merge["Usage Rate"])[0][1]))
print("The regression slope of Sp. Attack and the UU Usage Rate is", float(reg_spatk.coef_))
print("The Spearman Coefficient of Sp. Attack and the UU Usage Rate is", spearmanr(uu_spatk, uu_merge["Usage Rate"]).correlation)

reg_spdef = LinearRegression().fit(uu_merge[["Sp. Defense"]], uu_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Sp. Defense and UU Usage Rate is", float(np.corrcoef(uu_spdef, uu_merge["Usage Rate"])[0][1]))
print("The regression slope of Sp. Defense and the UU Usage Rate is", float(reg_spdef.coef_))
print("The Spearman Coefficient of Sp. Defense and the UU Usage Rate is", spearmanr(uu_spdef, uu_merge["Usage Rate"]).correlation)

reg_spd = LinearRegression().fit(uu_merge[["Speed"]], uu_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Speed and UU Usage Rate is", float(np.corrcoef(uu_spd, uu_merge["Usage Rate"])[0][1]))
print("The regression slope of Speed and the UU Usage Rate is", float(reg_spd.coef_))
print("The Spearman Coefficient of Speed and the UU Usage Rate is", spearmanr(uu_spd, uu_merge["Usage Rate"]).correlation)


ValueError: could not convert string to float: '0.04553%'

In [33]:
#Regression analysis to see the correlation between specific stats and usage rate within the RU tier
reg_hp = LinearRegression().fit(ru_merge[["HP"]], ru_merge[["Usage Rate"]])
print("The correlation coefficienct of HP and RU Usage Rate is", float(np.corrcoef(ru_hp, ru_merge["Usage Rate"])[0][1]))
print("The regression slope of HP and the RU Usage Rate is", float(reg_hp.coef_))
print("The Spearman Coefficient of HP and the RU Usage Rate is", spearmanr(ru_hp, ru_merge["Usage Rate"]).correlation)

reg_atk = LinearRegression().fit(ru_merge[["Attack"]], ru_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Attack and RU Usage Rate is", float(np.corrcoef(ru_atk, ru_merge["Usage Rate"])[0][1]))
print("The regression slope of Attack and the RU Usage Rate is", float(reg_atk.coef_))
print("The Spearman Coefficient of Attack and the RU Usage Rate is", spearmanr(ru_atk, ru_merge["Usage Rate"]).correlation)

reg_def = LinearRegression().fit(ru_merge[["Defense"]], ru_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Defense and RU Usage Rate is", float(np.corrcoef(ru_def, ru_merge["Usage Rate"])[0][1]))
print("The regression slope of Defense and the RU Usage Rate is", float(reg_def.coef_))
print("The Spearman Coefficient of Defense and the RU Usage Rate is", spearmanr(ru_def, ru_merge["Usage Rate"]).correlation)

reg_spatk = LinearRegression().fit(ru_merge[["Sp. Attack"]], ru_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Sp. Attack and RU Usage Rate is", float(np.corrcoef(ru_spatk, ru_merge["Usage Rate"])[0][1]))
print("The regression slope of Sp. Attack and the RU Usage Rate is", float(reg_spatk.coef_))
print("The Spearman Coefficient of Sp. Attack and the RU Usage Rate is", spearmanr(ru_spatk, ru_merge["Usage Rate"]).correlation)

reg_spdef = LinearRegression().fit(ru_merge[["Sp. Defense"]], ru_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Sp. Defense and RU Usage Rate is", float(np.corrcoef(ru_spdef, ru_merge["Usage Rate"])[0][1]))
print("The regression slope of Sp. Defense and the RU Usage Rate is", float(reg_spdef.coef_))
print("The Spearman Coefficient of Sp. Defense and the RU Usage Rate is", spearmanr(ru_spdef, ru_merge["Usage Rate"]).correlation)

reg_spd = LinearRegression().fit(ru_merge[["Speed"]], ru_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Speed and RU Usage Rate is", float(np.corrcoef(ru_spd, ru_merge["Usage Rate"])[0][1]))
print("The regression slope of Speed and the RU Usage Rate is", float(reg_spd.coef_))
print("The Spearman Coefficient of Speed and the RU Usage Rate is", spearmanr(ru_spd, ru_merge["Usage Rate"]).correlation)


ValueError: could not convert string to float: '0.00025%'

In [34]:
#Regression analysis to see the correlation between specific stats and usage rate within the NU tier
reg_hp = LinearRegression().fit(nu_merge[["HP"]], nu_merge[["Usage Rate"]])
print("The correlation coefficienct of HP and NU Usage Rate is", float(np.corrcoef(nu_hp, nu_merge["Usage Rate"])[0][1]))
print("The regression slope of HP and the NU Usage Rate is", float(reg_hp.coef_))
print("The Spearman Coefficient of HP and the NU Usage Rate is", spearmanr(nu_hp, nu_merge["Usage Rate"]).correlation)

reg_atk = LinearRegression().fit(nu_merge[["Attack"]], nu_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Attack and NU Usage Rate is", float(np.corrcoef(nu_atk, nu_merge["Usage Rate"])[0][1]))
print("The regression slope of Attack and the NU Usage Rate is", float(reg_atk.coef_))
print("The Spearman Coefficient of Attack and the NU Usage Rate is", spearmanr(nu_atk, nu_merge["Usage Rate"]).correlation)

reg_def = LinearRegression().fit(nu_merge[["Defense"]], nu_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Defense and NU Usage Rate is", float(np.corrcoef(nu_def, nu_merge["Usage Rate"])[0][1]))
print("The regression slope of Defense and the NU Usage Rate is", float(reg_def.coef_))
print("The Spearman Coefficient of Defense and the NU Usage Rate is", spearmanr(nu_def, nu_merge["Usage Rate"]).correlation)

reg_spatk = LinearRegression().fit(nu_merge[["Sp. Attack"]], nu_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Sp. Attack and NU Usage Rate is", float(np.corrcoef(nu_spatk, nu_merge["Usage Rate"])[0][1]))
print("The regression slope of Sp. Attack and the NU Usage Rate is", float(reg_spatk.coef_))
print("The Spearman Coefficient of Sp. Attack and the NU Usage Rate is", spearmanr(nu_spatk, nu_merge["Usage Rate"]).correlation)

reg_spdef = LinearRegression().fit(nu_merge[["Sp. Defense"]], nu_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Sp. Defense and NU Usage Rate is", float(np.corrcoef(nu_spdef, nu_merge["Usage Rate"])[0][1]))
print("The regression slope of Sp. Defense and the NU Usage Rate is", float(reg_spdef.coef_))
print("The Spearman Coefficient of Sp. Defense and the NU Usage Rate is", spearmanr(nu_spdef, nu_merge["Usage Rate"]).correlation)

reg_spd = LinearRegression().fit(nu_merge[["Speed"]], nu_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Speed and NU Usage Rate is", float(np.corrcoef(nu_spd, nu_merge["Usage Rate"])[0][1]))
print("The regression slope of Speed and the NU Usage Rate is", float(reg_spd.coef_))
print("The Spearman Coefficient of Speed and the NU Usage Rate is", spearmanr(nu_spd, nu_merge["Usage Rate"]).correlation)


ValueError: could not convert string to float: '0.00025%'

In [35]:
#Regression analysis to see the correlation between specific stats and usage rate within the PU tier
reg_hp = LinearRegression().fit(pu_merge[["HP"]], pu_merge[["Usage Rate"]])
print("The correlation coefficienct of HP and PU Usage Rate is", float(np.corrcoef(pu_hp, pu_merge["Usage Rate"])[0][1]))
print("The regression slope of HP and the PU Usage Rate is", float(reg_hp.coef_))
print("The Spearman Coefficient of HP and the PU Usage Rate is", spearmanr(pu_hp, pu_merge["Usage Rate"]).correlation)

reg_atk = LinearRegression().fit(pu_merge[["Attack"]], pu_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Attack and PU Usage Rate is", float(np.corrcoef(pu_atk, pu_merge["Usage Rate"])[0][1]))
print("The regression slope of Attack and the PU Usage Rate is", float(reg_atk.coef_))
print("The Spearman Coefficient of Attack and the PU Usage Rate is", spearmanr(pu_atk, pu_merge["Usage Rate"]).correlation)

reg_def = LinearRegression().fit(pu_merge[["Defense"]], pu_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Defense and PU Usage Rate is", float(np.corrcoef(pu_def, pu_merge["Usage Rate"])[0][1]))
print("The regression slope of Defense and the PU Usage Rate is", float(reg_def.coef_))
print("The Spearman Coefficient of Defense and the PU Usage Rate is", spearmanr(pu_def, pu_merge["Usage Rate"]).correlation)

reg_spatk = LinearRegression().fit(pu_merge[["Sp. Attack"]], pu_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Sp. Attack and PU Usage Rate is", float(np.corrcoef(pu_spatk, pu_merge["Usage Rate"])[0][1]))
print("The regression slope of Sp. Attack and the PU Usage Rate is", float(reg_spatk.coef_))
print("The Spearman Coefficient of Sp. Attack and the PU Usage Rate is", spearmanr(pu_spatk, pu_merge["Usage Rate"]).correlation)

reg_spdef = LinearRegression().fit(pu_merge[["Sp. Defense"]], pu_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Sp. Defense and PU Usage Rate is", float(np.corrcoef(pu_spdef, pu_merge["Usage Rate"])[0][1]))
print("The regression slope of Sp. Defense and the PU Usage Rate is", float(reg_spdef.coef_))
print("The Spearman Coefficient of Sp. Defense and the PU Usage Rate is", spearmanr(pu_spdef, pu_merge["Usage Rate"]).correlation)

reg_spd = LinearRegression().fit(pu_merge[["Speed"]], pu_merge[["Usage Rate"]])
print("\nThe correlation coefficienct of Speed and PU Usage Rate is", float(np.corrcoef(pu_spd, pu_merge["Usage Rate"])[0][1]))
print("The regression slope of Speed and the PU Usage Rate is", float(reg_spd.coef_))
print("The Spearman Coefficient of Speed and the PU Usage Rate is", spearmanr(pu_spd, pu_merge["Usage Rate"]).correlation)


ValueError: could not convert string to float: '1.44871%'