# Import Data

<br>
<br>

Below are brief explanations of what data we are importing and what these variables represent
* `ALL_DATA` _(data frame)_ - is our main data set 
* `poke_types` _(data frame)_ - is a seperate data set that contains a mapping of Pokemon IDs to pokemon name and types. This will be important as our main dependant variable will be **Pokemon type**
* `pokemonId_ALL` _(list/array)_ - of all unique pokemon IDs that exist in `ALL_DATA` from smallest to biggest number. 

<br>
<br>
<br>
<br>


In [16]:
import pandas as pd # data frames
import numpy as np # ____number generation
import statsmodels.formula.api as smf # for linear modeling
import matplotlib.pyplot as plt # plotting

In [17]:
ALL_DATA = pd.read_csv('data/300k.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [3]:
poke_types = pd.read_csv('data/pokeId.csv')
poke_types = poke_types[['#', "Name", "Type 1", "Type 2"]]

pokemonId_ALL = set(ALL_DATA.pokemonId)

<br>
<br>
<br>

## Small Helper Functions

In [7]:
def printAllFeatures():
    cols = ALL_DATA.columns
    for col in cols:
        print(col)

<br>
<br>
<br>

## Prepare Data

In [9]:
# Create a mapping of IDs to type. We need to do this in order to add the
# appropriate type to each pokemon in our main data set.

pokeId_toType = {}
for index, row in poke_types.iterrows():
    if row["#"] in pokemonId_ALL: #only add poke that are in main data set
        pokeId_toType[row["#"]] = [row["Name"] ,row["Type 1"], row["Type 2"]]
        
        
pokeId_toType

{1: ['Bulbasaur', 'Grass', 'Poison'],
 2: ['Ivysaur', 'Grass', 'Poison'],
 3: ['VenusaurMega Venusaur', 'Grass', 'Poison'],
 4: ['Charmander', 'Fire', nan],
 5: ['Charmeleon', 'Fire', nan],
 6: ['CharizardMega Charizard Y', 'Fire', 'Flying'],
 7: ['Squirtle', 'Water', nan],
 8: ['Wartortle', 'Water', nan],
 9: ['BlastoiseMega Blastoise', 'Water', nan],
 10: ['Caterpie', 'Bug', nan],
 11: ['Metapod', 'Bug', nan],
 12: ['Butterfree', 'Bug', 'Flying'],
 13: ['Weedle', 'Bug', 'Poison'],
 14: ['Kakuna', 'Bug', 'Poison'],
 15: ['BeedrillMega Beedrill', 'Bug', 'Poison'],
 16: ['Pidgey', 'Normal', 'Flying'],
 17: ['Pidgeotto', 'Normal', 'Flying'],
 18: ['PidgeotMega Pidgeot', 'Normal', 'Flying'],
 19: ['Rattata', 'Normal', nan],
 20: ['Raticate', 'Normal', nan],
 21: ['Spearow', 'Normal', 'Flying'],
 22: ['Fearow', 'Normal', 'Flying'],
 23: ['Ekans', 'Poison', nan],
 24: ['Arbok', 'Poison', nan],
 25: ['Pikachu', 'Electric', nan],
 26: ['Raichu', 'Electric', nan],
 27: ['Sandshrew', 'Ground', 

In [29]:
# Check if you have the merged data set. If you do not then build and save it locally.
# The print statements do a good job of informing what's going on.

import os.path
if not os.path.exists("data/merged.csv"): 
    
    print("Hey looks like you're missing the merged data, let me build that for you, it'll take 2-5 minutes probably.\n")
    
    # Add the new columns
    ALL_DATA["Name"] = ""
    ALL_DATA["Type"] = "" 

    def update_row(row):
        tempTypes = pokeId_toType[row["pokemonId"]]   
        listy = [tempTypes[0], tempTypes[1]]
        return pd.Series(listy)

    print("Merging data...")
    ALL_DATA[['Name', 'Type']] = ALL_DATA.apply(update_row, axis=1)
    print("Done merging data.\n")

    print("Saving merge data set to ./data/merged.csv")
    ALL_DATA.to_csv("data/merged.csv",index=False)
    print("\n...Done saving data. Move on now.\n")
    ALL_DATA.head()
else:
    print("Good job, you already have the merged data. Move on.")

Good job, you already have the merged data.


<br>
<br>
<br>

## Correlations
Visualize the correlations of features to our dependant variable: pokemon type.

In [50]:
printAllFeatures()

pokemonId
latitude
longitude
appearedLocalTime
_id
cellId_90m
cellId_180m
cellId_370m
cellId_730m
cellId_1460m
cellId_2920m
cellId_5850m
appearedTimeOfDay
appearedHour
appearedMinute
appearedDayOfWeek
appearedDay
appearedMonth
appearedYear
terrainType
closeToWater
city
continent
weather
temperature
windSpeed
windBearing
pressure
weatherIcon
sunriseMinutesMidnight
sunriseHour
sunriseMinute
sunriseMinutesSince
sunsetMinutesMidnight
sunsetHour
sunsetMinute
sunsetMinutesBefore
population_density
urban
suburban
midurban
rural
gymDistanceKm
gymIn100m
gymIn250m
gymIn500m
gymIn1000m
gymIn2500m
gymIn5000m
pokestopDistanceKm
pokestopIn100m
pokestopIn250m
pokestopIn500m
pokestopIn1000m
pokestopIn2500m
pokestopIn5000m
cooc_1
cooc_2
cooc_3
cooc_4
cooc_5
cooc_6
cooc_7
cooc_8
cooc_9
cooc_10
cooc_11
cooc_12
cooc_13
cooc_14
cooc_15
cooc_16
cooc_17
cooc_18
cooc_19
cooc_20
cooc_21
cooc_22
cooc_23
cooc_24
cooc_25
cooc_26
cooc_27
cooc_28
cooc_29
cooc_30
cooc_31
cooc_32
cooc_33
cooc_34
cooc_35
cooc_36
cooc_

In [40]:
from matplotlib.pyplot import figure

correlation = ALL_DATA.corr() # .drop(['Type'], axis=1)

In [49]:
corr_sorted = correlation.sort_values(by=["Name"])
features = list(correlation.columns.values)
corr_nums = list(correlation.total_cases)

figure(figsize=(10,10))
plt.barh(features, corr_nums, align='center', alpha=0.5)
plt.yticks(features, features)
plt.xlabel('Correlation')
plt.title('Correlation of environmental variables to total cases of Dengue')

plt.show()

KeyError: 'Name'

<br>
<br>
<br>

### Pokemon Types based on weather features

In [None]:
col_pokeID = ["pokemonId"]
cols_time = ["appearedTimeOfDay", "appearedHour", "appearedMinute", "appearedDayOfWeek", "appearedDay", "appearedYear"]
cols_weather = ["weather", "windSpeed", "windBearing", "pressure", "weatherIcon", "sunriseMinutesMidnight", "sunriseHour", "sunriseMinute", "sunriseMinutesSince", "sunsetMinutesMidnight", "sunsetHour", "sunsetMinute", "sunsetMinutesBefore"]

cols = col_pokeID + cols_weather
geo_weather_data = ALL_DATA[cols]
geo_weather_data