# Pokémon GO Battle Rankings Vectorization

This notebook transforms the Pokémon GO battle rankings data by converting text-based values (such as move names and Pokémon types) into numeric IDs using dictionary mapping files. This vectorization is useful for machine learning applications and data analysis.

## 1. Import Required Libraries

First, we'll import the necessary libraries for our data processing task:

In [38]:
# Import the required libraries
import pandas as pd
import os

## 2. Define File Paths

Now we'll set up the paths for our data files and directories:

In [39]:
# Define paths
data_dir = os.path.dirname(os.path.abspath("__file__"))  # Current directory
dict_dir = os.path.join(data_dir, 'dictionarie')
processed_dir = os.path.join(data_dir, 'processed_data')
vectorized_dir = os.path.join(data_dir, 'vectorized_data')

# Print paths to verify
print(f"Data directory: {data_dir}")
print(f"Dictionary directory: {dict_dir}")
print(f"Processed data directory: {processed_dir}")
print(f"Vectorized data directory: {vectorized_dir}")

Data directory: c:\Users\maxva\Documents\GitHub\PokemonGOBattleAssistant\data_acquisition
Dictionary directory: c:\Users\maxva\Documents\GitHub\PokemonGOBattleAssistant\data_acquisition\dictionarie
Processed data directory: c:\Users\maxva\Documents\GitHub\PokemonGOBattleAssistant\data_acquisition\processed_data
Vectorized data directory: c:\Users\maxva\Documents\GitHub\PokemonGOBattleAssistant\data_acquisition\vectorized_data


## 3. Create Output Directory

We'll check if the vectorized_data directory exists and create it if it doesn't:

In [40]:
# Create vectorized_data directory if it doesn't exist
if not os.path.exists(vectorized_dir):
    os.makedirs(vectorized_dir)
    print(f"Created directory: {vectorized_dir}")
else:
    print(f"Directory already exists: {vectorized_dir}")

Directory already exists: c:\Users\maxva\Documents\GitHub\PokemonGOBattleAssistant\data_acquisition\vectorized_data


## 4. Load Mapping Dictionaries

Now we'll load the CSV files that contain mappings from text values to numeric IDs:

In [41]:
# Load dictionaries
fast_move_dict = pd.read_csv(os.path.join(dict_dir, 'fast_move_to_number.csv'))
charged_move_dict = pd.read_csv(os.path.join(dict_dir, 'charged_move_to_number.csv'))
type_dict = pd.read_csv(os.path.join(dict_dir, 'type_to_number.csv'))

# Display the first few rows of each dictionary
print("Fast Move Dictionary:")
display(fast_move_dict.head())

print("\nCharged Move Dictionary:")
display(charged_move_dict.head())

print("\nType Dictionary:")
display(type_dict.head())

Fast Move Dictionary:


Unnamed: 0,Fast_Move,Number
0,none,0
1,Acid,1
2,Acid†,2
3,Air Slash,3
4,Astonish,4



Charged Move Dictionary:


Unnamed: 0,Charged_Move,Number
0,none,0
1,Acid Spray,1
2,Acrobatics,2
3,Acrobatics*,3
4,Aerial Ace,4



Type Dictionary:


Unnamed: 0,Type,Number
0,none,0
1,bug,1
2,dark,2
3,dragon,3
4,electric,4


## 5. Create Lookup Tables

We'll convert the DataFrame dictionaries into Python dictionaries for faster lookups:

In [42]:
# Convert dictionaries to lookup tables
fast_move_lookup = dict(zip(fast_move_dict['Fast_Move'], fast_move_dict['Number']))
charged_move_lookup = dict(zip(charged_move_dict['Charged_Move'], charged_move_dict['Number']))
type_lookup = dict(zip(type_dict['Type'], type_dict['Number']))

# Show a few examples from each lookup table
print("Fast Move Lookup Examples:")
for move, number in list(fast_move_lookup.items())[:5]:
    print(f"  {move} -> {number}")

print("\nCharged Move Lookup Examples:")
for move, number in list(charged_move_lookup.items())[:5]:
    print(f"  {move} -> {number}")

print("\nType Lookup Examples:")
for type_name, number in list(type_lookup.items())[:5]:
    print(f"  {type_name} -> {number}")

Fast Move Lookup Examples:
  none -> 0
  Acid -> 1
  Acid† -> 2
  Air Slash -> 3
  Astonish -> 4

Charged Move Lookup Examples:
  none -> 0
  Acid Spray -> 1
  Acrobatics -> 2
  Acrobatics* -> 3
  Aerial Ace -> 4

Type Lookup Examples:
  none -> 0
  bug -> 1
  dark -> 2
  dragon -> 3
  electric -> 4


## 6. Load Rankings Data

Now we'll load the Pokémon GO battle rankings data that needs to be vectorized:

In [43]:
# Load the rankings data
rankings_data = pd.read_csv(os.path.join(processed_dir, 'all_overall_rankings.csv'))

# Display basic information about the dataset
print(f"Dataset shape: {rankings_data.shape}")
print("\nData columns:")
for col in rankings_data.columns:
    print(f"  {col}")

# Display the first few rows
print("\nFirst few rows of the rankings data:")
display(rankings_data.head())

Dataset shape: (1046, 18)

Data columns:
  Pokemon
  Score
  Dex
  Type 1
  Type 2
  Attack
  Defense
  Stamina
  Stat Product
  Level
  CP
  Fast Move
  Charged Move 1
  Charged Move 2
  Charged Move 1 Count
  Charged Move 2 Count
  Buddy Distance
  Charged Move Cost

First few rows of the rankings data:


Unnamed: 0,Pokemon,Score,Dex,Type 1,Type 2,Attack,Defense,Stamina,Stat Product,Level,CP,Fast Move,Charged Move 1,Charged Move 2,Charged Move 1 Count,Charged Move 2 Count,Buddy Distance,Charged Move Cost
0,Clodsire,94.4,980,poison,ground,94.2,119.4,209,2352152,29.0,1490,Poison Sting,Sludge Bomb,Earthquake,6,8,3,50000
1,Diggersby,93.6,660,normal,ground,96.3,141.2,171,2324733,48.0,1496,Quick Attack,Fire Punch,Scorching Sands,5,7,1,10000
2,Forretress,93.6,205,bug,steel,110.2,141.6,130,2028758,25.0,1496,Bug Bite,Sand Tomb,Rock Tomb,14,17,5,75000
3,Lapras,92.9,131,water,ice,104.7,114.0,179,2135530,21.5,1497,Psywave,Sparkling Aria,Ice Beam,12,14,5,75000
4,Jellicent,92.7,593,water,ghost,106.7,124.3,157,2082451,24.0,1490,Hex,Surf,Shadow Ball,4,5,3,50000


## 7. Vectorize Categorical Columns

Now we'll replace the text values with their corresponding numeric IDs:

In [44]:
# Create a copy for vectorized data
vectorized_data = rankings_data.copy()

# Replace Fast Move with corresponding number
vectorized_data['Fast Move'] = vectorized_data['Fast Move'].map(fast_move_lookup)

# Replace Charged Moves with corresponding numbers
vectorized_data['Charged Move 1'] = vectorized_data['Charged Move 1'].map(charged_move_lookup)
vectorized_data['Charged Move 2'] = vectorized_data['Charged Move 2'].map(charged_move_lookup)

# Replace Type 1 and Type 2 with corresponding numbers
vectorized_data['Type 1'] = vectorized_data['Type 1'].map(type_lookup)
vectorized_data['Type 2'] = vectorized_data['Type 2'].map(type_lookup)

# Display the first few rows of the vectorized data
print("First few rows of the vectorized data:")
display(vectorized_data.head())

First few rows of the vectorized data:


Unnamed: 0,Pokemon,Score,Dex,Type 1,Type 2,Attack,Defense,Stamina,Stat Product,Level,CP,Fast Move,Charged Move 1,Charged Move 2,Charged Move 1 Count,Charged Move 2 Count,Buddy Distance,Charged Move Cost
0,Clodsire,94.4,980,14,11,94.2,119.4,209,2352152,29.0,1490,99,245,84.0,6,8,3,50000
1,Diggersby,93.6,660,13,11,96.3,141.2,171,2324733,48.0,1496,106,92,226.0,5,7,1,10000
2,Forretress,93.6,205,1,17,110.2,141.6,130,2028758,25.0,1496,8,223,218.0,14,17,5,75000
3,Lapras,92.9,131,18,12,104.7,114.0,179,2135530,21.5,1497,105,249,131.0,12,14,5,75000
4,Jellicent,92.7,593,18,9,106.7,124.3,157,2082451,24.0,1490,40,260,230.0,4,5,3,50000


In [45]:
# Check for any missing mappings (NaN values)
print("Checking for missing mappings:")
for column in ['Fast Move', 'Charged Move 1', 'Charged Move 2', 'Type 1', 'Type 2']:
    missing = vectorized_data[column].isna().sum()
    if missing > 0:
        print(f"  {column}: {missing} missing values")
        # Show examples of missing values
        missing_values = rankings_data.loc[vectorized_data[column].isna(), column].unique()
        print(f"  Examples of missing values: {missing_values[:5]}")
    else:
        print(f"  {column}: No missing values")

Checking for missing mappings:
  Fast Move: No missing values
  Charged Move 1: No missing values
  Charged Move 2: 1 missing values
  Examples of missing values: [nan]
  Type 1: No missing values
  Type 2: No missing values


## 8. Save Vectorized Data

Finally, we'll save the vectorized data to a CSV file:

In [46]:
# Save the vectorized data
output_path = os.path.join(vectorized_dir, 'all_overall_rankings_vectorized.csv')
vectorized_data.to_csv(output_path, index=False)

print(f"Vectorized data saved to {output_path}")

# Verify the saved file
print(f"File size: {os.path.getsize(output_path) / 1024:.2f} KB")
print(f"Number of rows: {len(vectorized_data)}")

Vectorized data saved to c:\Users\maxva\Documents\GitHub\PokemonGOBattleAssistant\data_acquisition\vectorized_data\all_overall_rankings_vectorized.csv
File size: 87.70 KB
Number of rows: 1046
