# Pokémon GO Data Vectorization

This notebook is designed to create dictionaries for vectorizing Pokémon attributes such as types, Fast Moves, and Charged Moves. The data sources include:
- `all_overall_rankings.csv` (for Pokémon types)
- `pokemon_stats_moves.json` (for Fast and Charged Moves)

The objective is to assign a unique numerical identifier to each attribute, which can be utilized in machine learning models.

## 1. Import Required Libraries

In [28]:
import pandas as pd
import json
import numpy as np

## 2. Load and Inspect Data

Let's load the data from both files and examine their structure.

In [29]:
# Load CSV data
csv_path = 'processed_data/all_overall_rankings.csv'
rankings_df = pd.read_csv(csv_path)

print("CSV Data Shape:", rankings_df.shape)

CSV Data Shape: (1046, 18)


In [30]:
# Load JSON data
json_path = 'Poke_stats_moves/raw_data/pokemon_stats_moves.json'
with open(json_path, 'r', encoding='utf-8') as f:
    moves_data = json.load(f)

print(f"JSON Data: {len(moves_data)} Lines")

JSON Data: 1046 Lines


## 3. Extract Unique Types from CSV

Let's extract all unique Pokémon types from the columns 'Type 1' and 'Type 2'.

In [31]:
# Extract unique types from CSV
type1_set = set(rankings_df['Type 1'].dropna().unique())
type2_set = set(rankings_df['Type 2'].dropna().unique())

# Combine all types and remove 'none'
all_types = type1_set.union(type2_set)
all_types.discard('none')  # Remove 'none' as it's not a real type

unique_types = sorted(list(all_types))

print(f"Total unique types found: {len(unique_types)}")

Total unique types found: 18


## 4. Extract Unique Fast and Charged Moves from JSON

Now let's extract all unique Fast Moves and Charged Moves from the JSON data.

In [32]:
# Initialize sets to collect unique moves
unique_fast_moves = set()
unique_charged_moves = set()

# Iterate through JSON data to collect all moves
for pokemon in moves_data:
    if pokemon and isinstance(pokemon, dict):
        # Collect fast moves
        if 'fast_moves' in pokemon and pokemon['fast_moves']:
            for move in pokemon['fast_moves']:
                if move and move.strip():  # Only add non-empty moves
                    unique_fast_moves.add(move.strip())
        
        # Collect charged moves
        if 'charged_moves' in pokemon and pokemon['charged_moves']:
            for move in pokemon['charged_moves']:
                if move and move.strip():  # Only add non-empty moves
                    unique_charged_moves.add(move.strip())
        
        # Collect recommended fast moves
        if 'recommended_fast_move' in pokemon and pokemon['recommended_fast_move']:
            move = pokemon['recommended_fast_move'].strip()
            if move and move != 'None':
                unique_fast_moves.add(move)
        
        # Collect recommended charged moves
        if 'recommended_charged_moves' in pokemon and pokemon['recommended_charged_moves']:
            for move in pokemon['recommended_charged_moves']:
                if move and move.strip():
                    unique_charged_moves.add(move.strip())

# Convert to sorted lists
unique_fast_moves = sorted(list(unique_fast_moves))
unique_charged_moves = sorted(list(unique_charged_moves))

print(f"Total unique fast moves found: {len(unique_fast_moves)}")
print(f"Total unique charged moves found: {len(unique_charged_moves)}")

Total unique fast moves found: 148
Total unique charged moves found: 271


In [33]:
# Also extract moves from CSV data to ensure we have all moves
csv_fast_moves = set(rankings_df['Fast Move'].dropna().unique())
csv_charged_moves_1 = set(rankings_df['Charged Move 1'].dropna().unique())
csv_charged_moves_2 = set(rankings_df['Charged Move 2'].dropna().unique())

# Combine with existing moves
unique_fast_moves_set = set(unique_fast_moves).union(csv_fast_moves)
unique_charged_moves_set = set(unique_charged_moves).union(csv_charged_moves_1).union(csv_charged_moves_2)

# Update the lists
unique_fast_moves = sorted(list(unique_fast_moves_set))
unique_charged_moves = sorted(list(unique_charged_moves_set))

print(f"After adding CSV moves:")
print(f"Total unique fast moves: {len(unique_fast_moves)}")
print(f"Total unique charged moves: {len(unique_charged_moves)}")


After adding CSV moves:
Total unique fast moves: 148
Total unique charged moves: 298


In [34]:
# Detailed Analysis: Which moves were added from the CSV file?

# Fast Moves that are only in the CSV (not in JSON)
json_fast_moves_set = set(unique_fast_moves) - csv_fast_moves
csv_only_fast_moves = csv_fast_moves - set(unique_fast_moves)

print("=== FAST MOVES ANALYSIS ===")
print(f"Number of Fast Moves only from JSON: {len(json_fast_moves_set)}")
print(f"Number of Fast Moves only from CSV: {len(csv_only_fast_moves)}")
print(f"Total number after combination: {len(unique_fast_moves_set)}")

if csv_only_fast_moves:
    print(f"\nFast Moves that ONLY appear in the CSV file ({len(csv_only_fast_moves)}):")
    for move in sorted(csv_only_fast_moves):
        print(f"  - {move}")
else:
    print("\nNo Fast Moves that only appear in the CSV file.")

# Charged Moves that are only in the CSV
json_charged_moves_set = set(unique_charged_moves) - csv_charged_moves_1.union(csv_charged_moves_2)
csv_only_charged_moves = csv_charged_moves_1.union(csv_charged_moves_2) - set(unique_charged_moves)

print("\n=== CHARGED MOVES ANALYSIS ===")
print(f"Number of Charged Moves only from JSON: {len(json_charged_moves_set)}")
print(f"Number of Charged Moves only from CSV: {len(csv_only_charged_moves)}")
print(f"Total number after combination: {len(unique_charged_moves_set)}")

if csv_only_charged_moves:
    print(f"\nCharged Moves that ONLY appear in the CSV file ({len(csv_only_charged_moves)}):")
    for move in sorted(csv_only_charged_moves):
        print(f"  - {move}")
else:
    print("\nNo Charged Moves that only appear in the CSV file.")

# Also show overlaps
fast_moves_overlap = csv_fast_moves.intersection(set(unique_fast_moves))
charged_moves_overlap = csv_charged_moves_1.union(csv_charged_moves_2).intersection(set(unique_charged_moves))

print(f"\n=== OVERLAPS ===")
print(f"Fast Moves in both sources: {len(fast_moves_overlap)}")
print(f"Charged Moves in both sources: {len(charged_moves_overlap)}")

=== FAST MOVES ANALYSIS ===
Number of Fast Moves only from JSON: 76
Number of Fast Moves only from CSV: 0
Total number after combination: 148

No Fast Moves that only appear in the CSV file.

=== CHARGED MOVES ANALYSIS ===
Number of Charged Moves only from JSON: 122
Number of Charged Moves only from CSV: 0
Total number after combination: 298

No Charged Moves that only appear in the CSV file.

=== OVERLAPS ===
Fast Moves in both sources: 72
Charged Moves in both sources: 176


## 5. Create Dictionaries for Types, Fast Moves, and Charged Moves

Now let's create dictionaries that assign each unique attribute a unique number.

In [35]:
# Create dictionaries mapping each unique value to a number

# Types dictionary (starting from 1, 0 reserved for 'none' or missing values)
type_to_number = {}
type_to_number['none'] = 0  # Reserve 0 for missing/none types
for i, poke_type in enumerate(unique_types):
    type_to_number[poke_type] = i + 1

# Fast moves dictionary (starting from 1, 0 reserved for missing values)
fast_move_to_number = {}
fast_move_to_number['none'] = 0  # Reserve 0 for missing moves
for i, move in enumerate(unique_fast_moves):
    fast_move_to_number[move] = i + 1

# Charged moves dictionary (starting from 1, 0 reserved for missing values)
charged_move_to_number = {}
charged_move_to_number['none'] = 0  # Reserve 0 for missing moves
for i, move in enumerate(unique_charged_moves):
    charged_move_to_number[move] = i + 1

print("=== VEKTORISIERUNGS-WÖRTERBÜCHER ===")
print(f"\n1. TYPES DICTIONARY ({len(type_to_number)} entries):")
print("Format: 'type_name': number")
for poke_type, number in sorted(type_to_number.items(), key=lambda x: x[1]):
    print(f"'{poke_type}': {number}")

print(f"\n2. FAST MOVES DICTIONARY ({len(fast_move_to_number)} entries):")
print("Format: 'move_name': number")
for move, number in sorted(fast_move_to_number.items(), key=lambda x: x[1]):
    print(f"'{move}': {number}")

print(f"\n3. CHARGED MOVES DICTIONARY ({len(charged_move_to_number)} entries):")
print("Format: 'move_name': number")
for move, number in sorted(charged_move_to_number.items(), key=lambda x: x[1]):
    print(f"'{move}': {number}")

=== VEKTORISIERUNGS-WÖRTERBÜCHER ===

1. TYPES DICTIONARY (19 entries):
Format: 'type_name': number
'none': 0
'bug': 1
'dark': 2
'dragon': 3
'electric': 4
'fairy': 5
'fighting': 6
'fire': 7
'flying': 8
'ghost': 9
'grass': 10
'ground': 11
'ice': 12
'normal': 13
'poison': 14
'psychic': 15
'rock': 16
'steel': 17
'water': 18

2. FAST MOVES DICTIONARY (149 entries):
Format: 'move_name': number
'none': 0
'Acid': 1
'Acid†': 2
'Air Slash': 3
'Astonish': 4
'Bite': 5
'Bite*': 6
'Bubble': 7
'Bug Bite': 8
'Bug Bite*': 9
'Bullet Punch': 10
'Bullet Seed': 11
'Bullet Seed*': 12
'Charge Beam': 13
'Charm': 14
'Confusion': 15
'Counter': 16
'Counter*': 17
'Cut': 18
'Cut*': 19
'Double Kick': 20
'Dragon Breath': 21
'Dragon Breath*': 22
'Dragon Tail': 23
'Dragon Tail*': 24
'Ember': 25
'Ember*': 26
'Extrasensory': 27
'Fairy Wind': 28
'Feint Attack': 29
'Fire Fang': 30
'Fire Spin': 31
'Force Palm': 32
'Force Palm*': 33
'Frost Breath': 34
'Frost Breath*': 35
'Fury Cutter': 36
'Fury Cutter*': 37
'Gust': 38
'Gus

## 6. Export Dictionaries to CSV Files

Let's save the dictionaries as CSV files for easy use in other tools.

In [36]:
# Save dictionaries as CSV files
import csv

# Function to save a dictionary as a CSV file
def save_dict_to_csv(dictionary, filename, key_header="Name", value_header="Number"):
    with open(filename, 'w', newline='', encoding='utf-8') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow([key_header, value_header])
        for key, value in dictionary.items():
            writer.writerow([key, value])

# Save the dictionaries
save_dict_to_csv(type_to_number, 'dictionarie/type_to_number.csv', 'Type', 'Number')
save_dict_to_csv(fast_move_to_number, 'dictionarie/fast_move_to_number.csv', 'Fast_Move', 'Number')
save_dict_to_csv(charged_move_to_number, 'dictionarie/charged_move_to_number.csv', 'Charged_Move', 'Number')