In [1]:
!pip install --upgrade mlxtend

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
from google.colab import drive
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import fpgrowth, association_rules
from mlxtend.preprocessing import TransactionEncoder
import pandas as pd

In [3]:
# Load cleaned data
drive.mount('/content/drive')
path = "/content/drive/MyDrive/Colab Notebooks/Pokémon Data Mining/clean_pokemon_data.csv"
pokemon_data= pd.read_csv(path, index_col=0)

  and should_run_async(code)


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [4]:
print(pokemon_data.columns)

Index(['name', 'german_name', 'japanese_name', 'generation', 'status',
       'species', 'type_number', 'type_1', 'type_2', 'height_m', 'weight_kg',
       'abilities_number', 'ability_1', 'ability_2', 'ability_hidden',
       'total_points', 'hp', 'attack', 'defense', 'sp_attack', 'sp_defense',
       'speed', 'catch_rate', 'base_friendship', 'base_experience',
       'growth_rate', 'egg_type_number', 'egg_type_1', 'egg_type_2',
       'percentage_male', 'egg_cycles', 'against_normal', 'against_fire',
       'against_water', 'against_electric', 'against_grass', 'against_ice',
       'against_fight', 'against_poison', 'against_ground', 'against_flying',
       'against_psychic', 'against_bug', 'against_rock', 'against_ghost',
       'against_dragon', 'against_dark', 'against_steel', 'against_fairy',
       'total_points_bins', 'individual_points_sum'],
      dtype='object')


  and should_run_async(code)


In [5]:
# Concatenate abilities from different columns into a list
pokemon_data['abilities'] = pokemon_data[['ability_1', 'ability_2', 'ability_hidden']].values.tolist()

# Drop original ability columns
pokemon_data = pokemon_data.drop(columns=['ability_1', 'ability_2', 'ability_hidden'])

# Convert DataFrame to a list of lists for Apriori
abilities_data = pokemon_data['abilities'].tolist()

# Convert list of lists into a TransactionEncoder
te = TransactionEncoder()
te_ary = te.fit(abilities_data).transform(abilities_data)

  and should_run_async(code)


In [6]:
# Turn encoded data into DataFrame
abilities_df = pd.DataFrame(te_ary, columns=te.columns_)

# Run Apriori algorithm
frequent_itemsets = apriori(abilities_df, min_support=0.01, use_colnames=True)

# Generate association rules
rules = association_rules(frequent_itemsets, metric='lift', min_threshold=1)

# Display rules sorted by lift
rules = rules.sort_values(by='lift', ascending=False)

print(rules)


      antecedents    consequents  antecedent support  consequent support  \
11    (Rock Head)       (Sturdy)            0.020096            0.039234   
10       (Sturdy)    (Rock Head)            0.039234            0.020096   
0   (Beast Boost)      (Unknown)            0.010526            0.508134   
1       (Unknown)  (Beast Boost)            0.508134            0.010526   
2         (Blaze)      (Unknown)            0.024880            0.508134   
3       (Unknown)        (Blaze)            0.508134            0.024880   
6      (Overgrow)      (Unknown)            0.024880            0.508134   
7       (Unknown)     (Overgrow)            0.508134            0.024880   
14      (Torrent)      (Unknown)            0.024880            0.508134   
15      (Unknown)      (Torrent)            0.508134            0.024880   
4      (Levitate)      (Unknown)            0.039234            0.508134   
5       (Unknown)     (Levitate)            0.508134            0.039234   
8      (Pres

  and should_run_async(code)


This list of association rules, are implications of the form "if antecedents then consequents". For each rule, the table provides several statistics:

1. Rule `(Rock Head) => (Sturdy)`: This rule has a support of 0.010526, meaning that the combination of Rock Head and Sturdy occurs in about 1.05% of all Pokémon. The confidence is 0.523810, indicating that out of all Pokémon that have Rock Head, 52.38% also have Sturdy. The lift is 13.35, which is greater than 1, meaning that the presence of Rock Head has a positive effect on the presence of Sturdy. This is a strong rule since the lift value is high.

2. Rule `(Beast Boost) => (Unknown)`: The support is 0.010526 (1.05% of all Pokémon have both Beast Boost and an Unknown ability), the confidence is 1 (meaning that all Pokémon with Beast Boost also have Unknown abilities), and the lift is 1.967985 (indicating that Beast Boost has a positive effect on the presence of an Unknown ability). The high confidence and lift suggest a strong association.

3. Rule `(Blaze) => (Unknown)`: This rule has the same support, confidence, and lift as the previous rule because the Unknown ability appears to be a common consequence.

4. Rule `(Levitate) => (Unknown)`: This rule has a slightly lower support (0.034450), but the confidence is very high (0.878049), and the lift is 1.727987. This suggests a strong association, though Levitate appears less frequently than Blaze or Beast Boost.

In general, the rules with 'Unknown' as the consequent have high confidence (1.0) but lower lift values. This is because the 'Unknown' category is very common in the data. The rule `(Rock Head) => (Sturdy)` is an interesting finding, because even though both these abilities are not very common, they appear together more often than would be expected by chance.

These rules do not imply causation. For instance, having the Rock Head ability doesn't cause a Pokémon to have the Sturdy ability. Instead, these rules highlight patterns in the data: Pokémon that have one of these abilities often have the other.

The 'conviction' column can be interpreted as the ratio of the expected frequency that X occurs without Y (that is, the frequency that the rule makes an incorrect prediction) if X and Y were independent divided by the observed frequency of incorrect predictions. In this case, 'inf' values are due to the confidence being 1, and therefore, there are no observed incorrect predictions.

The 'leverage' column calculates the difference between the observed frequency of X and Y appearing together and the frequency that would be expected if X and Y were independent. An leverage value of 0 indicates independence.

In [7]:
# Filter out rows where 'abilities' list contains 'Unknown'
pokemon_data = pokemon_data[~pokemon_data['abilities'].apply(lambda x: 'Unknown' in x)]

# Convert DataFrame to list of lists for Apriori
abilities_data = pokemon_data['abilities'].tolist()

# Convert list of lists into TransactionEncoder
te = TransactionEncoder()
te_ary = te.fit(abilities_data).transform(abilities_data)

# Turn encoded data into DataFrame
abilities_df = pd.DataFrame(te_ary, columns=te.columns_)

# Run Apriori algorithm
frequent_itemsets = apriori(abilities_df, min_support=0.01, use_colnames=True)

# Generate association rules
rules = association_rules(frequent_itemsets, metric='lift', min_threshold=1)

# Display the rules sorted by lift
rules = rules.sort_values(by='lift', ascending=False)

print(rules)

  and should_run_async(code)


           antecedents         consequents  antecedent support  \
9         (Flame Body)        (Flash Fire)            0.021401   
8         (Flash Fire)        (Flame Body)            0.029183   
42     (Frisk, Pickup)          (Insomnia)            0.015564   
43          (Insomnia)     (Frisk, Pickup)            0.036965   
38       (Synchronize)         (Telepathy)            0.027237   
39         (Telepathy)       (Synchronize)            0.025292   
41  (Insomnia, Pickup)             (Frisk)            0.015564   
44             (Frisk)  (Insomnia, Pickup)            0.052529   
21       (Magnet Pull)            (Sturdy)            0.015564   
20            (Sturdy)       (Magnet Pull)            0.062257   
40   (Insomnia, Frisk)            (Pickup)            0.019455   
45            (Pickup)   (Insomnia, Frisk)            0.050584   
3         (Leaf Guard)       (Chlorophyll)            0.025292   
2        (Chlorophyll)        (Leaf Guard)            0.040856   
7         

In [8]:
# Run FP-Growth algorithm
frequent_itemsets_fp = fpgrowth(abilities_df, min_support=0.01, use_colnames=True)

# Generate association rules
rules_fp = association_rules(frequent_itemsets_fp, metric='lift', min_threshold=1)

# Display rules sorted by lift
rules_fp = rules_fp.sort_values(by='lift', ascending=False)

print(rules_fp)

           antecedents         consequents  antecedent support  \
21        (Flame Body)        (Flash Fire)            0.021401   
20        (Flash Fire)        (Flame Body)            0.029183   
34     (Frisk, Pickup)          (Insomnia)            0.015564   
35          (Insomnia)     (Frisk, Pickup)            0.036965   
44       (Synchronize)         (Telepathy)            0.027237   
45         (Telepathy)       (Synchronize)            0.025292   
33  (Insomnia, Pickup)             (Frisk)            0.015564   
36             (Frisk)  (Insomnia, Pickup)            0.052529   
19       (Magnet Pull)            (Sturdy)            0.015564   
18            (Sturdy)       (Magnet Pull)            0.062257   
32   (Insomnia, Frisk)            (Pickup)            0.019455   
37            (Pickup)   (Insomnia, Frisk)            0.050584   
38       (Chlorophyll)        (Leaf Guard)            0.040856   
39        (Leaf Guard)       (Chlorophyll)            0.025292   
11        

  and should_run_async(code)
