# Removing Columns
After unzipping the battle dataset from the first `ipynb` file.  
We can now start preprocessing the data.

We're going to remove the columns that have are identifiers and unneeded stats.
These columns will have no effect for the final product.

<sub><sup>Please note that this notebook was made with the help of ChatGPT</sup></sub>

In [1]:
%pip install pandas
import pandas as pd

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [4]:
file_path = "data/out/battle_dataset/battlesStaging_12282020_WL_tagged.csv"

We are going to load the dataset into `pandas` so we can easily preprocess it.

In [5]:
data = pd.read_csv(file_path)
print("We have successfully loaded the data!")

Here is the code to get the columns for `columns_to_remove`.

In [6]:
columns_to_remove = [
    "battleTime", # This is a timestamp, which is not useful for the model
    "tournamentTag", # All battles in this dataset are ladder battles
    "arena.id", # All arenas in this dataset are ranked arenas
    "gameMode.id", # All gamemodes in this dataset are ladder battles
]

# Go through each team and add the columns to the columns_to_remove list
teams = ["loser", "winner"]
for team in teams:
    columns = [
        "{}.tag".format(team), # This is a unique identifier for each player (not needed)
        
        "{}.startingTrophies".format(team), # This is not needed since we have the average.startingTrophies
        "{}.trophyChange".format(team),
        "{}.clan.tag".format(team),
        "{}.clan.badgeId".format(team),

        "{}.kingTowerHitPoints".format(team),
        "{}.princessTowersHitPoints".format(team),

        "{}.cards.list".format(team),

        "{}.totalcard.level".format(team),
        "{}.troop.count".format(team),
        "{}.structure.count".format(team),
        "{}.spell.count".format(team),
        "{}.common.count".format(team),
        "{}.rare.count".format(team),
        "{}.epic.count".format(team),
        "{}.legendary.count".format(team),
        "{}.elixir.average".format(team),
    ]
    columns_to_remove.append(
        columns
    )
print("We have successfully created the columns_to_remove list!")

Afterwards, we drop all the columns in `columns_to_remove`.

In [7]:
data.rename({"Unnamed: 0":"index"}, axis="columns", inplace=True) # There is an unnamed column that is the index
data.drop(["index"], axis=1, inplace=True)

In [8]:
for column in columns_to_remove:
    data.drop(column, inplace=True, axis=1)
print("We have successfully removed all the columns in `columns_to_remove`!")


We have successfully removed all the columns in `columns_to_remove`!


We are going to replace the old battle dataset to save space.

In [9]:
data.to_csv(file_path, index=False)
print("We have successfully saved the data!")

Now the columns which remain are:
- average.startingTrophies
- winner.crowns
- loser.crowns
- winner.card#.id
- winner.card#.level
- loser.card#.id
- loser.card#.level

(# is 1-8)

We can now proceed to the next `ipynb` file (3).