# Cleaning Up Card Data
### Audrey Lai

In preparation for game development, cleaning and standardizing the data is necessary for it to be usable in the game. Many games, especially when memory was restrictive, would store all related game data in the same file (i.e. character data could include information on the main characters, non-playable characters, and enemies). In this game, our data were descriptions of cards (think Hearthstone). We had two types of cards: "character" cards and "skill" cards. The character cards were further divided into "boss" and "minion" cards, where the "boss" card must be destroyed to end the game while the "minion" cards were expendable. Additionally, each card has an "alignment" of Hero, Enemy, and Neutral.

In [1]:
# Import statements
import pandas as pd  # to read in the csv
import numpy as np   # to find the unique values

In [2]:
# Read in the card data
original_card_data = pd.read_csv("Cards - All_Cards.csv")

# First 5 rows of the data
original_card_data.head()

Unnamed: 0,Name,Alignment,Img Name,Description,:,Type,::,Rank,Cost to Use,Cost to Move,Attack/CP,Health,Defense,:::,Effect
0,Knight,Hero,knight.png,A trained soldier destined for greatness.,:,0,:,1,0,1,8,50,3,:,
1,Troll,enemy,troll.png,Notorious for leading groups of monsters on ra...,:,0,:,1,0,1,11,40,1,:,
2,Alp,Enemy,alp.png,A demon that sits on sleepers with a crushing ...,:,Minion,:,1,1,1,4,20,0,:,
3,Angel,hero,angel.png,"The symbol of goodness itself, it grants the b...",:,1,:,1,1,1,5,15,0,:,
4,Brag,Enemy,brag.png,"Beware this false mount, for it finds delight ...",:,1,:,1,1,1,5,10,0,:,


## Dealing with redundant categories

Lots of information there, but with different people working on the dataset, it can get difficult to actually put everything into nice categories. For example, we want to see how many of each alignment (Hero, Enemy, Neutral) we have in order to ensure each side is balanced. Normally, if we just categorized based on the text in the column without processing, we get this:

In [3]:
alignments = list(original_card_data["Alignment"])
for align in np.unique(alignments):
    print("{}:   \t{} cards".format(align, alignments.count(align)))
#original_card_data.groupby("Alignment")["Name"].nunique()

Enemy:   	11 cards
Hero:   	11 cards
Neutral:   	9 cards
enemy:   	3 cards
hero:   	3 cards
neutral:   	3 cards


We aren't getting the actual full counts of each category. Based on how people decided to input it, we get different categories for the same thing! This is where cleaning up comes in handy. If we just changed all the words to be of the same case, we'll be able to count them correctly.

In [4]:
# "save" the data in a new variable; for now, this is more symbolic than anything
changed_card_data = original_card_data

# set all of the entries in the Alignment column to title case (i.e. Hero instead of HERO or hero)
changed_card_data["Alignment"] = changed_card_data["Alignment"].str.title()

# recalculate the counts
changed_card_data.groupby("Alignment")["Name"].nunique()

Alignment
Enemy      14
Hero       14
Neutral    12
Name: Name, dtype: int64

In [5]:
original_card_data.head()

Unnamed: 0,Name,Alignment,Img Name,Description,:,Type,::,Rank,Cost to Use,Cost to Move,Attack/CP,Health,Defense,:::,Effect
0,Knight,Hero,knight.png,A trained soldier destined for greatness.,:,0,:,1,0,1,8,50,3,:,
1,Troll,Enemy,troll.png,Notorious for leading groups of monsters on ra...,:,0,:,1,0,1,11,40,1,:,
2,Alp,Enemy,alp.png,A demon that sits on sleepers with a crushing ...,:,Minion,:,1,1,1,4,20,0,:,
3,Angel,Hero,angel.png,"The symbol of goodness itself, it grants the b...",:,1,:,1,1,1,5,15,0,:,
4,Brag,Enemy,brag.png,"Beware this false mount, for it finds delight ...",:,1,:,1,1,1,5,10,0,:,


## Replacing non-descript values with descript ones

You'll also notice that the Type column seems to have a bunch of random information: 0s, 1s, Minion, etc. If we look at the column as a whole, we'll see that there's little consistency:

In [6]:
original_card_data.groupby("Type")["Name"].nunique()

Type
0          2
1         10
2         15
Minion     5
Spell      8
Name: Name, dtype: int64

These all actually refer to the same information, which is the Type (Boss, Minion, or Spell) of the card, but our team had worked on these cards separately and all decided to use different ways to express the same thing. The game code itself uses the numbers to determine the type of card: 0 for Boss, 1 for Minion, and 2 for Spell.

We can fix this with search and replace functions since we already know what each value maps to.

In [7]:
# "save" the data in a new variable; for now, this is more symbolic than anything
changed_card_data = original_card_data

# use the replace function to replace all words with their respective numbers
changed_card_data["Type"] = changed_card_data["Type"].replace({"Boss":"0","Minion": "1", "Spell": "2"})

# recalculate the counts
changed_card_data.groupby("Type")["Name"].nunique()

Type
0     2
1    15
2    23
Name: Name, dtype: int64

## Conclusion

While this is a very simple showcase on cleaning up data, it is very helpful and ensures that the end result is self-consistent. This means the game can run smoothly even if the team members had different ideas on how the information should be recorded. In other contexts, this kind of data processing is useful to prevent redundant or ill data from affecting the analysis!

In [8]:
changed_card_data

Unnamed: 0,Name,Alignment,Img Name,Description,:,Type,::,Rank,Cost to Use,Cost to Move,Attack/CP,Health,Defense,:::,Effect
0,Knight,Hero,knight.png,A trained soldier destined for greatness.,:,0,:,1,0,1,8,50,3,:,
1,Troll,Enemy,troll.png,Notorious for leading groups of monsters on ra...,:,0,:,1,0,1,11,40,1,:,
2,Alp,Enemy,alp.png,A demon that sits on sleepers with a crushing ...,:,1,:,1,1,1,4,20,0,:,
3,Angel,Hero,angel.png,"The symbol of goodness itself, it grants the b...",:,1,:,1,1,1,5,15,0,:,
4,Brag,Enemy,brag.png,"Beware this false mount, for it finds delight ...",:,1,:,1,1,1,5,10,0,:,
5,Div,Enemy,div.png,Legend has is that whomever can capture this d...,:,1,:,1,1,1,6,10,0,:,
6,Dwarf,Hero,dwarf.png,Mines in the deep underground for rare materia...,:,1,:,1,1,1,3,15,0,:,
7,Griffin,Neutral,griffin.png,"Many have tried to ride this creature, but onl...",:,1,:,1,1,1,5,25,0,:,
8,Lamia,Neutral,lamia.png,Even despite the horrible rumors surrounding i...,:,1,:,1,1,1,4,10,0,:,
9,Leshak,Enemy,leshak.png,Its friendly appearance masks its guise as it ...,:,1,:,1,1,1,5,10,0,:,
