# Checkpoint Three: Cleaning Data

Now you are ready to clean your data. Before starting coding, provide the link to your dataset below.

My dataset:

Import the necessary libraries and create your dataframe(s).

In [462]:
# Same intro as checkpoint 2
# Standard imports of the libraries we have been using in class:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# The following was done after running the first .head(), which did not show all columns initially:

pd.set_option('display.max_columns', None)

In [463]:
# csv can be found here: https://www.kaggle.com/datasets/maximebonnin/dnd-characters-test

# Initial import of full .csv that will be chopped down (because checkpoint 3 is in a subfolder in checkpoint 2 we directly reference full file path):
fullSheet = pd.read_csv("over_one_mil_chars.csv")

In [464]:
# Also pre-emptively adding the stat names dict
statNames = {'stats_1': 'Strength',
    'stats_2': 'Dexterity',
    'stats_3': 'Constitution',
    'stats_4': 'Intelligence',
    'stats_5': 'Wisdom',
    'stats_6': 'Charisma'
}

## Missing Data

Test your dataset for missing data and handle it as needed. Make notes in the form of code comments as to your thought process.

In [465]:
# Once again copying the code from data manip studio to see what the percentages of null values per column are

for col in fullSheet.columns:
    pct_missing = np.mean(fullSheet[col].isnull())
    print('{} - {}%'.format(col, round(pct_missing*100)))

# As noted in checkpoint 2, not a ton of missing data that is unexpected, but the 0 values for stats have to be addressed. Turning them into null values will make the series float types, but that's fine as they can be converted back to ints later, 
    # as they are all whole numbers. It is also possible to just put >= 1 when doing work on them, that might also be the play.

Unnamed: 0 - 0%
char_id - 0%
name - 0%
base_hp - 0%
stats_1 - 0%
stats_2 - 0%
stats_3 - 0%
stats_4 - 0%
stats_5 - 0%
stats_6 - 0%
background - 25%
race - 0%
class_starting - 0%
class_starting_level - 0%
subclass_starting - 50%
class_other - 93%
subclass_other - 96%
total_level - 0%
feats - 82%
inventory - 31%
date_modified - 0%
notes_len - 0%
gold - 0%


In [466]:
# Now to check and see value counts of duplicated to see if there are any duplicated rows

fullSheet.duplicated().value_counts()

# No, no duplicated rows. Nice. 

False    1204252
Name: count, dtype: int64

In [467]:
# While there are other things to do (and this does fall outside the scope of cleaning data), I want to first look at feats since we couldn't in checkpoint two, 
    # due to the necessity of splitting them up first. As such we will work with it in this order by making a separate dataframe of only characters with listed feats:

charsWithFeats = fullSheet[(fullSheet['feats'].notnull() == True)]


# To now completely isolate the feats: 

rawFeats = charsWithFeats['feats']

# head to see how it looks:
print(rawFeats.head(15))

# Oh those indexes will be a problem, need to clean that up:

rawFeats = rawFeats.reset_index(drop=True)

# Now for a quick check to make sure the split is working correctly:

tempFeats = [rawFeats[0].split('/')]
print(tempFeats)

# Perfect

# Now a for loop to split the whole 

for i in range(len(rawFeats)):
    rawFeats[i] = rawFeats[i].split('/')


# Now running head again to see if split worked properly

rawFeats.head(15)

# It did, joy

1     Spell Sniper (Bard, Sorcerer, Warlock)/Inferna...
6                                          Dual Wielder
7                              Resilient (Constitution)
8          Resilient (Dexterity)/Ritual Caster (Wizard)
14                             Resilient (Intelligence)
16                                            Observant
20                                         Dual Wielder
26                                         Dual Wielder
29                                             Grappler
30                                           War Caster
34                                             Grappler
35                                                Actor
63                                             Grappler
67                                        Shield Master
80                           Grappler/Svirfneblin Magic
Name: feats, dtype: object
[['Spell Sniper (Bard, Sorcerer, Warlock)', 'Infernal Constitution']]


0     [Spell Sniper (Bard, Sorcerer, Warlock), Infer...
1                                        [Dual Wielder]
2                            [Resilient (Constitution)]
3       [Resilient (Dexterity), Ritual Caster (Wizard)]
4                            [Resilient (Intelligence)]
5                                           [Observant]
6                                        [Dual Wielder]
7                                        [Dual Wielder]
8                                            [Grappler]
9                                          [War Caster]
10                                           [Grappler]
11                                              [Actor]
12                                           [Grappler]
13                                      [Shield Master]
14                        [Grappler, Svirfneblin Magic]
Name: feats, dtype: object

In [468]:
# Let's see which has the most feats

mostFeats = max(rawFeats, key=len)
print(len(mostFeats), mostFeats)

# 103 feats is crazy

103 ['Grappler', 'Alert', 'Dungeon Delver', 'Durable', 'Healer', 'Heavily Armored', 'Inspiring Leader', 'Keen Mind', 'Linguist', 'Mage Slayer', 'Magic Initiate', 'Moderately Armored', 'Observant', 'Sentinel', 'Shield Master', 'Svirfneblin Magic', 'War Caster', 'Weapon Master', 'Savage Attacker', 'Elemental Adept (Lightning)', 'Bountiful Luck', 'Wood Elf Magic', 'Second Chance', 'Dwarven Fortitude', 'Prodigy', 'Fey Teleportation', 'Flames of Phlegethos', 'Master Artificer', 'Eldritch Infused Weapon.', 'Arcane Channeling', 'Volcanic Fury (Druid)', 'Natural Leader', 'Arcane Infusion', 'Powerful Presence', 'Adaptable Dragons Breath', 'Look me in the eyes', 'Berserker', 'Feral Rage', 'Experience Fighter', 'Warp spell', 'Unyielding Fighter', 'Greater Planeswalker Spark', 'Runeforging', 'War druid', 'Magic Imbued Weaponry', 'Indecisive (Cleric)', 'Book of Ancient Secrets (Sorcerer', 'Warlock)', 'Spirit Archetypes', "Ancient One's Savagery", 'Magic Blast', 'PTSD', 'Goodberry Potion', 'Arcane C

In [469]:
# Now need to break apart the list further

# First, initializing a list that will hold all values of the feats series, began by a for loop to iterate through all the rows in rawFeats

totalFeats = []

for individualFeatsList in rawFeats:

    # A nested for loop to iterate through all the lists within the series:

    for eachIndividualFeat in individualFeatsList:
        totalFeats.append(eachIndividualFeat)

        # NOTE: Originally the code was: 

        # for i in range(len(rawFeats)):

            #for e in range(len(rawFeats[i])):
            #totalFeats.append(rawFeats[i][e])
        
        # But after looking at notation for a ton of for loops that didn't explicitly specify range, I wanted to try it just using variables. And after much trial and error, figured out the above. 

# Initializing a Series to store totalFeats and .head without printing the master list:

totalFeatsSeries = pd.Series(totalFeats)

# Now just in case to make it all properly case-sensitive:

totalFeatsSeries = totalFeatsSeries.str.title()

totalFeatsSeries.head(15)

# Oh my goodness it works

0     Spell Sniper (Bard, Sorcerer, Warlock)
1                      Infernal Constitution
2                               Dual Wielder
3                   Resilient (Constitution)
4                      Resilient (Dexterity)
5                     Ritual Caster (Wizard)
6                   Resilient (Intelligence)
7                                  Observant
8                               Dual Wielder
9                               Dual Wielder
10                                  Grappler
11                                War Caster
12                                  Grappler
13                                     Actor
14                                  Grappler
dtype: object

In [470]:
# Now all that's left is to run value counts of the feats series

totalFeatsSeries.value_counts()

# Whoa, grappler sure is popular

Grappler                  68520
War Caster                18200
Svirfneblin Magic         15423
Lucky                     13536
Tough                     10945
                          ...  
Great Weapon Expertise        1
Wood Elf Paragon              1
Chaotic Magic                 1
Blessing Of Bane              1
Son Of Thor                   1
Name: count, Length: 3847, dtype: int64

In [471]:
# Now let's see what the 15 most popular are without the formatting cutting it off with ...

totalFeatsSeries.value_counts().head(15)

# All of these make sense!

Grappler                    68520
War Caster                  18200
Svirfneblin Magic           15423
Lucky                       13536
Tough                       10945
Sharpshooter                10321
Alert                        9866
Observant                    9260
Great Weapon Master          8872
Dual Wielder                 8369
Sentinel                     7807
Mobile                       7049
Shield Master                5589
Resilient (Constitution)     5276
Heavy Armor Master           4534
Name: count, dtype: int64

In [472]:
# I do gotta go back and see what that character with 103 feats was, though. First, splitting the feats within the series but as part of the overall dataframe and not just raw feats

charsWithFeats['feats'] = charsWithFeats['feats'].str.split('/')

print(charsWithFeats['feats'].head(15))

print(max(charsWithFeats['feats'], key=len))


1     [Spell Sniper (Bard, Sorcerer, Warlock), Infer...
6                                        [Dual Wielder]
7                            [Resilient (Constitution)]
8       [Resilient (Dexterity), Ritual Caster (Wizard)]
14                           [Resilient (Intelligence)]
16                                          [Observant]
20                                       [Dual Wielder]
26                                       [Dual Wielder]
29                                           [Grappler]
30                                         [War Caster]
34                                           [Grappler]
35                                              [Actor]
63                                           [Grappler]
67                                      [Shield Master]
80                        [Grappler, Svirfneblin Magic]
Name: feats, dtype: object
['Grappler', 'Alert', 'Dungeon Delver', 'Durable', 'Healer', 'Heavily Armored', 'Inspiring Leader', 'Keen Mind', 'Linguist', 'Mage Sl

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  charsWithFeats['feats'] = charsWithFeats['feats'].str.split('/')


In [473]:
# Calculating the length of each feat lists

featLengths = charsWithFeats['feats'].apply(len)
print(featLengths)

# And this kept our indexes, nice

maxFeatCountIndex = featLengths.idxmax()
print(maxFeatCountIndex)


charWithMostFeats = pd.DataFrame(charsWithFeats.loc[maxFeatCountIndex])
charWithMostFeats

#Kyvue Greystorm, what a chad (this doesn't seem like a possible list of feats at level 12 but go off)

1          2
6          1
7          1
8          2
14         1
          ..
1204236    1
1204243    1
1204245    1
1204246    1
1204250    1
Name: feats, Length: 219008, dtype: int64
1112025


Unnamed: 0,1112025
Unnamed: 0,1112025
char_id,1806785
name,Kyvue Greystorm
base_hp,63
stats_1,15
stats_2,15
stats_3,16
stats_4,13
stats_5,11
stats_6,15


In [474]:
# class_other having so many null values is fine, as it's common for characters to not multiclass (especially low level characters, which we know there are a ton of). Same with 
# Inventory having a bunch of null values is something we could work with like the feat list above, but as that is beyond the scope of this project for right now, it will most likely just be dropped. 
# The only real question is what to do with the gold values of 0.0, because there are so many. But as stated before it IS plausible that half the characters have a gold value of 0 because of low level players + one-shot campaigns. 

## Irregular Data

Detect outliers in your dataset and handle them as needed. Use code comments to make notes about your thought process.

In [475]:
# Just in case this comes back to bite me, we will initialize a new dataframe that is a copy of the fullSheet, but with the intent of making this with NaN instead of Zero stats

fullSheetNoZeroStats = fullSheet

In [476]:
# Now to make the NoZeroStats sheet actually live up to its name

for i in range(len(statNames)):
    fullSheetNoZeroStats[f'stats_{i+1}'] = fullSheetNoZeroStats[f'stats_{i+1}'].replace(0, np.nan)


In [477]:
# Now to verify the loop worked:

fullSheetNoZeroStats['stats_1'].value_counts(dropna=False)

# Perfect, the 0's were replaced with NaN so they will no longer mess with the numbers 

# Regarding gold, I don't think I'll touch it, instead just add in sanity checks excluding 1st and 99th percentiles when calculating anything off of it 

stats_1
8.0     200679
15.0    177714
10.0    149672
NaN     145239
12.0    120819
14.0    105169
13.0     93383
11.0     48666
16.0     39418
9.0      38224
18.0     38151
17.0     25821
7.0       9059
6.0       5565
5.0       2709
3.0       2073
4.0       1696
20.0       102
19.0        44
30.0        11
2.0          6
22.0         6
1.0          5
21.0         4
25.0         4
23.0         4
26.0         4
28.0         3
27.0         1
24.0         1
Name: count, dtype: int64

In [478]:
# Now to finally rename the stats columns to their proper values. Held off on that until now because it was fun to use {i+1} within the f string for the columns in a for loop

fullSheetNoZeroStats = fullSheetNoZeroStats.rename(columns=statNames)

fullSheetNoZeroStats.head(15)

# Wonderful, they're properly renamed. Going forward, NoZeroStats will be the one worked with

Unnamed: 0.1,Unnamed: 0,char_id,name,base_hp,Strength,Dexterity,Constitution,Intelligence,Wisdom,Charisma,background,race,class_starting,class_starting_level,subclass_starting,class_other,subclass_other,total_level,feats,inventory,date_modified,notes_len,gold
0,0,1,Molster,8,11.0,22.0,11.0,17.0,20.0,12.0,Urban Bounty Hunter,Aarakocra,Artificer,4,,,,4,,Adamantine Chain Shirt/Cloak of Protection/Gog...,2022-09-12T19:13:03.29Z,17,305.0
1,1,3,Prailak,103,12.0,17.0,18.0,14.0,16.0,16.0,Charlatan,Tiefling,Warlock,20,The Great Old One,,,20,"Spell Sniper (Bard, Sorcerer, Warlock)/Inferna...","Bag of Holding/Iron Flask/Studded Leather, +3/...",2021-12-07T21:25:36.9Z,60,0.0
2,2,8,Aurilanax,76,18.0,10.0,16.0,13.0,13.0,16.0,City Watch / Investigator,Bugbear,Paladin,7,Oath of the Crown,,,7,,Shield,2021-01-21T16:50:35Z,0,0.0
3,3,10,Gamndell Banglebon,127,10.0,14.0,16.0,16.0,15.0,18.0,Clan Crafter,Gnome,Warlock,2,,Bard/Cleric/Wizard,School of Illusion,16,,Give/Studded Leather/Dagger/Light Hammer/Ink (...,2020-08-06T16:05:07Z,65,4951.06
4,4,19,Bellek Bouncer,52,10.0,10.0,15.0,12.0,14.0,10.0,Sailor,Half-Orc,Cleric,6,Life Domain,,,6,,Shield/Plate/Mace/Potion of Healing,2020-05-15T04:47:59Z,0,0.0
5,5,44,Yehudi,41,15.0,8.0,14.0,16.0,13.0,16.0,Far Traveler,Firbolg,Monk,5,Way of the Long Death,,,5,,Shield,2022-11-08T02:58:57.507Z,157,5.0
6,6,45,Rockhand,82,18.0,14.0,17.0,7.0,8.0,13.0,Outlander,Goliath,Barbarian,11,Path of the Wild Soul (archived),,,11,Dual Wielder,Amulet of Proof against Detection and Location...,2020-12-03T21:38:23.06Z,354,10305.75
7,7,48,Atherinnyia 'Rinn' Teshurr,62,8.0,14.0,13.0,17.0,12.0,15.0,Noble,Elf,Wizard,13,School of Evocation,Sorcerer,Draconic Bloodline,15,Resilient (Constitution),Circlet of Blasting/Robe of Stars/Potion of He...,2022-10-31T07:40:18.41Z,2765,73.42
8,8,54,Althovion,58,12.0,15.0,15.0,13.0,15.0,16.0,Charlatan,Human,Sorcerer,14,Shadow Magic,,,14,Resilient (Dexterity)/Ritual Caster (Wizard),Bag of Holding/Circlet of Blasting/Cloak of th...,2021-09-21T06:33:28Z,3336,522.84
9,9,59,Veelan Pheer'ii,12,10.0,14.0,14.0,10.0,12.0,13.0,Far Traveler,Genasi,Fighter,5,Champion,Rogue,Thief,8,,"Censer of Controlling Air Elementals/Shield, +...",2018-06-29T15:37:47Z,13,10.0


## Unnecessary Data

Look for the different types of unnecessary data in your dataset and address it as needed. Make sure to use code comments to illustrate your thought process.

In [479]:
# As mentioned before, several columns are irrelevant so it's finally time to drop them. 

fullSheetNoZeroStats = fullSheetNoZeroStats.drop(columns=['Unnamed: 0', 'char_id', 'notes_len', 'date_modified'])

fullSheetNoZeroStats.head()

# Remain undecided on gold/background columns currently, so those will wait as well. It's always much easier to simply not include them in future calcs than to drop them now and have to go back several iterations of the dataframe to have them included again

Unnamed: 0,name,base_hp,Strength,Dexterity,Constitution,Intelligence,Wisdom,Charisma,background,race,class_starting,class_starting_level,subclass_starting,class_other,subclass_other,total_level,feats,inventory,gold
0,Molster,8,11.0,22.0,11.0,17.0,20.0,12.0,Urban Bounty Hunter,Aarakocra,Artificer,4,,,,4,,Adamantine Chain Shirt/Cloak of Protection/Gog...,305.0
1,Prailak,103,12.0,17.0,18.0,14.0,16.0,16.0,Charlatan,Tiefling,Warlock,20,The Great Old One,,,20,"Spell Sniper (Bard, Sorcerer, Warlock)/Inferna...","Bag of Holding/Iron Flask/Studded Leather, +3/...",0.0
2,Aurilanax,76,18.0,10.0,16.0,13.0,13.0,16.0,City Watch / Investigator,Bugbear,Paladin,7,Oath of the Crown,,,7,,Shield,0.0
3,Gamndell Banglebon,127,10.0,14.0,16.0,16.0,15.0,18.0,Clan Crafter,Gnome,Warlock,2,,Bard/Cleric/Wizard,School of Illusion,16,,Give/Studded Leather/Dagger/Light Hammer/Ink (...,4951.06
4,Bellek Bouncer,52,10.0,10.0,15.0,12.0,14.0,10.0,Sailor,Half-Orc,Cleric,6,Life Domain,,,6,,Shield/Plate/Mace/Potion of Healing,0.0


## Inconsistent Data

Check for inconsistent data and address any that arises. As always, use code comments to illustrate your thought process.

In [480]:
# The big one here is case sensitivity. While checking for names in checkpoint 2, some of the names had different cases. So, to make sure everything is as uniform as it can be, it's time to split all of the strings and run everything through .title()

fullSheetNoZeroStats['name'] = fullSheetNoZeroStats['name'].str.title()

# If this took properly, then just like in checkpoint 2 it should not have Test and test both

fullSheetNoZeroStats['name'].value_counts()

# Perfect, they match

name
Test                          1792
Bob                           1262
Varis                          487
Rhogar                         465
Steve                          361
                              ... 
Alexander Stonecutter            1
Griv Ungart                      1
Veelan Pheer'Ii                  1
Atherinnyia 'Rinn' Teshurr       1
Rockhand                         1
Name: count, Length: 388581, dtype: int64

In [481]:
# Splitting background:

fullSheetNoZeroStats['background'] = fullSheetNoZeroStats['background'].str.split('/')

# Running head to check thesis with the extra space on City Watch / Investigator:
fullSheetNoZeroStats['background'].head(10)




0           [Urban Bounty Hunter]
1                     [Charlatan]
2    [City Watch ,  Investigator]
3                  [Clan Crafter]
4                        [Sailor]
5                  [Far Traveler]
6                     [Outlander]
7                         [Noble]
8                     [Charlatan]
9                  [Far Traveler]
Name: background, dtype: object

In [None]:
# Quick info to double check amount of non-null -- with 1.2m this still means 300k empty rows or so

fullSheetNoZeroStats['background'].info()

<class 'pandas.core.series.Series'>
RangeIndex: 1204252 entries, 0 to 1204251
Series name: background
Non-Null Count   Dtype 
--------------   ----- 
898870 non-null  object
dtypes: object(1)
memory usage: 9.2+ MB


In [506]:
# Space verified, time to strip. However, since there are null values, we have to account for them. The easiest way to do that is to user isinstance to check if the value is a list

for i in range(len(fullSheetNoZeroStats['background'])):
    if isinstance(fullSheetNoZeroStats['background'][i], list):
        for x in range(len(fullSheetNoZeroStats['background'][i])):
            fullSheetNoZeroStats['background'][i][x] = fullSheetNoZeroStats['background'][i][x].strip().title()
            
# (it took me unironically like two and a half hours of trying things before I realized that isinstance is something that could be used. Was doing things like running loc on notnull, etc, and kept running into all kinds of syntax errors)


In [507]:
# Running head to verify. Specifically, the spaces around City Watch and Investigator should be gone, while the NaN in index 13 should remain

fullSheetNoZeroStats['background'].head(15)

# Thank F***, that took so long

0          [Urban Bounty Hunter]
1                    [Charlatan]
2     [City Watch, Investigator]
3                 [Clan Crafter]
4                       [Sailor]
5                 [Far Traveler]
6                    [Outlander]
7                        [Noble]
8                    [Charlatan]
9                 [Far Traveler]
10                [Clan Crafter]
11                   [Charlatan]
12                   [Folk Hero]
13                           NaN
14                   [Gladiator]
Name: background, dtype: object

In [495]:
# Thankfully, race by identity cannot be a list (at least, in the context of dnd because half races are "half-elf", etc.). So at least there's that. 

fullSheetNoZeroStats['race'] = fullSheetNoZeroStats['race'].str.strip().str.title()

In [None]:
# Quick head to check

fullSheetNoZeroStats['race'].value_counts().head(25)

race
Human                260468
Elf                  194312
Half-Elf             115163
Dwarf                 89961
Tiefling              89939
Dragonborn            87207
Halfling              60419
Half-Orc              54909
Genasi                48171
Gnome                 44800
Aasimar               43746
Goliath               43105
Aarakocra             33111
Tabaxi                 7092
Tortle                 4155
Firbolg                3145
Kenku                  3012
Lizardfolk             2493
Goblin                 2357
Yuan-Ti Pureblood      1901
Bugbear                1526
Feral Tiefling         1483
Kobold (Archived)      1431
Triton                 1402
Orc (Archived)         1049
Name: count, dtype: int64

In [504]:
# We know from running the unique in checkpoint 2 that all class_starting are correct, so let's move on to subclass_starting. 

fullSheetNoZeroStats['subclass_starting'] = fullSheetNoZeroStats['subclass_starting'].str.split('/')




In [508]:
# Now to run the same loop, stripping unnecessary spaces and setting text case

for i in range(len(fullSheetNoZeroStats['subclass_starting'])):
    if isinstance(fullSheetNoZeroStats['subclass_starting'][i], list):
        for x in range(len(fullSheetNoZeroStats['subclass_starting'][i])):
            fullSheetNoZeroStats['subclass_starting'][i][x] = fullSheetNoZeroStats['subclass_starting'][i][x].strip().title()

In [509]:
# Now class-other

fullSheetNoZeroStats['class_other'] = fullSheetNoZeroStats['class_other'].str.split('/')

In [510]:
for i in range(len(fullSheetNoZeroStats['class_other'])):
    if isinstance(fullSheetNoZeroStats['class_other'][i], list):
        for x in range(len(fullSheetNoZeroStats['class_other'][i])):
            fullSheetNoZeroStats['class_other'][i][x] = fullSheetNoZeroStats['class_other'][i][x].strip().title()

In [511]:
# subclass-other

fullSheetNoZeroStats['subclass_other'] = fullSheetNoZeroStats['subclass_other'].str.split('/')

In [None]:
# subclass_other strip/title

for i in range(len(fullSheetNoZeroStats['subclass_other'])):
    if isinstance(fullSheetNoZeroStats['subclass_other'][i], list):
        for x in range(len(fullSheetNoZeroStats['subclass_other'][i])):
            fullSheetNoZeroStats['subclass_other'][i][x] = fullSheetNoZeroStats['subclass_other'][i][x].strip().title()

In [513]:
# Because feats haven't had this done in this dataframe, it's time to do it with them as well

fullSheetNoZeroStats['feats'] = fullSheetNoZeroStats['feats'].str.split('/')

In [None]:
# feats strip/title

for i in range(len(fullSheetNoZeroStats['feats'])):
    if isinstance(fullSheetNoZeroStats['feats'][i], list):
        for x in range(len(fullSheetNoZeroStats['feats'][i])):
            fullSheetNoZeroStats['feats'][i][x] = fullSheetNoZeroStats['feats'][i][x].strip().title()

In [None]:
# And finally inventory. This will probably be a monster one to separate but we might as well finish it. 

fullSheetNoZeroStats['inventory'] = fullSheetNoZeroStats['inventory'].str.split('/')

In [None]:
# Now the final strip/title

# for i in range(len(fullSheetNoZeroStats['inventory'])):
#    if isinstance(fullSheetNoZeroStats['inventory'][i], list):
#        for x in range(len(fullSheetNoZeroStats['inventory'][i])):
#            fullSheetNoZeroStats['inventory'][i][x] = fullSheetNoZeroStats['inventory'][i][x].strip().title()

# NOTE: THIS CELL COMMENTED OUT BECAUSE EVEN ON MY PC IT TOOK TWO AND A HALF MINUTES TO RUN. SO IT'S TO SAVE THE GRADER'S PC

## Summarize Your Results

Make note of your answers to the following questions.

1. Did you find all four types of dirty data in your dataset?
# I did, yes. There were plenty of missing values that I had to account for in some of the columns, there were crazy outliers in the gold count, multiple of the columns were not relevant for what I needed, and then at the end some of the strings (in particular names) did not have the same capitalization. Additionally, there were a ton of strings that were lists without being lists, and they needed to be split properly into lists. 
2. Did the process of cleaning your data give you new insights into your dataset?
# Yes, in particular playing around with the feats was very much enlightnening in terms of how the dataset functions as a whole, and just how *many* data points there can be for one character.
3. Is there anything you would like to make note of when it comes to manipulating the data and making visualizations?
# Nothing that I haven't already denoted in the comments, to be honest. The big one is now that all the columns that contain lists of data are properly split and formatted in their own lists within the series rows, at any point I can put them into a single list of all values to get counts, like totalFeats in checkpoint two. From there it will be very easy to get counts to visualize things with, and I'd imagine it also makes groupby easier to work with as well if need be. 