# 3 - Consistent Strings
In this third step I'll show you how to fix inconsistent strings in the <code>category</code> column.

You can use: comment of action, python coding and results.

In [7]:
import pandas as pd 

df = pd.read_json("customer_data.json", convert_dates=False)
df.head()

Unnamed: 0,amount,category,city,customer_id,date,frequently_bought_together,lat_lon,purchase,related_items,state,zip_code
0,24.64,household,Chicago,100191,1-Jan-14,towels,"41.86,-87.619",soap,towels,IL,60605
1,35.0,clothing,Dallas,100199,2-Jan-14,sandals,"32.924,-96.547",shorts,belts,TX,75089
2,89.72,outdoor,Philadelphia,100170,3-Jan-14,lawn bags,"40.002,-75.118",lawn_mower,shovels,PA,19019
3,51.32,electronics,Chicago,100124,4-Jan-14,headphones,"41.88,-87.63",laptop,headphones,IL,60603
4,81.75,outdoor,Philadelphia,100173,5-Jan-14,sponge,"39.953,-75.166",car wash,sponge,PA,19102


Lets take a quick look at the data using the <code>.unique()</code> method.

In [8]:
print(df["category"].unique())

['household' 'clothing' 'outdoor' 'electronics' 'appliances' 'house'
 'elect^ronics' '^electro$nics' 'outdo&or' 'household_' '?out$door' 'elec'
 'app' 'house_hold' '%appliances' '\\appliances' 'electronic']


Not all of the strings have consistent formats.

The <code>household</code> category shows up with the following inconsistent formats <code>["household_", "house", "house_hold"]</code>.

In [9]:
inconsistent_format = ["household_", "house", "house_hold"]

cnt = 0

for row in df["category"]:
    if row in inconsistent_format:
        df.loc[cnt, "category"] = "household"
    cnt+=1
    
print(df["category"].unique())

['household' 'clothing' 'outdoor' 'electronics' 'appliances'
 'elect^ronics' '^electro$nics' 'outdo&or' '?out$door' 'elec' 'app'
 '%appliances' '\\appliances' 'electronic']


We now see that we have one consistent format for <code>household</code>.

Use this same process to clean the other inconsistent strings.