# Inspect r/loseit Challenge Data

Now we can read in the already cleaned file. If you don't have the cleaned data, you will need to run [Find and Clean Loseit Data](clean_loseit_challenge_data.ipynb).

In [43]:
import numpy as np
import pandas as pd
pd.options.mode.chained_assignment = None  # default='warn'
import re
import matplotlib.pyplot as plt
import seaborn as sns
import os
color = sns.color_palette()
from IPython.display import display, HTML

%matplotlib inline

We begin by loading in the dataset and look at the counts.

In [57]:
big_df = pd.read_csv('./data/cleaned_and_combined_loseit_challenge_data.csv', index_col=0)
big_df['NSV Text'] = big_df['NSV Text'].astype(str).replace('nan', '')
len(big_df)

8873

Now we want to start looking at some of the statistics for the participants.

In [58]:
display(big_df.sort_values(by='Age').head())
display(big_df.sort_values(by='Age', ascending=False).head())

Unnamed: 0,Timestamp,Username,Team,Challenge,Age,Gender,Height,Highest Weight,Starting Weight,Challenge Goal Weight,Starting BMI,Has NSV,Has Food Tracker,Has Activity Tracker,NSV Text,Challenge Goal Loss,Final Weight,Total Challenge Loss,Challenge Percentage Lost,Percent of Challenge Goal
7050,2016-07,spartan117g,Sunshine,The Summer Challenge,1.0,Male,31.0,165.0,181.0,171.0,132.41,0,0,0,,10.0,178.0,3.0,1.657459,30.0
2123,2017-06,fattyteen12,Deadpool,Super Hero Summer Challenge,13.0,Unknown,58.0,125.0,90.3,90.0,19.43,1,1,1,To begin a healthy life style,0.3,88.0,2.3,2.547065,766.666667
7770,2017-08,thehealthymt,Terminator,Scifi Movies Challenge,14.0,Female,67.0,249.9,238.0,228.0,37.3,1,0,0,Fit in an XL shirt comfortably,10.0,230.0,8.0,3.361345,80.0
4751,2017-01,cantthinkofanything3,Snake,Rebirth Challenge,14.0,Male,65.0,215.0,207.5,200.0,34.53,1,0,0,Be able to control the amount of food I eat wi...,7.5,200.4,7.1,3.421687,94.666667
277,2018-10,thehealthymt,Waluigi,Super Mario Brothers Super Challenge,14.0,Female,67.0,249.9,189.1,185.0,30.7,1,0,0,Fit into size 14 jeans,4.1,176.5,12.6,6.663141,307.317073


Unnamed: 0,Timestamp,Username,Team,Challenge,Age,Gender,Height,Highest Weight,Starting Weight,Challenge Goal Weight,Starting BMI,Has NSV,Has Food Tracker,Has Activity Tracker,NSV Text,Challenge Goal Loss,Final Weight,Total Challenge Loss,Challenge Percentage Lost,Percent of Challenge Goal
1956,2018-03,amiathrowawaytotally,Yeti,Mythical Creatures Spring Challenge,100.0,Other,67.0,299.0,299.0,290.0,46.82,0,0,0,,9.0,290.0,9.0,3.010033,100.0
6380,2018-07,amiathrowawaytotally,Shadowfax,Lord Of The Rings Summer Challenge,100.0,Other,67.0,200.0,200.0,190.0,31.32,1,0,0,"Drink more water, take 7000+ steps per day.",10.0,196.5,3.5,1.75,35.0
6025,2018-07,mcculloughronnie75,Shadowfax,Lord Of The Rings Summer Challenge,76.0,Female,61.0,175.0,166.0,155.0,31.36,1,0,1,So I can fit into smaller and cuter clothes,11.0,162.0,4.0,2.409639,36.363636
2063,2018-03,patchet44,Cerberus,Mythical Creatures Spring Challenge,76.0,Female,68.0,208.0,163.0,157.0,24.78,1,0,0,better health,6.0,159.9,3.1,1.90184,51.666667
575,2018-10,mcculloughronnie75,Yoshi,Super Mario Brothers Super Challenge,75.0,Female,61.0,175.0,162.0,145.0,29.7,1,0,1,Fit into salsa dress,17.0,156.0,6.0,3.703704,35.294118


Here we can see that there are outliers at both the young and old end of the age spectrum. So I will look to see if they ever entered with a different age, and if not I will remove them from any age analysis.

In [59]:
display(big_df[big_df.Username == 'amiathrowawaytotally'])
display(big_df[big_df.Username == 'amiathrowawaytotally'])
display(big_df[big_df.Username == 'spartan117g'])

Unnamed: 0,Timestamp,Username,Team,Challenge,Age,Gender,Height,Highest Weight,Starting Weight,Challenge Goal Weight,Starting BMI,Has NSV,Has Food Tracker,Has Activity Tracker,NSV Text,Challenge Goal Loss,Final Weight,Total Challenge Loss,Challenge Percentage Lost,Percent of Challenge Goal
1956,2018-03,amiathrowawaytotally,Yeti,Mythical Creatures Spring Challenge,100.0,Other,67.0,299.0,299.0,290.0,46.82,0,0,0,,9.0,290.0,9.0,3.010033,100.0
6380,2018-07,amiathrowawaytotally,Shadowfax,Lord Of The Rings Summer Challenge,100.0,Other,67.0,200.0,200.0,190.0,31.32,1,0,0,"Drink more water, take 7000+ steps per day.",10.0,196.5,3.5,1.75,35.0


Unnamed: 0,Timestamp,Username,Team,Challenge,Age,Gender,Height,Highest Weight,Starting Weight,Challenge Goal Weight,Starting BMI,Has NSV,Has Food Tracker,Has Activity Tracker,NSV Text,Challenge Goal Loss,Final Weight,Total Challenge Loss,Challenge Percentage Lost,Percent of Challenge Goal
1956,2018-03,amiathrowawaytotally,Yeti,Mythical Creatures Spring Challenge,100.0,Other,67.0,299.0,299.0,290.0,46.82,0,0,0,,9.0,290.0,9.0,3.010033,100.0
6380,2018-07,amiathrowawaytotally,Shadowfax,Lord Of The Rings Summer Challenge,100.0,Other,67.0,200.0,200.0,190.0,31.32,1,0,0,"Drink more water, take 7000+ steps per day.",10.0,196.5,3.5,1.75,35.0


Unnamed: 0,Timestamp,Username,Team,Challenge,Age,Gender,Height,Highest Weight,Starting Weight,Challenge Goal Weight,Starting BMI,Has NSV,Has Food Tracker,Has Activity Tracker,NSV Text,Challenge Goal Loss,Final Weight,Total Challenge Loss,Challenge Percentage Lost,Percent of Challenge Goal
480,2018-10,spartan117g,Mario,Super Mario Brothers Super Challenge,21.0,Male,67.0,196.0,170.4,163.0,25.9,1,0,0,Be more fit,7.4,161.0,9.4,5.516432,127.027027
1841,2018-03,spartan117g,Phoenix,Mythical Creatures Spring Challenge,21.0,Male,67.0,196.0,163.6,160.0,25.62,1,0,0,Feel better,3.6,162.0,1.6,0.977995,44.444444
2624,2017-06,spartan117g,Batman,Super Hero Summer Challenge,20.0,Unknown,66.5,196.0,186.0,175.0,29.41,1,0,0,"Better fitness, I'm training to go to a road t...",11.0,182.0,4.0,2.150538,36.363636
3203,2018-01,spartan117g,Kākāriki,New New Year New Goals Challenge,21.0,Male,67.0,196.0,169.0,163.0,26.47,1,0,0,To be fit,6.0,164.0,5.0,2.95858,83.333333
7050,2016-07,spartan117g,Sunshine,The Summer Challenge,1.0,Male,31.0,165.0,181.0,171.0,132.41,0,0,0,,10.0,178.0,3.0,1.657459,30.0
9412,2017-10,spartan117g,Shark,Autumn Animal Challenge,21.0,Male,67.0,176.0,176.0,168.0,27.56,1,1,0,Get fit,8.0,167.7,8.3,4.715909,103.75


Ok, so we will need to ignore the age 100 entries, but we can replace the age 1 entry -- as well as the height of 31 inches.

In [60]:
big_df.Age.replace(1.0, 19.0, inplace=True)
big_df.Height.replace(31.0, 67.0, inplace=True)
display(big_df[big_df.Username == 'spartan117g'])
age_df = big_df[big_df.Age < 100]

Unnamed: 0,Timestamp,Username,Team,Challenge,Age,Gender,Height,Highest Weight,Starting Weight,Challenge Goal Weight,Starting BMI,Has NSV,Has Food Tracker,Has Activity Tracker,NSV Text,Challenge Goal Loss,Final Weight,Total Challenge Loss,Challenge Percentage Lost,Percent of Challenge Goal
480,2018-10,spartan117g,Mario,Super Mario Brothers Super Challenge,21.0,Male,67.0,196.0,170.4,163.0,25.9,1,0,0,Be more fit,7.4,161.0,9.4,5.516432,127.027027
1841,2018-03,spartan117g,Phoenix,Mythical Creatures Spring Challenge,21.0,Male,67.0,196.0,163.6,160.0,25.62,1,0,0,Feel better,3.6,162.0,1.6,0.977995,44.444444
2624,2017-06,spartan117g,Batman,Super Hero Summer Challenge,20.0,Unknown,66.5,196.0,186.0,175.0,29.41,1,0,0,"Better fitness, I'm training to go to a road t...",11.0,182.0,4.0,2.150538,36.363636
3203,2018-01,spartan117g,Kākāriki,New New Year New Goals Challenge,21.0,Male,67.0,196.0,169.0,163.0,26.47,1,0,0,To be fit,6.0,164.0,5.0,2.95858,83.333333
7050,2016-07,spartan117g,Sunshine,The Summer Challenge,19.0,Male,67.0,165.0,181.0,171.0,132.41,0,0,0,,10.0,178.0,3.0,1.657459,30.0
9412,2017-10,spartan117g,Shark,Autumn Animal Challenge,21.0,Male,67.0,176.0,176.0,168.0,27.56,1,1,0,Get fit,8.0,167.7,8.3,4.715909,103.75


The next component we will look at is height.

In [61]:
display(big_df.sort_values(by='Height').head())
display(big_df.sort_values(by='Height', ascending=False).head())

Unnamed: 0,Timestamp,Username,Team,Challenge,Age,Gender,Height,Highest Weight,Starting Weight,Challenge Goal Weight,Starting BMI,Has NSV,Has Food Tracker,Has Activity Tracker,NSV Text,Challenge Goal Loss,Final Weight,Total Challenge Loss,Challenge Percentage Lost,Percent of Challenge Goal
5719,2018-07,myfamilyworriessome,Frodo And Sam,Lord Of The Rings Summer Challenge,27.0,Female,52.0,164.2,150.6,140.0,39.15,1,0,0,Fit into my little blue dress for a wedding in...,10.6,144.2,6.4,4.249668,60.377358
4346,2017-01,quibaba,Phoenix,Rebirth Challenge,36.0,Female,52.0,242.0,221.0,205.0,57.46,1,0,0,Size large scrub pants!,16.0,217.0,4.0,1.809955,25.0
6764,2016-07,kaciedietrich,Junebug,The Summer Challenge,31.0,Female,52.0,168.0,158.0,145.0,41.08,1,1,1,Lower pant/dress size by 1 full size,13.0,153.0,5.0,3.164557,38.461538
6317,2018-07,mossy-pants,2nd Breakfast,Lord Of The Rings Summer Challenge,21.0,Female,52.0,166.8,157.2,152.0,40.87,1,0,1,increase cardio fitness score on fitbit,5.2,152.4,4.8,3.053435,92.307692
4739,2017-01,g-rain,Snake,Rebirth Challenge,25.0,Female,52.0,222.0,186.0,176.0,48.36,1,1,1,Improving strength and stamina,10.0,174.0,12.0,6.451613,120.0


Unnamed: 0,Timestamp,Username,Team,Challenge,Age,Gender,Height,Highest Weight,Starting Weight,Challenge Goal Weight,Starting BMI,Has NSV,Has Food Tracker,Has Activity Tracker,NSV Text,Challenge Goal Loss,Final Weight,Total Challenge Loss,Challenge Percentage Lost,Percent of Challenge Goal
9324,2017-10,mercer022,Lynx,Autumn Animal Challenge,22.0,Male,189.0,216.0,216.0,200.0,4.25,1,1,0,To start jogging,16.0,209.0,7.0,3.240741,43.75
9692,2017-10,skydweller-entropist,Panda,Autumn Animal Challenge,21.0,Male,82.6,216.0,216.0,200.0,22.26,0,1,0,,16.0,220.0,-4.0,-1.851852,-25.0
8258,2016-04,sportsfreaktony,Duckling,Spring Into Summer Challenge,25.0,Male,82.0,250.0,230.0,200.0,24.05,0,0,0,,30.0,219.0,11.0,4.782609,36.666667
4570,2017-01,nfaber06,Phoenix,Rebirth Challenge,28.0,Male,82.0,430.0,398.0,375.0,41.61,1,0,0,Fit in old clothes,23.0,384.0,14.0,3.517588,60.869565
1487,2018-03,jeffles2,Dragon,Mythical Creatures Spring Challenge,42.0,Male,81.0,275.0,245.0,240.0,26.25,1,0,0,Stick to elimination diet,5.0,224.6,20.4,8.326531,408.0


In [63]:
big_df[big_df.Username == 'mercer022']

Unnamed: 0,Timestamp,Username,Team,Challenge,Age,Gender,Height,Highest Weight,Starting Weight,Challenge Goal Weight,Starting BMI,Has NSV,Has Food Tracker,Has Activity Tracker,NSV Text,Challenge Goal Loss,Final Weight,Total Challenge Loss,Challenge Percentage Lost,Percent of Challenge Goal
815,2018-10,mercer022,Mario,Super Mario Brothers Super Challenge,23.0,Male,74.0,249.1,229.2,216.0,28.8,1,0,0,To start jogging,13.2,220.4,8.8,3.839442,66.666667
2487,2017-06,mercer022,Deadpool,Super Hero Summer Challenge,22.0,Unknown,74.4,260.0,229.2,220.0,29.97,1,0,0,To start jogging,9.2,224.8,4.4,1.919721,47.826087
8889,2016-04,mercer022,Seedling,Spring Into Summer Challenge,21.0,Male,74.0,286.6,260.0,242.0,33.38,0,0,0,,18.0,250.0,10.0,3.846154,55.555556
9324,2017-10,mercer022,Lynx,Autumn Animal Challenge,22.0,Male,189.0,216.0,216.0,200.0,4.25,1,1,0,To start jogging,16.0,209.0,7.0,3.240741,43.75


In [64]:
big_df.Height.replace(189.0, 74.0, inplace=True)
# we also need to fix the BMI, for 74 inches and 216 lbs, bmi = 27.7
big_df['Starting BMI'].replace(4.25, 27.7, inplace=True)
big_df[big_df.Username == 'mercer022']

Unnamed: 0,Timestamp,Username,Team,Challenge,Age,Gender,Height,Highest Weight,Starting Weight,Challenge Goal Weight,Starting BMI,Has NSV,Has Food Tracker,Has Activity Tracker,NSV Text,Challenge Goal Loss,Final Weight,Total Challenge Loss,Challenge Percentage Lost,Percent of Challenge Goal
815,2018-10,mercer022,Mario,Super Mario Brothers Super Challenge,23.0,Male,74.0,249.1,229.2,216.0,28.8,1,0,0,To start jogging,13.2,220.4,8.8,3.839442,66.666667
2487,2017-06,mercer022,Deadpool,Super Hero Summer Challenge,22.0,Unknown,74.4,260.0,229.2,220.0,29.97,1,0,0,To start jogging,9.2,224.8,4.4,1.919721,47.826087
8889,2016-04,mercer022,Seedling,Spring Into Summer Challenge,21.0,Male,74.0,286.6,260.0,242.0,33.38,0,0,0,,18.0,250.0,10.0,3.846154,55.555556
9324,2017-10,mercer022,Lynx,Autumn Animal Challenge,22.0,Male,74.0,216.0,216.0,200.0,27.7,1,1,0,To start jogging,16.0,209.0,7.0,3.240741,43.75


The next stat we look at is the total weight loss during the challenge..

In [65]:
display(big_df.sort_values(by='Total Challenge Loss').head(15))
display(big_df.sort_values(by='Total Challenge Loss', ascending=False).head(5))

Unnamed: 0,Timestamp,Username,Team,Challenge,Age,Gender,Height,Highest Weight,Starting Weight,Challenge Goal Weight,Starting BMI,Has NSV,Has Food Tracker,Has Activity Tracker,NSV Text,Challenge Goal Loss,Final Weight,Total Challenge Loss,Challenge Percentage Lost,Percent of Challenge Goal
3726,2018-01,lemasterofswag,Teacup Pig,New New Year New Goals Challenge,21.0,Female,59.8,124.6,104.1,100.3,20.46,1,1,1,See my shoulder bones,3.8,219.97,-115.87,-111.306436,-3049.210526
1789,2018-03,kivotheginger,Yeti,Mythical Creatures Spring Challenge,21.0,Male,72.0,270.0,164.0,245.0,22.24,1,1,0,Workout 4 times per week,-81.0,268.0,-104.0,-63.414634,128.395062
231,2018-10,monkey_doodoo,Yoshi,Super Mario Brothers Super Challenge,42.0,Female,63.0,230.0,197.1,188.0,35.1,1,1,0,"for jeans to be comfy, feel better",9.1,296.5,-99.4,-50.431253,-1092.307692
3914,2018-01,avoidsummer,Turtle,New New Year New Goals Challenge,33.0,Female,65.0,314.0,289.0,280.0,48.09,1,0,0,To visit the gym on a regular basis and do a m...,9.0,384.0,-95.0,-32.871972,-1055.555556
1596,2018-03,raelinxovern,Chupacabra,Mythical Creatures Spring Challenge,23.0,Male,74.0,420.0,386.8,365.0,49.66,1,1,0,Not have to rest hands on gut while on the phone,21.8,469.0,-82.2,-21.251293,-377.06422
6606,2016-07,rtriv85,Butterfly,The Summer Challenge,31.0,Female,65.0,164.0,158.5,139.0,26.37,1,0,1,Start wearing my size 8 & size 10 dresses agai...,19.5,240.0,-81.5,-51.419558,-417.948718
8313,2016-04,jennyy1,Fawn,Spring Into Summer Challenge,26.0,Female,59.0,189.0,159.4,150.0,32.19,1,1,1,Loss of belly fat (post preg),9.4,185.5,-26.1,-16.373902,-277.659574
7419,2017-08,bonsai1001,Alien,Scifi Movies Challenge,29.0,Female,67.0,180.0,139.0,129.0,21.8,1,1,1,Losing a jeans size,10.0,158.0,-19.0,-13.669065,-190.0
4335,2017-01,cattipotato,Phoenix,Rebirth Challenge,22.0,Female,63.0,250.0,220.2,215.0,39.0,1,0,0,Gym 4x/week,5.2,238.0,-17.8,-8.08356,-342.307692
8222,2016-04,inputzero,Duckling,Spring Into Summer Challenge,26.0,Male,69.0,280.0,188.0,175.0,27.76,1,1,1,To fit my old clothea,13.0,205.0,-17.0,-9.042553,-130.769231


Unnamed: 0,Timestamp,Username,Team,Challenge,Age,Gender,Height,Highest Weight,Starting Weight,Challenge Goal Weight,Starting BMI,Has NSV,Has Food Tracker,Has Activity Tracker,NSV Text,Challenge Goal Loss,Final Weight,Total Challenge Loss,Challenge Percentage Lost,Percent of Challenge Goal
1480,2018-03,fralance,Chupacabra,Mythical Creatures Spring Challenge,18.0,Male,71.0,250.0,248.4,230.0,34.64,1,1,0,To get back on track,18.4,111.3,137.1,55.193237,745.108696
1676,2018-03,rainishamy,Unicorn,Mythical Creatures Spring Challenge,44.0,Female,69.0,340.0,309.8,292.0,45.74,1,1,0,need smaller pants,17.8,197.6,112.2,36.216914,630.337079
1322,2018-03,mandatech,Phoenix,Mythical Creatures Spring Challenge,34.0,Female,66.0,235.0,221.3,215.0,35.71,1,0,0,Zip the boots again,6.3,115.9,105.4,47.627655,1673.015873
360,2018-10,katamaja,Waluigi,Super Mario Brothers Super Challenge,31.0,Female,65.0,187.0,165.3,159.0,28.2,1,1,0,"run 10k, log every day,",6.3,72.4,92.9,56.200847,1474.603175
1301,2018-03,alakazam1111,Dragon,Mythical Creatures Spring Challenge,17.0,Female,59.0,167.0,141.5,135.0,28.58,0,0,0,,6.5,64.5,77.0,54.416961,1184.615385


So we need to ignore the 4 entries with a total loss of more than 50 lbs lost and the 6 entries gaining more than 20lbs during the entry.

In [66]:
loss_df = big_df[big_df['Total Challenge Loss'] < 50]
loss_df = loss_df[loss_df['Total Challenge Loss'] > -20]

Challenge Goal Loss

In [67]:
display(big_df.sort_values(by='Challenge Goal Loss').head(5))
display(big_df.sort_values(by='Challenge Goal Loss', ascending=False).head(5))

Unnamed: 0,Timestamp,Username,Team,Challenge,Age,Gender,Height,Highest Weight,Starting Weight,Challenge Goal Weight,Starting BMI,Has NSV,Has Food Tracker,Has Activity Tracker,NSV Text,Challenge Goal Loss,Final Weight,Total Challenge Loss,Challenge Percentage Lost,Percent of Challenge Goal
5677,2018-07,schrodinger_dog,Rivendell,Lord Of The Rings Summer Challenge,27.0,Male,70.0,318.0,254.0,345.0,36.44,1,0,0,Go to the gym 5 times a week,-91.0,243.2,10.8,4.251969,-11.868132
1789,2018-03,kivotheginger,Yeti,Mythical Creatures Spring Challenge,21.0,Male,72.0,270.0,164.0,245.0,22.24,1,1,0,Workout 4 times per week,-81.0,268.0,-104.0,-63.414634,128.395062
8101,2016-04,dubmevertigo,Daffodil,Spring Into Summer Challenge,21.0,Female,65.7,240.0,175.0,191.0,28.5,1,1,1,To fit comfortably in size 14,-16.0,175.0,0.0,0.0,-0.0
8912,2016-04,shcamannon,Seedling,Spring Into Summer Challenge,25.0,Female,67.0,185.0,150.0,166.0,23.49,1,1,0,Less back pain,-16.0,154.0,-4.0,-2.666667,25.0
8460,2016-04,thats_ridiculous,Hayfever,Spring Into Summer Challenge,28.0,Female,64.0,255.0,210.0,224.0,36.04,1,0,1,"Fit comfortably into my ""skinny"" jeans",-14.0,218.0,-8.0,-3.809524,57.142857


Unnamed: 0,Timestamp,Username,Team,Challenge,Age,Gender,Height,Highest Weight,Starting Weight,Challenge Goal Weight,Starting BMI,Has NSV,Has Food Tracker,Has Activity Tracker,NSV Text,Challenge Goal Loss,Final Weight,Total Challenge Loss,Challenge Percentage Lost,Percent of Challenge Goal
4133,2017-01,chelle1976,Monarch,Rebirth Challenge,40.0,Female,65.0,215.0,204.0,15.0,33.94,1,1,0,To be able to run longer and with greater ease,189.0,201.0,3.0,1.470588,1.587302
8137,2016-04,misskateykates,Daffodil,Spring Into Summer Challenge,28.0,Female,66.0,199.9,198.0,20.0,31.95,1,1,1,to feel comfortable in shorts,178.0,182.2,15.8,7.979798,8.876404
8598,2016-04,stikki_lawndart,Ladybug,Spring Into Summer Challenge,25.0,Male,67.0,323.0,320.0,180.0,50.11,1,0,1,Stick to the habits I that help me get healthy.,140.0,302.0,18.0,5.625,12.857143
8543,2016-04,labirynthgrl,Ladybug,Spring Into Summer Challenge,25.0,Female,67.0,281.0,279.0,150.0,43.69,1,1,0,Being able to do an invert on pole,129.0,262.6,16.4,5.878136,12.713178
6900,2016-07,arcadia_lynch,Sunflower,The Summer Challenge,28.0,Female,67.0,306.0,283.0,175.0,44.32,1,0,0,Be down one clothing size please god.,108.0,278.6,4.4,1.55477,4.074074


Looking at these values, we can see that there seems to be a lot of outliers due to input errors. Trying to remove the outliers would be pretty subjective, so I won't try and remove them. Hopefully looking at something like a bar plot will be useful for seeing what kind of goals most people have for the challenge.

Inspecting the gender/sex column, we find that there is one challenge that is completely missing this column. I want to replace their gender with their gender in a different challenge if they have done more than 1.

In [68]:
unique_users = big_df.Username.unique()

In [69]:
for name in unique_users:
    user_df = big_df[big_df.Username == name].Gender
    if 'Male' in user_df.values:
        big_df.loc[big_df[big_df.Username == name].index, 'Gender'] = 'Male'
    elif 'Female' in user_df.values:
        big_df.loc[big_df[big_df.Username == name].index, 'Gender'] = 'Female'
    elif 'Other' in user_df.values:
        big_df.loc[big_df[big_df.Username == name].index, 'Gender'] = 'Other'
    elif 'Unique' in user_df.values:
        big_df.loc[big_df[big_df.Username == name].index, 'Gender'] = 'Unknown'

Now that we have fixed some of the input errors, we can save the data and begin the data analysis and visualization in the [next notebook](analyze_loseit_challenge_data.ipynb).

In [70]:
big_df.to_csv('./data/outlier_fized_loseit_challenge_data.csv')
age_df.to_csv('./data/age_outlier_loseit_challenge_data.csv')
loss_df.to_csv('./data/weight_loss_outlier_loseit_challenge_data.csv')