# Describing Data - Drill

Greg was 14, Marcia was 12, Peter was 11, Jan was 10, Bobby was 8, and Cindy was 6 when they started playing the Brady kids on The Brady Bunch. Cousin Oliver was 8 years old when he joined the show. What are the mean, median, and mode of the kids' ages when they first appeared on the show? What are the variance, standard deviation, and standard error?

Using these estimates, if you had to choose only one estimate of central tendency and one estimate of variance to describe the data, which would you pick and why?

Next, Cindy has a birthday. Update your estimates- what changed, and what didn't?

Nobody likes Cousin Oliver. Maybe the network should have used an even younger actor. Replace Cousin Oliver with 1-year-old Jessica, then recalculate again. Does this change your choice of central tendency or variance estimation methods?

On the 50th anniversary of The Brady Bunch, four different magazines asked their readers whether they were fans of the show. The answers were: TV Guide 20% fans Entertainment Weekly 23% fans Pop Culture Today 17% fans SciPhi Phanatic 5% fans

Based on these numbers, what percentage of adult Americans would you estimate were Brady Bunch fans on the 50th anniversary of the show?

In [45]:
import pandas as pd
import numpy as np

#dictionary of names/ages of cast members
brbunch = {'Greg': 14, 'Marcia': 12, 'Peter': 11, 'Jan': 10, 'Bobby': 8, 'Cindy':6, 'Oliver': 8}

#create df of cast members 
bunch_df = pd.DataFrame.from_dict(brbunch, orient='index')
bunch_df.columns = ['Age']
bunch_df.index.name = 'Names'
bunch_df

Unnamed: 0_level_0,Age
Names,Unnamed: 1_level_1
Greg,14
Marcia,12
Peter,11
Jan,10
Bobby,8
Cindy,6
Oliver,8


In [46]:
#find central tendencies 

bunch_mean = bunch_df.mean()
bunch_mode = bunch_df.mode()
bunch_median = bunch_df.median()

print('Bunch mean is', bunch_mean[0])
print('Bunch mode is', bunch_mode['Age'][0])
print('Bunch median is', bunch_median[0])

Bunch mean is 9.857142857142858
Bunch mode is 8
Bunch median is 10.0


In [47]:
#find variance, standard deviation and standard error 

#setting ddoff to 0 since this is the population, not the sample 
bunch_var = bunch_df.Age.var(ddof=0)
bunch_std = bunch_df.Age.std(ddof=0)
bunch_se = bunch_df.Age.sem(ddof=0)

print('Bunch variance is', bunch_var)
print('Bunch std is', bunch_std)
print('Bunch standard error is', bunch_se)


Bunch variance is 6.408163265306122
Bunch std is 2.531435020952764
Bunch standard error is 0.9567925036515135


### Using these estimates, if you had to choose only one estimate of central tendency and one estimate of variance to describe the data, which would you pick and why?

If I had to choose one estimate of central tendency and of estimate of variance, I would chose mean and variance. I would chose these two because they show that the age of the kids on the show could be anywhere between 15 to 3 years old. 

### Next, Cindy has a birthday. Update your estimates- what changed, and what didn't?

In [48]:
#Cindy birthday has past, let's update the dataframe 

bunch_df.Age.Cindy += 1

bunch_df

Unnamed: 0_level_0,Age
Names,Unnamed: 1_level_1
Greg,14
Marcia,12
Peter,11
Jan,10
Bobby,8
Cindy,7
Oliver,8


In [49]:
#recalculate central tendency and variance data 

#find central tendencies 

bunch_mean = bunch_df.mean()
bunch_mode = bunch_df.mode()
bunch_median = bunch_df.median()

print('Bunch mean is', bunch_mean[0])
print('Bunch mode is', bunch_mode['Age'][0])
print('Bunch median is', bunch_median[0])

#find variance, standard deviation and standard error 

#setting ddoff to 0 since this is the population, not the sample 
bunch_var = bunch_df.Age.var(ddof=0)
bunch_std = bunch_df.Age.std(ddof=0)
bunch_se = bunch_df.Age.sem(ddof=0)

print('Bunch variance is', bunch_var)
print('Bunch std is', bunch_std)
print('Bunch standard error is', bunch_se)

Bunch mean is 10.0
Bunch mode is 8
Bunch median is 10.0
Bunch variance is 5.428571428571429
Bunch std is 2.32992949004287
Bunch standard error is 0.8806305718527109


After changing the df to reflect Cindy's birthday, the mean, variance, std and standard error changed. 

### Nobody likes Cousin Oliver. Maybe the network should have used an even younger actor. Replace Cousin Oliver with 1-year-old Jessica, then recalculate again. Does this change your choice of central tendency or variance estimation methods?

In [55]:
#update dataframe to reflect cast change 

bunch_df = bunch_df.rename(index={'Oliver':'Jessica'})
bunch_df.Age.Jessica = 1
bunch_df              

Unnamed: 0_level_0,Age
Names,Unnamed: 1_level_1
Greg,14
Marcia,12
Peter,11
Jan,10
Bobby,8
Cindy,7
Jessica,1


In [60]:
#recalculate central tendency and variance data 

#find central tendencies 

bunch_mean = bunch_df.mean()
bunch_mode = bunch_df.mode()
bunch_median = bunch_df.median()

print('Bunch mean is', bunch_mean[0])
print('Bunch mode is n/a')
print('Bunch median is', bunch_median[0])

#find variance, standard deviation and standard error 

#setting ddoff to 0 since this is the population, not the sample 
bunch_var = bunch_df.Age.var(ddof=0)
bunch_std = bunch_df.Age.std(ddof=0)
bunch_se = bunch_df.Age.sem(ddof=0)

print('Bunch variance is', bunch_var)
print('Bunch std is', bunch_std)
print('Bunch standard error is', bunch_se)

Bunch mean is 9.0
Bunch mode is n/a
Bunch median is 10.0
Bunch variance is 15.428571428571429
Bunch std is 3.927922024247863
Bunch standard error is 1.4846149779161806


After reviewing the new central tendency and variance values, I would change my choice of estimating methods to median and standard error in order to show that the age of the kids on brandy bunch vary greatly. 

#### On the 50th anniversary of The Brady Bunch, four different magazines asked their readers whether they were fans of the show. The answers were: TV Guide 20% fans Entertainment Weekly 23% fans Pop Culture Today 17% fans SciPhi Phanatic 5% fans. Based on these numbers, what percentage of adult Americans would you estimate were Brady Bunch fans on the 50th anniversary of the show?
I would take the average of the percentages listed by TV Guide, Entertainment Weekly, and Pop Culture. I would not use SciPhi Phanatic for two reasons: 1) If you are a fan of SciPhi Phanatic, you probably watch shows like Star Trek; 2) the 5% is a outlier. 

In [63]:
avg_adult_fan = (20+23+17)/3
avg_adult_fan

20.0

I would guess that the percentage of adult American that were Brady Bunch Fans is 20%. 