In [2]:
import numpy as np
import pandas as pd
from collections import OrderedDict
from scipy import stats
%matplotlib inline

In [19]:
data = {"names": ["Greg", "Marcia", "Peter", "Jan", "Bobby", "Cindy", "Cousin Oliver"], "ages": [14, 12, 11, 10, 8, 6, 8]}
df = pd.DataFrame(data)

In [20]:
print(df)

   ages          names
0    14           Greg
1    12         Marcia
2    11          Peter
3    10            Jan
4     8          Bobby
5     6          Cindy
6     8  Cousin Oliver


In [21]:
print("Ages Mean: " + str(df.ages.mean()))
print("Median: " + str(df.ages.median()))
print("Mode: " + str(df.ages.mode()))

Ages Mean: 9.85714285714
Median: 10.0
Mode: 0    8
dtype: int64


In [22]:
print("Ages Variance: " + str(df.ages.var()))
print("Standard Dev: " + str(df.ages.std()))
print("Standard Error: " + str(stats.sem(df["ages"])))

Ages Variance: 7.47619047619
Standard Dev: 2.73426232761
Standard Error: 1.03345401972


Using these estimates, if I had to choose only one estimate of central tendency and one of variance, my measure of central tendency would have to be the mean, because the measures of variance all relate to the mean and the mean in this case is not far from the median.

In [32]:
df["ages"] = [14, 12, 11, 10, 8, 7, 8]
print(df)

   ages          names
0    14           Greg
1    12         Marcia
2    11          Peter
3    10            Jan
4     8          Bobby
5     7          Cindy
6     8  Cousin Oliver


In [33]:
print("Ages Mean: " + str(df.ages.mean()))
print("Median: " + str(df.ages.median()))
print("Mode: " + str(df.ages.mode()))

Ages Mean: 10.0
Median: 10.0
Mode: 0    8
dtype: int64


In [34]:
print("Ages Variance: " + str(df.ages.var()))
print("Standard Dev: " + str(df.ages.std()))
print("Standard Error: " + str(stats.sem(df["ages"])))

Ages Variance: 6.33333333333
Standard Dev: 2.51661147842
Standard Error: 0.951189731211


With Cindy's birthday, the mean converges on the median, and the measures of variance decrease. Median and mode remained the same.

In [35]:
df["ages"]= [14, 12, 11, 10, 8, 7, 1]
print(df)

   ages          names
0    14           Greg
1    12         Marcia
2    11          Peter
3    10            Jan
4     8          Bobby
5     7          Cindy
6     1  Cousin Oliver


In [36]:
print("Ages Mean: " + str(df.ages.mean()))
print("Median: " + str(df.ages.median()))
print("There is no mode; and evidently the scipy stats method for mode does not work when there is no mode.")

Ages Mean: 9.0
Median: 10.0
There is no mode; and evidently the scipy stats method for mode does not work when there is no mode.


In [37]:
print("Ages Variance: " + str(df.ages.var()))
print("Standard Dev: " + str(df.ages.std()))
print("Standard Error: " + str(stats.sem(df["ages"])))

Ages Variance: 18.0
Standard Dev: 4.24264068712
Standard Error: 1.60356745147


The mean and median have diverged, so I prefer the median as a measure that 50% of the ages are below 10 and 50% are above 10.

In [38]:
data2 = {"mags": ["TVGuide", "Ent Wkly", "Pop Culture", "SciPhi"], "scores": [20, 23, 17, 5]}
df2 = pd.DataFrame(data2)

In [39]:
print(df2)

          mags  scores
0      TVGuide      20
1     Ent Wkly      23
2  Pop Culture      17
3       SciPhi       5


In [40]:
df2.scores.mean()

16.25

I have no information about the numbers of subscribers to each magazine, so unless I can assume that there are an equal number of subscribers to each magazine and no adult Americans without a subscription to one of these magazines, and unless I know if the respondent groups are representative samples of the subscribers, I don't really have a way to estimate what percentage of adult Americans were fans of the Brady Bunch. I could guess with a crude average, 16.25%, though I would suspect that SciPhi would have a smaller number of subscibers and that the percentages should be weighted. The other three magazines are probably a more representative sample of the general population, but still likely exclude SciFi fanatics and certainly exclude those who don't subscribe to an entertainment magazine.