# Central Tendency Measures

- Measures of center are statistics that give us a sense of the "middle" of a numeric variable. 
- centrality measures give you a sense of a typical value you'd expect to see. 
- Common measures of center include the mean, median and mode.

# Mean

- To calculate the average       of a set of observations, add their value and divide by the number of observations
- The mean is simply an average: the sum of the values divided by the total number of records, we use df.mean() to get the mean of each column in a DataFrame.

<img src='./images/mean.png'>

    import numpy as np
    import pandas as pd
    df = pd.DataFrame({'A' : range(10), 'B' : np.random.randn(10)})
    df
    df.mean()
    df['A'].mean()
    df['B'].mean()

In [2]:
#code here


import numpy as np
import pandas as pd
df = pd.DataFrame({'A' : range(10), 'B' : np.random.randn(10)})
df
df.mean()
df['A'].mean()
df['B'].mean()

-0.34538728263023255

In [3]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from ggplot import mtcars

You can access Timestamp as pandas.Timestamp
  pd.tslib.Timestamp,
  from pandas.lib import Timestamp
  from pandas.core import datetools


In [4]:
mtcars.head()

Unnamed: 0,name,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
1,Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
2,Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
3,Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
4,Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2


In [5]:
mtcars.index = mtcars["name"]
mtcars.mean()                 # Get the mean of each column

mpg      20.090625
cyl       6.187500
disp    230.721875
hp      146.687500
drat      3.596563
wt        3.217250
qsec     17.848750
vs        0.437500
am        0.406250
gear      3.687500
carb      2.812500
dtype: float64

We can also get the means of each row by supplying an axis argument:

In [None]:
# mtcars.mean(axis=1)           # Get the mean of each row

In [None]:
#code here



# Median

    The median of a distribution is the value where 50% of the data lies below it and 50% lies above it. 
    The median splits the data in half. 

#### Calculation: - 

If there are an odd number of observations, ﬁnd the middle value
- If there are an even number of observations, ﬁnd the middle two  values and average them
### The median generally gives a better sense of the typical value in a distribution


• Example:

- Age of participants: 17   19   21   22   23   23   23   38
- Median = (22+23)/2 = 22.5

    import numpy as np
    import pandas as pd
    df = pd.DataFrame({'A' : range(10), 'B' : np.random.randn(10)})
    df
    df.median()
    df['A'].median()
    df['B'].median()

In [6]:
# code here

import numpy as np
import pandas as pd
df = pd.DataFrame({'A' : range(10), 'B' : np.random.randn(10)})
df
df.median()
df['A'].median()
df['B'].median()


-0.5046690244701907

In [None]:
# mtcars.median()                 # Get the median of each column

In [None]:
# code here




    The mean and median both give us some sense of the center of a distribution, they aren't always the same. 
    The median always gives us a value that splits the data into two halves while the mean is a numeric average so extreme values can have a significant impact on the mean. 
    In a symmetric distribution, the mean and median will be the same. 
    
    The mean is also influenced heavily by outliers, while the median resists the influence of outliers:

### Which Location Measure Is Best?

- • Mean is best for symmetric distributions without outliers 
- • Median is useful for skewed distributions or data with outliers


<img src='./images/whichmeasureisbest.png'>

# Mode

- The mode of a variable the value that appears most frequently. 
- you can take the mode of a categorical variable and it is possible to have multiple modes

    The columns with multiple modes (multiple values with the same count) return multiple values as the mode. 
    Columns with no mode (no value that appears more than once) return NaN.

In [7]:
mtcars.mode()

Unnamed: 0,name,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,AMC Javelin,10.4,8.0,275.8,110.0,3.07,3.44,17.02,0.0,0.0,3.0,2.0
1,Cadillac Fleetwood,15.2,,,175.0,3.92,,18.9,,,,4.0
2,Camaro Z28,19.2,,,180.0,,,,,,,
3,Chrysler Imperial,21.0,,,,,,,,,,
4,Datsun 710,21.4,,,,,,,,,,
5,Dodge Challenger,22.8,,,,,,,,,,
6,Duster 360,30.4,,,,,,,,,,
7,Ferrari Dino,,,,,,,,,,,
8,Fiat 128,,,,,,,,,,,
9,Fiat X1-9,,,,,,,,,,,
