In [32]:
import pandas as pd
import numpy as np

### Pandas Mean

Pandas will take the average of your data across rows or columns. You pick.

Let's run through 3 examples:
1. Mean across columns
2. Mean across rows
3. Skipping NAs

But first, let's create our DataFrame

In [33]:
np.random.seed(seed=42)

df = pd.DataFrame(data=np.random.randint(0,100,(4,3)),
           columns=('Monday', 'Tuesday', 'Wednesday'),
            index=('Bob', 'Sally', 'Frank', 'Claire')
                 )
df

Unnamed: 0,Monday,Tuesday,Wednesday
Bob,51,92,14
Sally,71,60,20
Frank,82,86,74
Claire,74,87,99


### 1. Mean across columns

Unfortunately when referring to 'rows' and 'columns' in pandas can get confusing. The way I think about is is 'what axis do you want to cross to take the mean?'

Meaning, if you want to cross over rows, and take the column average, then you need to set axis='index' or axis=0. This mean's you jump down across rows and take the column average.

Notice here how axis='index' and I get the column average.

In [35]:
df.mean(axis='index')

Monday       69.50
Tuesday      81.25
Wednesday    51.75
dtype: float64

### 2. Mean across rows

On the flip side, if you wanted to jump to the right across *columns* then you need to set your axis='columns' or 1. This essentially means you're taking the row averages.

Here the axis='columns' so I get the row average.

In [37]:
df.mean(axis='columns')

Bob       52.333333
Sally     50.333333
Frank     80.666667
Claire    86.666667
dtype: float64

### 3. Skipping NAs
Finally let's take a look at how to skip NAs in .mean(). By default pandas will skip these for you, but say you wanted a sensitive .mean() function -- meaning you wanted it to throw an error if there was a 'NA' value. Then set skipna=False

In [39]:
df = pd.DataFrame([('Foreign Cinema', 'Restaurant', 289.0),
                   ('Liho Liho', 'Restaurant', 224.0),
                   ('500 Club', 'bar', 80.5),
                   ('The Square', 'bar', np.nan)],
           columns=('name', 'type', 'AvgBill')
                 )
df

Unnamed: 0,name,type,AvgBill
0,Foreign Cinema,Restaurant,289.0
1,Liho Liho,Restaurant,224.0
2,500 Club,bar,80.5
3,The Square,bar,


In [40]:
df.mean()

AvgBill    197.833333
dtype: float64

In [41]:
df.mean(skipna=False)

AvgBill   NaN
dtype: float64

### 4. Bonus: You can call .mean() on a series too

In [42]:
df['AvgBill'].mean()

197.83333333333334