In [1]:
data = '''household,dorm,appliance,energy_kWh
A,tuscany,phone_energy,10
B,sauv,phone_energy,30
C,tuscany,phone_energy,12
D,sauv,phone_energy,20
A,tuscany,laptop_energy,50
B,sauv,laptop_energy,60
C,tuscany,laptop_energy,45
D,sauv,laptop_energy,50
'''

%matplotlib inline
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from io import StringIO

df = pd.read_csv(StringIO(data))
df

Unnamed: 0,household,dorm,appliance,energy_kWh
0,A,tuscany,phone_energy,10
1,B,sauv,phone_energy,30
2,C,tuscany,phone_energy,12
3,D,sauv,phone_energy,20
4,A,tuscany,laptop_energy,50
5,B,sauv,laptop_energy,60
6,C,tuscany,laptop_energy,45
7,D,sauv,laptop_energy,50


In this format, it is much easier to use plotting programs to understand the variations in the data.

We can use a tool called a groupby which does what it's name implies, it groups data by a similar attribute like the type of energy use or the name of the dorm.

If we recall our filtering operations, we can get all the observations of phone energy.

In [2]:
df[df['appliance']=='phone_energy']#['energy_kWh'].sum()

Unnamed: 0,household,dorm,appliance,energy_kWh
0,A,tuscany,phone_energy,10
1,B,sauv,phone_energy,30
2,C,tuscany,phone_energy,12
3,D,sauv,phone_energy,20


From this we can also get the energy column.

In [3]:
df[df['appliance']=='phone_energy']['energy_kWh']

0    10
1    30
2    12
3    20
Name: energy_kWh, dtype: int64

We can also get the sum of these energy values and figure out how much energy was used for phones in the dorms.

In [4]:
df[df['appliance']=='phone_energy']['energy_kWh'].sum()

72

We could do this for each appliance in our data set of observations.
However, there is a more compact way to do this called a group by.
In the command below, we instruct the computer to do the above calculation for each unique type of appliance in the data set.
Notice that we get the same sum.

In [5]:
pd.groupby(df, by='appliance').sum()

Unnamed: 0_level_0,energy_kWh
appliance,Unnamed: 1_level_1
laptop_energy,205
phone_energy,72


However, we can also do this by the dorm or any other column of the data.

In [6]:
pd.groupby(df, by='dorm').sum()

Unnamed: 0_level_0,energy_kWh
dorm,Unnamed: 1_level_1
sauv,160
tuscany,117


In [7]:
pd.groupby(df, by='household').sum()

Unnamed: 0_level_0,energy_kWh
household,Unnamed: 1_level_1
A,60
B,90
C,57
D,70


This provides a very powerful tool for understanding the contributions from different parts of your data.
In addition to sums, you can also do other statistical functions like median, maximums, and minimums.