# Chapter 10 Data Aggregation and Group Operations

- Split a data frame into pieces using one or more keys.
- Calculate group summary statistics such as count, mean, standard deviation, or a user-defined function.
- Apply within-group transformations such as normalization.
- Compute pivot tables and cross-tabulations.
- Perform statistical group analysis.

## I. GroupBy Mechanics

Many data processing follows a **split-apply-combine** process. For example, you may want to do the following operations to analyze a dataset about sales:
1. What is the total revenue every day?
2. What is the total sales of each product?
3. How much has each client perchased in total?

These operations all requires that you split the data into groups, and then apply certain calculations to each of the groups, and finally combine all results into a new table. In Pandas this is mostly done with `groupby()` function.

In [1]:
import numpy as np
import pandas as pd

In [2]:
# An example:
df = pd.DataFrame({'key1' : ['a', 'a', 'b', 'b', 'a'],
                   'key2' : ['one', 'two', 'one', 'two', 'one'],
                   'data1' : np.random.randn(5),
                   'data2' : np.random.randn(5)})
df

Unnamed: 0,key1,key2,data1,data2
0,a,one,1.533264,0.788511
1,a,two,0.070856,-0.193073
2,b,one,0.598159,1.251546
3,b,two,-0.671643,-0.619285
4,a,one,0.331091,0.642477


In [3]:
df['key1']

0    a
1    a
2    b
3    b
4    a
Name: key1, dtype: object

In [4]:
df['data1']

0    1.533264
1    0.070856
2    0.598159
3   -0.671643
4    0.331091
Name: data1, dtype: float64

In [5]:
# Split data1 values according to key1:
groups = df['data1'].groupby(df['key1'])
groups

<pandas.core.groupby.generic.SeriesGroupBy object at 0x7ffa660eca90>

In [6]:
# Apply mean() function to find the average value for each group
means = groups.mean()
means

key1
a    0.645070
b   -0.036742
Name: data1, dtype: float64

In [7]:
# Convert it to a data frame
df_means = means.to_frame(name='data1_mean')
df_means

Unnamed: 0_level_0,data1_mean
key1,Unnamed: 1_level_1
a,0.64507
b,-0.036742


In [8]:
# Put all operations in one statement
df_means = df['data1'].groupby(df['key1']).mean().to_frame(name='data1_mean')
df_means

Unnamed: 0_level_0,data1_mean
key1,Unnamed: 1_level_1
a,0.64507
b,-0.036742


In [9]:
# Exercise: split data2 according to key2, and calculate the sum.

# 1. split
groups2 = df['data2'].groupby(df['key2'])
# 2. apply
results = groups2.sum()
# 3. convert the result to a data frame
results.to_frame(name="data2_sum")

Unnamed: 0_level_0,data2_sum
key2,Unnamed: 1_level_1
one,2.682535
two,-0.812358


We can use more than one column as keys.

In [10]:
# Split the data according to both key1 and key2
groups = df['data1'].groupby([df['key1'], df['key2']])

In [11]:
# Calculate the mean
means = groups.mean()
means

key1  key2
a     one     0.932177
      two     0.070856
b     one     0.598159
      two    -0.671643
Name: data1, dtype: float64

In [12]:
means.to_frame(name="Value")

Unnamed: 0_level_0,Unnamed: 1_level_0,Value
key1,key2,Unnamed: 2_level_1
a,one,0.932177
a,two,0.070856
b,one,0.598159
b,two,-0.671643


We obtain a pandas Series with **hierarchical indexing**. It can be converted to a data frame using `unstack()`.

In [13]:
# Convert it to a data frame
means.unstack()

key2,one,two
key1,Unnamed: 1_level_1,Unnamed: 2_level_1
a,0.932177,0.070856
b,0.598159,-0.671643


In [14]:
means.unstack(level=0)

key1,a,b
key2,Unnamed: 1_level_1,Unnamed: 2_level_1
one,0.932177,0.598159
two,0.070856,-0.671643


In [15]:
# Put all operations in one statement

# df['data1'].groupby([df['key1'], df['key2']]).mean()
df['data1'].groupby([df['key1'], df['key2']]).mean().to_frame(name="Value").unstack()

Unnamed: 0_level_0,Value,Value
key2,one,two
key1,Unnamed: 1_level_2,Unnamed: 2_level_2
a,0.932177,0.070856
b,0.598159,-0.671643


In [16]:
# Split the entire data frame
df.groupby([df['key1'], df['key2']]).mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,data1,data2
key1,key2,Unnamed: 2_level_1,Unnamed: 3_level_1
a,one,0.932177,0.715494
a,two,0.070856,-0.193073
b,one,0.598159,1.251546
b,two,-0.671643,-0.619285


In [17]:
# Frequently the grouping information is found in the same data frame as the data 
# you want to work on. In that case, simply put column names as the keys:
df.groupby(['key1', 'key2']).mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,data1,data2
key1,key2,Unnamed: 2_level_1,Unnamed: 3_level_1
a,one,0.932177,0.715494
a,two,0.070856,-0.193073
b,one,0.598159,1.251546
b,two,-0.671643,-0.619285


In [18]:
# Find the number of instances in each subgroup
df.groupby(['key1', 'key2']).size().to_frame(name='size')

Unnamed: 0_level_0,Unnamed: 1_level_0,size
key1,key2,Unnamed: 2_level_1
a,one,2
a,two,1
b,one,1
b,two,1


**Iterating Over Groups**

The GroupBy object support iteration, providing a sequence of 2-tuples containing the group name along with the data.

In [19]:
# Show the content of each group.
groups = df.groupby(['key1', 'key2'])
for name, group in groups: # groups = [(("a", "one"), ....),
#                                       ("a", "two"), ....), ....]
    print("Name:", name)
    print(group)

Name: ('a', 'one')
  key1 key2     data1     data2
0    a  one  1.533264  0.788511
4    a  one  0.331091  0.642477
Name: ('a', 'two')
  key1 key2     data1     data2
1    a  two  0.070856 -0.193073
Name: ('b', 'one')
  key1 key2     data1     data2
2    b  one  0.598159  1.251546
Name: ('b', 'two')
  key1 key2     data1     data2
3    b  two -0.671643 -0.619285


**Syntactic sugar**: selecting columns for groupby()

In [20]:
df.groupby('key1')['data1'].min()

key1
a    0.070856
b   -0.671643
Name: data1, dtype: float64

In [21]:
df['data1'].groupby(df['key1']).min()

key1
a    0.070856
b   -0.671643
Name: data1, dtype: float64

In [65]:
# The following statement does not work because 'key1' is not a column in df['data1']
# df['data1'].groupby('key1').min()

In [23]:
df.groupby('key1')[['data2']].min()

Unnamed: 0_level_0,data2
key1,Unnamed: 1_level_1
a,-0.193073
b,-0.619285


In [24]:
df[['data2']].groupby(df['key1']).min()

Unnamed: 0_level_0,data2
key1,Unnamed: 1_level_1
a,-0.193073
b,-0.619285


**Grouping with dictionary**

In [25]:
values = np.array([
    [100, 80, 95],
    [55, 60, 45],
    [70, 75, 90],
    [75, 70, 60],
    [60, 73, 75],
    [72, 63, 70]
])
data = pd.DataFrame(values,
                   columns=['Midterm', 'Project', 'Final'],
                   index=['Alice', 'Bob', 'Chris', 'Doug', 'Eva', "Frank"])
data

Unnamed: 0,Midterm,Project,Final
Alice,100,80,95
Bob,55,60,45
Chris,70,75,90
Doug,75,70,60
Eva,60,73,75
Frank,72,63,70


In [26]:
gender = {
    'Alice': 'F',
    'Bob': 'M',
    'Chris': 'M',
    'Doug': 'M',
    'Eva': 'F',
    'Frank': 'M'
}

In [27]:
# split the rows according to gender
data.groupby(gender).size()

F    2
M    4
dtype: int64

In [28]:
data.groupby(['F', 'M', 'M', 'M', 'F', 'M']).size() # not recommended

F    2
M    4
dtype: int64

In [29]:
data.groupby(gender).mean()

Unnamed: 0,Midterm,Project,Final
F,80.0,76.5,85.0
M,68.0,67.0,66.25


**Grouping with functions**

Any function passed as a group key will be called once per index value, with the returned values being used as the group names.

In [30]:
def get_initial(name):
    return name[0]

In [31]:
data.groupby(get_initial).mean()

Unnamed: 0,Midterm,Project,Final
A,100,80,95
B,55,60,45
C,70,75,90
D,75,70,60
E,60,73,75
F,72,63,70


In [32]:
data.groupby(lambda x: x[0]).mean()

Unnamed: 0,Midterm,Project,Final
A,100,80,95
B,55,60,45
C,70,75,90
D,75,70,60
E,60,73,75
F,72,63,70


In [33]:
data.groupby(len).mean()

Unnamed: 0,Midterm,Project,Final
3,57.5,66.5,60.0
4,75.0,70.0,60.0
5,80.666667,72.666667,85.0


In [34]:
len("Alice")

5

In [35]:
len("Bob")

3

**Example: Filling Missing Values with Group-Specific Values**

In [36]:
states = ['Ohio', 'New York', 'Vermont', 'Florida',
          'Oregon', 'Nevada', 'California', 'Idaho']
group_key = ['East'] * 4 + ['West'] * 4
data = pd.DataFrame(np.random.randn(8), index=states, columns=['Value'])
data.loc[['Vermont', 'Nevada', 'Idaho']] = np.nan
data['group_key'] = group_key
data

Unnamed: 0,Value,group_key
Ohio,0.952939,East
New York,0.729977,East
Vermont,,East
Florida,0.57472,East
Oregon,0.737876,West
Nevada,,West
California,0.52578,West
Idaho,,West


In [37]:
# Fill the missing values with mean value

data.fillna(data.mean())

Unnamed: 0,Value,group_key
Ohio,0.952939,East
New York,0.729977,East
Vermont,0.704258,East
Florida,0.57472,East
Oregon,0.737876,West
Nevada,0.704258,West
California,0.52578,West
Idaho,0.704258,West


In [38]:
# Find the average value of eastern states and western states
means = data.groupby("group_key").mean()

# Fill missing values with group specific average
# data.groupby("group_key").apply(lambda x: x.fillna(x.mean()))
def fill_group(group):
    return group.fillna(group.mean())
data.groupby("group_key").apply(fill_group)

Unnamed: 0_level_0,Unnamed: 1_level_0,Value,group_key
group_key,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
East,Ohio,0.952939,East
East,New York,0.729977,East
East,Vermont,0.752545,East
East,Florida,0.57472,East
West,Oregon,0.737876,West
West,Nevada,0.631828,West
West,California,0.52578,West
West,Idaho,0.631828,West


In [39]:
# Fill missing values with the following rule:
# East: 0.5
# West: -0.5
values = {'East': 0.5,
          'West': -0.5}
# data.groupby("group_key").apply(lambda x: x.fillna(values[x.name]))
def fill_group2(group):
    value = values[group.name]
    return group.fillna(value)
data.groupby("group_key").apply(fill_group2)

Unnamed: 0,Value,group_key
Ohio,0.952939,East
New York,0.729977,East
Vermont,0.5,East
Florida,0.57472,East
Oregon,0.737876,West
Nevada,-0.5,West
California,0.52578,West
Idaho,-0.5,West


**Example: Random Sampling and Permutation**

In [40]:
# Hearts, Spades, Clubs, Diamonds
suits = ['H', 'S', 'C', 'D']
card_val = (list(range(1, 11)) + [10] * 3) * 4
base_names = ['A'] + list(range(2, 11)) + ['J', 'Q', 'K']
cards = []
for suit in ['H', 'S', 'C', 'D']: # heart, spade, club, diamond
    cards.extend(str(num) + suit for num in base_names)

deck = pd.Series(card_val, index=cards)
deck

AH      1
2H      2
3H      3
4H      4
5H      5
6H      6
7H      7
8H      8
9H      9
10H    10
JH     10
QH     10
KH     10
AS      1
2S      2
3S      3
4S      4
5S      5
6S      6
7S      7
8S      8
9S      9
10S    10
JS     10
QS     10
KS     10
AC      1
2C      2
3C      3
4C      4
5C      5
6C      6
7C      7
8C      8
9C      9
10C    10
JC     10
QC     10
KC     10
AD      1
2D      2
3D      3
4D      4
5D      5
6D      6
7D      7
8D      8
9D      9
10D    10
JD     10
QD     10
KD     10
dtype: int64

In [41]:
# Randomly sample 5 rows

deck.sample(5)

4D     4
KS    10
9H     9
AH     1
AS     1
dtype: int64

In [42]:
# Randomly sample 2 cards from each suit
groups = deck.groupby(lambda x: x[-1])
# for name, group in groups:
#     print(name)
#     print(group)
groups.apply(lambda x: x.sample(2))

C  JC    10
   6C     6
D  KD    10
   QD    10
H  9H     9
   5H     5
S  4S     4
   QS    10
dtype: int64

**Example: Analyzing Cell Phone History**

In [43]:
# Load data
# https://www.shanelynn.ie/summarising-aggregation-and-grouping-data-in-python-pandas/
url = "https://shanelynnwebsite-mid9n9g1q9y8tt.netdna-ssl.com/wp-content/uploads/2015/06/phone_data.csv"
# data = pd.read_csv(url, delimiter=",")
data = pd.read_csv(url, delimiter=",", index_col='index')
print(data.shape)
data.head(20)

(830, 6)


Unnamed: 0_level_0,date,duration,item,month,network,network_type
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,15/10/14 06:58,34.429,data,2014-11,data,data
1,15/10/14 06:58,13.0,call,2014-11,Vodafone,mobile
2,15/10/14 14:46,23.0,call,2014-11,Meteor,mobile
3,15/10/14 14:48,4.0,call,2014-11,Tesco,mobile
4,15/10/14 17:27,4.0,call,2014-11,Tesco,mobile
5,15/10/14 18:55,4.0,call,2014-11,Tesco,mobile
6,16/10/14 06:58,34.429,data,2014-11,data,data
7,16/10/14 15:01,602.0,call,2014-11,Three,mobile
8,16/10/14 15:12,1050.0,call,2014-11,Three,mobile
9,16/10/14 15:30,19.0,call,2014-11,voicemail,voicemail


1. **date**: The date and time of the entry
2. **duration**: The duration (in seconds) for each call, the amount of data (in MB) for each data entry, and the number of texts sent (usually 1) for each sms entry.
3. **item**: A description of the event occurring – can be one of call, sms, or data.
4. **month**: The billing month that each entry belongs to – of form ‘YYYY-MM’.
5. **network**: The mobile network that was called/texted for each entry.
6. **network_type**: Whether the number being called was a mobile, international (‘world’), voicemail, landline, or other (‘special’) number.

In [44]:
data.dtypes

date             object
duration        float64
item             object
month            object
network          object
network_type     object
dtype: object

In [45]:
# Convert date column from string to datetime objects
from dateutil.parser import parse
data['date'] = data['date'].apply(parse, dayfirst=True)
data.head(3)

Unnamed: 0_level_0,date,duration,item,month,network,network_type
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,2014-10-15 06:58:00,34.429,data,2014-11,data,data
1,2014-10-15 06:58:00,13.0,call,2014-11,Vodafone,mobile
2,2014-10-15 14:46:00,23.0,call,2014-11,Meteor,mobile


In [46]:
# Check data types

data.dtypes

date            datetime64[ns]
duration               float64
item                    object
month                   object
network                 object
network_type            object
dtype: object

In [47]:
# Check missing values

data.isnull().sum()

date            0
duration        0
item            0
month           0
network         0
network_type    0
dtype: int64

**Apply GroupBy actions**

In [48]:
# Which months are covered in this data set?
# data['month'].unique()
# set(data['month'])

data.groupby(['month']).groups.keys()

dict_keys(['2014-11', '2014-12', '2015-01', '2015-02', '2015-03'])

In [49]:
# Find the first entry for each month
data.groupby(['month']).first()

# without using groupby
# months = data['month'].unique()
# result = pd.DataFrame(columns=data.columns)
# for month in months:
#     subdata = data[data['month'] == month]
#     instance = subdata.loc[[subdata.index[0]], :]
# #     print(instance)
#     result = pd.concat([result, instance])
# result  

Unnamed: 0_level_0,date,duration,item,network,network_type
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2014-11,2014-10-15 06:58:00,34.429,data,data,data
2014-12,2014-11-13 06:58:00,34.429,data,data,data
2015-01,2014-12-13 06:58:00,34.429,data,data,data
2015-02,2015-01-13 06:58:00,34.429,data,data,data
2015-03,2015-02-12 20:15:00,69.0,call,landline,landline


In [50]:
# Get the number of instances in each month

# data.groupby('month').size()
data.groupby('month')['date'].count()

month
2014-11    230
2014-12    157
2015-01    205
2015-02    137
2015-03    101
Name: date, dtype: int64

In [51]:
# What is the sum of call durations for each month?

# data[data['item'] == 'call'].groupby('month').sum()
data.groupby(['month', 'item']).sum().unstack()

Unnamed: 0_level_0,duration,duration,duration
item,call,data,sms
month,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
2014-11,25547.0,998.441,94.0
2014-12,13561.0,1032.87,48.0
2015-01,17070.0,1067.299,86.0
2015-02,14416.0,1067.299,39.0
2015-03,21727.0,998.441,25.0


**Group by more than one variable**

In [52]:
# How many calls, messages, and data entries are there in each month?

# data.groupby(['month', 'item'])['duration'].count()

data.groupby(['month', 'item'])['duration'].count().to_frame(name='frequency')\
    .unstack()

Unnamed: 0_level_0,frequency,frequency,frequency
item,call,data,sms
month,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
2014-11,107,29,94
2014-12,79,30,48
2015-01,88,31,86
2015-02,67,31,39
2015-03,47,29,25


In [53]:
# How many instances are there per month, split by network_type?

# data.groupby(['network_type', 'month']).size()
data.groupby(['network_type', 'month']).size().to_frame("Frequency")\
.unstack(level=0)

Unnamed: 0_level_0,Frequency,Frequency,Frequency,Frequency,Frequency,Frequency
network_type,data,landline,mobile,special,voicemail,world
month,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
2014-11,29.0,5.0,189.0,1.0,6.0,
2014-12,30.0,7.0,108.0,,8.0,4.0
2015-01,31.0,11.0,160.0,,3.0,
2015-02,31.0,8.0,90.0,2.0,6.0,
2015-03,29.0,11.0,54.0,,4.0,3.0


## II. Data Aggregation
Aggregation refer to any data transformation that produces numeric values from arrays. The preceding examples have used several of them, including `mean()`, `count()`, `first()`, `min()`, and `sum()`. However, user-defined functions can also be applied to create desired summary.

In [54]:
# Define function range() that returns(max - min)
def get_range(array):
    return array.max() - array.min()

In [55]:
# Apply agg() to find the range of each type of cell phone use.
data.groupby(['item'])['duration'].agg(get_range)

item
call    10527.0
data        0.0
sms         0.0
Name: duration, dtype: float64

In [56]:
subdata1 = data[data['item'] == 'data']
subdata1.head()

Unnamed: 0_level_0,date,duration,item,month,network,network_type
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,2014-10-15 06:58:00,34.429,data,2014-11,data,data
6,2014-10-16 06:58:00,34.429,data,2014-11,data,data
13,2014-10-17 06:58:00,34.429,data,2014-11,data,data
26,2014-10-18 06:58:00,34.429,data,2014-11,data,data
39,2014-10-19 06:58:00,34.429,data,2014-11,data,data


In [57]:
subdata1['duration'].value_counts()

34.429    150
Name: duration, dtype: int64

In [58]:
# If only one function is applied, there is no difference in agg() and apply()
data.groupby(['item'])['duration'].apply(get_range)

item
call    10527.0
data        0.0
sms         0.0
Name: duration, dtype: float64

In [59]:
# Apply multiple aggregation functions
data.groupby(['item'])['duration'].agg([get_range, np.max, np.min])

Unnamed: 0_level_0,get_range,amax,amin
item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
call,10527.0,10528.0,1.0
data,0.0,34.429,34.429
sms,0.0,1.0,1.0


In [60]:
# Declare columns names
data.groupby(['item'])['duration'].agg([('range', get_range),
                                        ('maximum', np.max),
                                        ('minimum', np.min)])

Unnamed: 0_level_0,range,maximum,minimum
item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
call,10527.0,10528.0,1.0
data,0.0,34.429,34.429
sms,0.0,1.0,1.0


In [61]:
# Apply a different function to each column
functions = {
    'duration': sum,
    'network_type': 'count',
    'date': 'first'
}
data.groupby(['month', 'item']).agg(functions)

Unnamed: 0_level_0,Unnamed: 1_level_0,duration,network_type,date
month,item,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2014-11,call,25547.0,107,2014-10-15 06:58:00
2014-11,data,998.441,29,2014-10-15 06:58:00
2014-11,sms,94.0,94,2014-10-16 22:18:00
2014-12,call,13561.0,79,2014-11-14 17:24:00
2014-12,data,1032.87,30,2014-11-13 06:58:00
2014-12,sms,48.0,48,2014-11-14 17:28:00
2015-01,call,17070.0,88,2014-12-15 20:03:00
2015-01,data,1067.299,31,2014-12-13 06:58:00
2015-01,sms,86.0,86,2014-12-15 19:56:00
2015-02,call,14416.0,67,2015-01-15 10:36:00


In [62]:
# Tuple named aggregations
data[data['item'] == 'call'].groupby('month').agg(
    # Get max of the duration column for each group
    max_duration=('duration', max),
    # Get min of the duration column for each group
    min_duration=('duration', min),
    # Get sum of the duration column for each group
    total_duration=('duration', sum),
    # Apply a lambda to date column
    num_days=("date", lambda x: (max(x) - min(x)).days)   
)

Unnamed: 0_level_0,max_duration,min_duration,total_duration,num_days
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2014-11,1940.0,1.0,25547.0,28
2014-12,2120.0,2.0,13561.0,30
2015-01,1859.0,2.0,17070.0,30
2015-02,1863.0,1.0,14416.0,25
2015-03,10528.0,2.0,21727.0,19


## III. Pivot Table
It is used to split the data using two sets of keys.

In [63]:
# Create a pivot table with counts for each month and network type
data.pivot_table('date', index='month', columns='network_type', aggfunc=len)

network_type,data,landline,mobile,special,voicemail,world
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2014-11,29.0,5.0,189.0,1.0,6.0,
2014-12,30.0,7.0,108.0,,8.0,4.0
2015-01,31.0,11.0,160.0,,3.0,
2015-02,31.0,8.0,90.0,2.0,6.0,
2015-03,29.0,11.0,54.0,,4.0,3.0


## IV. Cross Tabulation

In [64]:
pd.crosstab(index=data['month'], columns=data['network_type'])

network_type,data,landline,mobile,special,voicemail,world
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2014-11,29,5,189,1,6,0
2014-12,30,7,108,0,8,4
2015-01,31,11,160,0,3,0
2015-02,31,8,90,2,6,0
2015-03,29,11,54,0,4,3


# Homework:
Use the cell phone usage data in this exercise.
1. Find out the network names that belongs to network_type "mobile".
2. How many messages were sent to each mobile network every month?
3. What is the total call duration to each mobile network every month?