# Lecture 15 - Pandas Datetime, Grouping and Aggregating Data

Wednesday 2021/03/17

## Lecture Notes and in-class exercises

▶️ First, run the code cell below to import `unittest`, a module used for **🧭 Check Your Work** sections and the autograder.

In [1]:
import unittest
tc = unittest.TestCase()

#### 👇 Tasks

- ✔️ Import the following Python packages.
    1. `pandas`: Use alias `pd`.
    2. `numpy`: Use alias `np`.

In [2]:
# YOUR CODE BEGINS
import pandas as pd
import numpy as np
# YOUR CODE ENDS

#### 🧭 Check your work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [3]:
import sys
tc.assertTrue('pd' in globals(), 'Check whether you have correctly import Pandas with an alias.')
tc.assertTrue('np' in globals(), 'Check whether you have correctly import NumPy with an alias.')

---

## 🗓️ Working with Datetime Values

You will often see date-looking strings in your data. A few examples are:

- `20210315`
- `Mar 15, 2021`
- `2020-03-15`
- `2020/3/15`

In the first part of today's lecture, we'll discuss how we can *parse* and utilize datetime values.

---

### 📌 Load employees data

▶️ Run the code cell below to create a new `DataFrame` named `df_emp`.

In [4]:
# DO NOT CHANGE THE CODE IN THIS CELL
df_emp = pd.DataFrame({
    'emp_id': [30, 40, 10, 20],
    'name': ['Joe', 'Marissa', 'James', 'Victoria'],
    'dept': ['Finance', 'Purchase', 'Finance', 'Purchase'],
    'office_phone': ['(217)123-4500', np.nan, np.nan, '(217)987-6600'],
    'start_date': ['2017-05-01', '2018-02-01', '2020-08-01', '2019-12-01'],
    'salary': [202000, 185000, 240000, 160500]
})

# Used for intermediate checks
df_emp_backup = df_emp.copy()

df_emp

Unnamed: 0,emp_id,name,dept,office_phone,start_date,salary
0,30,Joe,Finance,(217)123-4500,2017-05-01,202000
1,40,Marissa,Purchase,,2018-02-01,185000
2,10,James,Finance,,2020-08-01,240000
3,20,Victoria,Purchase,(217)987-6600,2019-12-01,160500


---

### 📌 Concise summary of a `DataFrame`

▶️ Run `df_emp.info()` below to see a concise summary of the `DataFrame`.

In [5]:
# YOUR CODE BEGINS
df_emp.info()
# YOUR CODE ENDS

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   emp_id        4 non-null      int64 
 1   name          4 non-null      object
 2   dept          4 non-null      object
 3   office_phone  2 non-null      object
 4   start_date    4 non-null      object
 5   salary        4 non-null      int64 
dtypes: int64(2), object(4)
memory usage: 320.0+ bytes


**Question**: What is the data type of the `start_date` column?

▶️ Run `str(df_emp['start_date'].dtype)` below to see the data type of the `start_date` column.

In [6]:
# YOUR CODE BEGINS
str(df_emp['start_date'].dtype)
# YOUR CODE ENDS

'object'

While `object` can refer to many different types, you can safely assume that all `object` data types you see in this course refer to strings.

---

### 🎯 Mini-exercise: Parse a string column as datetime

#### 👇 Tasks

- ✔️ Parse `start_date` to a `datetime` data type.
- ✔️ Store the result to a new column named `start_date_parsed`.

#### 🚀 Hints

The code below converts `date_str` column to a `datetime`-typed column and stores the converted result to a new column named `date_parsed`.

```python
my_dataframe['date_parsed'] = pd.to_datetime(my_dataframe['date_str'])
```

In [7]:
# YOUR CODE BEGINS
df_emp['start_date_parsed'] = pd.to_datetime(df_emp['start_date'])
# YOUR CODE ENDS

df_emp

Unnamed: 0,emp_id,name,dept,office_phone,start_date,salary,start_date_parsed
0,30,Joe,Finance,(217)123-4500,2017-05-01,202000,2017-05-01
1,40,Marissa,Purchase,,2018-02-01,185000,2018-02-01
2,10,James,Finance,,2020-08-01,240000,2020-08-01
3,20,Victoria,Purchase,(217)987-6600,2019-12-01,160500,2019-12-01


#### 🧭 Check Your Work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [8]:
# Check result
tc.assertEqual(set(df_emp.columns), set(df_emp_backup.columns.tolist() + ['start_date_parsed']))
pd.testing.assert_series_equal(df_emp['start_date_parsed'].reset_index(drop=True),
                               pd.to_datetime(df_emp_backup['_'.join(['sTarT', 'DaTe']).lower()])
                                  .reset_index(drop=True),
                               check_names=False)

---

▶️ Run `str(df_emp['start_date_parsed'].dtype)` below to see the data type of the `start_date` column.

In [9]:
# YOUR CODE BEGINS
str(df_emp['start_date_parsed'].dtype)
# YOUR CODE ENDS

'datetime64[ns]'

---

### 🎯 Mini-exercise: Drop `start_date` column *in-place*

We no longer need the `start_date` column. We'll work with the new `start_date_parsed` column from this point on.

#### 👇 Tasks

- ✔️ Drop `start_date` column from `df_emp` *in-place*.

#### 🚀 Hints

The code below drops `col1` from `my_dataframe` *in-place* without creating a new variable.

```python
my_dataframe.drop(columns=['col1'], inplace=True)
```

In [10]:
# YOUR CODE BEGINS
df_emp.drop(columns=['start_date'], inplace=True)
# YOUR CODE ENDS

df_emp

Unnamed: 0,emp_id,name,dept,office_phone,salary,start_date_parsed
0,30,Joe,Finance,(217)123-4500,202000,2017-05-01
1,40,Marissa,Purchase,,185000,2018-02-01
2,10,James,Finance,,240000,2020-08-01
3,20,Victoria,Purchase,(217)987-6600,160500,2019-12-01


#### 🧭 Check Your Work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [11]:
df_check = df_emp_backup.copy()
df_check['_'.join(['sTarT', 'DaTe', 'pArSeD']).lower()] = pd.to_datetime(df_check['start_date'])
df_check = df_check.drop(columns=['start_date'])

# Check result
tc.assertEqual(set(df_emp.columns), set(['start_date_parsed', 'salary', 'office_phone', 'dept', 'name', 'emp_id']))
pd.testing.assert_frame_equal(df_emp.sort_values(df_emp.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))

---

### 🎯 Mini-exercise: Rename `start_date_parsed` to `start_date`

#### 👇 Tasks

- ✔️ Rename `start_date_parsed` to `start_date` in `df_emp` *in-place*.

#### 🚀 Hints

The code below renames the `name_before` column to `name_after` in `my_dataframe` *in-place* without creating a new variable.

```python
my_dataframe.rename(columns={'name_before': 'name_after'}, inplace=True)
```

In [12]:
# YOUR CODE BEGINS
df_emp.rename(columns={'start_date_parsed': 'start_date'}, inplace=True)
# YOUR CODE ENDS

df_emp

Unnamed: 0,emp_id,name,dept,office_phone,salary,start_date
0,30,Joe,Finance,(217)123-4500,202000,2017-05-01
1,40,Marissa,Purchase,,185000,2018-02-01
2,10,James,Finance,,240000,2020-08-01
3,20,Victoria,Purchase,(217)987-6600,160500,2019-12-01


#### 🧭 Check Your Work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [13]:
df_check = df_emp_backup.copy()
df_check['_'.join(['sTarT', 'DaTe', 'pArSeD']).lower()] = pd.to_datetime(df_check['start_date'])
df_check = df_check.drop(columns=['start_date']).rename(columns={'start_date_parsed': 'start_date'})

# Check result
pd.testing.assert_frame_equal(df_emp.sort_values(df_emp.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))

---

### 🎯 Mini-exercise: Extract year from a datetime column

#### 👇 Tasks

- ✔️ Create a new column named `start_year` in `df_emp` that contains the starting years in integers (e.g., `2017`, `2018`).
- ✔️ Extract values from `df_emp['start_date']`.

#### 🚀 Hints

The code extracts the year of a datetime column `my_date` and stores it to a new column named `my_year`.

```python
my_dataframe['my_year'] = my_dataframe['my_date'].dt.year
```

In [14]:
# YOUR CODE BEGINS
df_emp['start_year'] = df_emp['start_date'].dt.year
# YOUR CODE ENDS

df_emp

Unnamed: 0,emp_id,name,dept,office_phone,salary,start_date,start_year
0,30,Joe,Finance,(217)123-4500,202000,2017-05-01,2017
1,40,Marissa,Purchase,,185000,2018-02-01,2018
2,10,James,Finance,,240000,2020-08-01,2020
3,20,Victoria,Purchase,(217)987-6600,160500,2019-12-01,2019


#### 🧭 Check Your Work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [15]:
df_check = df_emp_backup.copy()
df_check['_'.join(['sTarT', 'DaTe', 'pArSeD']).lower()] = pd.to_datetime(df_check['start_date'])
df_check = df_check.drop(columns=['start_date']).rename(columns={'start_date_parsed': 'start_date'})
df_check['_'.join(['sTarT', 'yEaR']).lower()] = df_check['_'.join(['sTarT', 'dAtE']).lower()].dt.year

# Check result
pd.testing.assert_frame_equal(df_emp.sort_values(df_emp.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))

---

### 🎯 Mini-exercise: Extract month, day from a datetime column

#### 👇 Tasks

- ✔️ Create new columns named `start_month` and `start_day` in `df_emp` that contain the starting months and days in integers.
- ✔️ Extract values from `df_emp['start_date']`.

#### 🚀 Hints

The code extracts the months and days of a datetime column `my_date` and stores it to two new columns.

```python
my_dataframe['my_month'] = my_dataframe['my_date'].dt.month
my_dataframe['my_day'] = my_dataframe['my_date'].dt.day
```

In [16]:
# YOUR CODE BEGINS
df_emp['start_month'] = df_emp['start_date'].dt.month
df_emp['start_day'] = df_emp['start_date'].dt.day
# YOUR CODE ENDS

df_emp

Unnamed: 0,emp_id,name,dept,office_phone,salary,start_date,start_year,start_month,start_day
0,30,Joe,Finance,(217)123-4500,202000,2017-05-01,2017,5,1
1,40,Marissa,Purchase,,185000,2018-02-01,2018,2,1
2,10,James,Finance,,240000,2020-08-01,2020,8,1
3,20,Victoria,Purchase,(217)987-6600,160500,2019-12-01,2019,12,1


#### 🧭 Check Your Work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [17]:
df_check = df_emp_backup.copy()
df_check['_'.join(['sTarT', 'DaTe', 'pArSeD']).lower()] = pd.to_datetime(df_check['start_date'])
df_check = df_check.drop(columns=['start_date']).rename(columns={'start_date_parsed': 'start_date'})
df_check['_'.join(['sTarT', 'yEaR']).lower()] = df_check['_'.join(['sTarT', 'dAtE']).lower()].dt.year
df_check['_'.join(['sTarT', 'mOnTh']).lower()] = df_check['_'.join(['sTarT', 'dAtE']).lower()].dt.month
df_check['_'.join(['sTarT', 'dAy']).lower()] = df_check['_'.join(['sTarT', 'dAtE']).lower()].dt.day

# Check result
pd.testing.assert_frame_equal(df_emp.sort_values(df_emp.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))

---

▶️ Run `df_emp['start_date'].dt.weekday` below.

In [18]:
# YOUR CODE BEGINS
df_emp['start_date'].dt.weekday
# YOUR CODE ENDS

0    0
1    3
2    5
3    6
Name: start_date, dtype: int64

Note that Monday is 0 and Sunday returns 6.

---

▶️ Run `df_emp['start_date'].dt.quarter` below.

In [19]:
# YOUR CODE BEGINS
df_emp['start_date'].dt.quarter
# YOUR CODE ENDS

0    2
1    1
2    3
3    4
Name: start_date, dtype: int64

This returns the quarter of a year (e.g., Q1, Q2, Q3, Q4).

---

## 🔬 Grouping and Aggregating Data

---

### 📌 Load employees data (a simpler one)

▶️ Run the code cell below to create a new `DataFrame` named `df`.

In [20]:
df = pd.DataFrame({
    'name': ['Mary', 'Roy', 'John', 'Joe', 'Paul', 'Erin'],
    'dept': ['Finance', 'Purchase', 'Finance', 'Purchase', 'Finance', 'Purchase'],
    'salary': [240000, 160000, 250000, 170000, 260000, 180000]}
)

df

Unnamed: 0,name,dept,salary
0,Mary,Finance,240000
1,Roy,Purchase,160000
2,John,Finance,250000
3,Joe,Purchase,170000
4,Paul,Finance,260000
5,Erin,Purchase,180000


👉 A very common task in working with data is to *group* your data by one or more criteria. As an example, you want to see the average salary of employees in your `df_emp` **by department**.

---

### 📌 Creating a `DataFrameGroupBy` object

▶️ Run `df.groupby('dept')` below.

In [21]:
# YOUR CODE BEGINS
df.groupby('dept')
# YOUR CODE ENDS

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001ED8B5200D0>

![groupby object](https://github.com/bdi475/images/blob/main/pandas/df-groupby-object-01.png?raw=true)

---

### 📌 Aggregating a `DataFrameGroupBy` object

▶️ Run `df.groupby('dept').agg({'salary': 'mean'})` below.

In [22]:
# YOUR CODE BEGINS
df.groupby('dept').agg({'salary': 'mean'})
# YOUR CODE ENDS

Unnamed: 0_level_0,salary
dept,Unnamed: 1_level_1
Finance,250000
Purchase,170000


👉 Your resulting `DataFrame` now displays average salary by `dept`.

```python
df_salary_by_dept = df.groupby('dept').agg({'salary': 'mean'})

display(df_salary_by_dept)
print(df_salary_by_dept.columns)
```

▶️ Copy the provided code above to the code cell below and run it.

In [23]:
# YOUR CODE BEGINS
df_salary_by_dept = df.groupby('dept').agg({'salary': 'mean'})

display(df_salary_by_dept)
print(df_salary_by_dept.columns)
# YOUR CODE ENDS

Unnamed: 0_level_0,salary
dept,Unnamed: 1_level_1
Finance,250000
Purchase,170000


Index(['salary'], dtype='object')


👉 There is only one column shown when you print out `df_salary_by_dept.columns`! 🙀

This is because the column(s) you use to create groups are used as **index** by default.

![groupby agg result](https://github.com/bdi475/images/blob/main/pandas/df-groupby-agg-as-index-true-01.png?raw=true)

---

### 📌 Aggregating a `DataFrameGroupBy` object with optional `index=False`

```python
df_salary_by_dept2 = df.groupby('dept', as_index=False).agg({'salary': 'mean'})

display(df_salary_by_dept2)
print(df_salary_by_dept2.columns)
```

▶️ Copy the provided code to the code cell below and run it.

In [24]:
# YOUR CODE BEGINS
df_salary_by_dept2 = df.groupby('dept', as_index=False).agg({'salary': 'mean'})

display(df_salary_by_dept2)
print(df_salary_by_dept2.columns)
# YOUR CODE ENDS

Unnamed: 0,dept,salary
0,Finance,250000
1,Purchase,170000


Index(['dept', 'salary'], dtype='object')


👉 Now, printing out the columns show both `dept` and `salary`. Supplying `as_index=False` to `groupby()` keeps the columns you use as groupby criteria as regular columns.

![groupby agg result](https://github.com/bdi475/images/blob/main/pandas/df-groupby-agg-as-index-false-01.png?raw=true)

---

### 📌 Creating multiple aggregation measures

```python
df_salary_by_dept3 = df.groupby('dept', as_index=False).agg({'salary': ['min', 'max', 'mean', 'sum', 'count', 'std']})

display(df_salary_by_dept3)
print(df_salary_by_dept3.columns)
```

▶️ Copy the provided code to the code cell below and run it.

In [25]:
df_salary_by_dept3 = df.groupby('dept', as_index=False).agg({'salary': ['min', 'max', 'mean', 'sum', 'count', 'std']})

display(df_salary_by_dept3)
print(df_salary_by_dept3.columns)

Unnamed: 0_level_0,dept,salary,salary,salary,salary,salary,salary
Unnamed: 0_level_1,Unnamed: 1_level_1,min,max,mean,sum,count,std
0,Finance,240000,260000,250000,750000,3,10000.0
1,Purchase,160000,180000,170000,510000,3,10000.0


MultiIndex([(  'dept',      ''),
            ('salary',   'min'),
            ('salary',   'max'),
            ('salary',  'mean'),
            ('salary',   'sum'),
            ('salary', 'count'),
            ('salary',   'std')],
           )


---

## 📞 Exercises Using Bank Marketing Calls Data

For the remainder of this lecture, you'll work with a dataset related with direct marketing campaigns (phone calls) of a banking institution. 

**Data Source**: [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014

| Column Name     | Type        | Description                                                                                                                                                     |
|-----------------|-------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `age`           | Numeric     | Age                                                                                                                                                             |
| `job`           | Categorical | admin.', 'blue-collar',   'entrepreneur', 'housemaid', 'management', 'retired', 'self-employed',   'services', 'student', 'technician', 'unemployed', 'unknown' |
| `marital`       | Categorical | single', 'married', 'divorced', 'unknown'                                                                                                                       |
| `education`     | Categorical | basic.4y', 'basic.6y', 'basic.9y', 'high.school', 'illiterate',   'professional.course', 'university.degree', 'unknown'                                         |
| `contact_type`  | Categorical | cellular', 'telephone'                                                                                                                                          |
| `num_contacts`  | Numeric     | Number of contacts performed during this campaign for this client                                                                                               |
| `prev_outcome`  | Categorical | Outcome of the previous marketing campaign - 'failure', 'nonexistent',   'success'                                                                              |
| `place_deposit` | Boolean     | Did the client subscribe to a term deposit? This column indicates whether   the campaign was successful for each client.                                        |

Your goal is to analyze the dataset to discover relationships between personal factors and marketing campaign result of each individual.

**`place_deposit`** column indicates whether a marketing campaign was successful.

- ✅ If `True`, the individual has placed a deposit within the bank. This is considered a **successful campaign**.
- 🚫 If `False`, the individual has not placed a deposit within the bank. This is considered an **unsuccessful campaign**.

---

### 📌 Load data

▶️ Run the code cell below to create a new `DataFrame` named `df_m`.

In [26]:
df_m = pd.read_csv('https://github.com/bdi475/datasets/raw/main/bank-direct-marketing.csv')
df_m_backup = df_m.copy()
df_m

Unnamed: 0,age,job,marital,education,contact_type,num_contacts,prev_outcome,place_deposit
0,56,housemaid,married,basic.4y,telephone,1,nonexistent,False
1,57,services,married,high.school,telephone,1,nonexistent,False
2,37,services,married,high.school,telephone,1,nonexistent,False
3,40,admin.,married,basic.6y,telephone,1,nonexistent,False
4,56,services,married,high.school,telephone,1,nonexistent,False
...,...,...,...,...,...,...,...,...
41183,73,retired,married,professional.course,cellular,1,nonexistent,True
41184,46,blue-collar,married,professional.course,cellular,1,nonexistent,False
41185,56,retired,married,university.degree,cellular,2,nonexistent,False
41186,44,technician,married,professional.course,cellular,1,nonexistent,True


👉 **Wait, how can you add boolean values?** Boolean `True` values are converted to `1` and `False` values are converted to `0` before running an arithmetic operation. This is common in many programming languages.

---

### 🎯 Mini-exercise: Summing a boolean column

#### 👇 Tasks

- ✔️ Using `df_m`, sum up all values in `place_deposit` column.
- ✔️ Store the result to a new variable named `num_success`.

#### 🚀 Hints

`my_series.sum()` sums up all values in a `Series`.

To sum up `my_column` of `my_dataframe`, use `my_dataframe['my_column'].sum()`.

In [27]:
# YOUR CODE BEGINS
num_success = df_m['place_deposit'].sum()
# YOUR CODE ENDS

num_success

4640

In [28]:
tc.assertEqual(num_success, np.sum(df_m['_'.join(['pLaCe', 'dePoSIt']).lower()]))

---

### 🎯 Mini-exercise: Marketing success by marital status

#### 👇 Tasks

- ✔️ Using `df_m`, create an aggregated table named `df_by_marital` to display whether there is a difference in average success rate in direct marketing campaigns by marital status.
    - Use `index=False` option.
- ✔️ We will give you the code for this one.

![code](https://github.com/bdi475/images/blob/main/lecture-notes/df_m-groupby-marital-status-average-success.png?raw=true)

In [29]:
# YOUR CODE BEGINS
df_by_marital = df_m.groupby('marital', as_index=False).agg({'place_deposit': 'mean'})
# YOUR CODE ENDS

df_by_marital

Unnamed: 0,marital,place_deposit
0,divorced,0.103209
1,married,0.101573
2,single,0.140041
3,unknown,0.15


#### 🧭 Check Your Work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [30]:
df_check = df_m_backup.groupby('marital').agg({'place_deposit': np.mean}).reset_index()
df_check = df_check[['place_deposit', 'marital'][::-1]].copy()

# Check result
pd.testing.assert_frame_equal(df_by_marital.sort_values(df_by_marital.columns.tolist()).reset_index(drop=True),
                              df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))

---

### 🎯 Mini-exercise: Marketing success by job

#### 👇 Tasks

- ✔️ Using `df_m`, create an aggregated table named `df_by_job` to display whether there is a difference in average success rate in direct marketing campaigns by job.
    - Use `index=False` option.
- ✔️ `df_by_job` should only have the following two columns in the same order.
    - `marital`: Marital status (e.g., divorced, married, single, unknown)
    - `place_deposit`: Average success ratio (between 0-1)
- ✔️ Both columns should not be used as an index column.
    - Printing `df_by_job.columns.tolist()` should print out `['job', 'place_deposit']`.
- ✔️ Sort `df_by_job` by `place_deposit` in descending order *in-place*.

In [31]:
# YOUR CODE BEGINS
df_by_job = df_m.groupby('job', as_index=False).agg({'place_deposit': 'mean'})
df_by_job.sort_values('place_deposit', ascending=False, inplace=True)
# YOUR CODE ENDS

df_by_job

Unnamed: 0,job,place_deposit
8,student,0.314286
5,retired,0.252326
10,unemployed,0.142012
0,admin.,0.129726
4,management,0.112175
11,unknown,0.112121
9,technician,0.10826
6,self-employed,0.104856
3,housemaid,0.1
2,entrepreneur,0.085165


#### 🧭 Check Your Work

- Once you're done, run the code cell below to test correctness.
- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix incorrect parts.

In [32]:
df_check = df_m_backup.groupby('job').agg({'place_deposit': np.mean}).reset_index()
df_check = df_check.sort_values('place_deposit', ascending=False)
df_check = df_check[['place_deposit', 'job'][::-1]].copy()

# Check result
pd.testing.assert_frame_equal(df_by_job.reset_index(drop=True),
                              df_check.reset_index(drop=True))

---

### 📌 Marketing success by job with more details

![code](https://github.com/bdi475/images/blob/main/lecture-notes/df_m-groupby-job-success-details.png?raw=true)

▶️ Copy the code above to the code cell below and run your code.

In [33]:
# YOUR CODE BEGINS
df_by_job_details = df_m.groupby('job', as_index=False).agg({'place_deposit': ['count', 'sum', 'mean']})
df_by_job_details
# YOUR CODE ENDS

Unnamed: 0_level_0,job,place_deposit,place_deposit,place_deposit
Unnamed: 0_level_1,Unnamed: 1_level_1,count,sum,mean
0,admin.,10422,1352,0.129726
1,blue-collar,9254,638,0.068943
2,entrepreneur,1456,124,0.085165
3,housemaid,1060,106,0.1
4,management,2924,328,0.112175
5,retired,1720,434,0.252326
6,self-employed,1421,149,0.104856
7,services,3969,323,0.081381
8,student,875,275,0.314286
9,technician,6743,730,0.10826
