<img src="img/01.png">

#### According to [REF1](../README.md) :

Like the Series object discussed in the previous section, the DataFrame can be thought of either as a generalization of a NumPy array, or as a specialization of a Python dictionary.

# 02. Pandas - Data Frames - chaptel 1

## 02.01 Data Frames - basics

A Pandas Series is a one-dimensional array of indexed data.
* Create a `Series` object

<img src="img/02.png">

## 02.02 Data Frames - preprocessing


In [1]:
import numpy as np
import pandas as pd
from IPython.display import display, Markdown

In [2]:
# Dataset comes from:
# https://www.kaggle.com/rhuebner/human-resources-data-set#core_dataset.csv
# reading CSV is one of the most common forms of creating DataFrame
df = pd.read_csv("../92_data/emplyees.csv")

In [3]:
# to keep the screen clean we display only the head of dataframe (default is 5 rows)
df.head()

Unnamed: 0,Employee Name,Employee Number,State,Zip,DOB,Age,Sex,MaritalDesc,CitizenDesc,Hispanic/Latino,...,Date of Hire,Date of Termination,Reason For Term,Employment Status,Department,Position,Pay Rate,Manager Name,Employee Source,Performance Score
0,"Brown, Mia",1103024456,MA,1450,11/24/1985,32,Female,Married,US Citizen,No,...,10/27/2008,,N/A - still employed,Active,Admin Offices,Accountant I,28.5,Brandon R. LeBlanc,Diversity Job Fair,Fully Meets
1,"LaRotonda, William",1106026572,MA,1460,4/26/1984,33,Male,Divorced,US Citizen,No,...,1/6/2014,,N/A - still employed,Active,Admin Offices,Accountant I,23.0,Brandon R. LeBlanc,Website Banner Ads,Fully Meets
2,"Steans, Tyrone",1302053333,MA,2703,9/1/1986,31,Male,Single,US Citizen,No,...,9/29/2014,,N/A - still employed,Active,Admin Offices,Accountant I,29.0,Brandon R. LeBlanc,Internet Search,Fully Meets
3,"Howard, Estelle",1211050782,MA,2170,9/16/1985,32,Female,Married,US Citizen,No,...,2/16/2015,4/15/2015,N/A - still employed,Active,Admin Offices,Administrative Assistant,21.5,Brandon R. LeBlanc,Pay Per Click - Google,N/A- too early to review
4,"Singh, Nan",1307059817,MA,2330,5/19/1988,29,Female,Single,US Citizen,No,...,5/1/2015,,N/A - still employed,Active,Admin Offices,Administrative Assistant,16.56,Brandon R. LeBlanc,Website Banner Ads,N/A- too early to review


In [4]:
# investigate shape of dataframe
df.shape
print("Number of rows (emplyees) = {}".format(df.shape[0]))
print("Number of columns         = {}".format(df.shape[1]))

Number of rows (emplyees) = 301
Number of columns         = 21


In [5]:
# columns names
df.columns

Index(['Employee Name', 'Employee Number', 'State', 'Zip', 'DOB', 'Age', 'Sex',
       'MaritalDesc', 'CitizenDesc', 'Hispanic/Latino', 'RaceDesc',
       'Date of Hire', 'Date of Termination', 'Reason For Term',
       'Employment Status', 'Department', 'Position', 'Pay Rate',
       'Manager Name', 'Employee Source', 'Performance Score'],
      dtype='object')

In [6]:
# display one emplyee
df.loc[1]

Employee Name               LaRotonda, William  
Employee Number                       1106026572
State                                         MA
Zip                                         1460
DOB                                    4/26/1984
Age                                           33
Sex                                         Male
MaritalDesc                             Divorced
CitizenDesc                           US Citizen
Hispanic/Latino                               No
RaceDesc               Black or African American
Date of Hire                            1/6/2014
Date of Termination                          NaN
Reason For Term             N/A - still employed
Employment Status                         Active
Department                         Admin Offices
Position                            Accountant I
Pay Rate                                      23
Manager Name                  Brandon R. LeBlanc
Employee Source               Website Banner Ads
Performance Score   

In [7]:
# DataFrame.describe() is very useful to get basic intuition about the numerical data
df.describe().round(2)

Unnamed: 0,Employee Number,Zip,Age,Pay Rate
count,301.0,301.0,301.0,301.0
mean,1205421000.0,6705.2,38.55,30.72
std,182661600.0,17167.53,8.94,15.22
min,602000300.0,1013.0,25.0,14.0
25%,1102024000.0,1901.0,31.0,20.0
50%,1204033000.0,2132.0,37.0,24.0
75%,1401065000.0,2421.0,44.0,43.0
max,1988300000.0,98052.0,67.0,80.0


In [8]:
# count values of categorical columns
for c_name in df.columns:
    series = df[c_name]
    if series.dtype.kind == 'O': # strings are recognized as (O)bjects in pandas
        display(series.value_counts())

Wolk, Hang  T         1
Steans, Tyrone        1
Bacong, Alejandro     1
Lydon, Allison        1
Hutter, Rosalie       1
                     ..
Houlihan, Debra       1
Billis, Helen         1
Rarrick, Quinn        1
Hendrickson, Trina    1
Dunn, Amy             1
Name: Employee Name, Length: 301, dtype: int64

MA    266
CT      6
TX      3
VT      2
IN      1
NC      1
PA      1
TN      1
RI      1
AL      1
VA      1
NH      1
KY      1
AZ      1
ME      1
ND      1
NV      1
WA      1
MT      1
GA      1
FL      1
UT      1
CA      1
OR      1
CO      1
NY      1
ID      1
OH      1
Name: State, dtype: int64

7/7/1984      2
9/22/1976     2
9/9/1965      2
8/27/1972     1
2/11/1952     1
             ..
10/18/1981    1
8/12/1979     1
5/7/1992      1
9/30/1980     1
10/1/1990     1
Name: DOB, Length: 298, dtype: int64

Female    174
Male      126
male        1
Name: Sex, dtype: int64

Single       127
Married      119
Divorced      30
Separated     14
widowed       11
Name: MaritalDesc, dtype: int64

US Citizen             285
Eligible NonCitizen     12
Non-Citizen              4
Name: CitizenDesc, dtype: int64

No     271
Yes     27
no       2
yes      1
Name: Hispanic/Latino, dtype: int64

White                               190
Black or African American            54
Asian                                31
Two or more races                    18
American Indian or Alaska Native      4
Hispanic                              4
Name: RaceDesc, dtype: int64

1/10/2011    14
3/30/2015    12
1/5/2015     11
9/29/2014    11
5/16/2011    10
             ..
6/2/2015      1
1/21/2011     1
12/1/2014     1
4/27/2009     1
7/9/2012      1
Name: Date of Hire, Length: 93, dtype: int64

4/4/2014      2
9/24/2012     2
9/26/2011     2
5/1/2016      2
1/9/2012      2
             ..
4/1/2016      1
2/5/2016      1
10/22/2011    1
2/8/2016      1
6/15/2013     1
Name: Date of Termination, Length: 93, dtype: int64

N/A - still employed                188
Another position                     20
unhappy                              14
N/A - Has not started yet            11
more money                           11
hours                                 9
career change                         9
attendance                            7
return to school                      5
relocation out of area                5
military                              4
retiring                              4
performance                           4
maternity leave - did not return      3
no-call, no-show                      3
medical issues                        3
gross misconduct                      1
Name: Reason For Term, dtype: int64

Active                    174
Voluntarily Terminated     88
Terminated for Cause       14
Leave of Absence           14
Future Start               11
Name: Employment Status, dtype: int64

Production              208
IT/IS                    41
Sales                    31
Software Engineering     10
Admin Offices            10
Executive Office          1
Name: Department, dtype: int64

Production Technician I         136
Production Technician II         57
Area Sales Manager               27
Production Manager               14
Database Administrator           13
Network Engineer                  9
Software Engineer                 9
Sr. Network Engineer              5
IT Support                        4
Sr. DBA                           4
Administrative Assistant          3
Accountant I                      3
Sales Manager                     3
IT Manager - DB                   2
Sr. Accountant                    2
Shared Services Manager           2
Software Engineering Manager      1
Director of Operations            1
IT Manager - Support              1
IT Director                       1
Director of Sales                 1
President & CEO                   1
CIO                               1
IT Manager - Infra                1
Name: Position, dtype: int64

Kissy Sullivan        22
Kelley Spirea         22
Elijiah Gray          22
Michael Albert        22
Webster Butler        21
David Stanley         21
Ketsia Liebig         21
Brannon Miller        21
Amy Dunn              21
Janet King            19
Simon Roup            17
Peter Monroe          14
John Smith            14
Lynn Daneault         13
Alex Sweetwater        9
Brandon R. LeBlanc     7
Jennifer Zamora        6
Eric Dougall           4
Debra Houlihan         3
Board of Directors     2
Name: Manager Name, dtype: int64

Employee Referral                         31
Diversity Job Fair                        29
Search Engine - Google Bing Yahoo         25
Monster.com                               24
Pay Per Click - Google                    21
Professional Society                      19
Newspager/Magazine                        18
MBTA ads                                  17
Billboard                                 16
Vendor Referral                           15
Glassdoor                                 14
Website Banner Ads                        13
Word of Mouth                             13
On-campus Recruiting                      12
Social Networks - Facebook Twitter etc    11
Other                                      9
Internet Search                            6
Information Session                        4
Careerbuilder                              1
On-line Web application                    1
Pay Per Click                              1
Company Intranet - Partner                 1
Name: Empl

Fully Meets                 172
N/A- too early to review     37
90-day meets                 31
Exceeds                      28
Needs Improvement            15
PIP                           9
Exceptional                   9
Name: Performance Score, dtype: int64

## 02.03 Data Frames - add / select data 

In [9]:
# It is possible to select Columns like properties - Personally I do not recommend this way
# imagine you have column named `mean`, then what will happen is you call `DataFrame.mean`?
df.Age

0      32
1      33
2      31
3      32
4      29
       ..
296    38
297    31
298    34
299    34
300    51
Name: Age, Length: 301, dtype: int64

In [10]:
# Calling column like a python dict is much more intuitive for me
# as you may notice it returns a `Series` object
df['Age']

0      32
1      33
2      31
3      32
4      29
       ..
296    38
297    31
298    34
299    34
300    51
Name: Age, Length: 301, dtype: int64

In [11]:
# of course all the tricks from previous tutorial work as well
df.loc[2:10:3, ['Age', 'Position']]

Unnamed: 0,Age,Position
2,31,Accountant I
5,30,Administrative Assistant
8,30,Sr. Accountant


In [12]:
# now we select the data we would like to work with
selected_columns = ['Employee Name', 'Age', 'Sex', 'MaritalDesc', 'Department', 'Position', 'Pay Rate', 'Manager Name']
df = df[selected_columns]
df.head()

Unnamed: 0,Employee Name,Age,Sex,MaritalDesc,Department,Position,Pay Rate,Manager Name
0,"Brown, Mia",32,Female,Married,Admin Offices,Accountant I,28.5,Brandon R. LeBlanc
1,"LaRotonda, William",33,Male,Divorced,Admin Offices,Accountant I,23.0,Brandon R. LeBlanc
2,"Steans, Tyrone",31,Male,Single,Admin Offices,Accountant I,29.0,Brandon R. LeBlanc
3,"Howard, Estelle",32,Female,Married,Admin Offices,Administrative Assistant,21.5,Brandon R. LeBlanc
4,"Singh, Nan",29,Female,Single,Admin Offices,Administrative Assistant,16.56,Brandon R. LeBlanc


`pd.Series` have `str` methods to work with strings efficiently, which map strig methods to whole series, moreover the `str` methods accept regexp patterns as default

* change "male" to "Male"

In [13]:
df['Sex'].value_counts()

Female    174
Male      126
male        1
Name: Sex, dtype: int64

In [14]:
# let's fix the issues with the not capitalized values
df['Sex']= df['Sex'].str.capitalize()

In [15]:
# Now everything seems to be OK
df['Sex'].value_counts()

Female    174
Male      127
Name: Sex, dtype: int64

* Split column `Employee Name` into `First_Name` and `Last_Name`

In [16]:
# first we need to check if every column contains `,` to make shure we can split using this pattern
mask_comma = df['Employee Name'].str.contains(',')

# this is the correct way
df_view = df.loc[~mask_comma, 'Employee Name']
df_view

272    Jeremy Prater
Name: Employee Name, dtype: object

In [17]:
# all we need to do is:
df_view.str.replace(pat = ' ', repl = ', ', n=1)

272    Jeremy, Prater
Name: Employee Name, dtype: object

In [18]:
# Now care this is tricky part!!!

# ####### !IMPORTANT! #######
# this is WRONG habit do not use '][' when you work with pandas!!! see more here: 
# https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

# this is the WRONG WAY!! 
# df[~mask_comma]['Employee Name'] = df_view.str.replace(pat = ' ', repl = ', ', n=1)

# this is correct way
df.loc[~mask_comma, 'Employee Name'] = df_view.str.replace(pat = ' ', repl = ', ', n=1)

In [19]:
# check if everything went OK
df[~mask_comma]['Employee Name']

272    Jeremy, Prater
Name: Employee Name, dtype: object

In [20]:
# now we can split the 'Employee Name'
additional_cols = df['Employee Name'].str.split(pat=',', n=1, expand=True)

In [21]:
additional_cols_dict = {'last_name':additional_cols[0], 'first_name':additional_cols[1]}
additional_cols_dict = {'last_name':additional_cols[0]}

In [22]:
df['Last_Name']  = additional_cols[0]
df['First_Name'] = additional_cols[1]

In [23]:
df.head()

Unnamed: 0,Employee Name,Age,Sex,MaritalDesc,Department,Position,Pay Rate,Manager Name,Last_Name,First_Name
0,"Brown, Mia",32,Female,Married,Admin Offices,Accountant I,28.5,Brandon R. LeBlanc,Brown,Mia
1,"LaRotonda, William",33,Male,Divorced,Admin Offices,Accountant I,23.0,Brandon R. LeBlanc,LaRotonda,William
2,"Steans, Tyrone",31,Male,Single,Admin Offices,Accountant I,29.0,Brandon R. LeBlanc,Steans,Tyrone
3,"Howard, Estelle",32,Female,Married,Admin Offices,Administrative Assistant,21.5,Brandon R. LeBlanc,Howard,Estelle
4,"Singh, Nan",29,Female,Single,Admin Offices,Administrative Assistant,16.56,Brandon R. LeBlanc,Singh,Nan


In [24]:
# last thing we need to do is to sort the columns and get rid of old `Employee Name` name
selected_columns_fln = ['First_Name', 'Last_Name'] + selected_columns[1:]
df = df[selected_columns_fln]
df.head()

Unnamed: 0,First_Name,Last_Name,Age,Sex,MaritalDesc,Department,Position,Pay Rate,Manager Name
0,Mia,Brown,32,Female,Married,Admin Offices,Accountant I,28.5,Brandon R. LeBlanc
1,William,LaRotonda,33,Male,Divorced,Admin Offices,Accountant I,23.0,Brandon R. LeBlanc
2,Tyrone,Steans,31,Male,Single,Admin Offices,Accountant I,29.0,Brandon R. LeBlanc
3,Estelle,Howard,32,Female,Married,Admin Offices,Administrative Assistant,21.5,Brandon R. LeBlanc
4,Nan,Singh,29,Female,Single,Admin Offices,Administrative Assistant,16.56,Brandon R. LeBlanc


In [25]:
# intersting part is to check if during the splitting
# whitespaces was removed from the beginning and end of `First_Name` and `Last_Name` 
# to check this we will need to use mapping
# btw `.apply` is one of the MOST important concepts in this spreadshit

whitespace_check = df[['First_Name', 'Last_Name']].apply(lambda x:x.str.contains(pat='\s'))
whitespace_check.head()

Unnamed: 0,First_Name,Last_Name
0,True,False
1,True,False
2,True,False
3,True,False
4,True,False


In [26]:
# Lets check the scale of the phenomenon
whitespace_check.sum()

First_Name    298
Last_Name       3
dtype: int64

In [27]:
# to make sure check what went wrong
df.loc[0, 'First_Name']

' Mia'

In [28]:
# let's apply the update function
df[['First_Name', 'Last_Name']] = df[['First_Name', 'Last_Name']].apply(lambda x:x.str.strip())

In [29]:
# much more better!
df.loc[0, 'First_Name']

'Mia'

In [30]:
# let's check once again
whitespace_check = df[['First_Name', 'Last_Name']].apply(lambda x:x.str.contains(pat='\s'))
df[whitespace_check.any(axis=1)]

Unnamed: 0,First_Name,Last_Name,Age,Sex,MaritalDesc,Department,Position,Pay Rate,Manager Name
5,Leigh Ann,Smith,30,Female,Married,Admin Offices,Administrative Assistant,20.5,Brandon R. LeBlanc
6,Brandon R,LeBlanc,33,Male,Married,Admin Offices,Shared Services Manager,55.0,Janet King
43,Karthikeyan,Ait Sidi,42,Male,Married,IT/IS,Sr. DBA,62.0,Simon Roup
44,Claudia N,Carr,31,Female,Single,IT/IS,Sr. DBA,61.3,Simon Roup
55,Webster L,Butler,34,Male,Single,Production,Production Manager,55.0,Janet King
66,Courtney E,Wallace,62,Female,Married,Production,Production Manager,33.5,Janet King
67,Wilson K,Adinolfi,34,Male,Single,Production,Production Technician I,20.0,Michael Albert
75,Francesco A,Barone,34,Male,Single,Production,Production Technician I,16.76,Kelley Spirea
80,Lowan M,Biden,59,Female,Divorced,Production,Production Technician I,22.0,Ketsia Liebig
87,Donovan E,Chang,34,Male,Single,Production,Production Technician I,22.0,Webster Butler


Now everything seems much more better! Whitespaces in `First_Name` and `Last_Name` seems like a reasonable data. We can move forward!

## 02.04 Data Frames - Grouping  (<-- meet your best friend!)

`pd.DataFrame.groupby`

Groupby according to [REF1](../README.md) :

<img src="img/03.png">

In [31]:
# most basic `groupby` use
# let's check what is the mean `Age` and `Pay Rate` for every `Position`
simple1_group = df.groupby(['Position']).mean().round(2)
simple1_group.sort_values('Age')

Unnamed: 0_level_0,Age,Pay Rate
Position,Unnamed: 1_level_1,Unnamed: 2_level_1
Sales Manager,29.33,56.75
Administrative Assistant,30.33,19.52
IT Manager - Infra,31.0,63.0
Accountant I,32.0,26.83
Shared Services Manager,33.0,55.0
Software Engineer,33.67,51.07
Network Engineer,33.78,39.68
Sr. Accountant,34.0,34.95
Director of Operations,34.0,60.0
Database Administrator,34.54,39.48


In [32]:
# this is full groupby call: 

# 1) First we pass columns with 
# 2) Next we pass which columns we wish to select
# 3) Finally we pass a dictionary with the aggregation functions

simple2_group = df.groupby(['Sex','Department'])['Age', 'Pay Rate'] \
                .agg({'Age':['mean', 'count'], 'Pay Rate': ['std']}).round(2)
simple2_group

Unnamed: 0_level_0,Unnamed: 1_level_0,Age,Age,Pay Rate
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,count,std
Sex,Department,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Female,Admin Offices,31.83,6,7.82
Female,Executive Office,63.0,1,
Female,IT/IS,38.26,19,11.98
Female,Production,39.34,127,7.84
Female,Sales,35.73,15,1.92
Female,Software Engineering,32.83,6,4.31
Male,Admin Offices,32.5,4,16.92
Male,IT/IS,37.41,22,12.99
Male,Production,38.53,81,9.99
Male,Sales,41.38,16,0.47


In [33]:
# sometimes it is useful to play with the index/column using `stack` and `unstack` methods
simple2_group.unstack(0)

Unnamed: 0_level_0,Age,Age,Age,Age,Pay Rate,Pay Rate
Unnamed: 0_level_1,mean,mean,count,count,std,std
Sex,Female,Male,Female,Male,Female,Male
Department,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3
Admin Offices,31.83,32.5,6.0,4.0,7.82,16.92
Executive Office,63.0,,1.0,,,
IT/IS,38.26,37.41,19.0,22.0,11.98,12.99
Production,39.34,38.53,127.0,81.0,7.84,9.99
Sales,35.73,41.38,15.0,16.0,1.92,0.47
Software Engineering,32.83,39.25,6.0,4.0,4.31,11.14


In [34]:
# nothing left in columns, so we got the `Series` :)
simple2_group.stack([0,1])

Sex     Department                           
Female  Admin Offices         Age       count      6.00
                                        mean      31.83
                              Pay Rate  std        7.82
        Executive Office      Age       count      1.00
                                        mean      63.00
        IT/IS                 Age       count     19.00
                                        mean      38.26
                              Pay Rate  std       11.98
        Production            Age       count    127.00
                                        mean      39.34
                              Pay Rate  std        7.84
        Sales                 Age       count     15.00
                                        mean      35.73
                              Pay Rate  std        1.92
        Software Engineering  Age       count      6.00
                                        mean      32.83
                              Pay Rate  std        4.31
Ma

To gain some more intuition about how `groupby` actually works we will do one extraordinary example
* Let's find all the employees `First_Name`s and `Last_Name`s assigend to every manager


In [35]:
df.head()

Unnamed: 0,First_Name,Last_Name,Age,Sex,MaritalDesc,Department,Position,Pay Rate,Manager Name
0,Mia,Brown,32,Female,Married,Admin Offices,Accountant I,28.5,Brandon R. LeBlanc
1,William,LaRotonda,33,Male,Divorced,Admin Offices,Accountant I,23.0,Brandon R. LeBlanc
2,Tyrone,Steans,31,Male,Single,Admin Offices,Accountant I,29.0,Brandon R. LeBlanc
3,Estelle,Howard,32,Female,Married,Admin Offices,Administrative Assistant,21.5,Brandon R. LeBlanc
4,Nan,Singh,29,Female,Single,Admin Offices,Administrative Assistant,16.56,Brandon R. LeBlanc


In [36]:
# we will pass our own aggregation fucntion
df.groupby('Manager Name')[['First_Name','Last_Name']].agg(lambda x: "|".join(x))

Unnamed: 0_level_0,First_Name,Last_Name
Manager Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Alex Sweetwater,Colby|Judith|Keyla|Susan|Sandra|Luke|Adell|And...,Andreola|Carabbio|Del Bosque|Exantus|Martin|Pa...
Amy Dunn,Linda|Sean|Enola|Carl|Nilson|Evelyn|Kara|Alexa...,Anderson|Bernstein|Chivukula|Desimone|Fernande...
Board of Directors,Amy|Janet,Foster-Baker|King
Brandon R. LeBlanc,Mia|William|Tyrone|Estelle|Nan|Leigh Ann|Bonalyn,Brown|LaRotonda|Steans|Howard|Singh|Smith|Bout...
Brannon Miller,Linda|Helen|Elijian|Lily|Libby|Cayo|Rose|John|...,Bachiochi|Billis|Clukey|DiNocco|Fidelia|Gonzal...
David Stanley,Rachael|Donna|James|Denisa S|Raul|David|Marye...,Baczenski|Brill|Cockel|Dobrin|Garcia|Gordon|Ja...
Debra Houlihan,Lynn|Donysha|John,Daneault|Kampew|Smith
Elijiah Gray,Trina|Courtney|Lin|Jene'ya|April|Melisa|Ludwic...,Alagbe|Beatrice|Chan|Darson|Evensen|Gerke|Harr...
Eric Dougall,Rick|Lisa|Leonara|Julia,Clayton|Galia|Lindsay|Soto
Janet King,Brandon R|Sean|Jennifer|Elisa|Michael|Charles...,LeBlanc|Quinn|Zamora|Bramante|Albert|Bozzi|But...


**IMPORTANT**
`pd.DataFrame.groupby` can be combined with many other functions except of `agg`.

honorable mentsions:  
- `filter`
- `transform`

## 02.05 Pivot Tables - when `groupby` is not enough
`pd.DataFrame.pivot_table`

When you are actually looking for a more powerfull tool than `groupby` then `pivot_table` is probably the thing you need.

In [37]:
df.pivot_table(values=['Pay Rate', 'Age'], index=['Department'], columns=['Sex'], aggfunc='mean').round(2)

Unnamed: 0_level_0,Age,Age,Pay Rate,Pay Rate
Sex,Female,Male,Female,Male
Department,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Admin Offices,31.83,32.5,26.16,40.5
Executive Office,63.0,,80.0,
IT/IS,38.26,37.41,42.54,46.74
Production,39.34,38.53,22.61,23.83
Sales,35.73,41.38,55.68,55.38
Software Engineering,32.83,39.25,52.33,43.17


**EXCERCISE 04.02**

1. Count how many workers are assigned to different `Position`s in the company using 3 different methods:
  - using `value_counts`
  - using `groupby`
  - using `pivot_table`*

  `*` - additional (not intuitive, hacky way)

In [38]:
### YOUR CODE HERE:
pass
### END YOUR CODE

In [39]:
### TO SHOW THE SOLUTION USE LINE BELOW ###
# %load ../91_solutions/ex4_2.py

**EXCERCISE 04.03**

1. Count how many workers (groupped by `Department`) are assigned to every `Manager Name`?

**IMPORTANT** - mention about `NaN` vs `count` interaction!

In [40]:
### YOUR CODE HERE:
pass
### END YOUR CODE

In [41]:
### TO SHOW THE SOLUTION USE LINE BELOW ###
# %load ../91_solutions/ex4_3.py