# Challenge Questions - Employees Dataset

# Instructions:
• Please ensure you don't overwrite any existing cells. Add new cells below by pressing ALT+ENTER

• Attempt all of the questions

• You are encouraged to look online for help should you need it

# Dataset overview:
There are three csv files containing tables stored in the same directory as this Notebook, they are all related to each other:

• **employees.csv**: contains information about employees in a company. It contains their unique employee number (emp_no), their department number (dept_no), their hire date (hire_date) and their leaving date (leaving_date). The leaving date is blank if the employee is still employed by the company

• **departments.csv**: This contains information about the departments in a company. It contains the deparment number (dept_no), the department name (dept_name) and location.

• **salaries.csv**: This file contains the salaries of the employees. It contains a unique employee department key (emp_dept_key) and the salary. The emp_dept_key is in the format 'emp_id-dept_id'


# 

## Import pandas, numpy and datetime

In [344]:
import pandas as pd
import numpy as np
import datetime as dt

## Load the files:
• "employees.csv" should be assigned to the variable **emp**

• "departments.csv" should be assigned to the variable **dept**

• "salaries.csv" should be assigned to the variable **sal**

In [345]:
emp=pd.read_csv("employees.csv")
dept=pd.read_csv("departments.csv")
sal=pd.read_csv("salaries.csv")

## Check the head of all three DataFrames

In [346]:
emp.head()

Unnamed: 0,emp_no,dept_no,hire_date,leaving_date
0,10001,D5,26/06/2006,
1,10002,D6,21/11/2005,
2,10003,D4,28/08/2006,
3,10004,D4,01/12/2006,
4,10005,D3,12/09/2009,


In [347]:
dept.head()

Unnamed: 0,dept_no,dept_name,location
0,D1,Accounting and Finance,Chicago
1,D2,Human Resources,New York
2,D3,Supply Chain Operations,Chicago
3,D4,Marketing,New York
4,D5,Technology,Chicago


In [348]:
sal.head()

Unnamed: 0,emp_dept_key,salary
0,10001-D5,30546
1,10002-D6,36536
2,10003-D4,38323
3,10004-D4,31851
4,10005-D3,53435


## Check the data types of all three DataFrames

In [349]:
emp.dtypes

emp_no           int64
dept_no         object
hire_date       object
leaving_date    object
dtype: object

In [350]:
dept.dtypes

dept_no      object
dept_name    object
location     object
dtype: object

In [351]:
sal.dtypes

emp_dept_key    object
salary           int64
dtype: object

## Change the data types accordingly. 

• emp_no, dept_no, dept_name, location, emp_depy_key should all be string data types

• hire_date and leaving_date should be datetime64

• salary should be int64

In [352]:
emp[['emp_no', 'dept_no']] = emp[['emp_no', 'dept_no']].astype('string')

In [353]:
# dept_no is also in the dept Dataframe
dept[['dept_name','location','dept_no']]=dept[['dept_name','location','dept_no']].astype('string')

In [354]:
sal['emp_dept_key']=sal['emp_dept_key'].astype('string')

In [355]:
# For dates you cannot combine for different columns as above
emp['hire_date']=pd.to_datetime(emp['hire_date'],format='%d/%m/%Y')

In [356]:
emp['leaving_date']=pd.to_datetime(emp['leaving_date'],format='%d/%m/%Y')

In [357]:
sal['salary']=sal['salary'].astype(int)

In [358]:
emp.dtypes

emp_no          string[python]
dept_no         string[python]
hire_date       datetime64[ns]
leaving_date    datetime64[ns]
dtype: object

In [359]:
dept.dtypes

dept_no      string[python]
dept_name    string[python]
location     string[python]
dtype: object

In [360]:
sal.dtypes

emp_dept_key    string[python]
salary                   int32
dtype: object

## How many employees are currently working at the company. 

The employees still employed do not have a leaving date value. You can use the isnull() method to identify nulls or NaN values.

isnull(): https://pandas.pydata.org/docs/reference/api/pandas.isnull.html

In [361]:
emp['leaving_date'].isnull()

0       True
1       True
2       True
3       True
4       True
       ...  
995     True
996     True
997    False
998    False
999     True
Name: leaving_date, Length: 1000, dtype: bool

In [362]:
# You MUST first filter the dataset before counting otherwise it will count all elements in that column irregardless of whether True or false
emp[emp['leaving_date'].isnull()].count()

emp_no          741
dept_no         741
hire_date       741
leaving_date      0
dtype: int64

In [363]:
# We can remove uncessary columns by referencing the column we want to keep
emp[emp['leaving_date'].isnull()].count()['emp_no']

741

In [364]:
#Alternatively
len(emp[emp['leaving_date'].isnull()])

741

## How many currently employed people are there per department.

In [365]:
emp[emp['leaving_date'].isnull()].groupby(by='dept_no').count()

Unnamed: 0_level_0,emp_no,hire_date,leaving_date
dept_no,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
D1,84,84,0
D2,36,36,0
D3,94,94,0
D4,171,171,0
D5,188,188,0
D6,168,168,0


In [368]:
# Hiding uncessary columns,this time using drop
emp[emp['leaving_date'].isnull()].groupby(by='dept_no').count().drop(columns=['hire_date','leaving_date'])

Unnamed: 0_level_0,emp_no
dept_no,Unnamed: 1_level_1
D1,84
D2,36
D3,94
D4,171
D5,188
D6,168


In [367]:
# Using a pivot table
# writing the same column in both the values and columns arguements results in an error 
# so, just ignore the column arguement if the column needs to appear as both a column and provides the cell values
emp[emp['leaving_date'].isnull()].pivot_table(values='emp_no',aggfunc='count',index='dept_no',columns='emp_no')

ValueError: Grouper for 'emp_no' not 1-dimensional

In [369]:
emp[emp['leaving_date'].isnull()].pivot_table(values='emp_no',aggfunc='count',index='dept_no')

Unnamed: 0_level_0,emp_no
dept_no,Unnamed: 1_level_1
D1,84
D2,36
D3,94
D4,171
D5,188
D6,168


## Perform a left join on the emp and dept DataFrames (with emp as the left DF). Assign the result of this to the variable emp_dept

In [370]:
emp_dept=pd.merge(left= emp, right= dept, left_on='dept_no', right_on='dept_no',how='left')

In [371]:
emp_dept

Unnamed: 0,emp_no,dept_no,hire_date,leaving_date,dept_name,location
0,10001,D5,2006-06-26,NaT,Technology,Chicago
1,10002,D6,2005-11-21,NaT,Sales,Chicago
2,10003,D4,2006-08-28,NaT,Marketing,New York
3,10004,D4,2006-12-01,NaT,Marketing,New York
4,10005,D3,2009-09-12,NaT,Supply Chain Operations,Chicago
...,...,...,...,...,...,...
995,10903,D1,2009-02-14,NaT,Accounting and Finance,Chicago
996,10904,D5,2013-04-16,NaT,Technology,Chicago
997,10905,D4,2005-02-28,2006-03-07,Marketing,New York
998,10906,D2,2014-01-20,2021-04-25,Human Resources,New York


## Perform a left join on the newly created "emp_dept" DataFrame and the "sal" DataFrame. 
## Assign this resulting DataFrame to the variable "emp_dept_sal"

• You will need to think about how to join the two tables. Note the emp_dept_key on the sal DataFrame is in the format 'emp_id-dept_id'

In [372]:
# the primary_key in the sal Datframe is in the format 'emp_id-dept_id' thus, we have to create a column similar to it to join them on
emp_dept['new_key']=emp_dept['emp_no']+'-'+emp_dept['dept_no']

In [373]:
emp_dept

Unnamed: 0,emp_no,dept_no,hire_date,leaving_date,dept_name,location,new_key
0,10001,D5,2006-06-26,NaT,Technology,Chicago,10001-D5
1,10002,D6,2005-11-21,NaT,Sales,Chicago,10002-D6
2,10003,D4,2006-08-28,NaT,Marketing,New York,10003-D4
3,10004,D4,2006-12-01,NaT,Marketing,New York,10004-D4
4,10005,D3,2009-09-12,NaT,Supply Chain Operations,Chicago,10005-D3
...,...,...,...,...,...,...,...
995,10903,D1,2009-02-14,NaT,Accounting and Finance,Chicago,10903-D1
996,10904,D5,2013-04-16,NaT,Technology,Chicago,10904-D5
997,10905,D4,2005-02-28,2006-03-07,Marketing,New York,10905-D4
998,10906,D2,2014-01-20,2021-04-25,Human Resources,New York,10906-D2


In [374]:
emp_dept_sal=pd.merge(left=emp_dept,right=sal,left_on='new_key',right_on='emp_dept_key',how='left')

In [375]:
emp_dept_sal

Unnamed: 0,emp_no,dept_no,hire_date,leaving_date,dept_name,location,new_key,emp_dept_key,salary
0,10001,D5,2006-06-26,NaT,Technology,Chicago,10001-D5,10001-D5,30546
1,10002,D6,2005-11-21,NaT,Sales,Chicago,10002-D6,10002-D6,36536
2,10003,D4,2006-08-28,NaT,Marketing,New York,10003-D4,10003-D4,38323
3,10004,D4,2006-12-01,NaT,Marketing,New York,10004-D4,10004-D4,31851
4,10005,D3,2009-09-12,NaT,Supply Chain Operations,Chicago,10005-D3,10005-D3,53435
...,...,...,...,...,...,...,...,...,...
997,10903,D1,2009-02-14,NaT,Accounting and Finance,Chicago,10903-D1,10903-D1,42815
998,10904,D5,2013-04-16,NaT,Technology,Chicago,10904-D5,10904-D5,90778
999,10905,D4,2005-02-28,2006-03-07,Marketing,New York,10905-D4,10905-D4,32735
1000,10906,D2,2014-01-20,2021-04-25,Human Resources,New York,10906-D2,10906-D2,29095


<b>Note: we have 1002 rows

In [377]:
emp_dept_sal['new_key'].value_counts()

new_key
10259-D1    4
10013-D3    1
10029-D4    1
10028-D5    1
10003-D4    1
           ..
10902-D5    1
10903-D1    1
10904-D5    1
10905-D4    1
10906-D6    1
Name: count, Length: 999, dtype: Int64

<b>We could have gone the other way by breaking up emp_dept_key in the sal department

In [378]:
# We can use lambda and apply to slice the emp_dept_key column values
sal['emp_no1']=sal['emp_dept_key'].apply(lambda x :x[:5])

In [379]:
sal

Unnamed: 0,emp_dept_key,salary,emp_no1
0,10001-D5,30546,10001
1,10002-D6,36536,10002
2,10003-D4,38323,10003
3,10004-D4,31851,10004
4,10005-D3,53435,10005
...,...,...,...
995,10903-D1,42815,10903
996,10904-D5,90778,10904
997,10905-D4,32735,10905
998,10906-D2,29095,10906


In [380]:
# A more robust situation would be to use the find method to locate the index number of the '-' seperator
sal['emp_no1']=sal['emp_dept_key'].apply(lambda x :x[:x.find('-')]) # x[0:x.find('-')]
                                                                    # this means start at 0 then go till '-' is found
                                                                    #Recall: find is a string method

# This is the go to method for long ids

In [381]:
sal

Unnamed: 0,emp_dept_key,salary,emp_no1
0,10001-D5,30546,10001
1,10002-D6,36536,10002
2,10003-D4,38323,10003
3,10004-D4,31851,10004
4,10005-D3,53435,10005
...,...,...,...
995,10903-D1,42815,10903
996,10904-D5,90778,10904
997,10905-D4,32735,10905
998,10906-D2,29095,10906


In [382]:
emp_dept_sal=pd.merge(left=emp_dept,right=sal,left_on='emp_no', right_on = 'emp_no1', how='left')

In [383]:
emp_dept_sal

Unnamed: 0,emp_no,dept_no,hire_date,leaving_date,dept_name,location,new_key,emp_dept_key,salary,emp_no1
0,10001,D5,2006-06-26,NaT,Technology,Chicago,10001-D5,10001-D5,30546,10001
1,10002,D6,2005-11-21,NaT,Sales,Chicago,10002-D6,10002-D6,36536,10002
2,10003,D4,2006-08-28,NaT,Marketing,New York,10003-D4,10003-D4,38323,10003
3,10004,D4,2006-12-01,NaT,Marketing,New York,10004-D4,10004-D4,31851,10004
4,10005,D3,2009-09-12,NaT,Supply Chain Operations,Chicago,10005-D3,10005-D3,53435,10005
...,...,...,...,...,...,...,...,...,...,...
1183,10905,D4,2005-02-28,2006-03-07,Marketing,New York,10905-D4,10905-D4,32735,10905
1184,10906,D2,2014-01-20,2021-04-25,Human Resources,New York,10906-D2,10906-D2,29095,10906
1185,10906,D2,2014-01-20,2021-04-25,Human Resources,New York,10906-D2,10906-D6,97330,10906
1186,10906,D6,2014-01-20,NaT,Sales,Chicago,10906-D6,10906-D2,29095,10906


<b> Note:we have 1188 rows

In [385]:
emp_dept_sal['emp_dept_key'].value_counts()

emp_dept_key
10259-D1    4
10440-D3    2
10088-D1    2
10836-D1    2
10855-D6    2
           ..
10713-D3    1
10714-D5    1
10715-D4    1
10716-D5    1
10454-D6    1
Name: count, Length: 999, dtype: Int64

    As you can see the methods bring different results.Why? I am not sure yet.

## Drop the column "emp_dept_key"

In [387]:
emp_dept_sal.drop(columns='emp_dept_key',inplace=True)

In [388]:
emp_dept_sal

Unnamed: 0,emp_no,dept_no,hire_date,leaving_date,dept_name,location,new_key,salary,emp_no1
0,10001,D5,2006-06-26,NaT,Technology,Chicago,10001-D5,30546,10001
1,10002,D6,2005-11-21,NaT,Sales,Chicago,10002-D6,36536,10002
2,10003,D4,2006-08-28,NaT,Marketing,New York,10003-D4,38323,10003
3,10004,D4,2006-12-01,NaT,Marketing,New York,10004-D4,31851,10004
4,10005,D3,2009-09-12,NaT,Supply Chain Operations,Chicago,10005-D3,53435,10005
...,...,...,...,...,...,...,...,...,...
1183,10905,D4,2005-02-28,2006-03-07,Marketing,New York,10905-D4,32735,10905
1184,10906,D2,2014-01-20,2021-04-25,Human Resources,New York,10906-D2,29095,10906
1185,10906,D2,2014-01-20,2021-04-25,Human Resources,New York,10906-D2,97330,10906
1186,10906,D6,2014-01-20,NaT,Sales,Chicago,10906-D6,29095,10906


## What is the average salary per department?

In [389]:
emp_dept_sal[['dept_name','salary']].groupby(by='dept_name').mean()

Unnamed: 0_level_0,salary
dept_name,Unnamed: 1_level_1
Accounting and Finance,52148.887324
Human Resources,44849.60274
Marketing,41540.041985
Sales,59192.05364
Supply Chain Operations,45183.788462
Technology,58975.979592


In [390]:
# Using a pivot table
emp_dept_sal.pivot_table(values='salary',index='dept_name',aggfunc='mean')

Unnamed: 0_level_0,salary
dept_name,Unnamed: 1_level_1
Accounting and Finance,52148.887324
Human Resources,44849.60274
Marketing,41540.041985
Sales,59192.05364
Supply Chain Operations,45183.788462
Technology,58975.979592


<b> This are values using the second method of merging from the previous question.

## What is the average salary by location?

In [391]:
emp_dept_sal

Unnamed: 0,emp_no,dept_no,hire_date,leaving_date,dept_name,location,new_key,salary,emp_no1
0,10001,D5,2006-06-26,NaT,Technology,Chicago,10001-D5,30546,10001
1,10002,D6,2005-11-21,NaT,Sales,Chicago,10002-D6,36536,10002
2,10003,D4,2006-08-28,NaT,Marketing,New York,10003-D4,38323,10003
3,10004,D4,2006-12-01,NaT,Marketing,New York,10004-D4,31851,10004
4,10005,D3,2009-09-12,NaT,Supply Chain Operations,Chicago,10005-D3,53435,10005
...,...,...,...,...,...,...,...,...,...
1183,10905,D4,2005-02-28,2006-03-07,Marketing,New York,10905-D4,32735,10905
1184,10906,D2,2014-01-20,2021-04-25,Human Resources,New York,10906-D2,29095,10906
1185,10906,D2,2014-01-20,2021-04-25,Human Resources,New York,10906-D2,97330,10906
1186,10906,D6,2014-01-20,NaT,Sales,Chicago,10906-D6,29095,10906


In [392]:
emp_dept_sal[['location','salary']].groupby(by='location').mean()

Unnamed: 0_level_0,salary
location,Unnamed: 1_level_1
Chicago,55383.208675
New York,42261.229851


In [393]:
# Using a pivot table
emp_dept_sal.pivot_table(values='salary',index='location',aggfunc='mean')

Unnamed: 0_level_0,salary
location,Unnamed: 1_level_1
Chicago,55383.208675
New York,42261.229851


## How many people were hired each year in each of the last 10 years?

In [394]:
emp_dept_sal

Unnamed: 0,emp_no,dept_no,hire_date,leaving_date,dept_name,location,new_key,salary,emp_no1
0,10001,D5,2006-06-26,NaT,Technology,Chicago,10001-D5,30546,10001
1,10002,D6,2005-11-21,NaT,Sales,Chicago,10002-D6,36536,10002
2,10003,D4,2006-08-28,NaT,Marketing,New York,10003-D4,38323,10003
3,10004,D4,2006-12-01,NaT,Marketing,New York,10004-D4,31851,10004
4,10005,D3,2009-09-12,NaT,Supply Chain Operations,Chicago,10005-D3,53435,10005
...,...,...,...,...,...,...,...,...,...
1183,10905,D4,2005-02-28,2006-03-07,Marketing,New York,10905-D4,32735,10905
1184,10906,D2,2014-01-20,2021-04-25,Human Resources,New York,10906-D2,29095,10906
1185,10906,D2,2014-01-20,2021-04-25,Human Resources,New York,10906-D2,97330,10906
1186,10906,D6,2014-01-20,NaT,Sales,Chicago,10906-D6,29095,10906


In [396]:
emp_dept_sal['ordered_hire_date']=sorted(emp_dept_sal['hire_date'])

In [397]:
# Viewing which years were the last 10 years
emp_dept_sal['ordered_years']=emp_dept_sal['ordered_hire_date'].dt.year

In [398]:
emp_dept_sal

Unnamed: 0,emp_no,dept_no,hire_date,leaving_date,dept_name,location,new_key,salary,emp_no1,ordered_hire_date,ordered_years
0,10001,D5,2006-06-26,NaT,Technology,Chicago,10001-D5,30546,10001,2005-02-03,2005
1,10002,D6,2005-11-21,NaT,Sales,Chicago,10002-D6,36536,10002,2005-02-14,2005
2,10003,D4,2006-08-28,NaT,Marketing,New York,10003-D4,38323,10003,2005-02-15,2005
3,10004,D4,2006-12-01,NaT,Marketing,New York,10004-D4,31851,10004,2005-02-18,2005
4,10005,D3,2009-09-12,NaT,Supply Chain Operations,Chicago,10005-D3,53435,10005,2005-02-18,2005
...,...,...,...,...,...,...,...,...,...,...,...
1183,10905,D4,2005-02-28,2006-03-07,Marketing,New York,10905-D4,32735,10905,2019-03-30,2019
1184,10906,D2,2014-01-20,2021-04-25,Human Resources,New York,10906-D2,29095,10906,2019-03-30,2019
1185,10906,D2,2014-01-20,2021-04-25,Human Resources,New York,10906-D2,97330,10906,2019-03-30,2019
1186,10906,D6,2014-01-20,NaT,Sales,Chicago,10906-D6,29095,10906,2019-04-30,2019


In [399]:
# Viewing a list of all the years 
emp_dept_sal['ordered_years'].unique()

array([2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015,
       2016, 2017, 2018, 2019])

In [400]:
# Viewing a list of the last 10 years 
emp_dept_sal['ordered_years'].unique()[-10:]

array([2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019])

In [401]:
# filtering the dataset so it only contains values from these years
emp_dept_sal[(emp_dept_sal['ordered_years'] >=2010) & (emp_dept_sal['ordered_years'] <=2019)]

Unnamed: 0,emp_no,dept_no,hire_date,leaving_date,dept_name,location,new_key,salary,emp_no1,ordered_hire_date,ordered_years
624,10478,D5,2012-06-09,NaT,Technology,Chicago,10478-D5,61674,10478,2010-01-03,2010
625,10478,D5,2012-06-09,NaT,Technology,Chicago,10478-D5,38519,10478,2010-01-05,2010
626,10478,D1,2012-06-09,2013-02-10,Accounting and Finance,Chicago,10478-D1,61674,10478,2010-01-05,2010
627,10478,D1,2012-06-09,2013-02-10,Accounting and Finance,Chicago,10478-D1,38519,10478,2010-01-05,2010
628,10479,D4,2007-02-28,NaT,Marketing,New York,10479-D4,46793,10479,2010-01-05,2010
...,...,...,...,...,...,...,...,...,...,...,...
1183,10905,D4,2005-02-28,2006-03-07,Marketing,New York,10905-D4,32735,10905,2019-03-30,2019
1184,10906,D2,2014-01-20,2021-04-25,Human Resources,New York,10906-D2,29095,10906,2019-03-30,2019
1185,10906,D2,2014-01-20,2021-04-25,Human Resources,New York,10906-D2,97330,10906,2019-03-30,2019
1186,10906,D6,2014-01-20,NaT,Sales,Chicago,10906-D6,29095,10906,2019-04-30,2019


In [402]:
# using a pivot table
emp_dept_sal[(emp_dept_sal['ordered_years'] >=2010) & (emp_dept_sal['ordered_years'] <=2019)].pivot_table(
    index='ordered_years',values='emp_no',aggfunc='count')

Unnamed: 0_level_0,emp_no
ordered_years,Unnamed: 1_level_1
2010,146
2011,99
2012,81
2013,61
2014,69
2015,39
2016,40
2017,14
2018,8
2019,7


In [404]:
# using group by
emp_dept_sal[(emp_dept_sal['ordered_years'] >=2010) & (emp_dept_sal['ordered_years'] <=2019)].groupby(by='ordered_years').count()

Unnamed: 0_level_0,emp_no,dept_no,hire_date,leaving_date,dept_name,location,new_key,salary,emp_no1,ordered_hire_date
ordered_years,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2010,146,146,146,43,146,146,146,146,146,146
2011,99,99,99,27,99,99,99,99,99,99
2012,81,81,81,23,81,81,81,81,81,81
2013,61,61,61,24,61,61,61,61,61,61
2014,69,69,69,18,69,69,69,69,69,69
2015,39,39,39,11,39,39,39,39,39,39
2016,40,40,40,12,40,40,40,40,40,40
2017,14,14,14,6,14,14,14,14,14,14
2018,8,8,8,2,8,8,8,8,8,8
2019,7,7,7,3,7,7,7,7,7,7


In [405]:
# dropping uncessary columns
emp_dept_sal[(emp_dept_sal['ordered_years'] >=2010) & (emp_dept_sal['ordered_years'] <=2019)].groupby(
    by='ordered_years').count()['emp_no']

ordered_years
2010    146
2011     99
2012     81
2013     61
2014     69
2015     39
2016     40
2017     14
2018      8
2019      7
Name: emp_no, dtype: int64

<b> A much shorter alternative is than filtering is to use the tail(10)  the dataset 😅😅

In [410]:
emp_dept_sal.groupby(by='ordered_years').count().tail(10)

Unnamed: 0_level_0,emp_no,dept_no,hire_date,leaving_date,dept_name,location,new_key,salary,emp_no1,ordered_hire_date
ordered_years,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2010,146,146,146,43,146,146,146,146,146,146
2011,99,99,99,27,99,99,99,99,99,99
2012,81,81,81,23,81,81,81,81,81,81
2013,61,61,61,24,61,61,61,61,61,61
2014,69,69,69,18,69,69,69,69,69,69
2015,39,39,39,11,39,39,39,39,39,39
2016,40,40,40,12,40,40,40,40,40,40
2017,14,14,14,6,14,14,14,14,14,14
2018,8,8,8,2,8,8,8,8,8,8
2019,7,7,7,3,7,7,7,7,7,7


In [407]:
emp_dept_sal.groupby(by='ordered_years').count().tail(10)['emp_no']

ordered_years
2010    146
2011     99
2012     81
2013     61
2014     69
2015     39
2016     40
2017     14
2018      8
2019      7
Name: emp_no, dtype: int64

In [409]:
emp_dept_sal.pivot_table(index='ordered_years', aggfunc='count', values='emp_no').tail(10)

Unnamed: 0_level_0,emp_no
ordered_years,Unnamed: 1_level_1
2010,146
2011,99
2012,81
2013,61
2014,69
2015,39
2016,40
2017,14
2018,8
2019,7
