Number of Workers by Department Starting in April or Later

Find the number of workers by department who joined in or after April.


Output the department name along with the corresponding number of workers.


Sort records based on the number of workers in descending order.

In [2]:
import pandas as pd
import numpy as np
import datetime as dt



In [5]:
worker = pd.read_csv("CSV/worker.csv")
worker = worker.iloc[:, :6]
worker.head()

Unnamed: 0,worker_id,first_name,last_name,salary,joining_date,department
0,1,Monika,Arora,100000,2014-02-20,HR
1,2,Niharika,Verma,80000,2014-06-11,Admin
2,3,Vishal,Singhal,300000,2014-02-20,HR
3,4,Amitah,Singh,500000,2014-02-20,Admin
4,5,Vivek,Bhati,500000,2014-06-11,Admin


In [6]:
worker["joining_date"] = pd.to_datetime(worker["joining_date"])
worker.head()

Unnamed: 0,worker_id,first_name,last_name,salary,joining_date,department
0,1,Monika,Arora,100000,2014-02-20,HR
1,2,Niharika,Verma,80000,2014-06-11,Admin
2,3,Vishal,Singhal,300000,2014-02-20,HR
3,4,Amitah,Singh,500000,2014-02-20,Admin
4,5,Vivek,Bhati,500000,2014-06-11,Admin


In [8]:
worker["month"] = worker["joining_date"].dt.month
worker.head()

Unnamed: 0,worker_id,first_name,last_name,salary,joining_date,department,month
0,1,Monika,Arora,100000,2014-02-20,HR,2
1,2,Niharika,Verma,80000,2014-06-11,Admin,6
2,3,Vishal,Singhal,300000,2014-02-20,HR,2
3,4,Amitah,Singh,500000,2014-02-20,Admin,2
4,5,Vivek,Bhati,500000,2014-06-11,Admin,6


In [9]:
april_df = worker.loc[worker["month"] >= 4]
april_df.head()

Unnamed: 0,worker_id,first_name,last_name,salary,joining_date,department,month
1,2,Niharika,Verma,80000,2014-06-11,Admin,6
4,5,Vivek,Bhati,500000,2014-06-11,Admin,6
5,6,Vipul,Diwan,200000,2014-06-11,Account,6
7,8,Geetika,Chauhan,90000,2014-04-11,Admin,4
8,9,Agepi,Argon,90000,2015-04-10,Admin,4


In [10]:
result = (
    april_df.groupby(["department"])
    .size()
    .to_frame("num_workers")
    .sort_values(by="num_workers", ascending=False)
    .reset_index()
)

In [11]:
result

Unnamed: 0,department,num_workers
0,Admin,4
1,Account,1
2,HR,1


Solution Walkthrough
In this problem, we are given a dataset of workers and their joining dates. We need to find the number of workers by department who joined in or after April and sort the records based on the number of workers in descending order.

To solve this problem, we will make use of the pandas library in Python. We will first convert the joining dates to datetime format and extract the month from the dates. Then, we will filter the dataset to include only the workers who joined in or after April. Next, we will group the filtered dataset by department, calculate the number of workers in each department, sort the records based on the number of workers in descending order, and finally output the department name along with the corresponding number of workers.

Let's break down the solution step by step.

Understanding The Data
The given dataset contains information about the workers, including their joining dates and the department they belong to. Each row in the dataset represents a worker, and the columns represent different attributes like "worker_id", "department", and "joining_date".

The Problem Statement
We need to find the number of workers by department who joined in or after April and sort the records based on the number of workers in descending order.

Breaking Down The Code
Here is the code to solve the problem:

import pandas as pd
import numpy as np
import datetime as dt

# Convert joining_date column to datetime format
worker["joining_date"] = pd.to_datetime(worker["joining_date"])

# Extract the month from joining_date and create a new column called "month"
worker["month"] = worker["joining_date"].dt.month

# Filter the dataset to include only the workers who joined in or after April
april_df = worker.loc[worker["month"] >= 4]

# Group the filtered dataset by department, calculate the number of workers in each department,
# sort the records based on the number of workers in descending order, and reset the index
result = (
    april_df.groupby(["department"])
    .size()
    .to_frame("num_workers")
    .sort_values(by="num_workers", ascending=False)
    .reset_index()
)
Let's understand each step in detail:

First, we import the necessary libraries: pandas, numpy, and datetime. We will use pandas for data manipulation, numpy for numerical operations, and datetime for working with dates.

We convert the "joining_date" column to datetime format using the pd.to_datetime() function. This allows us to perform operations on the dates.

Next, we extract the month from the "joining_date" column using the .dt.month attribute. This creates a new column called "month" in the dataset, which contains the month corresponding to each worker's joining date.

We filter the dataset to include only the workers who joined in or after April using the loc function. The condition worker["month"] >= 4 selects all the rows where the month is greater than or equal to 4 (April).

We group the filtered dataset by the "department" column using the groupby() function. This groups the rows based on the unique department values.

Next, we calculate the number of workers in each department using the size() function within the groupby() function. This counts the number of rows in each group.

We convert the resulting series of counts to a DataFrame using the to_frame() function. This allows us to manipulate the data more easily.

We sort the DataFrame based on the "num_workers" column in descending order using the sort_values() function. The ascending=False parameter ensures that the sorting is done in descending order.

Finally, we reset the index of the DataFrame using the reset_index() function. This reassigns a new index to the DataFrame based on the sorted order.

Bringing It All Together
The code provided above performs all the necessary steps to solve the problem. After running the code, the variable "result" will contain the department name along with the corresponding number of workers, sorted based on the number of workers in descending order.

Conclusion
In this walkthrough, we learned how to find the number of workers by department who joined in or after April and sort the records based on the number of workers in descending order. We used the pandas library in Python to manipulate the dataset, convert dates to datetime format, filter rows based on conditions, group rows by a column, calculate counts, sort a DataFrame, and reset the index.