# Identify the best source of recruitment for a tech startup, based on previous data of candidate sources and recruitment strategies

### Importing required libraries

In [108]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Reading your data into a Dataframe in Python

In [109]:
df = pd.read_csv('Recruitment_Data.csv')
df.head(10)

Unnamed: 0,attrition,performance_rating,sales_quota_pct,recruiting_source
0,1,3,1.08819,Applied Online
1,0,3,2.394173,
2,1,2,0.49753,Campus
3,0,2,2.513958,
4,0,3,1.424789,Applied Online
5,1,3,0.548123,Referral
6,1,3,0.794213,Applied Online
7,0,2,1.006524,Referral
8,0,3,1.519917,Campus
9,0,3,2.073528,


### Observing Data in the Dataframe

In [110]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 446 entries, 0 to 445
Data columns (total 4 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   attrition           446 non-null    int64  
 1   performance_rating  446 non-null    int64  
 2   sales_quota_pct     446 non-null    float64
 3   recruiting_source   241 non-null    object 
dtypes: float64(1), int64(2), object(1)
memory usage: 14.1+ KB


voluntary departure of employees from a company for personal, professional, or any other reasons.

### Identipying Groups in the Dataframe

##### Grouping variable 'recruiting_source' and 'attrition'

In [111]:
# Group by 'recruiting_source' and calculate the mean of 'attrition', then reset the index
df_avg_attrition = df.groupby('recruiting_source')['attrition'].mean().reset_index()

# Displaying the dataframe 'df_avg_attrition'
df_avg_attrition

Unnamed: 0,recruiting_source,attrition
0,Applied Online,0.246154
1,Campus,0.285714
2,Referral,0.333333
3,Search Firm,0.5


The attrition rate for employees recruited through 'Applied Online' is about 24.62%, which is the lowest among the sources listed. This suggests that employees sourced from online applications tend to stay with the company longer than those from other sources.

'Campus' recruits have an attrition rate of approximately 28.57%, indicating that nearly 29% of employees sourced from campus recruitment leave the company. This rate is higher than that of 'Applied Online' recruits but still below one-third.

The 'Referral' source has an attrition rate of approximately 33.33%, which suggests that one-third of the employees sourced through referrals end up leaving the company. This is higher than both 'Applied Online' and 'Campus' sources.

Employees that are hired through a 'Search Firm' have the highest attrition rate at 50%. This means that half of the employees sourced from search firms leave the company, which is notably higher than the other sources.

##### Grouping variable 'recruiting_source' and 'sales_quota_pct'

In [113]:
# Group by 'recruiting_source' and calculate the mean of 'sales_quota_pct', then reset the index
df_avg_sales_quota_pct = df.groupby('recruiting_source')['sales_quota_pct'].mean().reset_index()

# Displaying the dataframe 'df_avg_sales_quota_pct'
df_avg_sales_quota_pct

Unnamed: 0,recruiting_source,sales_quota_pct
0,Applied Online,1.05859
1,Campus,0.908035
2,Referral,1.023198
3,Search Firm,0.88696


Employees recruited through 'Applied Online' have the highest average sales quota achievement at approximately 1.059, which suggests that this group exceeds their sales targets on average, assuming that a sales quota percentage of 1.000 indicates 100% achievement.

'Campus' recruits have an average close to target performance with a sales quota percentage of about 0.908. This could suggest that recent graduates or those sourced from campus recruitment events are performing slightly below the set sales targets on average.

Those referred by employees or other associates ('Referral') also exceed their sales targets on average, with a sales quota percentage around 1.023.

The 'Search Firm' sourced employees have the lowest average sales quota achievement, at roughly 0.887. This suggests that employees from this source, on average, do not meet the sales targets, with performance around 11.3% below the expected quota.

##### Grouping variable 'recruiting_source' and 'performance_rating'

In [115]:
# Group by 'recruiting_source' and calculate the mean of 'performance_rating', then reset the index
df_avg_performance_rating = df.groupby('recruiting_source')['performance_rating'].mean().reset_index()

# Displaying the dataframe 'df_avg_performance_rating'
df_avg_performance_rating

Unnamed: 0,recruiting_source,performance_rating
0,Applied Online,2.930769
1,Campus,2.928571
2,Referral,2.844444
3,Search Firm,2.7


Employees recruited through 'Applied Online' have the highest average performance rating at approximately 2.93. This suggests that on average, employees sourced online are performing closer to a rating of 3, which might indicate a performance level above satisfactory, assuming the scale is out of a maximum that is not given here (for example, 5).

'Campus' recruits have a slightly lower average performance rating of around 2.93 as well, which is very close to that of the 'Applied Online' recruits, indicating a similar performance level.

The 'Referral' group has an average performance rating of about 2.84, which is slightly lower than the 'Applied Online' and 'Campus' recruits. Employees brought in through referrals are performing a bit below those other two sources but still above the average of 2.7 if we assume a 5-point scale.

'Search Firm' recruits have the lowest average performance rating at 2.70. While not drastically below the others, it is the lowest average score among the four sources listed, which could indicate a need to assess the quality of candidates sourced from search firms or possibly the support and development provided to these employees.