## Capstone 1 - Part 2

In the first part of the capstone, we focused on Data Retrieval, Data Preprocessing, Feature Engineering and Exploratory Data Analysis using Python & its libraries. Now we are going to shift gears and gain insights into our HR Analytics dataset using SQL.

## TODO: Make use of SQL to do the following:

### Create a SQLITE3 DB using the CSV file (2 pts). Please refer this [link](https://www.linkedin.com/pulse/accessing-sqlite3-database-from-jupyter-notebook-using-varun-lobo/) and this [link](https://www.geeksforgeeks.org/how-to-import-csv-file-in-sqlite-database-using-python/) to know more.
### Calculate the Attrition Rate and summarize attrition (3 pts) by:
1. Gender
2. Department
3. Age
4. Average monthly income by job level
5. Years at company

### Continue using SQL to explore main reasons for attrition (3 pts), For example:
1. Why do more people over 50 years old leave the company than people who aged 40-50?
2. Why do people with higher pay still leave the company?
3. Which factors drive employees who work at company less than 5 years to leave?

### Effective Communication (2 pts)
1. Please make use of markdown cells to communicate your thought process, why did you think of performing a step? what was the observation from the query? etc.
2. The code should be commented so that it is readable for the reviewer.

### Grading and Important Instructions
1. Each of the above steps are mandatory and should be completed in good faith
2. Make sure before submitting that the code is in fully working condition
3. It is fine to make use of ChatGPT, stackoverflow type resources, just provide the reference links from where you got it
4. Debugging is an art, if you find yourself stuck with errors, take help of stackoverflow and ChatGPT to resolve the issue and if it's still unresolved, reach out to me for help.
5. You need to score atleast 7/10 to pass the project, anything less than that will be marked required, needing resubmission.
6. Feedback will be provided on 3 levels (Awesome, Suggestion, & Required). Required changes are mandatory to be made.
7. For submission, please upload the project on github and share the link to the file with us through LMS.

In [1]:
import sqlite3
import pandas as pd


df=pd.read_csv('./dataset/WA_Fn-UseC_-HR-Employee-Attrition.csv')

In [2]:
conn=sqlite3.connect('mytable.db')

In [3]:
df.to_sql(name="employee",con=conn,if_exists='replace',index=False)
conn.commit()

In [4]:
%%capture
%load_ext sql
%sql sqlite:///mytable.db

In [5]:
%%sql

select *
from employee
LIMIT 5

 * sqlite:///mytable.db
Done.


Age,Attrition,BusinessTravel,DailyRate,Department,DistanceFromHome,Education,EducationField,EmployeeCount,EmployeeNumber,EnvironmentSatisfaction,Gender,HourlyRate,JobInvolvement,JobLevel,JobRole,JobSatisfaction,MaritalStatus,MonthlyIncome,MonthlyRate,NumCompaniesWorked,Over18,OverTime,PercentSalaryHike,PerformanceRating,RelationshipSatisfaction,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
41,Yes,Travel_Rarely,1102,Sales,1,2,Life Sciences,1,1,2,Female,94,3,2,Sales Executive,4,Single,5993,19479,8,Y,Yes,11,3,1,80,0,8,0,1,6,4,0,5
49,No,Travel_Frequently,279,Research & Development,8,1,Life Sciences,1,2,3,Male,61,2,2,Research Scientist,2,Married,5130,24907,1,Y,No,23,4,4,80,1,10,3,3,10,7,1,7
37,Yes,Travel_Rarely,1373,Research & Development,2,2,Other,1,4,4,Male,92,2,1,Laboratory Technician,3,Single,2090,2396,6,Y,Yes,15,3,2,80,0,7,3,3,0,0,0,0
33,No,Travel_Frequently,1392,Research & Development,3,4,Life Sciences,1,5,4,Female,56,3,1,Research Scientist,3,Married,2909,23159,1,Y,Yes,11,3,3,80,0,8,3,3,8,7,3,0
27,No,Travel_Rarely,591,Research & Development,2,1,Medical,1,7,1,Male,40,3,1,Laboratory Technician,2,Married,3468,16632,9,Y,No,12,3,4,80,1,6,3,3,2,2,2,2


In [6]:
%%sql

select distinct(EducationField)
from employee

 * sqlite:///mytable.db
Done.


EducationField
Life Sciences
Other
Medical
Marketing
Technical Degree
Human Resources


In [24]:
%%sql

select CASE
        WHEN attrition = 'Yes' THEN 'True'
        WHEN attrition = 'No' THEN 'False'
        END as Attrition, 
        Round(cast(count(attrition) as float)* 100 /(select count(attrition)from employee),1) as attrition_rate
from employee
group by attrition

 * sqlite:///mytable.db
Done.


Attrition,attrition_rate
False,83.9
True,16.1


In [29]:
%%sql

select CASE
        WHEN attrition = 'Yes' THEN 'True'
        WHEN attrition = 'No' THEN 'False'
        END as Attrition,
        Gender,
        Count(attrition),
        Round(cast(count(attrition) as float)* 100/(select count(attrition)from employee) ,1) as attrition_rate
from employee
group by attrition, gender
order by attrition desc, gender

 * sqlite:///mytable.db
Done.


Attrition,Gender,Count(attrition),attrition_rate
True,Female,87,5.9
True,Male,150,10.2
False,Female,501,34.1
False,Male,732,49.8


In [32]:
%%sql

select Department,Attrition, count(attrition)
from employee
group by attrition, department
order by department, attrition

 * sqlite:///mytable.db
Done.


Department,Attrition,count(attrition)
Human Resources,No,51
Human Resources,Yes,12
Research & Development,No,828
Research & Development,Yes,133
Sales,No,354
Sales,Yes,92


In [39]:
%%sql

select 
    CASE
        WHEN attrition = 'Yes' THEN 'True'
        WHEN attrition = 'No' THEN 'False'
        END as Attrition,
        CASE
        WHEN Age < 30 THEN 'Under 30'
        WHEN Age >=30 AND Age<40 Then '30 - 40'
        WHEN Age >=40 AND Age<50 THen '40 - 50'
        WHEN Age <=50 Then 'Over 50'
        End as Age_Group,
        Count(attrition),
        Round(cast(count(attrition) as float)* 100/(select count(attrition)from employee) ,1) as attrition_rate
from employee
group by age_group, attrition
order by attrition, attrition_rate desc

 * sqlite:///mytable.db
Done.


Attrition,Age_Group,Count(attrition),attrition_rate
False,30 - 40,533,36.3
False,40 - 50,315,21.4
False,Under 30,235,16.0
False,,125,8.5
False,Over 50,25,1.7
True,Under 30,91,6.2
True,30 - 40,89,6.1
True,40 - 50,34,2.3
True,,18,1.2
True,Over 50,5,0.3


In [42]:
%%sql

select
    Department,
    JobLevel,
    AVG(HourlyRate * StandardHours) as Avg_Income,
    IF Attrition = 'Yes'THEN AVG(HourlyRate * StandardHours) End as Attrition_Average_Income,
    Avg_Income - Attrition_Average_Income As Difference
from employee
group by department, joblevel
    

 * sqlite:///mytable.db
(sqlite3.OperationalError) near "=": syntax error
[SQL: select
    Department,
    JobLevel,
    AVG(HourlyRate * StandardHours) as Avg_Income,
    IF Attrition = 'Yes'THEN AVG(HourlyRate * StandardHours) End as Attrition_Average_Income,
    Avg_Income - Attrition_Average_Income As Difference
from employee
group by department, joblevel]
(Background on this error at: https://sqlalche.me/e/20/e3q8)
