<h1 style="font-size:40px; background-color:green;color:white;font-weight:800;padding:12px;border-radius:8px;text-align:center">Predicting Hiring Decisions Using Regression Analysis </h1>

------------------------------------------------------------------------------------------
<div style="background-color:green; color:white; padding:12px;border-radius:8px; font-size:20px; font-weight:bold">
<h3 style="font-size:20px; font-weight:bold"> About Dataset</h3>

**Introduction**

This dataset provides insights into factors influencing hiring decisions. Each record represents a candidate with various attributes considered during the hiring process. The goal is to predict whether a candidate will be hired based on these attributes.

</div>

------------------------------------------------------------------------------------------

<div style="background-color:green; color:white; padding:12px;border-radius:8px; font-size:20px; font-weight:bold">
Variables Description

------------------------------------------------------------------------------------------

**Age**

- **Description:** Age of the candidate.
- **Data Range:** 20 to 50 years.
- **Data Type:** Integer.

------------------------------------------------------------------------------------------

**Gender**
- **Description:** Gender of the candidate.
- **Categories:** Male (0) or Female (1).
- **Data Type:** Binary.

------------------------------------------------------------------------------------------

**Education Level**
- **Description:** Highest level of education attained by the candidate.
- **Categories:**
    - Bachelor's (Type 1)
    - Bachelor's (Type 2)
    - Master's
    - PhD
- **Data Type:** Categorical.

------------------------------------------------------------------------------------------

**Experience Years**
- **Description:** Number of years of professional experience.
- **Data Range:** 0 to 15 years.
- **Data Type:** Integer.

------------------------------------------------------------------------------------------

**Previous Companies Worked**
- **Description:** Number of previous companies where the candidate has worked.
- **Data Range:** 1 to 5 companies.
- **Data Type:** Integer.

------------------------------------------------------------------------------------------

**Distance From Company**
- **Description:** Distance in kilometers from the candidate's residence to the hiring company.
- **Data Range:** 1 to 50 kilometers.
- **Data Type:** Float (continuous).

------------------------------------------------------------------------------------------

**Interview Score**
- **Description:** Score achieved by the candidate in the interview process.
- **Data Range:** 0 to 100.
- **Data Type:** Integer.

------------------------------------------------------------------------------------------

**Skill Score**
- **Description:** Assessment score of the candidate's technical skills.
- **Data Range:** 0 to 100.
- **Data Type:** Integer.

------------------------------------------------------------------------------------------

**Personality Score**
- **Description:** Evaluation score of the candidate's personality traits.
- **Data Range:** 0 to 100.
- **Data Type:** Integer.

------------------------------------------------------------------------------------------

**Recruitment Strategy**
- **Description:** Strategy adopted by the hiring team for recruitment.
- **Categories:**
    - Aggressive
    - Moderate
    - Conservative
- **Data Type:** Categorical.

------------------------------------------------------------------------------------------

**Hiring Decision (Target Variable)**
- **Description:** Outcome of the hiring decision.
- **Categories:**
    - 0: Not hired
    - 1: Hired
- **Data Type:** Binary (Integer).

------------------------------------------------------------------------------------------

**Dataset Information**
- **Records:** 1500
- **Features:** 10
- **Target Variable:** HiringDecision (Binary)

------------------------------------------------------------------------------------------

**Conclusion**
This dataset offers a comprehensive view of candidate attributes and recruitment factors crucial for predicting hiring decisions. It serves as a valuable resource for exploring machine learning models and strategies aimed at optimizing recruitment processes in various organizational contexts.

------------------------------------------------------------------------------------------

**Dataset Usage and Attribution Notice**
This dataset, shared by Rabie El Kharoua, is original and has never been shared before. It is made available under the CC BY 4.0 license, allowing anyone to use the dataset in any form as long as proper citation is given to the author. A DOI is provided for proper referencing. Please note that duplication of this work within Kaggle is not permitted.

</div>

----------------------------

<a id="contents_tabel"></a>

## Contents
<br>

----------------------------

<span style="font-size: 1.2em;line-height:1.3em">
    

- **[1 : Introduction](#l)**
- **[2 : Purpose](#2)**
- **[3 : Import Libraries](#3)** 
- **[4 : Dataset Preparation](#4)** 
- **[5 : Exploratory Data Analysis EDA](#5)** 
    - **[5.1 : ](#5.1)** 
    - **[5.2 : ](#5.2)** 
    - **[5.3 : ](#5.3)** 
- **[6 : Model Training and Evaluation](#6)** 
- **[7 : ](#7)** 
    - **[7.1 : ](#7.1)** 
    - **[7.2 : ](#7.2)** 
    - **[7.3 : ](#7.3)** 
    - **[7.4 : ](#7.4)** 
    - **[7.5 : ](#7.5)** 
    - **[7.6 : ](#7.6)** 
    - **[7.7 : ](#7.7)** 
    - **[7.8 : ](#7.8)** 
    - **[7.9 : ](#7.9)** 
    - **[7.10 : ](#7.10)** 
    - **[7.11 : ](#7.11)** 
    - **[7.12 : ](#7.12)** 
    - **[7.13 : ](#7.13)** 
- **[8 : ](#8)** 
    - **[8.1 : T](#8.1)** 
- **[9 : ](#9)** 
    - **[9.1 : ](#9.1)** 
    - **[9.2 : ](#9.2)** 
    - **[9.3 : ](#9.3)** 
- **[10 : ](#10)** 
    - **[10.1 : ](#10.1)** 
    - **[10.2 : ](#10.2)** 
    - **[10.3 : ](#10.3)** 
    - **[10.4 : ](#10.4)** 
    - **[10.5 : ](#10.5)** 
- **[11 : ](#11)** 
    - **[11.1 : ](#11.1)** 
    - **[11.2 : ](#11.2)**   
    - **[11.3 : ](#11.3)** 
- **[12 : Conclusion](#12)** 


----------------------------

<a id="1"></a>
# <p style="background-color:green; color:white; font-family:calibri; font-size:130%; color:white; text-align:left; border-radius:8px; padding:10px">1 : Introduction</p>

This dataset provides insights into factors influencing hiring decisions. Each record represents a candidate with various attributes considered during the hiring process. The goal is to predict whether a candidate will be hired based on these attributes.

⬆️ [Go to Contents](#contents_tabel)

----------------------------

<a id="2"></a>
# <p style="background-color:green; color:white; font-family:calibri; font-size:130%; color:white; text-align:left; border-radius:8px; padding:10px">2 : Purpose</p>

The purpose of this project is to explore and understand the factors that contribute to hiring decisions in recruitment data, and to apply machine learning models to predict hiring outcomes.

⬆️ [Go to Contents](#contents_tabel)

----------------------------

<a id="3"></a>
# <p style="background-color:green; color:white; font-family:calibri; font-size:130%; color:white; text-align:left; border-radius:8px; padding:10px">3 : Import Libraries</p>

⬆️ [Go to Contents](#contents_tabel)

----------------------------

In [7]:
# Importing Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import plotly.subplots as sp
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler, PowerTransformer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from scipy.stats import shapiro
#import optuna
#from optuna.samplers import TPESampler
import warnings
warnings.filterwarnings('ignore')

# Custom settings for visualizations
#plt.style.use('seaborn-darkgrid')


----------------------------
<a id="4"></a>
# <p style="background-color:green; color:white; font-family:calibri; font-size:130%; color:white; text-align:left; border-radius:8px; padding:10px">4 : Dataset Preparation</p>

⬆️ [Go to Contents](#contents_tabel)

----------------------------

In [10]:
# Section: Data Loading
file_path = 'recruitment_data.csv'
df = pd.read_csv(file_path)
print("Data loaded successfully. Here's a preview:")
df.head()

Data loaded successfully. Here's a preview:


Unnamed: 0,Age,Gender,EducationLevel,ExperienceYears,PreviousCompanies,DistanceFromCompany,InterviewScore,SkillScore,PersonalityScore,RecruitmentStrategy,HiringDecision
0,26,1,2,0,3,26.783828,48,78,91,1,1
1,39,1,4,12,3,25.862694,35,68,80,2,1
2,48,0,2,3,2,9.920805,20,67,13,2,0
3,34,1,2,5,2,6.407751,36,27,70,3,0
4,30,0,1,6,1,43.105343,23,52,85,2,0


In [11]:
df.isnull().sum()

Age                    0
Gender                 0
EducationLevel         0
ExperienceYears        0
PreviousCompanies      0
DistanceFromCompany    0
InterviewScore         0
SkillScore             0
PersonalityScore       0
RecruitmentStrategy    0
HiringDecision         0
dtype: int64

In [12]:
df.duplicated().sum()

0

In [13]:
df.shape

(1500, 11)

<div style="background-color:#0cbfae; color:white; padding:12px;border-radius:8px">
    <h2>Distribution of Dataset:</h2>
    <ul>
    <li> The df.shape attribute in pandas returns a tuple representing the dimensions of a DataFrame, where the first element is the number of rows and the second element is the number of columns.</li>
    <li>In the case of df.shape returning (1500, 11):

1500: This is the number of rows in the DataFrame. Each row typically represents an observation or a data point.
11: This is the number of columns in the DataFrame. Each column usually represents a different feature, variable, or attribute of the data.
So, your DataFrame df has 1,500 rows and 11 columns.</li>
    </ul>
 </div>