# **Project Name :- Student's Placement Record Analysis**
- Project Type :- Data Science
- Project Contributor :- Aditya Dhumal

## **Objective**
- Analyze which features (CGPA, internships, coding skills, communication skills, etc.) have the strongest influence on placement outcomes.
- Help students understand what really matters for getting placed.

# **Hypothesis**
- A studentâ€™s academic performance, technical skills, internships, aptitude, and soft skills significantly influence their campus placement status.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns

# Data Loading

In [8]:
df=pd.read_csv("Student_placement_status_ML/train.csv")

# Data Exploration

In [10]:
df.head()

Unnamed: 0,Student_ID,Age,Gender,Degree,Branch,CGPA,Internships,Projects,Coding_Skills,Communication_Skills,Aptitude_Test_Score,Soft_Skills_Rating,Certifications,Backlogs,Placement_Status
0,1048,22,Female,B.Tech,ECE,6.29,0,3,4,6,51,5,1,3,Not Placed
1,37820,20,Female,BCA,ECE,6.05,1,4,6,8,59,8,2,1,Not Placed
2,49668,22,Male,MCA,ME,7.22,1,4,6,6,58,6,2,2,Not Placed
3,19467,22,Male,MCA,ME,7.78,2,4,6,6,90,4,2,0,Placed
4,23094,20,Female,B.Tech,ME,7.63,1,4,6,5,79,6,2,0,Placed


In [11]:
df.columns

Index(['Student_ID', 'Age', 'Gender', 'Degree', 'Branch', 'CGPA',
       'Internships', 'Projects', 'Coding_Skills', 'Communication_Skills',
       'Aptitude_Test_Score', 'Soft_Skills_Rating', 'Certifications',
       'Backlogs', 'Placement_Status'],
      dtype='object')

In [21]:
df.shape

(45000, 15)

In [12]:
df.describe()

Unnamed: 0,Student_ID,Age,CGPA,Internships,Projects,Coding_Skills,Communication_Skills,Aptitude_Test_Score,Soft_Skills_Rating,Certifications,Backlogs
count,45000.0,45000.0,45000.0,45000.0,45000.0,45000.0,45000.0,45000.0,45000.0,45000.0,45000.0
mean,24977.9626,20.999333,7.00229,0.774089,3.734222,5.6918,5.501644,69.385356,5.501644,1.800956,0.888133
std,14425.605704,1.995071,0.993855,0.84475,0.923738,1.994674,1.515374,13.90971,1.238722,0.650104,0.970954
min,1.0,18.0,4.5,0.0,1.0,1.0,1.0,35.0,1.0,0.0,0.0
25%,12509.75,19.0,6.32,0.0,3.0,4.0,4.0,60.0,5.0,1.0,0.0
50%,24957.5,21.0,7.0,1.0,4.0,6.0,6.0,69.0,5.0,2.0,1.0
75%,37475.25,23.0,7.67,1.0,4.0,7.0,7.0,79.0,6.0,2.0,2.0
max,50000.0,24.0,9.8,3.0,6.0,10.0,10.0,100.0,10.0,3.0,3.0


In [18]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45000 entries, 0 to 44999
Data columns (total 15 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Student_ID            45000 non-null  int64  
 1   Age                   45000 non-null  int64  
 2   Gender                45000 non-null  object 
 3   Degree                45000 non-null  object 
 4   Branch                45000 non-null  object 
 5   CGPA                  45000 non-null  float64
 6   Internships           45000 non-null  int64  
 7   Projects              45000 non-null  int64  
 8   Coding_Skills         45000 non-null  int64  
 9   Communication_Skills  45000 non-null  int64  
 10  Aptitude_Test_Score   45000 non-null  int64  
 11  Soft_Skills_Rating    45000 non-null  int64  
 12  Certifications        45000 non-null  int64  
 13  Backlogs              45000 non-null  int64  
 14  Placement_Status      45000 non-null  object 
dtypes: float64(1), int6

In [17]:
df.isnull().sum()

Student_ID              0
Age                     0
Gender                  0
Degree                  0
Branch                  0
CGPA                    0
Internships             0
Projects                0
Coding_Skills           0
Communication_Skills    0
Aptitude_Test_Score     0
Soft_Skills_Rating      0
Certifications          0
Backlogs                0
Placement_Status        0
dtype: int64

- Null values per column = 0
- Numerical fields [Internships, Projects, Coding_Skills, Communication_Skills, Aptitude_Test_Score, Soft_Skills_Rating, Certifications,Backlogs]
- Categorical fields [Student_ID, Age, Gender, Degree, Branch, CGPA]
- Target field [Placement_Status]

### Dataset information
- Student_ID - Unique identifiers for each student (4500 students)
- Age - Student's Age
- Gender - Gender of each student (Male/Female)
- Degree - Type of degree student is pursuing or completed (Eg: B.E/B.Tech)
- Branch - Specialization in field
- CGPA - Average CGPA of each student throught academic journey
- Internships - Number of internships student has completed as a fresher
- Projects - Number of projects students has built in their career journey
- Coding_Skills - Coding skills on a scale of 1-10 (1='Poor' , 10='Skilled')
- Communication_Skills - Communication skills of each student on a scale of 1-10 (1='Poor', 10='Excellent')
- Aptitude_Test_Score - Apptitude test score of each individual (out of 100)
- Soft_Skills_Rating - Soft skills ratings of each students (1-10)
- Certifications - No of certifications each student has completed 
- Backlogs - No of backlogs each student has. 
- Placement_Status - Actual target column contains value if student has placed or not 