## Placement Data Analysis using Random Forest Classifier

## 1. Introduction

In this project, we analyze a placement dataset to predict internship opportunities using the Random Forest Classifier. The goal is to understand the dataset, preprocess it, build a predictive model, evaluate its performance, and derive meaningful insights.

## 2. Libraries Used

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

## 3. About the Dataset

## 3. Dataset Overview  

The dataset consists of various attributes related to students' academic and extracurricular backgrounds. The target variable is **Internships**, which we aim to predict using other features.  

### 3.1 Columns in the Dataset  

- **ExtracurricularActivities** - Whether the student participated in extracurricular activities (Yes/No)  
- **PlacementTraining** - Whether the student attended placement training (Yes/No)  
- **PlacementStatus** - Final placement status (Placed/Not Placed)  
- **Internships** - Number of internships done (Target Variable)  

### 3.2 Relationships Between Features  

- Placement training and extracurricular activities may have a direct impact on internship opportunities.  
- Students who secured final placements might have completed internships during their academic years.  

## 4. Basic Analysis  


In [2]:
data = pd.read_csv(r"C:\Users\devad\Downloads\placementdata.csv")
data

Unnamed: 0,StudentID,CGPA,Internships,Projects,Workshops/Certifications,AptitudeTestScore,SoftSkillsRating,ExtracurricularActivities,PlacementTraining,SSC_Marks,HSC_Marks,PlacementStatus
0,1,7.5,1,1,1,65,4.4,No,No,61,79,NotPlaced
1,2,8.9,0,3,2,90,4.0,Yes,Yes,78,82,Placed
2,3,7.3,1,2,2,82,4.8,Yes,No,79,80,NotPlaced
3,4,7.5,1,1,2,85,4.4,Yes,Yes,81,80,Placed
4,5,8.3,1,2,2,86,4.5,Yes,Yes,74,88,Placed
...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9996,7.5,1,1,2,72,3.9,Yes,No,85,66,NotPlaced
9996,9997,7.4,0,1,0,90,4.8,No,No,84,67,Placed
9997,9998,8.4,1,3,0,70,4.8,Yes,Yes,79,81,Placed
9998,9999,8.9,0,3,2,87,4.8,Yes,Yes,71,85,Placed


## 4.1 Checking Null Values

In [3]:
data.isnull().sum()

StudentID                    0
CGPA                         0
Internships                  0
Projects                     0
Workshops/Certifications     0
AptitudeTestScore            0
SoftSkillsRating             0
ExtracurricularActivities    0
PlacementTraining            0
SSC_Marks                    0
HSC_Marks                    0
PlacementStatus              0
dtype: int64

No missing values were found in the dataset.

## 4.2 Encoding Categorical Variables

In [4]:
le = LabelEncoder()
data['ExtracurricularActivities'] = le.fit_transform(data['ExtracurricularActivities'])
data['PlacementTraining'] = le.fit_transform(data['PlacementTraining'])
data['PlacementStatus'] = le.fit_transform(data['PlacementStatus'])

## 5. Model Building

### 5.1 Splitting Data

In [5]:
x = data.drop(['Internships'], axis=1)
y = data['Internships']

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=60)

### 5.2 Training Random Forest Classifier

In [6]:
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(x_train, y_train)

### 5.3 Predictions

In [7]:
y_pred = rf_classifier.predict(x_test)

## 6. Model Evaluation

### 6.1 Accuracy Score

In [8]:
accuracy = accuracy_score(y_test, y_pred) * 100
print("Accuracy Score:", accuracy)

Accuracy Score: 60.699999999999996


The accuracy score achieved in the model is 60.69%.

## 7. Conclusion

- The dataset does not contain missing values, making it easier to preprocess.

- Categorical variables were converted to numerical form using Label Encoding.

- The model performed well with 60.69% accuracy, suggesting that student extracurricular activities, placement training, and placement status have a significant impact on internship opportunities.

Future improvements could include feature engineering or hyperparameter tuning to improve model performance.

