📄 Dataset Description
This dataset simulates the academic and professional profiles of 10,000 college students, focusing on factors that influence placement outcomes. It includes features like IQ, academic performance, CGPA, internships, communication skills, and more.

The dataset is ideal for:

Predictive modeling of placement outcomes
Educational exercises in classification
Feature importance analysis
End-to-end machine learning projects
📊 Columns Description
Column Name	Description
College_ID	Unique ID of the college (e.g., CLG0001 to CLG0100)
IQ	Student’s IQ score (normally distributed around 100)
Prev_Sem_Result	GPA from the previous semester (range: 5.0 to 10.0)
CGPA	Cumulative Grade Point Average (range: ~5.0 to 10.0)
Academic_Performance	Annual academic rating (scale: 1 to 10)
Internship_Experience	Whether the student has completed any internship (Yes/No)
Extra_Curricular_Score	Involvement in extracurriculars (score from 0 to 10)
Communication_Skills	Soft skill rating (scale: 1 to 10)
Projects_Completed	Number of academic/technical projects completed (0 to 5)
Placement	Final placement result (Yes = Placed, No = Not Placed)
🎯 Target Variable
Placement: This is the binary classification target (Yes/No) that you can try to predict based on the other features.
🧠 Use Cases
📈 Classification Modeling (Logistic Regression, Decision Trees, Random Forest, etc.)
🔍 Exploratory Data Analysis (EDA)
🎯 Feature Engineering and Selection
🧪 Model Evaluation Practice
👩‍🏫 Academic Projects & Capstone Use
📦 Dataset Size
Rows: 10,000
Columns: 10
File Format: .csv
📚 Context
This dataset was generated to resemble real-world data in academic institutions for research and machine learning use. While it is synthetic, the variables and relationships are crafted to mimic authentic trends observed in student placements.

📜 License
MIT

🔗 Source
Created using Python (NumPy, Pandas) with data logic designed for educational and ML experimentation purposes.

You can get the dataset from the link : https://www.kaggle.com/datasets/sahilislam007/college-student-placement-factors-dataset

In [1]:
#doing data cleaning and data prepatation for the deep learning model 
import pandas as pd
dataframe=pd.read_csv("college_student_placement_dataset.csv")
print(f"The shape of the dataset : {dataframe.shape}")
print(f"\033[1;36m{"="*130}\033[0m")
print(f"Description about the dataset ")
print(dataframe.describe())
print(f"\033[1;36m{"="*130}\033[0m")
print(f"Info about the dataset ....")
print(dataframe.info())
print(f"\033[1;36m{"="*130}\033[0m")

The shape of the dataset : (10000, 10)
Description about the dataset 
                 IQ  Prev_Sem_Result          CGPA  Academic_Performance  \
count  10000.000000     10000.000000  10000.000000          10000.000000   
mean      99.471800         7.535673      7.532379              5.546400   
std       15.053101         1.447519      1.470141              2.873477   
min       41.000000         5.000000      4.540000              1.000000   
25%       89.000000         6.290000      6.290000              3.000000   
50%       99.000000         7.560000      7.550000              6.000000   
75%      110.000000         8.790000      8.770000              8.000000   
max      158.000000        10.000000     10.460000             10.000000   

       Extra_Curricular_Score  Communication_Skills  Projects_Completed  
count            10000.000000          10000.000000        10000.000000  
mean                 4.970900              5.561800            2.513400  
std                  3.

In [2]:
print("checking for missing values .....")
print(dataframe.isna().sum())
print("Checking for the duplicated values ....")
print(dataframe.duplicated().sum())

checking for missing values .....
College_ID                0
IQ                        0
Prev_Sem_Result           0
CGPA                      0
Academic_Performance      0
Internship_Experience     0
Extra_Curricular_Score    0
Communication_Skills      0
Projects_Completed        0
Placement                 0
dtype: int64
Checking for the duplicated values ....
0


In [3]:
print(f"There are no missing values and duplicated values ")
print("Reading column names and removing unnecessay values ")
print(dataframe.columns)

There are no missing values and duplicated values 
Reading column names and removing unnecessay values 
Index(['College_ID', 'IQ', 'Prev_Sem_Result', 'CGPA', 'Academic_Performance',
       'Internship_Experience', 'Extra_Curricular_Score',
       'Communication_Skills', 'Projects_Completed', 'Placement'],
      dtype='object')


In [4]:
# we do not need college id so removing it initially
dataframe.drop(columns=["College_ID"],inplace=True)
print(dataframe.columns)
df_columns=dataframe.columns

Index(['IQ', 'Prev_Sem_Result', 'CGPA', 'Academic_Performance',
       'Internship_Experience', 'Extra_Curricular_Score',
       'Communication_Skills', 'Projects_Completed', 'Placement'],
      dtype='object')


In [5]:
#iterating through every column and knowing its values 
for i in df_columns:
        print(i)
        print(dataframe[i].value_counts)
        print("="*130)

IQ
<bound method IndexOpsMixin.value_counts of 0       107
1        97
2       109
3       122
4        96
       ... 
9995    119
9996     70
9997     89
9998    107
9999    109
Name: IQ, Length: 10000, dtype: int64>
Prev_Sem_Result
<bound method IndexOpsMixin.value_counts of 0       6.61
1       5.52
2       5.36
3       5.47
4       7.91
        ... 
9995    8.41
9996    9.25
9997    6.08
9998    8.77
9999    9.41
Name: Prev_Sem_Result, Length: 10000, dtype: float64>
CGPA
<bound method IndexOpsMixin.value_counts of 0       6.28
1       5.37
2       5.83
3       5.75
4       7.69
        ... 
9995    8.29
9996    9.34
9997    6.25
9998    8.92
9999    9.77
Name: CGPA, Length: 10000, dtype: float64>
Academic_Performance
<bound method IndexOpsMixin.value_counts of 0       8
1       8
2       9
3       6
4       7
       ..
9995    4
9996    7
9997    3
9998    3
9999    8
Name: Academic_Performance, Length: 10000, dtype: int64>
Internship_Experience
<bound method IndexOpsMixin.value_co

In [6]:
#getting object datatypes to convert into integer or float dtype
list_objects=[]
list_numbers=[]
for i in df_columns:
        if(str(dataframe[i].dtype)=="object"):
                list_objects.append(i)
        else:
                list_numbers.append(i)
print(list_objects)
print(list_numbers)

['Internship_Experience', 'Placement']
['IQ', 'Prev_Sem_Result', 'CGPA', 'Academic_Performance', 'Extra_Curricular_Score', 'Communication_Skills', 'Projects_Completed']


In [7]:
#checking out the object type classes
for i in list_objects:

        print(dataframe[i].value_counts())
        print("="*80)

Internship_Experience
No     6036
Yes    3964
Name: count, dtype: int64
Placement
No     8341
Yes    1659
Name: count, dtype: int64


In [8]:
#applying directly to columns 
from sklearn.preprocessing import LabelEncoder
encoder=LabelEncoder()
list_encoders={}
for i in list_objects:
        dataframe[i]=encoder.fit_transform(dataframe[i])
        list_encoders[i]=encoder.classes_
for i in list_objects:
        print(dataframe[i].value_counts())

Internship_Experience
0    6036
1    3964
Name: count, dtype: int64
Placement
0    8341
1    1659
Name: count, dtype: int64


In [9]:
for i in list_objects:
        print(dataframe[i].dtype)
        print(dataframe[i].value_counts())
print(list_encoders)
dict_encoded={
        "Yes":1,
        "No":0
}

int32
Internship_Experience
0    6036
1    3964
Name: count, dtype: int64
int32
Placement
0    8341
1    1659
Name: count, dtype: int64
{'Internship_Experience': array(['No', 'Yes'], dtype=object), 'Placement': array(['No', 'Yes'], dtype=object)}


In [10]:
for i in df_columns:
        print(i)
        print(dataframe[i].dtype)

IQ
int64
Prev_Sem_Result
float64
CGPA
float64
Academic_Performance
int64
Internship_Experience
int32
Extra_Curricular_Score
int64
Communication_Skills
int64
Projects_Completed
int64
Placement
int32


In [11]:
X=dataframe.drop(columns=["Placement"],inplace=False)
y=dataframe["Placement"]
print(X.shape)
print(y.shape)
print(X[:2])
print(X[:2])

(10000, 8)
(10000,)
    IQ  Prev_Sem_Result  CGPA  Academic_Performance  Internship_Experience  \
0  107             6.61  6.28                     8                      0   
1   97             5.52  5.37                     8                      0   

   Extra_Curricular_Score  Communication_Skills  Projects_Completed  
0                       8                     8                   4  
1                       7                     8                   0  
    IQ  Prev_Sem_Result  CGPA  Academic_Performance  Internship_Experience  \
0  107             6.61  6.28                     8                      0   
1   97             5.52  5.37                     8                      0   

   Extra_Curricular_Score  Communication_Skills  Projects_Completed  
0                       8                     8                   4  
1                       7                     8                   0  


In [12]:
from sklearn.model_selection import train_test_split
x_train_df,x_test_df,y_train_df,y_test_df=train_test_split(X,y,train_size=0.8,random_state=42)
print(x_train_df.shape)
print(x_test_df.shape)
print(y_train_df.shape)
print(y_test_df.shape)

(8000, 8)
(2000, 8)
(8000,)
(2000,)


In [13]:
x_train=x_train_df.to_numpy()
x_test=x_test_df.to_numpy()
y_train=y_train_df.to_numpy()
y_test=y_test_df.to_numpy()
print(type(x_train))
print(type(x_test))
print(type(y_train))
print(type(y_test))


<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>


In [14]:
#getting to the tensor data-type
import torch as t

x_train=t.from_numpy(x_train)
x_test=t.from_numpy(x_test)
y_train=t.from_numpy(y_train)
y_test=t.from_numpy(y_test)


print(type(x_train))
print(type(x_test))
print(type(y_train))
print(type(y_test))


<class 'torch.Tensor'>
<class 'torch.Tensor'>
<class 'torch.Tensor'>
<class 'torch.Tensor'>


In [15]:
print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)

torch.Size([8000, 8])
torch.Size([2000, 8])
torch.Size([8000])
torch.Size([2000])


In [16]:
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
import torch.nn as nn
class Dataset_creator(Dataset):
        def __init__(self,X_val,y_val):
                
                self.x=X_val
                self.y=y_val
        def __len__(self):
                return len(self.y)
        def __getitem__(self, index):
                return self.x[index],self.y[index]



In [17]:
train_dataset=Dataset_creator(X_val=x_train,y_val=y_train)
test_dataset=Dataset_creator(X_val=x_test,y_val=y_test)
print(len(train_dataset))
print(len(test_dataset))


8000
2000


In [18]:
train_dataloader=DataLoader(dataset=train_dataset,shuffle=True,drop_last=True,batch_size=32,num_workers=0)
test_dataloader=DataLoader(dataset=test_dataset,shuffle=True,drop_last=True,batch_size=32,num_workers=0)

In [19]:
print(len(train_dataloader))
print(len(test_dataloader))

250
62


In [20]:
for idx,(X,y) in enumerate(train_dataloader):
        print(f"Batch:{idx}")
        print(X.shape)
        print(y.shape)
        print(f"\033[1;36m{"="*150}\033[0m")
        



Batch:0
torch.Size([32, 8])
torch.Size([32])
Batch:1
torch.Size([32, 8])
torch.Size([32])
Batch:2
torch.Size([32, 8])
torch.Size([32])
Batch:3
torch.Size([32, 8])
torch.Size([32])
Batch:4
torch.Size([32, 8])
torch.Size([32])
Batch:5
torch.Size([32, 8])
torch.Size([32])
Batch:6
torch.Size([32, 8])
torch.Size([32])
Batch:7
torch.Size([32, 8])
torch.Size([32])
Batch:8
torch.Size([32, 8])
torch.Size([32])
Batch:9
torch.Size([32, 8])
torch.Size([32])
Batch:10
torch.Size([32, 8])
torch.Size([32])
Batch:11
torch.Size([32, 8])
torch.Size([32])
Batch:12
torch.Size([32, 8])
torch.Size([32])
Batch:13
torch.Size([32, 8])
torch.Size([32])
Batch:14
torch.Size([32, 8])
torch.Size([32])
Batch:15
torch.Size([32, 8])
torch.Size([32])
Batch:16
torch.Size([32, 8])
torch.Size([32])
Batch:17
torch.Size([32, 8])
torch.Size([32])
Batch:18
torch.Size([32, 8])
torch.Size([32])
Batch:19
torch.Size([32, 8])
torch.Size([32])
Batch:20
torch.Size([32, 8])
torch.Size([32])
Batch:21
torch.Size([32, 8])
torch.Size([32]

In [56]:
class Model(nn.Module):
        def __init__(self,input_size,hidden_size,output_size):
                super().__init__()
                self.Layer1=nn.Sequential(
                        nn.Linear(input_size,hidden_size),
                        nn.ReLU(),
                        nn.Linear(hidden_size,hidden_size),
                        nn.ReLU(),
                        nn.Dropout(p=0.2),
                        nn.BatchNorm1d(hidden_size),
                        nn.Linear(hidden_size,hidden_size),
                        nn.Dropout(p=0.1),
                        nn.ReLU(),
                        nn.Linear(hidden_size,hidden_size),
                        nn.ReLU(),
                        nn.Linear(hidden_size,hidden_size),
                        nn.ReLU(),
                        nn.Linear(hidden_size,output_size)

                )
        def forward(self,x):
                return self.Layer1(x)
        


In [57]:
Model_classifier=Model(8,hidden_size=200,output_size=1)

In [58]:
from torchinfo import summary
summary(
        model=Model_classifier,
        input_size=(32,8),
        col_names=["input_size","output_size","num_params","trainable"])


Layer (type:depth-idx)                   Input Shape               Output Shape              Param #                   Trainable
Model                                    [32, 8]                   [32, 1]                   --                        True
├─Sequential: 1-1                        [32, 8]                   [32, 1]                   --                        True
│    └─Linear: 2-1                       [32, 8]                   [32, 200]                 1,800                     True
│    └─ReLU: 2-2                         [32, 200]                 [32, 200]                 --                        --
│    └─Linear: 2-3                       [32, 200]                 [32, 200]                 40,200                    True
│    └─ReLU: 2-4                         [32, 200]                 [32, 200]                 --                        --
│    └─Dropout: 2-5                      [32, 200]                 [32, 200]                 --                        --
│    └─Ba

In [63]:
optimizer=t.optim.Adam(params=Model_classifier.parameters(),lr=1e-2)
lossfn=t.nn.BCEWithLogitsLoss()


In [64]:
#training
import pyttsx3
Model_classifier.to(t.float32)
t.manual_seed(42)
train_loss_list=[]
test_loss_list=[]
train_acc_list=[]
test_acc_list=[]
epochs=40
for epoch in range(epochs):
        train_acc=0
        train_loss=0
        test_acc=0
        test_loss=0
        Model_classifier.train()
        for (X,y) in train_dataloader:
                X=X.to(t.float32)
                y_preds=Model_classifier(X)
                y_preds=y_preds.squeeze(dim=1)
                y=y.to(t.float32)
                loss=lossfn(y_preds,y)
                train_loss+=loss.item()*32 # batchsize=32
                train_acc+=(t.round(y_preds)==y).sum().item()
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
        Model_classifier.eval()
        with t.inference_mode():
                for (X,y) in test_dataloader:
                        X=X.to(t.float32)
                        y_preds=Model_classifier(X)
                        y_preds=y_preds.squeeze(dim=1)
                        y=y.to(t.float32)
                        loss=lossfn(y_preds,y)
                        test_loss+=loss.item()*32
                        test_acc+=(t.round(y_preds)==y).sum().item()
        train_acc_list.append(train_acc/len(train_dataloader.dataset))
        train_loss_list.append(train_loss/len(train_dataloader.dataset))
        test_acc_list.append(test_acc/len(test_dataloader.dataset))
        test_loss_list.append(test_loss/len(test_dataloader.dataset))
        print(f" \033[1;36m Epoch : {epoch} completed \033[0m")
        print(f"Training accuracy : {train_acc_list[epoch]}")
        print(f"Testing accuracy : {test_acc_list[epoch]}")
        # engine=pyttsx3.init()
        # engine.say(f"Epoch {epoch} completed ")
        
        # engine.runAndWait()
        # del engine
        

        

 [1;36m Epoch : 0 completed [0m
Training accuracy : 0.058875
Testing accuracy : 0.0015
 [1;36m Epoch : 1 completed [0m
Training accuracy : 0.066625
Testing accuracy : 0.0
 [1;36m Epoch : 2 completed [0m
Training accuracy : 0.090875
Testing accuracy : 0.004
 [1;36m Epoch : 3 completed [0m
Training accuracy : 0.08425
Testing accuracy : 0.0
 [1;36m Epoch : 4 completed [0m
Training accuracy : 0.08375
Testing accuracy : 0.0
 [1;36m Epoch : 5 completed [0m
Training accuracy : 0.088875
Testing accuracy : 0.0005
 [1;36m Epoch : 6 completed [0m
Training accuracy : 0.107
Testing accuracy : 0.001
 [1;36m Epoch : 7 completed [0m
Training accuracy : 0.071375
Testing accuracy : 0.0
 [1;36m Epoch : 8 completed [0m
Training accuracy : 0.097875
Testing accuracy : 0.001
 [1;36m Epoch : 9 completed [0m
Training accuracy : 0.087375
Testing accuracy : 0.0
 [1;36m Epoch : 10 completed [0m
Training accuracy : 0.0795
Testing accuracy : 0.0
 [1;36m Epoch : 11 completed [0m
Training accu

MODEL PERFORMED WORSE

In [None]:
#Model performed worse