 # Graduate Admission Rating Model

**importing required libraries**

In [85]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

**reading/uploading the dataset**

In [7]:
dataset=pd.read_csv('/kaggle/input/admissions/Admission.csv')

**What are the number of rows & columns?**

In [12]:
dataset.shape

(1000, 8)

We have a dataset spanning 1000 rows and 8 columns

**Printing the first 10 rows**

In [18]:
dataset.head(10)

Unnamed: 0,GRE Score,TOEFL Score,University Rating,SOP,LOR,CGPA,Research,Chance of Admit
0,337.0,118.0,4.0,4.5,4.5,9.65,1.0,0.92
1,324.0,107.0,4.0,4.0,4.5,8.87,1.0,0.76
2,316.0,104.0,3.0,3.0,3.5,8.0,1.0,0.72
3,322.0,110.0,3.0,3.5,2.5,8.67,1.0,0.8
4,314.0,103.0,2.0,2.0,3.0,8.21,0.0,0.65
5,330.0,115.0,5.0,4.5,3.0,9.34,1.0,0.9
6,321.0,109.0,3.0,3.0,4.0,8.2,1.0,0.75
7,308.0,101.0,2.0,3.0,4.0,7.9,0.0,0.68
8,302.0,102.0,1.0,2.0,1.5,8.0,0.0,0.5
9,323.0,108.0,3.0,3.5,3.0,8.6,0.0,0.45


**Printing the last 10 rows**

In [17]:
dataset.tail(10)

Unnamed: 0,GRE Score,TOEFL Score,University Rating,SOP,LOR,CGPA,Research,Chance of Admit
990,312.0,104.0,2.0,2.4,2.9,8.13,1.0,0.539438
991,310.0,106.0,2.0,2.3,3.9,8.11,1.0,0.666709
992,307.0,102.0,2.0,1.8,2.8,8.74,1.0,0.732535
993,318.0,107.0,3.0,4.5,4.9,8.52,1.0,0.77004
994,305.0,99.0,2.0,1.9,1.9,8.09,-0.0,0.592842
995,314.0,109.0,1.0,2.5,4.1,8.78,1.0,0.788787
996,322.0,102.0,3.0,3.5,2.7,8.17,1.0,0.683076
997,324.0,109.0,3.0,3.2,3.1,8.51,1.0,0.725821
998,298.0,95.0,2.0,3.1,2.5,8.11,-0.0,0.56029
999,333.0,114.0,5.0,4.2,3.9,9.23,1.0,0.831325


**Checking if any of the columns have empty/null values**

In [14]:
dataset.isnull().any()

GRE Score            False
TOEFL Score          False
University Rating    False
SOP                  False
LOR                  False
CGPA                 False
Research             False
Chance of Admit      False
dtype: bool

This tells us that none of the columns have empty cells

**Stripping any leading or trailing whitespaces in a column name**

In [30]:
dataset.columns=dataset.columns.str.strip()

Column Heading 'LOR' had either trailing or leading whitespaces.

**Getting range of values used to rating in each columns**

In [34]:
print("GRE Score range: ",dataset['GRE Score'].min(),"-",dataset['GRE Score'].max())
print("TOEFL Score range: ",dataset['TOEFL Score'].min(),"-",dataset['TOEFL Score'].max())
print("University Rating range: ",dataset['University Rating'].min(),"-",dataset['University Rating'].max())
print("SOP range: ",dataset['SOP'].min(),"-",dataset['SOP'].max())
print("LOR range: ",dataset['LOR'].min(),"-",dataset['LOR'].max())
print("CGPA range: ",dataset['CGPA'].min(),"-",dataset['CGPA'].max())
print("Research range: ",dataset['Research'].min(),"-",dataset['Research'].max())
print("Chances of Admit range: ",dataset['Chance of Admit'].min(),"-",dataset['Chance of Admit'].max())

GRE Score range:  290.0 - 340.0
TOEFL Score range:  92.0 - 120.0
University Rating range:  1.0 - 5.0
SOP range:  1.0 - 5.0
LOR range:  1.0 - 5.0
CGPA range:  6.8 - 9.92
Research range:  -0.0 - 1.0
Chances of Admit range:  0.34 - 0.97


**Getting some more statistical insights**

In [35]:
dataset.describe()

Unnamed: 0,GRE Score,TOEFL Score,University Rating,SOP,LOR,CGPA,Research,Chance of Admit
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,316.432,107.227,3.095,3.3448,3.4755,8.57098,0.556,0.720568
std,11.324028,6.132461,1.166763,1.022712,0.956151,0.609056,0.497103,0.14318
min,290.0,92.0,1.0,1.0,1.0,6.8,-0.0,0.34
25%,308.0,103.0,2.0,2.5,2.9,8.1275,0.0,0.629591
50%,317.0,107.0,3.0,3.5,3.5,8.56,1.0,0.724833
75%,325.0,112.0,4.0,4.0,4.2,9.02,1.0,0.82
max,340.0,120.0,5.0,5.0,5.0,9.92,1.0,0.97


**Splitting the column we need to predict**

In [52]:
x=dataset.drop('Chance of Admit',axis=1)
y=dataset['Chance of Admit']

**Separating training and testing sets**

In [55]:
x_train, x_test, y_train, y_test=train_test_split(x,y, test_size=0.18, random_state=3)

**Using GradientBoostingClassifier**

In [59]:
model=model = RandomForestRegressor(n_estimators=100, max_depth=5)

model chosen

**Fitting the model on my dataset**

In [60]:
model.fit(x_train,y_train)

Model has been trained on our dataset.

**Predicting for arbitrary values**

In [84]:
p=model.predict([[329,110,5,4.2,4,8.78,1.00]])
p[0]



0.8198670039137983

* **Candidate Details given**
* GRE Score: 329
* TOEFL Score: 110
* SOP: 4.2
* LOR: 4
* CGPA: 8.78
* Research: 1.00
***According to our model this candidate has 81.98%~82% chance of getting admitted.**

**Assessing performance using rmse**

In [86]:
y_test_pred = model.predict(x_test)
mse = mean_squared_error(y_test, y_test_pred)
rmse = np.sqrt(mse)
print(f"Root Mean Squared Error (RMSE): {rmse:.2f}")


Root Mean Squared Error (RMSE): 0.07
