## **Predicting admissions to the university**
### Problem Statement:
The goal of this project is to build a logistic regression model to predict whether a student 
will be admitted to a university based on their academic profile, including their GRE 
score, TOEFL score, undergraduate CGPA, and other relevant factors. This model will 
help universities identify potential candidates for admission.

In [2]:
import pandas as pd
import numpy as np

In [3]:
#Importing datasets
admin = pd.read_csv(r"C:\Users\hp\Desktop\5.1\sankyana\Assignment\ML project 1\Admission_Predict_Ver1.1.csv")

In [4]:
admin.head()

Unnamed: 0,Serial No.,GRE Score,TOEFL Score,University Rating,SOP,LOR,CGPA,Research,Chance of Admit
0,1,337,118,4,4.5,4.5,9.65,1,0.92
1,2,324,107,4,4.0,4.5,8.87,1,0.76
2,3,316,104,3,3.0,3.5,8.0,1,0.72
3,4,322,110,3,3.5,2.5,8.67,1,0.8
4,5,314,103,2,2.0,3.0,8.21,0,0.65


In [5]:
# Checking for missing values
admin.isnull().sum()

Serial No.           0
GRE Score            0
TOEFL Score          0
University Rating    0
SOP                  0
LOR                  0
CGPA                 0
Research             0
Chance of Admit      0
dtype: int64

In [6]:
admin.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Serial No.,500.0,250.5,144.481833,1.0,125.75,250.5,375.25,500.0
GRE Score,500.0,316.472,11.295148,290.0,308.0,317.0,325.0,340.0
TOEFL Score,500.0,107.192,6.081868,92.0,103.0,107.0,112.0,120.0
University Rating,500.0,3.114,1.143512,1.0,2.0,3.0,4.0,5.0
SOP,500.0,3.374,0.991004,1.0,2.5,3.5,4.0,5.0
LOR,500.0,3.484,0.92545,1.0,3.0,3.5,4.0,5.0
CGPA,500.0,8.57644,0.604813,6.8,8.1275,8.56,9.04,9.92
Research,500.0,0.56,0.496884,0.0,0.0,1.0,1.0,1.0
Chance of Admit,500.0,0.72174,0.14114,0.34,0.63,0.72,0.82,0.97


In [7]:
admin.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Serial No.         500 non-null    int64  
 1   GRE Score          500 non-null    int64  
 2   TOEFL Score        500 non-null    int64  
 3   University Rating  500 non-null    int64  
 4   SOP                500 non-null    float64
 5   LOR                500 non-null    float64
 6   CGPA               500 non-null    float64
 7   Research           500 non-null    int64  
 8   Chance of Admit    500 non-null    float64
dtypes: float64(4), int64(5)
memory usage: 35.3 KB


In [8]:
# Dropping the serial number column
admin = admin.drop(['Serial No.'],axis = 1)

In [9]:
#Splitting the data into dependent and independent variables
X =admin.drop(['Chance of Admit '],axis = 1)
Y = admin['Chance of Admit ']

In [10]:
#Looking at their shapes
print(X.shape,Y.shape)

(500, 7) (500,)


### Building and training the model

In [24]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

In [13]:
x_train, x_test, y_train, y_test = train_test_split(X,Y,test_size =0.2,random_state = 0)

In [28]:
#Converting continuous target to binary
y_train_binary = np.where(y_train >= 0.5,1,0)
y_test_binary = np.where(y_test >= 0.5,1,0)
y_train_binary

#Scaling the data
scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(x_train)
x_test_scaled = scaler.transform(x_test)

In [26]:
#Creating an instance of the model
admin_model = LogisticRegression(max_iter = 1000,solver= 'lbfgs')

In [27]:
admin_model.fit(x_train_scaled,y_train_binary)

In [None]:
#Predicting the values
y_pred = admin_model.predict(x_test_scaled)
y_pred

### Evaluating accuracy

In [33]:
from sklearn.metrics import r2_score, root_mean_squared_error as rmse, accuracy_score, confusion_matrix

In [36]:
print("%.3f" % r2_score(y_test_binary,y_pred))
print(rmse(y_test_binary,y_pred))
print(accuracy_score(y_test_binary,y_pred))

0.291
0.2
0.96
