<a href="https://colab.research.google.com/github/PankajJoshi0202/Engineering-College-Admission-Prediction/blob/main/Final_code.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [75]:
import pandas as pd

In [76]:
# Load your data into a DataFrame
data = pd.read_csv('kaggle_sw_raw.csv')  # If you're using a CSV file

# Understanding of the dataset and the attributes available
1. View the first few rows of the dataset to get a sense of the data structure.
2. Check the column names to understand which features (attributes) are present.
3. Check for missing values to identify if there are any NaN or null values that need to be handled.
4. Check the data types to understand the types of data you’re working with (numerical, categorical, etc.).
5. Get descriptive statistics of numerical features, like mean, standard deviation, min, and max.
6. Check for unique values in categorical columns to understand their distribution.

In [77]:
data.head() # Get head of data

Unnamed: 0,rank,percentile,branch,gender,category,fulfillment,seat_type,primary_seat_type,secondary_seat_type,score_type,college_name,enrollment_no,branch_code
0,18388,90.121473,Civil Engineering,M,NT 2 (NT-C),^,GOPENS,State Level Seats,State Level Seats,MHT-CET,"Government College of Engineering, Amravati",EN22169138,100219110
1,18898,89.889223,Civil Engineering,F,SC,^,LOPENS,State Level Seats,State Level Seats,MHT-CET,"Government College of Engineering, Amravati",EN22182921,100219110
2,19374,89.540152,Civil Engineering,M,OBC,^,GOPENS,State Level Seats,State Level Seats,MHT-CET,"Government College of Engineering, Amravati",EN22164339,100219110
3,21857,88.241971,Civil Engineering,M,OBC,^,GOPENS,State Level Seats,State Level Seats,MHT-CET,"Government College of Engineering, Amravati",EN22169336,100219110
4,22128,88.091617,Civil Engineering,M,DT/VJ,~,GOPENS,State Level Seats,State Level Seats,MHT-CET,"Government College of Engineering, Amravati",EN22135944,100219110


In [78]:
data.columns # get all column of data

Index(['rank', 'percentile', 'branch', 'gender', 'category', 'fulfillment',
       'seat_type', 'primary_seat_type', 'secondary_seat_type', 'score_type',
       'college_name', 'enrollment_no', 'branch_code'],
      dtype='object')

In [79]:
data.isnull().sum() # Checking if we have any null value in data

Unnamed: 0,0
rank,0
percentile,0
branch,0
gender,0
category,0
fulfillment,0
seat_type,0
primary_seat_type,0
secondary_seat_type,0
score_type,0


In [80]:
data.info() # Get the info about column present in data

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 104345 entries, 0 to 104344
Data columns (total 13 columns):
 #   Column               Non-Null Count   Dtype  
---  ------               --------------   -----  
 0   rank                 104345 non-null  int64  
 1   percentile           104345 non-null  float64
 2   branch               104345 non-null  object 
 3   gender               104345 non-null  object 
 4   category             104345 non-null  object 
 5   fulfillment          104345 non-null  object 
 6   seat_type            104345 non-null  object 
 7   primary_seat_type    104345 non-null  object 
 8   secondary_seat_type  104345 non-null  object 
 9   score_type           104345 non-null  object 
 10  college_name         104345 non-null  object 
 11  enrollment_no        104345 non-null  object 
 12  branch_code          104345 non-null  object 
dtypes: float64(1), int64(1), object(11)
memory usage: 10.3+ MB


In [81]:
data.describe() # Describing the data

Unnamed: 0,rank,percentile
count,104345.0,104345.0
mean,53628.903819,63.394483
std,35253.231789,26.21945
min,0.0,0.004739
25%,22737.0,45.271811
50%,49145.0,69.286946
75%,82395.0,85.040326
max,129286.0,100.0


In [82]:
data.select_dtypes(include=['object']).nunique() #get all unique column of data

Unnamed: 0,0
branch,95
gender,2
category,73
fulfillment,5
seat_type,77
primary_seat_type,4
secondary_seat_type,12
score_type,3
college_name,326
enrollment_no,104345


In [83]:
# Check class distribution
data['college_name'].value_counts(normalize=True)

Unnamed: 0_level_0,proportion
college_name,Unnamed: 1_level_1
"Bansilal Ramnath Agarawal Charitable Trust's Vishwakarma Institute of Technology, Bibwewadi, Pune",0.012622
"Lokmanya Tilak Jankalyan Shikshan Sanstha, Priyadarshani College of Engineering, Nagpur",0.012162
"Yeshwantrao Chavan College of Engineering,Wanadongri, Nagpur",0.011941
"B.R.A.C.T's Vishwakarma Institute of Information Technology, Kondhwa (Bk.), Pune",0.011500
"Dr. D. Y. Patil Unitech Society's Dr. D. Y. Patil Institute of Technology, Pimpri, Pune",0.010906
...,...
"VPM's Maharshi Parshuram College of Engineering, Velneshwar, Ratnagiri.",0.000230
"University Department of Chemical Technology, Aurangabad",0.000220
"Universal College of Engineering & Research, Sasewadi",0.000192
"Jamia Institute Of Engineering And Management Studies, Akkalkuwa",0.000173


In [84]:
# Remove non-numeric characters from branch_code
data['branch_code'] = data['branch_code'].str.extract('(\d+)')  # Keep only numeric part
# Check for non-numeric values in branch_code column
print(data['branch_code'].unique())

# Check for non-numeric values in other relevant columns
print(data['enrollment_no'].unique())


['100219110' '100219111' '100224210' ... '693829350' '693837250'
 '693892550']
['EN22169138' 'EN22182921' 'EN22164339' ... 'EN22121674' 'EN22123672'
 'EN22246442']


In [85]:
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split

# Clean branch_code
data['branch_code'] = data['branch_code'].str.extract('(\d+)')

# Redo OneHotEncoding for categorical columns and StandardScaler for numerical ones
categorical_cols = ['branch', 'gender', 'category', 'fulfillment', 'seat_type', 'primary_seat_type',
                    'secondary_seat_type', 'score_type']

# Apply transformation
preprocessor = ColumnTransformer(
    transformers=[('cat', OneHotEncoder(), categorical_cols)],
    remainder='passthrough')  # keep numerical columns as they are

X = data.drop(columns=['college_name', 'enrollment_no'])  # Drop target and irrelevant columns
y = data['college_name']

X_encoded = preprocessor.fit_transform(X)



In [86]:
from sklearn.preprocessing import StandardScaler

# Apply StandardScaler with `with_mean=False` for sparse data
scaler = StandardScaler(with_mean=False)
X_encoded[:, -2:] = scaler.fit_transform(X_encoded[:, -2:])



In [87]:
X_train, X_test, y_train, y_test = train_test_split(X_encoded, y, test_size=0.2, random_state=42)


In [88]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Add class_weight parameter to handle imbalance
model = RandomForestClassifier(n_estimators=50, random_state=42, class_weight='balanced')
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")


Accuracy: 48.31%


In [89]:
import joblib
joblib.dump(model, 'model.pkl')
joblib.dump(preprocessor, 'preprocessor.pkl')

['preprocessor.pkl']

In [90]:
! pip install streamlit -q
!pip install pyngrok



In [91]:
%%writefile app.py
import streamlit as st
import pandas as pd
import joblib

# Load the trained model and preprocessor
model = joblib.load('model.pkl')  # Ensure the model is saved and loaded using joblib
preprocessor = joblib.load('preprocessor.pkl')  # Load the preprocessor if necessary

# Define a function to make predictions
def predict_college(input_data):
    # Convert input data into DataFrame
    input_df = pd.DataFrame([input_data])

    # Preprocess the input data
    input_processed = preprocessor.transform(input_df)

    # Make prediction
    prediction = model.predict(input_processed)

    return prediction[0]

# Fulfillment mapping
fulfillment_mapping = {
    '*': 'Betterment in Choice Code',
    '@': 'Betterment in Seat Type',
    '~': 'No Change',
    '^': 'Admitted to Institute',
    '&': 'Newly Allotted'
}

# Reverse mapping for selection purposes
reverse_fulfillment_mapping = {v: k for k, v in fulfillment_mapping.items()}

# Streamlit UI
st.title('College Predictor')

# Input fields for each feature
rank = st.number_input('Rank', min_value=0, max_value=130000, value=20000)
percentile = st.slider('Percentile', 0.0, 100.0, 85.12)
branch = st.selectbox(
    'Branch',
    [
        'Civil Engineering', 'Computer Science and Engineering', 'Information Technology',
        'Electrical Engineering', 'Electronics and Telecommunication Engg',
        'Instrumentation Engineering', 'Mechanical Engineering', 'Food Technology',
        'Oil and Paints Technology', 'Paper and Pulp Technology',
        'Petro Chemical Engineering', 'Computer Engineering',
        'Electrical Engg[Electronics and Power]', 'Artificial Intelligence (AI) and Data Science',
        'Industrial IoT', 'Artificial Intelligence and Data Science', 'Chemical Engineering',
        'Textile Engineering / Technology',
        'Computer Science and Engineering(Data Science)', 'Production Engineering',
        'Textile Technology', 'Pharmaceutical and Fine Chemical Technology',
        'Electronics and Computer Engineering', 'Agricultural Engineering',
        'Computer Science and Design', 'Plastic and Polymer Engineering',
        'Computer Science and Engineering(Artificial Intelligence and Machine Learning)',
        'Electrical and Electronics Engineering',
        'Electrical Engg [Electrical and Power]', 'Electronics Engineering',
        'Mechanical & Automation Engineering',
        'Artificial Intelligence and Machine Learning', 'Safety and Fire Engineering',
        'Production Engineering[Sandwich]',
        'Electronics Engineering ( VLSI Design and Technology)', 'Computer Science and Technology',
        'Electronics and Communication Engineering', 'Data Science', 'Dyestuff Technology',
        'Oil,Oleochemicals and Surfactants Technology',
        'Pharmaceuticals Chemistry and Technology',
        'Fibres and Textile Processing Technology', 'Polymer Engineering and Technology',
        'Food Engineering and Technology', 'Surface Coating Technology',
        'Food Technology And Management', 'Mechatronics Engineering',
        'Civil and Infrastructure Engineering', 'Bio Medical Engineering',
        'Electronics and Computer Science',
        'Computer Science and Engineering (Internet of Things and Cyber Security Including Block Chain Technology)',
        'Cyber Security', 'Automobile Engineering', 'Internet of Things (IoT)',
        'Mechanical and Mechatronics Engineering (Additive Manufacturing)',
        'Computer Science and Engineering(Cyber Security)', 'Automation and Robotics',
        'Data Engineering', 'Plastic and Polymer Technology', 'Petro Chemical Technology',
        'Oil Technology', 'Computer Technology',
        'Computer Science and Engineering (Cyber Security)',
        'Computer Science and Engineering (IoT)',
        'Computer Science and Engineering (Artificial Intelligence)', 'Artificial Intelligence',
        'Aeronautical Engineering', 'Bio Technology',
        'Robotics and Artificial Intelligence', 'Mining Engineering',
        'Computer Science and Business Systems', 'Plastic Technology', 'Paints Technology',
        'Instrumentation and Control Engineering', 'Robotics and Automation',
        'Structural Engineering',
        'Electronics and Telecommunication Engg University, Jalgaon',
        'Computer Science and Engineering University, Jalgaon', 'Robotics',
        'Artificial Intelligence and Data Science University, Jalgaon',
        'Civil and Environmental Engineering', 'Manufacturing Science and Engineering',
        'Metallurgy and Material Technology', 'Computer Engineering (Regional Language)',
        'Automotive Technology', 'Computer Science and Information Technology',
        'Fashion Technology', 'Man Made Textile Technology', 'Textile Chemistry',
        'Textile Plant Engineering',
        'Computer Science and Engineering (Artificial Intelligence and Data Science)',
        'Electrical and Computer Engineering', 'Printing Technology',
        'Mechanical Engineering[Sandwich]', 'Agriculture Engineering'
    ]
)
gender = st.selectbox('Gender', ['M', 'F'])
category = st.selectbox(
    'Category',
    [
        'NT 2 (NT-C)', 'SC', 'OBC', 'DT/VJ', 'NT 1 (NT-B)', 'SC$', 'NT 1 (NT-B)$#',
        'SBC', 'OBC$', 'OBC#', 'ST', 'NT 3 (NT-D)', 'ST/DEF2', 'ST$', 'OBC/DEF1',
        'OPEN', 'Open/DEF3', 'SC/DEF1', 'SC/DEF2', 'OBC/PH1', 'Open/PH1', 'Open/DEF1',
        'OBC/DEF2', 'NT 3 (NT-D)#', 'SC$/DEF1', 'SBC$', 'DT/VJ$#', 'NT 1 (NT-B)/DEF1',
        'OBC$#', 'NT 1 (NT-B)$', 'NT 3 (NT-D)$#', 'DT/VJ$', 'ST/PH1', 'NT 3 (NT-D)$',
        'SBC/DEF1', 'NT 1 (NT-B)#', 'NT 2 (NT-C)$', 'NT 2 (NT-C)#', 'DT/VJ#',
        'NT 2 (NT-C)$#', 'DT/VJ/PH1', 'OBC$/DEF1', 'NT 2 (NT-C)/PH1', 'SC/PH1',
        'NT 3 (NT-D)/DEF1', 'Open/DEF2', 'SBC#', 'NT 2 (NT-C)/DEF1', 'NT 2 (NT-C)/DEF2',
        'SBC$#', 'OBC$/PH1', 'DT/VJ/DEF1', 'OBC$#/DEF1', 'OBC/DEF3', 'NT 1 (NT-B)/DEF2',
        'NT 1 (NT-B)/PH1', 'SBC/PH1', 'OBC#/DEF1', 'OBC#/PH1', 'ST/DEF1', 'SC/DEF3',
        'OBC$#/PH1', 'NT 2 (NT-C)$/PH1', 'NT 3 (NT-D)/PH1', 'SBC$/PH1', 'NT 3 (NT-D)/DEF2',
        'SBC#/DEF1', 'DT/VJ$/DEF1', 'OBC$#/DEF2', 'OBC$/DEF3', 'OBC$/DEF2', 'DT/VJ/DEF2',
        'SBC/DEF2'
    ]
)
fulfillment = st.selectbox('Fulfillment', list(fulfillment_mapping.values()))
seat_type = st.selectbox(
    'Seat Type',
    [
        'GOPENS', 'LOPENS', 'LSCS', 'LOBCS', 'GOBCS', 'GVJS', 'PWDOPENS', 'GSCS',
        'DEFOPENS', 'GNT2S', 'ORPHAN', 'GNT1S', 'GSTS', 'GNT3S', 'LVJS', 'LNT2S',
        'LSTS', 'DEFROBCS', 'EWS', 'TFWS', 'LNT1S', 'DEFRSCS', 'PWDROBCS', 'GOPENH',
        'LOPENH', 'LOBCH', 'GOBCH', 'GNT1H', 'GSTH', 'GSCH', 'GVJH', 'GSCO', 'GOBCO',
        'GOPENO', 'PWDOPENH', 'LSCH', 'GNT2H', 'LSTH', 'GNT3H', 'LOPENO', 'GVJO',
        'LOBCO', 'GSTO', 'GNT2O', 'LSCO', 'GNT1O', 'GNT3O', 'AI', 'LNT2H', 'LVJH',
        'LNT1H', 'LSTO', 'DEFOBCS', 'PWDOBCH', 'MI-MH', 'MI', 'MI-AI', 'LNT3S',
        'PWDRSCS', 'PWDOBCS', 'PWDRVJS', 'LNT2O', 'DEFRNT2S', 'PWDROBCH', 'PWDRSCH',
        'LNT3H', 'LNT3O', 'LVJO', 'DEFRNT1S', 'PWDRNT2S', 'PWDSCH', 'DEFSCS',
        'PWDSCS', 'DEFRNT3S', 'PWDRNT1S', 'LNT1O', 'PWDRNT3S'
    ]
)
primary_seat_type = st.selectbox('Primary Seat Type', ['State Level Seats', 'Maharashtra State Seats', 'All India Seats',
       'Maharashtra State Seats Allotted to All India Candidature Candidates'])
secondary_seat_type = st.selectbox('Secondary Seat Type', ['State Level Seats',
       'Home University Seats Allotted to Home University Candidates',
       'Home University Seats Allotted to Other Than Home University Candidates',
       'Other Than Home University Seats Allotted to Other Than Home University Candidates',
       'Economically Weaker Section Seats',
       'Other Than Home University Seats Allotted to Home University Candidates',
       'ORPHAN Seats',
       'All India Seats Allotted to All India Candidature Candidates with JEE(Main) Score',
       'All India Seats Allotted to All India Candidature Candidates with MHT-CET Score',
       'Maharashtra State Seats Allotted to All India Candidature Candidates with JEE(Main) Score',
       'Diploma Candidates',
       'Maharashtra State Seats Allotted to All India Candidature Candidates with MHT-CET Score'])

score_type = st.selectbox('Score Type', ['MHT-CET', 'JEE(Main)', 'Merit'])
# Button to trigger prediction
if st.button('Predict College'):
    # Create a dictionary of input values
    input_data = {
        'rank': rank,
        'percentile': percentile,
        'branch': branch,
        'gender': gender,
        'category': category,
        'fulfillment': reverse_fulfillment_mapping[fulfillment],  # Map back to key
        'seat_type': seat_type,
        'primary_seat_type': primary_seat_type,
        'secondary_seat_type': secondary_seat_type,
        'score_type': score_type,
        'branch_code': '100219110'  # Set branch_code to None
    }

    # Predict the college
    predicted_college = predict_college(input_data)

    # Display the result
    st.success(f"The predicted college is: {predicted_college}")


Overwriting app.py


In [92]:
!wget -q -O - ipv4.icanhazip.com

34.170.172.136


In [None]:
! streamlit run app.py & npx localtunnel --port 8501


Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8501[0m
[34m  Network URL: [0m[1mhttp://172.28.0.12:8501[0m
[34m  External URL: [0m[1mhttp://34.170.172.136:8501[0m
[0m
your url is: https://fifty-nights-follow.loca.lt


In [93]:
data['branch'].unique()

array(['Civil Engineering', 'Computer Science and Engineering',
       'Information Technology', 'Electrical Engineering',
       'Electronics and Telecommunication Engg',
       'Instrumentation Engineering', 'Mechanical Engineering',
       'Food Technology', 'Oil and Paints Technology',
       'Paper and Pulp Technology', 'Petro Chemical Engineering',
       'Computer Engineering', 'Electrical Engg[Electronics and Power]',
       'Artificial Intelligence (AI) and Data Science', 'Industrial IoT',
       'Artificial Intelligence and Data Science', 'Chemical Engineering',
       'Textile Engineering / Technology',
       'Computer Science and Engineering(Data Science)',
       'Production Engineering', 'Textile Technology',
       'Pharmaceutical and Fine Chemical Technology',
       'Electronics and Computer Engineering', 'Agricultural Engineering',
       'Computer Science and Design', 'Plastic and Polymer Engineering',
       'Computer Science and Engineering(Artificial Intelligence