This Jupyter Notebook provides a comprehensive analysis of the admission data for various engineering colleges in India, including IITs, NITs, IIITs, and GFTIs. The data spans multiple rounds of admissions, capturing opening and closing ranks for different categories and seat types. The goal is to understand trends, preferences, and patterns in the admission process.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


In [None]:
import warnings

# Ignore all warnings
warnings.filterwarnings("ignore")

"We focused on rounds 3, 4, and 5, as these final rounds provide more detailed information about which students are likely to secure a college seat based on their respective ranks."

In [None]:
round_5 = pd.read_csv("2024_Round_5.csv")
round_4 = pd.read_csv("2024_Round_4.csv")
round_3 = pd.read_csv("2024_Round_3.csv")
    

In [None]:
#Lets check the last round
round_5.head()

In [None]:
round_5.info()

In [None]:
round_4.info()

In [None]:
round_3.info()

"Converting the Opening Rank and Closing Rank data types from object (string) to integer values."








In [None]:
# as the ranks are object(string) type we need to change them to int
pd.set_option('display.float_format', '{:.2f}'.format)
# round_5
round_5['Opening Rank'] = round_5['Opening Rank'].str.replace('P', '', regex=False).astype(float)
round_5['Closing Rank'] = round_5['Closing Rank'].str.replace('P', '', regex=False).astype(float)
round_5[['Opening Rank', 'Closing Rank']] = round_5[['Opening Rank', 'Closing Rank']].astype(int)
#round_4
round_4['Opening Rank'] = round_4['Opening Rank'].str.replace('P', '', regex=False).astype(float)
round_4['Closing Rank'] = round_4['Closing Rank'].str.replace('P', '', regex=False).astype(float)
round_4[['Opening Rank', 'Closing Rank']] = round_4[['Opening Rank', 'Closing Rank']].astype(int)
#round_3
round_3['Opening Rank'] = round_3['Opening Rank'].str.replace('P', '', regex=False).astype(float)
round_3['Closing Rank'] = round_3['Closing Rank'].str.replace('P', '', regex=False).astype(float)
round_3[['Opening Rank', 'Closing Rank']] = round_3[['Opening Rank', 'Closing Rank']].astype(int)

"Organizing the Opening and Closing Ranks for rounds 5, 4, and 3 into multi-level columns."








In [None]:
multi_columns = round_5[['Opening Rank', 'Closing Rank']]

# Step 2: Create the MultiIndex for these two columns
multi_columns.columns = pd.MultiIndex.from_tuples([('ROUND-5', 'Opening Rank'),('ROUND-5', 'Closing Rank')])

# Step 3: Drop the original columns and assign the multi-indexed ones
round_5.drop(['Opening Rank', 'Closing Rank'], axis=1, inplace=True)
round_5 = pd.concat([round_5, multi_columns], axis=1)


In [None]:
multi_columns = round_3[['Opening Rank', 'Closing Rank']]

# Step 2: Create the MultiIndex for these two columns
multi_columns.columns = pd.MultiIndex.from_tuples([('ROUND-3', 'Opening Rank'),('ROUND-3', 'Closing Rank')])

# Step 3: Drop the original columns and assign the multi-indexed ones
round_3.drop(['Opening Rank', 'Closing Rank'], axis=1, inplace=True)
round_3 = pd.concat([round_3, multi_columns], axis=1)

In [None]:
multi_columns = round_4[['Opening Rank', 'Closing Rank']]

# Step 2: Create the MultiIndex for these two columns
multi_columns.columns = pd.MultiIndex.from_tuples([('ROUND-4', 'Opening Rank'),('ROUND-4', 'Closing Rank')])

# Step 3: Drop the original columns and assign the multi-indexed ones
round_4.drop(['Opening Rank', 'Closing Rank'], axis=1, inplace=True)
round_4 = pd.concat([round_4, multi_columns], axis=1)

Creating a new DataFrame for cleaned data 

In [None]:
df = pd.DataFrame(columns = ['Institute','Course','Quota','Gender','Seat Type'])

In [None]:
df['Institute'] = round_5['Institute']
df['Course'] = round_5['Academic Program Name']
df['Quota'] = round_5['Quota']
df['Gender'] = round_5['Gender']
df['Seat Type'] = round_5['Seat Type']

In [None]:
df.head()

In [None]:
round_5.columns

In [None]:
round_4.columns

In [None]:
round_3.columns

In [None]:
df[('Round-5','Opening Rank')] = round_5[('ROUND-5', 'Opening Rank')]
df[('Round-5','Closing Rank')] = round_5[('ROUND-5', 'Closing Rank')]
df[('Round-4','Opening Rank')] = round_4[('ROUND-4', 'Opening Rank')]
df[('Round-4','Closing Rank')] = round_4[('ROUND-4', 'Closing Rank')]
df[('Round-3','Opening Rank')] = round_3[('ROUND-3', 'Opening Rank')]
df[('Round-3','Closing Rank')] = round_3[('ROUND-3', 'Closing Rank')]

In [None]:
df.head()

In [None]:
df.info()

In [None]:
df.isnull()

In [None]:
#there are no null values
# Find the number of missing values for each column
missing_values_per_column = df.isnull().sum()
print(missing_values_per_column)


"We included only gender-neutral data and removed female-only categories, as this provides a clearer picture of the overall competition."

In [None]:
df = df[df['Gender'] == 'Gender-Neutral']

In [None]:
df

In [None]:
df.info()


In [None]:
# making course name short so no confusion occurs
course = df['Course']

In [None]:
def clean_program_name(name):
    features = []
    
    if '4 Years' in name:
        features.append('4 Years')
    if '5 Years' in name:
        features.append('5 Years')
    if 'Dual Degree' in name:
        features.append('Dual Degree')

    base_name = name.split(' (')[0] 
    if features:
        return f"{base_name} ({', '.join(features)})"  
    else:
        return base_name  
    

In [None]:
course = course.apply(clean_program_name)

In [None]:
df['Course'] = course

## From here we seperated the colleges into IIT's, IIIT's, NIT's and GFTI's

In [None]:
IIT_df = df[:1702]
IIT_df

In [None]:
NIT_df = df[1702:5327]
NIT_df

In [None]:
IIIT_df = df[5327:5977]
IIIT_df

In [None]:
GFTI_df = df[5977:]
GFTI_df

## Till here we completed our data preprocessing by showing only necessary details for better understanding.

## IIT's


In [None]:
#we cannot compare iit and other colleges

In [None]:
IIT_df.head()

In [None]:
#as there are no CRL ranks we are seperating 'Seat type' and removing PWD for easy calculations


In [None]:
IIT_df['Seat Type'].value_counts()

In [None]:
IIT_df = IIT_df[IIT_df['Seat Type'].isin(['OPEN', 'EWS', 'OBC-NCL', 'ST', 'SC'])]

In [None]:
IIT_open_df = IIT_df[IIT_df['Seat Type'] == 'OPEN']
IIT_ews_df = IIT_df[IIT_df['Seat Type'] == 'EWS']
IIT_obc_df = IIT_df[IIT_df['Seat Type'] == 'OBC-NCL']
IIT_sc_df = IIT_df[IIT_df['Seat Type'] == 'SC']
IIT_st_df = IIT_df[IIT_df['Seat Type'] == 'ST']


In [None]:
IIT_open_df['Course'].value_counts()

In [None]:
IIT_open_df

## Here we calculated the average of the closing ranks of the last 3 rounds so to get an idea on what range of ranks we can get a specific branch of the college.

In [None]:
IIT_open_df['Average'] = (IIT_open_df[('Round-5', 'Closing Rank')] + 
                      IIT_open_df[('Round-4', 'Closing Rank')] + 
                      IIT_open_df[('Round-3', 'Closing Rank')]) / 3

Also there are Round-5 closing ranks which are less than round-4 and round -3 so we took avg 

In [None]:
IIT_open_df

In [None]:
top_10_courses = IIT_open_df.sort_values('Average').head(10)
top_10_courses

## So on the basis of average we sorted the top 10 courses students prefer to take.

In [None]:
top_10_courses = IIT_open_df.sort_values('Average')['Course'].head(10)

# Display the top 10 courses
print(top_10_courses)

In [None]:
#It lookes and mathematics and comuting and cse are most preferred branches


In [None]:
top_10_courses_df = IIT_open_df[IIT_df['Course'].isin(top_10_courses)]


In [None]:
print(top_10_courses_df.columns)

In [None]:
df_melted_top10 = top_10_courses_df.melt(id_vars=['Course'], 
                                         value_vars=[('Round-5', 'Closing Rank'), 
                                                     ('Round-4', 'Closing Rank'), 
                                                     ('Round-3', 'Closing Rank')], var_name='Round', value_name='Closing Rank')

plt.figure(figsize=(12, 6))
sns.barplot(data=df_melted_top10, x='Course', y='Closing Rank', hue='Round')
plt.xticks(rotation=45, ha='right')
plt.xlabel('Course')
plt.ylabel('Closing Rank')
plt.title('Comparison of Closing Ranks across Rounds for Top 10 Courses')
plt.legend(title='Round')
plt.tight_layout()
plt.show()


## So Here is the box plot showing the top 10 branches closing ranks V/s their cutoff variations across the 3 rounds.

# Crazy How the round-5 cutoff is less than round-4 and 3

## Possible Reasons

# Additional Seats Released:
Sometimes, institutions may release additional seats in later rounds due to various reasons such as higher-than-expected demand in earlier rounds or changes in seat availability.

# Dropouts:
If students who accepted seats in earlier rounds do not join, their seats may be reallocated in subsequent rounds, potentially leading to a lower closing rank.

# Reallocation:
If students withdraw from the program or accept other offers, institutions may readjust their intake, which could affect closing ranks.

# Changes in Eligibility:
There may be changes in eligibility criteria or reservation policies that can affect the distribution of seats in later rounds.


## Cse, Mathematics and Computing and AI are the most preferred branches


In [None]:
program_counts = IIT_open_df.groupby('Institute').size().reset_index(name='Program Count')

plt.figure(figsize=(12, 6))
plt.barh(program_counts['Institute'], program_counts['Program Count'], color='lightgreen')
plt.xlabel('Number of Programs Offered')
plt.title('Number of Programs Offered by Each Institute')
plt.gca().invert_yaxis() 
plt.show()

In [None]:
IIT_df.columns

In [None]:
IIT_open_closing_df = IIT_open_df.sort_values(('Round-5', 'Closing Rank'),ascending=False)
plot_IIT_df =IIT_open_closing_df.drop_duplicates(subset='Institute')
plot_IIT_df

In [None]:
plt.figure(figsize=(12, 6))
plt.barh(plot_IIT_df['Institute'], plot_IIT_df[('Round-5', 'Closing Rank')], color='skyblue')
plt.title('Closing Ranks by Institute')
plt.xlabel('Closing Rank')
plt.grid(axis='x')
plt.show()

## As Round 5 is the last round to get IIT's we analysed how closing ranks differ for each institute to get last seat into it.

In [None]:
plt.figure(figsize=(12, 6))
plt.barh(plot_IIT_df['Institute']+ ' - ' + plot_IIT_df['Course'], 
         plot_IIT_df[('Round-5', 'Closing Rank')], color='lightblue')
plt.xlabel('Closing Rank')
plt.title('Least Preferred Branches in Each College')
plt.gca().invert_yaxis()  
plt.show()

Here we tell that which is the leaast preferred brannch in each institute.

Observation:-
    
    1) Students prefer college over branches in some coleges.
    2) Architecture and material/metallurgical were the least preferred branch 
    
    

In [None]:
#now lets get most preferred branch

In [None]:
IIT_open_closing_df = IIT_open_df.sort_values(('Round-5', 'Closing Rank'),ascending=True)
plot_IIT_df =IIT_open_closing_df.drop_duplicates(subset='Institute')
plot_IIT_df

In [None]:
##All are computer science

In [None]:
IIT_open_closing_df = IIT_open_df.sort_values(('Round-5', 'Closing Rank'),ascending=True)
plot_IIT_df =IIT_open_closing_df.drop_duplicates(subset='Course').head(10)
plot_IIT_df

In [None]:
plt.figure(figsize=(12, 6))
plt.barh(plot_IIT_df['Institute']+ ' - ' + plot_IIT_df['Course'], 
         plot_IIT_df[('Round-5', 'Closing Rank')], color='lightblue')
plt.xlabel('Closing Rank')
plt.title('Top 10 Most Preferred Branches in Each College')
plt.gca().invert_yaxis()  
plt.show()

Here is the graph showing the closing ranks of the top 10 branches mopst preferred by the students.

We observed that:-
    
    1)IIT-B CSE is highly competitive and is closed  first followed by IIT-D CSE.
    2) All the closing ranks are under 1000 which tells that for the best college and branch you need a rank under 1000.
    3) Software related branches tops telling students are more inclined towards software development and it also tells that how industry is growing up.

## NIT's vs IIIT vs some good GFTI's

In [None]:
#Why IIIT and not IIIT's you will see down


In [None]:
# only good gfti which can compete with nit's and iiit's

In [None]:
NIT_df.head()

In [None]:
#removing HS for more competition

In [None]:
NIT_df = NIT_df[NIT_df['Quota'] == 'OS']

In [None]:
NIT_df = NIT_df[NIT_df['Seat Type'].isin(['OPEN', 'EWS', 'OBC-NCL', 'ST', 'SC'])]

In [None]:
NIT_df

In [None]:
NIT_open_df = NIT_df[NIT_df['Seat Type'] == 'OPEN']

In [None]:
NIT_open_df

In [None]:
#top 10 courses in NITs
NIT_open_closing_df = NIT_open_df.sort_values(('Round-5', 'Closing Rank'),ascending=True)
plot_NIT_df = NIT_open_closing_df.drop_duplicates(subset='Institute')
plot_NIT_df
                                              

## From here we got a crazy thing that Architecture round 5 closing rank has a drastic drop from lakhs to a 3 digit number.

In [None]:
# as u can see come architecture courses where round_5 is less than round_4,3
plt.figure(figsize=(12, 6))
plt.barh(plot_NIT_df['Institute'], plot_NIT_df[('Round-5', 'Closing Rank')], color='skyblue')
plt.title('Closing Ranks by Institute')
plt.xlabel('Closing Rank')
plt.grid(axis='x')
plt.show()

The graph tells us that NIT Trichy is the best among all the NIT's out and Mizoram the least.

Also geographical wise all the NIT's situated in north east are not so competitive as their closing ranks are much higher thatb the average.


In [None]:
NIT_open_df

In [None]:
program_counts = NIT_open_df.groupby('Institute').size().reset_index(name='Program Count')

plt.figure(figsize=(12, 6))
plt.barh(program_counts['Institute'], program_counts['Program Count'], color='blue')
plt.xlabel('Number of Programs Offered')
plt.title('Number of Programs Offered by Each Institute')
plt.gca().invert_yaxis() 
plt.show()

Many courses are offered by NIT Rourkela and still it is one of least college of closing ranks, this telles rourkela it is one of the best college

## Lets compare NIT's to IIIT Dharwad


In [None]:
IIIT_df

In [None]:
IIIT_df = IIIT_df[IIIT_df['Seat Type'].isin(['OPEN', 'EWS', 'OBC-NCL', 'ST', 'SC'])]

In [None]:
IIIT_dwd = IIIT_df[IIIT_df['Institute'] == 'Indian Institute of Information Technology(IIIT) Dharwad']

In [None]:
IIIT_dwd_open = IIIT_dwd[IIIT_dwd['Seat Type'] == 'OPEN']

In [None]:
IIIT_dwd_open

## The closing rank of the insttitute is around 40K, Let;s now vompare it with the NIT's 

In [None]:
#check this with nit's

In [None]:
df.columns

In [None]:
NIT_open_df

In [None]:
NIT_open_df.info()

In [None]:
IIIT_dwd_open.info()

In [None]:
# Assuming NIT_open_df and IIIT_dwd_open are your dataframes

NIT_greater_dwd = NIT_open_df[NIT_open_df[('Round-5', 'Closing Rank')] < 40000]
#Here we're checking NIT's which have closing rank less than 40K to get which NIT's are better than IIIT_dwd.

In [None]:
NIT_greater_dwd

In [None]:
# Extract unique values from the 'Institute' column in NIT_greater_dwd
better_colleges = NIT_greater_dwd['Institute'].drop_duplicates().reset_index(drop=True)

better_colleges


we considered IIEST shibpur, as a NIT because it closing ranks are near to a NIT level

## So here we got an amazing insight that all the NIT's have theor closing ranks under 40K which directly means that if ones rank is around 40K and is here he could definitely get a NIT. 

In [None]:
NIT_open_df['Institute'].nunique()

In [None]:
# lets check with cse as it more proritized (Computer Science and Engineering (4 Years))

In [None]:
NIT_cse_df = NIT_open_df[NIT_open_df['Course'] == 'Computer Science and Engineering (4 Years)']

In [None]:
NIT_cse_df

In [None]:
#  any nit cse is preffered than our clg

In [None]:
#"If you're getting a seat in NIT for CSE, I can't understand why you wouldn't take it—it's a golden opportunity! Also, there's no mention of caste here,

## Since all NIT's are better than IIITDWD we will check for GFTI

In [None]:
GFTI_df

In [None]:
GFTI_df = GFTI_df[GFTI_df['Seat Type'].isin(['OPEN', 'EWS', 'OBC-NCL', 'ST', 'SC'])]

In [None]:
GFTI_open_df = GFTI_df[GFTI_df['Seat Type'] == 'OPEN']

In [None]:
GFTI_open_df

In [None]:
#"Since these are GFTIs, let's disregard Home State, Other State, and All India quotas, and focus on the remaining criteria."

In [None]:
GFTI_greater_dwd = GFTI_open_df[GFTI_open_df[('Round-5', 'Closing Rank')] < 40000]

In [None]:
GFTI_greater_dwd

In [None]:
#round_5 is coming greater than round_3 and 4 so i will remove those

In [None]:
GFTI_df.columns

In [None]:
# Filter the DataFrame based on the specified conditions
cleaned_GFTI_greater_dwd = GFTI_greater_dwd[
    (GFTI_greater_dwd[('Round-5', 'Closing Rank')] > GFTI_greater_dwd[('Round-4', 'Closing Rank')]) &
     (GFTI_greater_dwd[('Round-5', 'Closing Rank')] > GFTI_greater_dwd[('Round-3', 'Closing Rank')]) &
     (GFTI_greater_dwd[('Round-4', 'Closing Rank')] > GFTI_greater_dwd[('Round-3', 'Closing Rank')])
]


cleaned_GFTI_greater_dwd


In [None]:
cleaned_GFTI_greater_dwd['Institute'].unique()

In [None]:
#if you are getting a seat in any of these colleges, you should definitely take it. It's a golden opportunity!.

"This is entirely based on students' preferences, so we can't definitively say which option 
is the best or not—though in some cases, we might have an idea 😄."

"IIIT Dharwad is an excellent college, and we have amazing seniors who are always ready to guide you if you need anything. So, please don’t feel discouraged by this—hard work in college is essential to achieving success."