# **About this notebook**:
    
    A Simple Analysis of the IIT-NIT category-wise cut-off data. 
    The source of data is from the https://cutoffs.iitr.ac.in/ data. 
    DataSet Citation: Rajdeep Ghosh. (2022). <i>IIT-NIT category-wise cutoff data</i> [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DSV/3950564
    
**Note**: As per the source website, in the data for all previous years, only the final round Opening/Closing ranks are shown. For current year all the rounds available are shown. So a cumulative analysis on entire data aggregated over years (eg: avg. across years) might not project the actual cut-offs. 

### Let's begin by importing the required libraries

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import plotly.express as px

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

ModuleNotFoundError: No module named 'plotly'

### 1. Create the Data Frame

In [None]:
cutoffs = pd.read_csv('/kaggle/input/iit-nit-data/data.csv')

### 2. Details of the data frame 


In [None]:
cutoffs.info()

### 3. Begin the Analysis!
#### The years for which data is available:

In [None]:
list(cutoffs['year'].unique())

#### The Institutes for which data is available

In [None]:
cutoffs['institute_short'].unique()

#### The No. of Colleges (IITs & NITs) which had offered / are offering a particular program.


In [None]:
cutoffs[['degree_short','program_name','institute_short']].drop_duplicates().groupby(['degree_short','program_name']).size().reset_index(name='no_of_institutes')

### No. of Colleges offering a particular program year-wise based on available data


In [None]:
cutoffs[['degree_short','program_name','institute_short','year']].drop_duplicates().groupby(['degree_short','program_name','year']).size().reset_index(name='no_of_institutes')

#### Verifying if the above data is correct. Let's check for Int Msc. Mathematics course for the year 2021

In [None]:
# cutoffs[(cutoffs['program_name']=='Mathematics') & (cutoffs['degree_short']== 'Int Msc.') & (cutoffs['year']==2021)]
cutoffs.query('program_name == "Mathematics" & degree_short == "Int Msc." & year == 2021')['institute_short'].unique()

Returns only one institute name hence validating our previous result.

#### Cut-Offs of preparatory courses is available for these years

In [None]:
cutoffs[cutoffs['is_preparatory']==1]['year'].unique()

#### The different rounds for which cutoffs are available - per year

In [None]:
cutoffs[['round_no','year']].groupby(['year','round_no']).size().reset_index(name = 'counts')

This shows that data for only the final rounds are available for previous years.

### 4. Let's look at some interesting information!

**Reservation For Girls**: From the year 2018 to increase the admission of female candidates in the premier engineering colleges in the country by implementing IIT reservation for girls. This doesn't affect the existing no. of seats already present for boys. 

Based on this - let's look at how the admissions to different programs have increased in this category over years across institutes.

#### For ease of analysis, let's create another dataframe with 'Female-Only' pool data


In [None]:
GirlsOnlySeats = cutoffs[cutoffs['pool']=='Female-Only'].groupby(['institute_short','year']).size().reset_index(name='seat_type_counts')
GirlsOnlySeats.head(5)

#### Let's try to plot the Female Only pool admission type counts over the years. 
Choose from the *Year* index to display all or selected years. 

In [None]:
fig = px.line(GirlsOnlySeats, x="institute_short", y="seat_type_counts", color='year',markers=True, labels={'institute_short':'Institute Name', 'seat_type_counts':'Female Only Admission Type#', 'year' :'Year'})
fig.show()

If we select only the year '2018' we can see that there are no seats in this category in the NITs. From 2019 there are such seats across all colleges

### 5. Analyse the cut-offs for 2021!

Trying to analyse the cut-offs together for NITs and IITs might not project the correct information - say if we take the average opening rank / closing rank per year. The reason for this being - the range (ie, min/max values) is very wide. So let's try looking at it separately.

#### Let's look at the cut-offs for B.Tech programs from IITs for the year 2021

##### Creating a new dataframe for ease of analysis with just the required data

In [None]:
iit_cutoffs_2021 = cutoffs[(cutoffs["year"] == 2021)  & (cutoffs["institute_type"]=='IIT') & (cutoffs['degree_short']=='B.Tech')]


##### Let's look at the range of opening ranks

In [None]:
print('Min:',iit_cutoffs_2021['opening_rank'].min(),'Max:',iit_cutoffs_2021['opening_rank'].max())

##### This is a box plot of Opening Ranks for each IITs for different B.Tech courses (irrespective of categories)

In [None]:
fig = px.box(iit_cutoffs_2021, x="opening_rank", y="institute_short",hover_data=['degree_short','program_name'], labels={'institute_short':'Institute Name', 'opening_rank':'Opening Rank'})
fig.show()

We can hover over the box plot to get the range, median, IQR and outlier opening rank details.

##### This is a box plot of Closing Ranks for each IITs for different B.Tech courses (irrespective of categories)

In [None]:
fig = px.box(iit_cutoffs_2021, y="institute_short", x="closing_rank", hover_data=['degree_short','program_name'],labels={'institute_short':'Institute Name', 'closing_rank':'Closing Rank'})
fig.show()

We can hover over the box plot to get the range, median, IQR and outlier closing rank details.


#### Let's look at the cut-offs for NITs for the year 2021
##### Creating a new dataframe for ease of analysis with just the required data

In [None]:
nit_cutoffs_2021 = cutoffs[(cutoffs["year"] == 2021) & (cutoffs["institute_type"]=='NIT') & (cutoffs['degree_short']=='B.Tech')]

##### Let's look at the range of opening ranks

In [None]:
print('Min:',nit_cutoffs_2021['opening_rank'].min(),'Max:',nit_cutoffs_2021['opening_rank'].max())

Comparing this range with that of IITs - we can see that an aggregate analysis ignoring the type of institute might not be right in all cases.

##### This is a box plot of Opening Ranks for each NITs for different B.Tech courses (irrespective of categories)
As the range of values of opening_rank is huge, the y-axis is scaled accordingly.

In [None]:
fig = px.box(nit_cutoffs_2021, y="opening_rank", x="institute_short",hover_data=['degree_short','program_name'],labels={'institute_short':'Institute Name', 'opening_rank':'Opening Rank'} )
fig.update_yaxes(type='log',tick0=0, dtick=2)
fig.show()

We can hover over the box plot to get the range, median, IQR and outlier opening rank details.

##### This is a box plot of Closing Ranks for each NITs for different B.Tech courses (irrespective of categories)
As the range of values of opening_rank is wider, the y-axis is scaled accordingly

In [None]:
fig = px.box(nit_cutoffs_2021, y="closing_rank", x="institute_short",hover_data=['degree_short','program_name'],labels={'institute_short':'Institute Name', 'closing_rank':'Closing Rank'} )
fig.update_yaxes(type='log',tick0=0, dtick=2)
fig.show()

We can hover over the box plot to get the range, median, IQR and outlier closing rank details.

### 6. Conclusion

From the simple analysis done, we could see the following:
1. The admissions in the category of Female Only pool is increasing by each passing year - there by working towards addressing the gender gaps in premier institutes as well as promoting higher education for women.
2. The difference in the range of cut-offs for IITs and NITs. Eg. The maximum closing rank across IITs was 22715 and NITs was 802997 for B.Tech programs in the year 2021 without considering any quotas/categories.
3. Older institutes in each types (eg: IIT Bombay, IIT Madras, IIT Kharagpur etc.) (eg: NIT - Surathkal, Rourkela, Allahabad etc.)  have a lesser range/higher cut-offs of ranks compared to newer institutes in the same category.