Write a python script which takes these inputs - 
1. Rank 
2. Seat Type 
3. Gender 

The output of the script should be the list of college branches that the rank will get selected into using the dataset.

BONUS question - Suppose you are given 2 such files. How would your script change in that case? Explain your approach how would you find the chances of selection for a rank in a branch for that case.

<br>
<br>
<br>

In [None]:
# dependencies
! pip install pandas

In [1]:
import pandas as pd

<br>
<br>
<br>

In [2]:
# loading data
data = pd.read_csv("../data/round1_josaa_22.csv")
data.head()

Unnamed: 0,Institute,Academic Program Name,Quota,Seat Type,Gender,Opening Rank,Closing Rank,Round
0,Indian Institute of Technology Bhubaneswar,"Civil Engineering (4 Years, Bachelor of Techno...",AI,OPEN,Gender-Neutral,9193,11771,1
1,Indian Institute of Technology Bhubaneswar,"Civil Engineering (4 Years, Bachelor of Techno...",AI,OPEN,Female-only (including Supernumerary),16138,20164,1
2,Indian Institute of Technology Bhubaneswar,"Civil Engineering (4 Years, Bachelor of Techno...",AI,EWS,Gender-Neutral,1605,1744,1
3,Indian Institute of Technology Bhubaneswar,"Civil Engineering (4 Years, Bachelor of Techno...",AI,EWS,Female-only (including Supernumerary),3159,3159,1
4,Indian Institute of Technology Bhubaneswar,"Civil Engineering (4 Years, Bachelor of Techno...",AI,OBC-NCL,Gender-Neutral,3997,4297,1


In [3]:
# checking shape
data.shape

(10022, 8)

In [4]:
# checking dataypes
data.dtypes

Institute                object
Academic Program Name    object
Quota                    object
Seat Type                object
Gender                   object
Opening Rank             object
Closing Rank             object
Round                     int64
dtype: object

In [5]:
# checking duplicate values
data.duplicated().any()

False

In [6]:
# checking null values
data.isna().any().any()

False

<br>
<br>
<br>

Assumption:
- removing letter P from rank feature

In [7]:
# clean rank feature
def rank_to_int(x):
    k = str(x)
    if 'P' in k:
        k = k[:-1]
    return int(k)

In [8]:
# convert to int

data['Opening Rank'] = data['Opening Rank'].apply(rank_to_int)
data['Closing Rank'] = data['Closing Rank'].apply(rank_to_int)

In [9]:
data.head(2)

Unnamed: 0,Institute,Academic Program Name,Quota,Seat Type,Gender,Opening Rank,Closing Rank,Round
0,Indian Institute of Technology Bhubaneswar,"Civil Engineering (4 Years, Bachelor of Techno...",AI,OPEN,Gender-Neutral,9193,11771,1
1,Indian Institute of Technology Bhubaneswar,"Civil Engineering (4 Years, Bachelor of Techno...",AI,OPEN,Female-only (including Supernumerary),16138,20164,1


<br>
<br>

In [10]:
# records with opening rank greater than closing rank
data[data['Opening Rank'] > data['Closing Rank']]

Unnamed: 0,Institute,Academic Program Name,Quota,Seat Type,Gender,Opening Rank,Closing Rank,Round
299,Indian Institute of Technology Bombay,Metallurgical Engineering and Materials Scienc...,AI,OPEN (PwD),Gender-Neutral,169,60,1
914,Indian Institute of Technology Kharagpur,"Mechanical Engineering (4 Years, Bachelor of T...",AI,OPEN (PwD),Gender-Neutral,113,87,1
1284,Indian Institute of Technology Kanpur,"Civil Engineering (4 Years, Bachelor of Techno...",AI,OPEN (PwD),Gender-Neutral,102,58,1
1368,Indian Institute of Technology Kanpur,"Mechanical Engineering (4 Years, Bachelor of T...",AI,OPEN (PwD),Gender-Neutral,160,57,1
1862,Indian Institute of Technology Roorkee,"Mechanical Engineering (4 Years, Bachelor of T...",AI,OPEN (PwD),Gender-Neutral,157,88,1
1963,Indian Institute of Technology (ISM) Dhanbad,Electronics and Communication Engineering (4 Y...,AI,OPEN (PwD),Gender-Neutral,176,90,1


Removing above records

In [11]:
data = data[~(data['Opening Rank'] > data['Closing Rank'])]
data.head(2)

Unnamed: 0,Institute,Academic Program Name,Quota,Seat Type,Gender,Opening Rank,Closing Rank,Round
0,Indian Institute of Technology Bhubaneswar,"Civil Engineering (4 Years, Bachelor of Techno...",AI,OPEN,Gender-Neutral,9193,11771,1
1,Indian Institute of Technology Bhubaneswar,"Civil Engineering (4 Years, Bachelor of Techno...",AI,OPEN,Female-only (including Supernumerary),16138,20164,1


In [12]:
data.shape

(10016, 8)

<br>
<br>

Assumptions:
- The rank of the candidate should be between Opening Rank and Closing Rank (both included)
- If gender is male, show 'Gender Neutral' seats
- If gender is female, show 'Gender Neutral' + 'Female-only' seats
- Returning Institute Name + Branch Name for clearity

<br>

In [13]:
def get_branch_list(rank: int, seat_type: str, gender: str) -> list:
    # list of colleges
    colleges = []

    # required data
    required_data = pd.DataFrame()

    if gender == 'male':
        required_data = data[(data['Seat Type'] == seat_type) & (data['Gender'] == 'Gender-Neutral') & (data['Opening Rank'] <= rank) & (data['Closing Rank'] >= rank)]
    elif gender == 'female':
        required_data = data[(data['Seat Type'] == seat_type) & (data['Opening Rank'] <= rank) & (data['Closing Rank'] >= rank)]

    # adding to the list
    for i in range(len(required_data)):
        colleges.append({
            'institute': required_data.iloc[i]['Institute'],
            'program': required_data.iloc[i]['Academic Program Name']
        })

    return colleges

In [14]:
# calling our function
branch_list = get_branch_list(rank=10_000, seat_type='OPEN', gender='male')

print(len(branch_list))
branch_list[:2]

117


[{'institute': 'Indian Institute of Technology Bhubaneswar',
  'program': 'Civil Engineering (4 Years, Bachelor of Technology)'},
 {'institute': 'Indian Institute of Technology Bhubaneswar',
  'program': 'Metallurgical and Materials Engineering (4 Years, Bachelor of Technology)'}]

<br>