9.	Perform the following operations in python on given dataset 
[applicants.csv: Students who applied and exam_scores.csv: Standardized test results]
a.	Clean up inconsistent formatting in names and missing test scores.
b.	Join on ApplicationID to combine personal data with scores.
c.	Normalize test scores.
d.	Convert Admission_Status to binary labels (1 = admitted).
e.	Remove duplicate applications and fix invalid test score entries.

In [1]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

In [3]:
applicants_df = pd.read_csv('applicants.csv')
scores_df = pd.read_csv('exam_scores.csv')

In [5]:
applicants_df

Unnamed: 0,ApplicationID,Name,GPA,Admission_Status
0,501,Rita,3.6,Admitted
1,502,Sam,3.8,Rejected
2,503,Liam,3.0,Admitted
3,504,Nina,3.5,Rejected
4,505,Tom,,Admitted


In [7]:
scores_df

Unnamed: 0,ApplicationID,SAT,ACT
0,501,1350.0,29
1,502,1450.0,31
2,503,1250.0,27
3,504,,30
4,506,1100.0,25


# a. Clean inconsistent formatting in names and missing test scores

In [15]:
applicants_df['Name'] = applicants_df['Name'].str.strip().str.title()

In [17]:
scores_df.dropna(subset=['SAT', 'ACT'], inplace=True)

# b. Join on ApplicationID to combine personal data with scores

In [22]:
merged_df = pd.merge(applicants_df, scores_df, on='ApplicationID', how='inner')

In [39]:
merged_df

Unnamed: 0,ApplicationID,Name,GPA,Admission_Status,SAT,ACT,SAT_Norm,ACT_Norm,Admission_Label
0,501,Rita,3.6,Admitted,1350.0,29,0.5,0.5,1
1,502,Sam,3.8,Rejected,1450.0,31,1.0,1.0,0
2,503,Liam,3.0,Admitted,1250.0,27,0.0,0.0,1


# c. Normalize test scores

In [25]:
scaler = MinMaxScaler()
merged_df[['SAT_Norm', 'ACT_Norm']] = scaler.fit_transform(merged_df[['SAT', 'ACT']])

# d. Convert Admission_Status to binary labels (1 = Admitted, 0 = Rejected)

In [28]:
merged_df['Admission_Label'] = merged_df['Admission_Status'].str.strip().str.lower().map({'admitted': 1, 'rejected': 0})

# e. Remove duplicate applications and fix invalid score entries

In [31]:
merged_df.drop_duplicates(subset='ApplicationID', keep='first', inplace=True)

In [33]:
merged_df = merged_df[(merged_df['SAT'] >= 400) & (merged_df['SAT'] <= 1600)]
merged_df = merged_df[(merged_df['ACT'] >= 1) & (merged_df['ACT'] <= 36)]

In [35]:
print(merged_df[['ApplicationID', 'Name', 'GPA', 'SAT', 'ACT', 'SAT_Norm', 'ACT_Norm', 'Admission_Label']])

   ApplicationID  Name  GPA     SAT  ACT  SAT_Norm  ACT_Norm  Admission_Label
0            501  Rita  3.6  1350.0   29       0.5       0.5                1
1            502   Sam  3.8  1450.0   31       1.0       1.0                0
2            503  Liam  3.0  1250.0   27       0.0       0.0                1
