# Pandas Basics - Dataframes

- **In this project, we will perform basic Exploratory Data Analysis (EDA) on the University Admissions Dataset**
- **Using the "university_admision.csv" included in the course package, write a python script to perform the following tasks:**
    - **1. Import the "university_admission.csv" file using Pandas** 
    - **2. Display the first and last 8 rows in the DataFrame**
    - **3. Obtain the shape of the DataFrame**
    - **4. Calculate the average, min and max values for the LOR and SOP Columns**
    - **5. Use the GRE Score as the pandas dataframe index**
- **Columns definitions are as listed below:**
    - GRE Scores (out of 340)
    - TOEFL Scores (out of 120)
    - University Rating (out of 5)
    - Statement of Purpose (SOP) 
    - Letter of Recommendation (LOR) Strength (out of 5)
    - Undergraduate GPA (out of 10)
    - Research Experience (either 0 or 1)
    - Chance of admission (ranging from 0 to 1)

In [2]:
import pandas as pd

In [3]:
# Import the "university_admission.csv" file using Pandas
university_admission_df = pd.read_csv('data/university_admission.csv')

In [4]:
# Display the first and last 8 rows in the DataFrame
university_admission_df.head(8)

Unnamed: 0,GRE_Score,TOEFL_Score,University_Rating,SOP,LOR,CGPA,Research,Chance_of_Admission
0,337,118,4,4.5,4.5,9.65,1,0.92
1,324,107,4,4.0,4.5,8.87,1,0.76
2,316,104,3,3.0,3.5,8.0,1,0.72
3,322,110,3,3.5,2.5,8.67,1,0.8
4,314,103,2,2.0,3.0,8.21,0,0.65
5,330,115,5,4.5,3.0,9.34,1,0.9
6,321,109,3,3.0,4.0,8.2,1,0.75
7,308,101,2,3.0,4.0,7.9,0,0.68


In [5]:
# Obtain the shape of the DataFrame
university_admission_df.shape

(1000, 8)

In [6]:
# Check if any values are missing in the data frame
university_admission_df.isnull().sum()

GRE_Score              0
TOEFL_Score            0
University_Rating      0
SOP                    0
LOR                    0
CGPA                   0
Research               0
Chance_of_Admission    0
dtype: int64

In [7]:
# Calculate the average, min and max values for the LOR Column
lor_avg = university_admission_df['LOR'].mean()
lor_min = university_admission_df['LOR'].min()
lor_max = university_admission_df['LOR'].max()

print(f'LOR avg:{lor_avg}, min:{lor_min}, max:{lor_max}')

LOR avg:3.484, min:1.0, max:5.0


In [8]:
# Calculate the average, min and max values for the SOP Column
sop_avg = university_admission_df['SOP'].mean()
sop_min = university_admission_df['SOP'].min()
sop_max = university_admission_df['SOP'].max()

print(f'SOP avg:{sop_avg}, min:{sop_min}, max:{sop_max}')

SOP avg:3.374, min:1.0, max:5.0


In [9]:
# Use the GRE Score as the pandas dataframe index
university_admission_df.set_index("GRE_Score", inplace = True)
university_admission_df

Unnamed: 0_level_0,TOEFL_Score,University_Rating,SOP,LOR,CGPA,Research,Chance_of_Admission
GRE_Score,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
337,118,4,4.5,4.5,9.65,1,0.92
324,107,4,4.0,4.5,8.87,1,0.76
316,104,3,3.0,3.5,8.00,1,0.72
322,110,3,3.5,2.5,8.67,1,0.80
314,103,2,2.0,3.0,8.21,0,0.65
...,...,...,...,...,...,...,...
332,108,5,4.5,4.0,9.02,1,0.87
337,117,5,5.0,5.0,9.87,1,0.96
330,120,5,4.5,5.0,9.56,1,0.93
312,103,4,4.0,5.0,8.43,0,0.73
