## An Institution wishes to find out their student’s ability in maths, reading and writing skills. The Institution wants to do an exploratory study to check the following information.

1. Find out how many males and females participated in the test.

2. What do you think about the students' parental level of education?

3. Who scores the most on average for math, reading and writing based on

    ● Gender

    ● Test preparation course
    

4. What do you think about the scoring variation for math, reading and writing based on

    ● Gender

    ● Test preparation course
    

5. The management needs your help to give bonus points to the top 25% of students based on their maths score, so how will you help the managemen to achieve this.

#### Importing required libraries

In [1]:
import numpy as np
import pandas as pd

#### Reading the data set into python

Note: Both dataset and python notebook are in same directory

In [2]:
data=pd.read_csv("StudentsPerformance.csv")

Checking random sample of dataset

In [3]:
data.sample(5)

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
517,female,group E,associate's degree,standard,none,100,100,100
583,male,group C,some college,standard,none,53,44,42
429,female,group E,some high school,standard,none,77,79,80
865,male,group E,associate's degree,free/reduced,completed,78,74,72
251,female,group D,some high school,standard,completed,64,60,74


## 1. Find out how many males and females participated in the test.

In [4]:
data["gender"].value_counts()

female    518
male      482
Name: gender, dtype: int64

value_counts() function returns object containing counts of unique values ie gender.

##### Insights: There are more female participants than male participants.

## 2. What do you think about the students' parental level of education?

In [5]:
data.groupby("parental level of education", as_index=False).size()

Unnamed: 0,parental level of education,size
0,associate's degree,222
1,bachelor's degree,118
2,high school,196
3,master's degree,59
4,some college,226
5,some high school,179


size() function count the number of distinct elements along a given axis, here "parental level of education"

##### Insights: As per the data analysed, the categories 1)have gone to some college or 2) have an associate degreee have more count. Only fewer parents have masters degree


## 3.  Who scores the most on average for math, reading and writing based on

### ● Gender

In [6]:
x=data.groupby('gender')[['math score']].mean()
a=x.sort_values(by=['math score'],ascending=False)
a.head(1)

Unnamed: 0_level_0,math score
gender,Unnamed: 1_level_1
male,68.821577


In [7]:
x=data.groupby('gender')[['reading score']].mean()
a=x.sort_values(by=['reading score'],ascending=False)
a.head(1)

Unnamed: 0_level_0,reading score
gender,Unnamed: 1_level_1
female,72.590734


In [8]:
x=data.groupby('gender')[['writing score']].mean()
a=x.sort_values(by=['writing score'],ascending=False)
a.head(1)

Unnamed: 0_level_0,writing score
gender,Unnamed: 1_level_1
female,72.467181


##### Insights: Although male's average of performance on math was better than females, females showed higher averages on writing and reading

### ● Test preparation course

In [9]:
x=data.groupby('test preparation course')[['math score']].mean()
a=x.sort_values(by=['math score'],ascending=False)
a.head(1)

Unnamed: 0_level_0,math score
test preparation course,Unnamed: 1_level_1
completed,69.96648


In [10]:
x=data.groupby('test preparation course')[['reading score']].mean()
a=x.sort_values(by=['reading score'],ascending=False)
a.head(1)

Unnamed: 0_level_0,reading score
test preparation course,Unnamed: 1_level_1
completed,74.175978


In [11]:
x=data.groupby('test preparation course')[['writing score']].mean()
a=x.sort_values(by=['writing score'],ascending=False)
a.head(1)

Unnamed: 0_level_0,writing score
test preparation course,Unnamed: 1_level_1
completed,74.684358


##### Insights: Those who have completed  test preparation course have scored the most on average for math, reading and writing.

## 4. What do you think about the scoring variation for math, reading and writing based on


### ● Gender



In [12]:
data.groupby('gender')[['math score']].std()

Unnamed: 0_level_0,math score
gender,Unnamed: 1_level_1
female,16.029928
male,14.556411


In [13]:
data.groupby('gender')[['reading score']].std()

Unnamed: 0_level_0,reading score
gender,Unnamed: 1_level_1
female,14.411018
male,14.149594


In [14]:
data.groupby('gender')[['writing score']].std()

Unnamed: 0_level_0,writing score
gender,Unnamed: 1_level_1
female,14.844842
male,14.227225


##### Insights:  When math score is considered, male shows lower deviation from mean value compared to females. Howerver both genders reading and writing scores show somewhat similar deviation from mean value.

### ● Test preparation course

In [15]:
data.groupby('test preparation course')[['math score']].std()

Unnamed: 0_level_0,math score
test preparation course,Unnamed: 1_level_1
completed,14.521847
none,15.705689


In [16]:
data.groupby('test preparation course')[['reading score']].std()

Unnamed: 0_level_0,reading score
test preparation course,Unnamed: 1_level_1
completed,13.537572
none,14.608896


In [17]:
data.groupby('test preparation course')[['writing score']].std()

Unnamed: 0_level_0,writing score
test preparation course,Unnamed: 1_level_1
completed,13.236412
none,15.041667


##### Insights: From the data it can be inferred that those who have completed test preparation course for math, reading and writing  have showed lower deviation from mean than those who have not.

## 5. The management needs your help to give bonus points to the top 25% of students based on their maths score, so how will you help the management to achieve this.

In [18]:
data.describe()

Unnamed: 0,math score,reading score,writing score
count,1000.0,1000.0,1000.0
mean,66.001,69.195,68.119
std,15.569567,14.706436,15.226381
min,0.0,17.0,10.0
25%,57.0,59.0,58.0
50%,66.0,70.0,69.0
75%,77.0,79.25,79.0
max,100.0,100.0,100.0


The describe() method is used for calculating some statistical data like percentile, mean and std of the numerical values of the Series or DataFrame. From the analysis it can be inferred that top 25% of students, ie 75 percentile should have math score above 77.

In [19]:
pd.qcut(data['math score'], q=4).value_counts()

(66.0, 77.0]      266
(-0.001, 57.0]    265
(57.0, 66.0]      238
(77.0, 100.0]     231
Name: math score, dtype: int64

qcut function tries to divide the data into equal-sized bins. The bins are defined using percentiles, based on the distribution and not on the actual numeric edges of the bins. Here the math scores is divided into 4 equal percentile.

Note: It is important to know that ‘(‘ parenthesis means ‘not included’ and ‘]’ means included.

##### Insights: From the analysed data it can be inferred that top 25% of students(75 percentile) have markes above 77 and there are a total of 231 such students. So management should give bonus points to to those top 231 students.