# Case Study on Measures of Central Tendency and Dispersion
An Institution wishes to find out their student’s ability in maths, reading and
writing skills. The Institution wants to do an exploratory study to check the
following information.
1. Find out how many males and females participated in the test.
2. What do you think about the students' parental level of education?
3. Who scores the most on average for math, reading and writing based on

● Gender

● Test preparation course

4. What do you think about the scoring variation for math, reading and writing
based on

● Gender

● Test preparation course

5. The management needs your help to give bonus points to the top 25% of
students based on their maths score, so how will you help the management
to achieve this.

In [80]:
import pandas as pd
import numpy as np


In [61]:
s=pd.read_csv("StudentsPerformance.csv")

In [62]:
s

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,female,group C,some high school,free/reduced,none,0,17,10
1,female,group B,high school,free/reduced,none,8,24,23
2,female,group B,some high school,free/reduced,none,18,32,28
3,female,group B,some college,standard,none,11,38,32
4,female,group C,some college,free/reduced,none,22,39,33
...,...,...,...,...,...,...,...,...
995,male,group E,some college,standard,completed,99,87,81
996,male,group A,some college,standard,completed,100,96,86
997,male,group D,some college,standard,completed,100,97,99
998,male,group E,associate's degree,free/reduced,completed,100,100,93


In [63]:
s.isnull().sum()

gender                         0
race/ethnicity                 0
parental level of education    0
lunch                          0
test preparation course        0
math score                     0
reading score                  0
writing score                  0
dtype: int64

# 1. Find out how many males and females participated in the test.

In [64]:
s["gender"].value_counts()

female    518
male      482
Name: gender, dtype: int64

# 2. What do you think about the students' parental level of education?

In [65]:
s["parental level of education"].value_counts()

some college          226
associate's degree    222
high school           196
some high school      179
bachelor's degree     118
master's degree        59
Name: parental level of education, dtype: int64

In [66]:
s.groupby('race/ethnicity')['parental level of education'].value_counts()

race/ethnicity  parental level of education
group A         some high school               24
                high school                    18
                some college                   18
                associate's degree             14
                bachelor's degree              12
                master's degree                 3
group B         high school                    48
                associate's degree             41
                some high school               38
                some college                   37
                bachelor's degree              20
                master's degree                 6
group C         associate's degree             78
                some college                   69
                high school                    64
                some high school               49
                bachelor's degree              40
                master's degree                19
group D         some college                   67
      

# Insights
*Most parents are in the category of 'Some college.

*But with respect to ethincity its completely different.

*when we consider both graduates and post graduates both together Group C have more educated persons but it also have more high school finished parents.

# 3. Who scores the most on average for math, reading and writing based on

#  a) Gender

In [7]:
s["math score"].mean()

66.001

In [8]:
s.groupby('gender')['math score'].mean()

gender
female    63.376448
male      68.821577
Name: math score, dtype: float64

In [73]:
s['reading score'].mean()

69.195

In [9]:
s.groupby('gender')['reading score'].mean()

gender
female    72.590734
male      65.545643
Name: reading score, dtype: float64

In [74]:
s['writing score'].mean()

68.119

In [10]:
s.groupby('gender')['writing score'].mean()

gender
female    72.467181
male      63.446058
Name: writing score, dtype: float64

# Insights
*Avearge of females are dominated in writing score and reading score.

*Males average maths score is more than females also total average maths score.

# b)Test preparation course

In [11]:
s.groupby('test preparation course')['writing score'].mean()

test preparation course
completed    74.684358
none         64.457944
Name: writing score, dtype: float64

In [12]:
s.groupby('test preparation course')['math score'].mean()

test preparation course
completed    69.96648
none         63.78972
Name: math score, dtype: float64

In [13]:
s.groupby('test preparation course')['reading score'].mean()

test preparation course
completed    74.175978
none         66.417445
Name: reading score, dtype: float64

# Insights
* The students who prepared with test preparation course have more average in all three exams than students not participated in test preparation course.

# 4. What do you think about the scoring variation for math, reading and writing based on

In [20]:
s.var()

math score       242.411410
reading score    216.279254
writing score    231.842682
dtype: float64

# a) Gender

In [17]:
s.groupby('gender')['math score'].var()

gender
female    256.958593
male      211.889097
Name: math score, dtype: float64

In [18]:
s.groupby('gender')['reading score'].var()

gender
female    207.677438
male      200.211010
Name: reading score, dtype: float64

In [87]:
s.groupby('gender')['writing score'].var()

gender
female    220.369327
male      202.413924
Name: writing score, dtype: float64

# Insights
*Variance is more for females in all three cases that meeans extreme values or outliers are more in female students.

# b)Test preparation course

In [85]:
s.groupby('test preparation course')['writing score'].var()

test preparation course
completed    175.202612
none         226.251739
Name: writing score, dtype: float64

In [82]:
s.groupby('test preparation course')['math score'].var()

test preparation course
completed    210.884027
none         246.668662
Name: math score, dtype: float64

In [16]:
s.groupby('test preparation course')['reading score'].var()

test preparation course
completed    183.265864
none         213.419851
Name: reading score, dtype: float64

# Insights
*The students particpated in test preparation course have less variance.

*That means inorder secure average scores or above the students must complete their test preparation course

In [22]:
s['math score'].describe()

count    1000.000000
mean       66.001000
std        15.569567
min         0.000000
25%        57.000000
50%        66.000000
75%        77.000000
max       100.000000
Name: math score, dtype: float64

# 5. The management needs your help to give bonus points to the top 25% of students based on their maths score, so how will you help the management to achieve this.

In [101]:
t=s.nlargest(250,'math score')
print('The list of top 25% students based on their math score:')
t[['gender','math score']]


The list of top 25% students based on their math score:


Unnamed: 0,gender,math score
515,female,100
516,female,100
517,female,100
996,male,100
997,male,100
...,...,...
428,female,77
429,female,77
849,male,77
850,male,77
