Purpose: Examination of the various factors in regards to New York City examination scores for fundamental learning areas.

Step 1: Import/Examination of Data

In [7]:
import pandas as pd
data = pd.read_csv("StudentsPerformance.csv")

In [11]:
data.head()

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,female,group B,bachelor's degree,standard,none,72,72,74
1,female,group C,some college,standard,completed,69,90,88
2,female,group B,master's degree,standard,none,90,95,93
3,male,group A,associate's degree,free/reduced,none,47,57,44
4,male,group C,some college,standard,none,76,78,75


In [50]:
data.describe()

Unnamed: 0,math score,reading score,writing score,combined_score
count,1000.0,1000.0,1000.0,1000.0
mean,66.089,69.169,68.054,67.770667
std,15.16308,14.600192,15.195657,14.257326
min,0.0,17.0,10.0,9.0
25%,57.0,59.0,57.75,58.333333
50%,66.0,70.0,69.0,68.333333
75%,77.0,79.0,79.0,77.666667
max,100.0,100.0,100.0,100.0


In [51]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 9 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   gender                       1000 non-null   object 
 1   race/ethnicity               1000 non-null   object 
 2   parental level of education  1000 non-null   object 
 3   lunch                        1000 non-null   object 
 4   test preparation course      1000 non-null   object 
 5   math score                   1000 non-null   int64  
 6   reading score                1000 non-null   int64  
 7   writing score                1000 non-null   int64  
 8   combined_score               1000 non-null   float64
dtypes: float64(1), int64(3), object(5)
memory usage: 70.4+ KB


In [42]:
data.gender.value_counts()

gender
female    518
male      482
Name: count, dtype: int64

In [41]:
for col in data.columns:
    print(data[col].value_counts())

gender
female    518
male      482
Name: count, dtype: int64
race/ethnicity
group C    319
group D    262
group B    190
group E    140
group A     89
Name: count, dtype: int64
parental level of education
some college          226
associate's degree    222
high school           196
some high school      179
bachelor's degree     118
master's degree        59
Name: count, dtype: int64
lunch
standard        645
free/reduced    355
Name: count, dtype: int64
test preparation course
none         642
completed    358
Name: count, dtype: int64
math score
65    36
62    35
69    32
59    32
61    27
      ..
24     1
28     1
33     1
18     1
8      1
Name: count, Length: 81, dtype: int64
reading score
72    34
74    33
64    32
67    30
73    30
      ..
28     1
26     1
17     1
32     1
40     1
Name: count, Length: 72, dtype: int64
writing score
74    35
70    33
68    31
73    28
80    27
      ..
28     1
35     1
19     1
22     1
23     1
Name: count, Length: 77, dtype: int64


# --------------QUESTIONS-------------#
1. Do certain ethnic groups score higher on average?
2. Do certain groups score higher in particular subjects?

In [65]:
# Examining scores based on assigned race/ethnicity
# Creating cummulative score
data["combined_score"] = (data['reading score'] + data['writing score'] + data['math score'])/3

gender_ethinicity_scores = pd.pivot_table(data, values=['math score', 'reading score', 'writing score', 'combined_score'], index=['gender'],
                       columns=['race/ethnicity'], aggfunc="mean")

ethnicity_scores = pd.pivot_table(data, values=['math score', 'reading score', 'writing score', 'combined_score'], index=None,
                       columns=['race/ethnicity'], aggfunc="mean")

print(ethnicity_scores)
print(gender_ethinicity_scores)

race/ethnicity    group A    group B    group C    group D    group E
combined_score  62.992509  65.468421  67.131661  69.179389  72.752381
math score      61.629213  63.452632  64.463950  67.362595  73.821429
reading score   64.674157  67.352632  69.103448  70.030534  73.028571
writing score   62.674157  65.600000  67.827586  70.145038  71.407143
               combined_score                                              \
race/ethnicity        group A    group B    group C    group D    group E   
gender                                                                      
female              65.129630  67.509615  68.585185  71.439276  74.062802   
male                61.540881  63.000000  65.249400  66.987469  71.478873   

               math score                                              \
race/ethnicity    group A    group B    group C    group D    group E   
gender                                                                  
female          58.527778  61.403846  62.0333

In [60]:
scores_by_education = pd.pivot_table(data, values=['combined_score'], index='parental level of education',
                       columns=None, aggfunc="mean")

print(scores_by_education.sort_values(by='combined_score', ascending=False))

                             combined_score
parental level of education                
master's degree                   73.598870
bachelor's degree                 71.923729
associate's degree                69.569069
some college                      68.476401
some high school                  65.108007
high school                       63.096939


# --------------QUESTIONS-------------#
1. How many degrees are held in each category by each ethnic group

In [69]:
scores_by_education_ethnicity = pd.pivot_table(data, values=['combined_score'], index=['parental level of education'],
                       columns=['race/ethnicity'], aggfunc="mean")

scores_by_education_ethnicity_count = pd.pivot_table(data, values=['combined_score'], index=['parental level of education'],
                       columns=['race/ethnicity'], aggfunc="count")

print(scores_by_education_ethnicity)
print(scores_by_education_ethnicity_count)



                            combined_score                                   \
race/ethnicity                     group A    group B    group C    group D   
parental level of education                                                   
associate's degree               63.880952  67.975610  69.376068  69.333333   
bachelor's degree                67.861111  71.300000  73.241667  69.869048   
high school                      61.277778  61.500000  62.328125  63.477273   
master's degree                  63.333333  74.833333  69.035088  76.478261   
some college                     64.888889  64.378378  67.806763  70.437811   
some high school                 59.861111  64.289474  63.156463  68.613333   

                                        
race/ethnicity                 group E  
parental level of education             
associate's degree           73.974359  
bachelor's degree            75.592593  
high school                  69.545455  
master's degree              79.083333  
s

Step 2: Partitioning of Data

Step 3: Analysis of Data