### Prepping Data Challenge: Setting Grades (Week 5)

This week's challenge is looking to take the numeric score our students' have received and turn it to:

 - A letter grade (our students' parents prefer this) 
 - A score that goes towards their High School applications
The challenge's aim is understand how many points on average a student who receives an A gets. This will help us understand how many students would get a higher score than the average student receiving an A without receiving one. 

### Input

One file (the same Grades as 2022 Week 3) of grades by student and subject.

### Requirements
 - Input the data
 - Divide the students grades into 6 evenly distributed groups 
   - By evenly distributed, it means the same number of students gain each grade within a subject
 - Convert the groups to two different metrics:
   - The top scoring group should get an A, second group B etc through to the sixth group who receive an F
   - An A is worth 10 points for their high school application, B gets 8, C gets 6, D gets 4, E gets 2 and F gets 1.
 - Determine how many high school application points each Student has received across all their subjects 
 - Work out the average total points per student by grade 
   - ie for all the students who got an A, how many points did they get across all their subjects
 - Take the average total score you get for students who have received at least one A and remove anyone who scored less than this. 
 - Remove students who received an A
 - How many students scored more than the average and never received an A?
 - Output the data

In [1]:
import pandas as pd

In [2]:
# Input the data.
df = pd.read_csv('WK3- Grades input.csv')

In [3]:
df.head()

Unnamed: 0,Student ID,Maths,English,Spanish,Science,Art,History,Geography
0,1,66,97,85,75,76,94,76
1,2,84,85,62,87,68,75,74
2,3,88,68,69,81,92,89,75
3,4,65,97,96,89,98,77,62
4,5,86,97,94,98,67,77,97


In [4]:
#Divide the students grades into 6 evenly distributed groups 
#By evenly distributed, it means the same number of students gain each grade within a subject
df2 = pd.melt(df, id_vars=['Student ID'], var_name = 'Subject',value_name='Score')

In [5]:
df2.head()

Unnamed: 0,Student ID,Subject,Score
0,1,Maths,66
1,2,Maths,84
2,3,Maths,88
3,4,Maths,65
4,5,Maths,86


In [6]:
label = ['F', 'E', 'D', 'C', 'B', 'A']
#df2['Grade'] = pd.cut(df2['Score'], bins=6, labels=label)
df2['Grade'] = df2.groupby('Subject')['Score']\
                  .transform(lambda x: pd.qcut(x, q=6, labels=label))

In [7]:
#An A is worth 10 points for their high school application, B gets 8, C gets 6, D gets 4, E gets 2 and F gets 1.
points = {'A' : 10, 'B' : 8,'C' : 6, 'D' : 4, 'E' : 2, 'F' : 1}
df2['Points'] = df2['Grade'].map(points)

In [8]:
df2['Points'].dtypes

CategoricalDtype(categories=[1, 2, 4, 6, 8, 10], ordered=True)

In [9]:
df2['Points'] = df2['Points'].astype('int64')

In [10]:
#Determine how many high school application points each Student has received across all their subjects 
df2['Total Points per Student'] = df2.groupby(['Student ID'])['Points'].transform('sum')

In [11]:
#Work out the average total points per student by grade 
#ie for all the students who got an A, how many points did they get across all their subjects
df2['Avg student total points per grade'] = df2.groupby('Grade')['Total Points per Student'].transform('mean').round(2)

In [12]:
#Take the average total score you get for students who have received at least one A 
#and remove anyone who scored less than this. 
atleast_a = df2[df2['Grade']=='A']['Avg student total points per grade'].min()
df2 = df2.loc[df2['Total Points per Student'] >= atleast_a]

In [13]:
#Remove students who received an A
df2 = df2[df2['Grade'] != 'A']

In [14]:
#How many students scored more than the average and never received an A?
df2['students without A'] = df2.groupby('Student ID')['Points'].transform('sum')
without_a = df2[df2['students without A'] > atleast_a]['Student ID'].nunique()
without_a

15

In [15]:
df2 = df2[['Avg student total points per grade', 'Total Points per Student','Grade','Points','Subject','Score','Student ID']]

In [16]:
df2.head(10)

Unnamed: 0,Avg student total points per grade,Total Points per Student,Grade,Points,Subject,Score,Student ID
3,31.09,44,F,1,Maths,65,4
4,38.34,52,B,8,Maths,86,5
7,36.19,42,C,6,Maths,82,8
10,31.09,45,F,1,Maths,61,11
13,31.09,45,F,1,Maths,63,14
16,32.72,42,E,2,Maths,70,17
21,38.34,43,B,8,Maths,91,22
22,34.24,52,D,4,Maths,78,23
24,38.34,47,B,8,Maths,90,25
41,38.34,51,B,8,Maths,90,42


In [17]:
df2.to_csv('wk5-output.csv', index=False)