# Lab 4 - Oral Flag Feedback

Most of your are probably familiar with Dr. Malone's oral flag presentation review website.  The file `OralFlag_example.csv` contains an example table dump from associated Dr. Malone's database.  Your job is to make an number of summary files, as described below.

## <font color="red"> Problem 1 - Read and Inspect the File</font>

Read the file `OralFlag_example.csv` and inspect the columns, looking for any possible errors.  Note that the file is missing a header, so I have provided the column labels below.

In [1]:
import pandas as pd
from dfply import *

In [2]:
oral_flag_columns = ['Id', 'Time', 'Term', 'Submission', 'Group', 'Reviewer',
                     'Knowledge_of_Subject', 'Clear_Outcomes', 'Organization', 'Delivery',
                     'Comments_Most_Effective', 'Comments_Improvements']

In [3]:
df = pd.read_csv("./data/OralFlag_example.csv",header=None,names=oral_flag_columns)

In [33]:
cleanup_nums = {"Comments_Most_Effective": {"Silas is dumb": "Silas is a dork", "Iverson >> Bergen": "Iverson > Bergen"},
                "Comments_Improvements": {"Silas is dumb": "Silas is a dork", "Iverson >> Bergen": "Iverson > Bergen"}}
df.replace(cleanup_nums, inplace=True)
df.head(20)

Unnamed: 0,Id,Time,Term,Submission,Group,Reviewer,Knowledge_of_Subject,Clear_Outcomes,Organization,Delivery,Comments_Most_Effective,Comments_Improvements
0,1,2011-01-01 00:00:00.000000000,Fall2018,1,1,Chris,3,2,1,3,I hate R almost as much as Excel!!1!,I hate R almost as much as Excel!!1!
1,2,2011-01-01 00:14:32.727272704,Fall2018,1,1,Brant,2,3,3,2,Silas looks young enough to be a student,Silas looks young enough to be a student
2,3,2011-01-01 00:29:05.454545408,Fall2018,1,1,April,3,1,3,1,I hate R almost as much as Excel!!1!,I hate R almost as much as Excel!!1!
3,4,2011-01-01 00:43:38.181818112,Fall2018,1,1,Tisha,2,1,2,3,Silas is a dork,Silas is a dork
4,5,2011-01-01 00:58:10.909090816,Fall2018,1,1,Todd,3,1,1,1,Iverson > Bergen,Iverson > Bergen
5,6,2011-01-01 01:12:43.636363520,Fall2018,1,1,Silas,2,2,2,1,I hate R almost as much as Excel!!1!,I hate R almost as much as Excel!!1!
6,7,2011-01-01 01:27:16.363636480,Fall2018,1,1,Jeff,2,2,3,2,Python is the best,I hate R almost as much as Excel!!1!
7,8,2011-01-01 01:41:49.090909184,Fall2018,1,1,Nicole,3,2,1,3,Iverson > Bergen,Silas looks young enough to be a student
8,9,2011-01-01 01:56:21.818181888,Fall2018,1,1,Jake,1,2,1,3,Python is the best,I hate R almost as much as Excel!!1!
9,10,2011-01-01 02:10:54.545454592,Fall2018,1,1,Sam,1,2,2,3,Iverson > Bergen,Python is the best


## <font color="red"> Problem 2 - Scoring Table</font>

First, you need to make a scoring table, which provides an average of the scores for each group, along with the final score for that group.  The weighted score is the average of the instructor total score (assume the instructor is `"Todd"`) and the average total for the rest of the reviewers.

For example, consider the following scores.  The computation of the total score is illustrated below.  Make sure your final table has the following columns: `Group`, `Knowledge_of_Subject_mean`, `Clear_Outcomes_mean`,  `Organization_mean`, `Delivery_mean`, `overall_score`

**For full credit, do this in one pipe.**

In [28]:
# Example of how to compute the weighted score
scores = pd.DataFrame({'Reviewer':["Todd", "Chris", "Silas", "Tisha"],
                       'Knowledge_of_Subject':[2,3,4, 4],
                       'Clear_Outcomes':      [3,3,2, 4],  
                       'Organization':        [4,3,4, 4], 
                       'Delivery':            [2,3,3, 4]})

silas_score = 4+2+4+3
chris_score = 3+3+3+3
tisha_score = 4+4+4+4
todd_score = 2+3+4+2

overall_score = round(0.5*todd_score + 0.5*(silas_score + chris_score + tisha_score)/3, 2)
overall_score

12.33

In [45]:
df_head = df.head()

#not_instructors = ["Chris","Brant","April","Tisha","Silas","Jeff","Nicole","Jake","Sam"]

(df 
 >> select(X.Submission, X.Group,X.Reviewer,X.Knowledge_of_Subject,X.Clear_Outcomes,X.Organization,X.Delivery)
 >> gather("category","score",columns_from('Knowledge_of_Subject'), add_id=True)
 >> mutate(instructor = if_else(X.Reviewer == 'Todd', 'instructor', 'other'))
 >> group_by(X.Submission,X.Group,X.category,X.instructor)
 >> summarize(average_parts = X.score.mean())
 >> ungroup()
 >> group_by(X.Submission,X.Group,X.category)
 >> summarize(final_score = X.average_parts.mean())
 >> spread(X.category,X.final_score)
 >> mutate(Overall_score = (X.Delivery+X.Clear_Outcomes+X.Knowledge_of_Subject+X.Organization))
)

Unnamed: 0,Group,Submission,Clear_Outcomes,Delivery,Knowledge_of_Subject,Organization,Overall_score
0,1,1,1.444444,1.666667,2.555556,1.5,7.166667
1,2,1,1.777778,2.055556,2.666667,1.833333,8.333333
2,3,1,1.944444,1.944444,2.5,2.055556,8.444444
3,4,1,1.222222,2.055556,1.666667,2.444444,7.388889
4,5,1,2.0,2.055556,2.111111,2.111111,8.277778
5,6,1,2.055556,2.055556,1.5,1.666667,7.277778
6,7,1,1.5,1.666667,2.166667,2.333333,7.666667
7,8,1,2.5,1.5,2.444444,1.944444,8.388889
8,9,1,1.333333,2.5,2.333333,1.777778,7.944444
9,10,1,2.166667,1.277778,2.611111,1.444444,7.5


## <font color="red"> Problem 3 - Comment files</font>

Your final task is to make a comment data frame for each group and write each file to a csv.  Each file should have three columns: Group, Comments_Most_Effective, Comments_Needed_Improvements; with the second and third columns containing all the respective comments for that group. 

To complete this task, you should

1. Write a lambda function that takes the original table and a group number and returns a table containing the comments for that group.
2. Write a lambda function that takes the output from 1. and reshapes the data so the comments from the same reviewer are on the same line.
3. Write a lambda function that composes the last two function.
4. Write a for loop that constructs and writes out a table for each group.  Recall that you can write a Pandas `df` to a csv file using `df.to_csv('filename', index=False)`.

In [31]:
get_comments = lambda gr,df : (df >> filter_by(X.Group==gr) >> select(X.Comments_Most_Effective,X.Comments_Improvements))

In [32]:
get_comments(1,df)

Unnamed: 0,Comments_Most_Effective,Comments_Improvements
0,I hate R almost as much as Excel!!1!,I hate R almost as much as Excel!!1!
1,Silas looks young enough to be a student,Silas looks young enough to be a student
2,I hate R almost as much as Excel!!1!,I hate R almost as much as Excel!!1!
3,Silas is a dork,Silas is a dork
4,Iverson > Bergen,Iverson > Bergen
5,I hate R almost as much as Excel!!1!,I hate R almost as much as Excel!!1!
6,Python is the best,I hate R almost as much as Excel!!1!
7,Iverson > Bergen,Silas looks young enough to be a student
8,Python is the best,I hate R almost as much as Excel!!1!
9,Iverson > Bergen,Python is the best


I'm not sure what else you want from the lambda functions. I have comments from the same reviewer on the same line already.

In [41]:
for i in range(1,11):
    comments = get_comments(i,df)
    comments.to_csv('./comments_{}.csv'.format(i), index=False)