# Lab 4 - Oral Flag Feedback

Most of your are probably familiar with Dr. Malone's oral flag presentation review website.  The file `OralFlag_example.csv` contains an example table dump from associated Dr. Malone's database.  Your job is to make an number of summary files, as described below.

## <font color="red"> Problem 1 - Read and Inspect the File</font>

Read the file `OralFlag_example.csv` and inspect the columns, looking for any possible errors.  Note that the file is missing a header, so I have provided the column labels below.

In [1]:
%reset -f
import pandas as pd
from dfply import *
oral_flag_columns = ['Id', 'Time', 'Term', 'Submission', 'Group', 'Reviewer',
                     'Knowledge_of_Subject', 'Clear_Outcomes', 'Organization', 'Delivery',
                     'Comments_Most_Effective', 'Comments_Improvements']

In [2]:
oralFlag = pd.read_csv("Data/OralFlag_example.csv",names = oral_flag_columns)
oralFlag.head()

Unnamed: 0,Id,Time,Term,Submission,Group,Reviewer,Knowledge_of_Subject,Clear_Outcomes,Organization,Delivery,Comments_Most_Effective,Comments_Improvements
0,1,2011-01-01 00:00:00.000000000,Fall2018,1,1,Chris,3,2,1,3,I hate R almost as much as Excel!!1!,I hate R almost as much as Excel!!1!
1,2,2011-01-01 00:14:32.727272704,Fall2018,1,1,Brant,2,3,3,2,Silas looks young enough to be a student,Silas looks young enough to be a student
2,3,2011-01-01 00:29:05.454545408,Fall2018,1,1,April,3,1,3,1,I hate R almost as much as Excel!!1!,I hate R almost as much as Excel!!1!
3,4,2011-01-01 00:43:38.181818112,Fall2018,1,1,Tisha,2,1,2,3,Silas is dumb,Silas is dumb
4,5,2011-01-01 00:58:10.909090816,Fall2018,1,1,Todd,3,1,1,1,Iverson >> Bergen,Iverson >> Bergen


In [3]:
oralFlag["Comments_Most_Effective"] = [x if x!="Silas is dumb" else "Silas is Dork" for x in oralFlag.Comments_Most_Effective]


In [4]:
oralFlag["Comments_Most_Effective"].head()

0        I hate R almost as much as Excel!!1!
1    Silas looks young enough to be a student
2        I hate R almost as much as Excel!!1!
3                               Silas is Dork
4                           Iverson >> Bergen
Name: Comments_Most_Effective, dtype: object

## <font color="red"> Problem 2 - Scoring Table</font>

First, you need to make a scoring table, which provides a average of the scores for each group, along with the final score for that group.  The weighted score is the average of the instructor total score (assume the instructor is `"Todd"`) and the average total for the rest of the reviewers.

For example, consider the following scores.  The computation of the total score is illustrated below.  Make sure your final table has the following columns: `Group`, `Knowledge_of_Subject_mean`, `Clear_Outcomes_mean`,  `Organization_mean`, `Delivery_mean`, `overall_score`

**For full credit, do this in one pipe.**

In [5]:
# Example of how to compute the weighted score
import pandas as pd
scores = pd.DataFrame({'Reviewer':["Todd", "Chris", "Silas", "Tisha"],
                       'Knowledge_of_Subject':[2,3,4, 4],
                       'Clear_Outcomes':      [3,3,2, 4],  
                       'Organization':        [4,3,4, 4], 
                       'Delivery':            [2,3,3, 4]})

silas_score = 4+2+4+3
chris_score = 3+3+3+3
tisha_score = 4+4+4+4
todd_score = 2+3+4+2

overall_score = round(0.5*todd_score + 0.5*(silas_score + chris_score + tisha_score)/3, 2)
scores

Unnamed: 0,Reviewer,Knowledge_of_Subject,Clear_Outcomes,Organization,Delivery
0,Todd,2,3,4,2
1,Chris,3,3,3,3
2,Silas,4,2,4,3
3,Tisha,4,4,4,4


In [6]:
cols_to_gather = ["Group","Knowledge_of_Subject","Clear_Outcomes","Organization","Delivery"]
oralFlag_stacked = (oralFlag 
                 >> select("Group","Knowledge_of_Subject","Clear_Outcomes","Organization","Delivery","Reviewer"))
oralFlag_stacked >> head

Unnamed: 0,Group,Knowledge_of_Subject,Clear_Outcomes,Organization,Delivery,Reviewer
0,1,3,2,1,3,Chris
1,1,2,3,3,2,Brant
2,1,3,1,3,1,April
3,1,2,1,2,3,Tisha
4,1,3,1,1,1,Todd


In [10]:
(oralFlag_stacked >> gather('Criteria','Scores',columns_between('Knowledge_of_Subject','Delivery'))
                >>group_by(X.Group,X.Reviewer,X.Criteria)
                 >> summarize(Scores = X.Scores.sum())
                 >> spread(X.Criteria,X.Scores))
#                     >> mutate(Score = X.Clear_Outcomes + X.Delivery + X.Knowledge_of_Subject + X.Organization)
#                     >> drop(X.Clear_Outcomes , X.Delivery , X.Knowledge_of_Subject , X.Organization)
#                         >>spread(X.Reviewer,X.Score)
#                     >> mutate(average_score = (X.April + X.Brant + X.Chris + X.Jake + X.Jeff + 
#                                                    X.Nicole + X.Sam + X.Silas + X.Tisha + X.Todd)/10,
#                                          final_score = 0.5*X.Todd + 0.5*(X.April + X.Brant + X.Chris + X.Jeff+
#                                                         X.Jake + X.Nicole + X.Sam + X.Silas + X.Tisha)/9)
#                            >> gather('Reviewer','Scores',columns_between('April','Todd'))
#                              >> drop(X.Scores,X.Reviewer)
#                             >> group_by(X.Group)
#                             >>summarize(average_score = X.average_score.mean(), final_score = X.final_score.mean())
#                     >> mutate(final_score = X.final_score.round(2))
# )



Unnamed: 0,Reviewer,Group,Clear_Outcomes,Delivery,Knowledge_of_Subject,Organization
0,April,1,1,1,3,3
1,April,2,2,1,3,1
2,April,3,3,1,1,3
3,April,4,2,2,2,1
4,April,5,3,2,3,2
5,April,6,1,1,2,3
6,April,7,1,3,2,3
7,April,8,2,3,3,1
8,April,9,2,3,1,2
9,April,10,1,1,2,2


## <font color="red"> Problem 3 -Comment Files</font>

Your final task is to make a comment data frame for each group and write each file to a csv.  Each file should have three columns: Group, Comments_Most_Effective, Comments_Needed_Improvements; with the second and third columns containing all the respective comments for that group. 

To complete this task, you should

1. Write a lambda function that takes the original table and a group number and returns a table containing the comments for that group.
2. Write a lambda function that takes the output from 1. and reshapes the data so the comments from the same reviewer are on the same line.
3. Write a lambda function that composes the last two function.
4. Write a for loop that constructs and writes out a table for each group.  Recall that you can write a Pandas `df` to a csv file using `df.to_csv('filename', index=False)`.

In [8]:
comments = lambda df,group :(df>>select(X.Group,X.Comments_Most_Effective,X.Comments_Improvements,X.Reviewer)
                             >>group_by(X.Group)
                            >>filter_by(X.Group ==group))
comments(oralFlag,2)

Unnamed: 0,Group,Comments_Most_Effective,Comments_Improvements,Reviewer
10,2,Iverson >> Bergen,Silas looks young enough to be a student,Chris
11,2,I hate R almost as much as Excel!!1!,I hate R almost as much as Excel!!1!,Brant
12,2,Iverson >> Bergen,I hate R almost as much as Excel!!1!,April
13,2,Silas is Dork,Silas looks young enough to be a student,Tisha
14,2,I hate R almost as much as Excel!!1!,Iverson >> Bergen,Todd
15,2,I hate R almost as much as Excel!!1!,Silas is dumb,Silas
16,2,Iverson >> Bergen,Python is the best,Jeff
17,2,Silas is Dork,I hate R almost as much as Excel!!1!,Nicole
18,2,Silas looks young enough to be a student,Silas is dumb,Jake
19,2,Python is the best,Silas looks young enough to be a student,Sam


In [9]:
dfs = [comments(oralFlag,num).to_csv('Dataframe{}'.format(num),index = False) for num in range(1,11)]

In [10]:
dfs

[None, None, None, None, None, None, None, None, None, None]