# Lab 7 - Summarizing a Health Survey

## Background

The file health_survey.csv contains the responses to a series of health-related questions.  Dr. Bergen, Director of the Statistical Consulting Center at WSU, needs you to prepare the attached data for analysis.  Please perform the following steps to prepare the required csv file.

Dr. Bergen had a follow-up meeting with his client and it was determined that we need to redo the file construction from an earlier assignment.  Recall that the file `health_survey.csv` contains the responses to a series of health-related questions. We need to code the responses as 1-5 using the definition below.   Some of the columns need a reverse coding (see the *Needs Reverse Coding?* column in `ReverseCodingItems.csv`.

The following table describes the coding that should be used for both types of questions.

|Old Label                     |New Coded Value  |Reverse Coding
|------------------------------|-----------------|----------------
|"Strongly Disagree"           |1                |5
|"Somewhat Disagree"           |2                |4
|"Neither Agree nor Disagree"  |3                |3
|"Somewhat Agree"              |4                |2
|"Strongly Agree"              |5                |1




## Tasks 

#### Task 1  

Look at the questions that need reverse coding and explain why it makes sense to reverse the coding on these items.

In [1]:
import pandas as pd
from dfply import *

In [2]:
survey = pd.read_csv("./data/health_survey.csv")
survey.head()

Unnamed: 0.1,Unnamed: 0,F1,F5,F2,F1.1,F2.1,F6,F4,F3,F5.1,...,F2.9,F3.4,F4.3,F2.10,F1.7,F6.4,F4.4,F5.7,F3.5,F2.11
0,1,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,...,Somewhat Agree,Somewhat Disagree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree
1,2,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Neither Agree nor Disagree,Neither Agree nor Disagree,...,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Neither Agree nor Disagree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree
2,3,Strongly Agree,Neither Agree nor Disagree,Somewhat Agree,Strongly Agree,Strongly Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,...,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Strongly Agree,Strongly Disagree,Somewhat Agree
3,4,Somewhat Agree,Somewhat Agree,Strongly Agree,Somewhat Agree,Strongly Agree,Neither Agree nor Disagree,Neither Agree nor Disagree,Somewhat Disagree,Somewhat Agree,...,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Neither Agree nor Disagree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Disagree,Somewhat Agree
4,5,Strongly Agree,Strongly Disagree,Neither Agree nor Disagree,Strongly Agree,Somewhat Agree,Strongly Disagree,Strongly Agree,Somewhat Agree,Neither Agree nor Disagree,...,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Strongly Agree,Somewhat Disagree,Somewhat Agree


In [3]:
rc = pd.read_csv("./data/ReverseCodingItems.csv")
rc.head()

Unnamed: 0,Question,Construct,Question # on Qualtrics Survey,Needs Reverse Coding?,Column Name
0,"In the future, I plan to participate in a comm...",1,1,No,F1
1,Individuals are responsible for their own misf...,5,2,Yes,F5
2,When tryng to understand the position of other...,2,3,No,F2
3,I plan to become involved in my community,1,4,No,F1.1
4,I can communicate well with others,2,5,No,F2.1


Reverse coding is required when a question or statement is worded in a negative manner, i.e. a strongly agree is a bad thing and strongly disagree is a good thing.

#### Task 2 

You will need to redo the file construction, but now need to take the reverse coding into account. **For each step, paste a screenshot of the JMP dialog or formula associated with the outcome.**

1.  *Stack* the columns.

![](img/survey/image1.png)

In [4]:
survey_stack = (survey >>
               gather("Column Name","Respnse",columns_from(X.F1)))
survey_stack.head()

Unnamed: 0.1,Unnamed: 0,Column Name,Respnse
0,1,F1,Somewhat Agree
1,2,F1,Somewhat Agree
2,3,F1,Strongly Agree
3,4,F1,Somewhat Agree
4,5,F1,Strongly Agree


2.  Read in and join `ReverseCodingItems.csv` to add a new column called `NeedsReverse` to the health survey dataframe. 

![](img/survey/image2.png)

In [18]:
survey_join = (survey_stack 
               >> left_join(rc, by='Column Name')
               >> select("Column Name", X.Respnse, "Needs Reverse Coding?","Unnamed: 0",X.Construct)
              )
survey_join.columns = ["Question","Response","Needs_reverse_coding","Index","Construct"]
survey_join.head()

Unnamed: 0,Question,Response,Needs_reverse_coding,Index,Construct
0,F1,Somewhat Agree,No,1,1
1,F1,Somewhat Agree,No,2,1
2,F1,Strongly Agree,No,3,1
3,F1,Somewhat Agree,No,4,1
4,F1,Strongly Agree,No,5,1


3.  Make a new column called `TempCodedValue` by recoding the `Question`s column.

![](img/survey/image3.png)

In [26]:

survey_join1 = (survey_join
               >> mutate(Coded = if_else(X.Response == "Strongly Agree",5,
                                         if_else(X.Response == "Somewhat Agree",4,
                                                if_else(X.Response == "Neither Agree nor Disagree",3,
                                                       if_else(X.Response == "Somewhat Disagree",2,
                                                              if_else(X.Response == "Strongly Disagree",1,0)
                                                              )
                                                       )
                                                )
                                        )
                        )
               >> mutate(Final_code = if_else(X.Needs_reverse_coding=="Yes",6-X.Coded,X.Coded))
               >> select(X.Question,X.Final_code,X.Index,X.Construct)
               >> group_by(X.Construct,X.Index)
               >> summarize(Mean_score = X.Final_code.mean())
              )
              
survey_join1


Unnamed: 0,Index,Construct,Mean_score
0,1,1,3.875
1,2,1,3.875
2,3,1,4.500
3,4,1,4.000
4,5,1,4.625
5,6,1,4.500
6,7,1,4.500
7,8,1,4.875
8,9,1,4.500
9,10,1,4.375


4.  Make a new column called `TempCodedValue` by recoding the `Question`s column.

![](img/survey/image4.png)

5.  Make a new column called `RecodedValue` that holds the correct
    value for each question based on the value in `NeedsReverse`.

![](img/survey/image5.png)

In [None]:
# Your code here

6.  Make a new column by *Recoding* the Question Types to *F1, F2, ..., F6. *

![](img/survey/image6.png)

In [None]:
# Your code here

7.  *Aggregate* and *Unstack*.

![](img/survey/image7.png)
![](img/survey/image8.png)

In [27]:
survey_unstack = (survey_join1 
                  >> spread(X.Construct,X.Mean_score))
survey_unstack.head()

Unnamed: 0,Index,1,2,3,4,5,6
0,1,3.875,4.0,3.333333,3.4,3.5,3.6
1,2,3.875,3.916667,3.166667,3.4,3.375,4.0
2,3,4.5,3.833333,3.166667,3.6,4.0,3.4
3,4,4.0,4.5,2.0,3.0,3.75,3.2
4,5,4.625,3.916667,3.666667,3.8,4.5,3.8


#### Task 3

Repackage all of your code in one pipe then write the final output to `health_survey_summary.csv`


In [28]:
survey.columns = ["Participants",'F1', 'F5', 'F2', 'F1.1', 'F2.1', 'F6', 'F4', 'F3',
       'F5.1', 'F1.2', 'F2.2', 'F6.1', 'F2.3', 'F4.1', 'F2.4', 'F5.2', 'F2.5',
       'F6.2', 'F1.3', 'F2.6', 'F5.3', 'F4.2', 'F2.7', 'F3.1', 'F2.8', 'F5.4',
       'F3.2', 'F1.4', 'F3.3', 'F1.5', 'F5.5', 'F6.3', 'F1.6', 'F5.6', 'F2.9',
       'F3.4', 'F4.3', 'F2.10', 'F1.7', 'F6.4', 'F4.4', 'F5.7', 'F3.5',
       'F2.11']
rc.columns = ['Question', 'Construct', 'Question # on Qualtrics Survey',
       'Needs_reverse_coding', 'Column_name']

In [33]:
health_survey_summary = (survey 
                         >> gather("Column_name","Response",columns_from(X.F1))
                         >> left_join(rc, by='Column_name')
                         >> select(X.Column_name, X.Response, X.Needs_reverse_coding, X.Participants, X.Construct)
                         >> mutate(Coded = if_else(X.Response == "Strongly Agree",5,
                                         if_else(X.Response == "Somewhat Agree",4,
                                                if_else(X.Response == "Neither Agree nor Disagree",3,
                                                       if_else(X.Response == "Somewhat Disagree",2,
                                                              if_else(X.Response == "Strongly Disagree",1,0))))))
                         >> mutate(Final_code = if_else(X.Needs_reverse_coding=="Yes",6-X.Coded,X.Coded))
                         >> select(X.Column_name,X.Final_code,X.Participants, X.Construct)
                         >> group_by(X.Construct,X.Participants)
                         >> summarize(Mean_score = X.Final_code.sum())
                         >> spread(X.Construct,X.Mean_score)
                        )
health_survey_summary.columns = ["Participants",'F1', 'F2', 'F3', 'F4', 'F5', 'F6']
health_survey_summary.head()

Unnamed: 0,Participants,F1,F2,F3,F4,F5,F6
0,1,31,48,20,17,28,18
1,2,31,47,19,17,27,20
2,3,36,46,19,18,32,17
3,4,32,54,12,15,30,16
4,5,37,47,22,19,36,19


In [34]:
health_survey_summary.to_csv('./health_summary_survey.csv', index=False)