## Open up the scrubbed data file
This notebeook opens up the file named 'responses_scrubbed_tagged.csv' and asks the user to tag each respondent's free text responses in order to be able to later quantify how many respondent's mentioned certain common complaints or suggestions.

If starting this tagging process from scratch for the first time, make sure to run the 'cleanup.ipnyb' script first, and then copy the 'responses_scrubbed.csv' file created by that script to a duplicate file named 'responses_scrubbed_tagged.csv' file.  This notebook will modify that latter file.

If you have already tagged responses previously, and wish to add newly gathered responses to the 'responses_scrubbed.csv' file and tag those, manually copy any new lines from the 'responses_scrubbed.csv' file into the existing 'responses_scrubbed_tagged.csv' file in order not to lose previous tagging.

In [1]:
import numpy as np
import pandas as pd
from IPython.display import display, Markdown, Latex, clear_output

# open the file... this should be a duplicate of the responses_scrubbed.csv generated by the cleanup script to start with
df = pd.read_csv('responses_scrubbed_tagged.csv', index_col=None, parse_dates=['date'])

#don't truncate results
pd.set_option('display.max_rows', None)

## Categorize every free text response from every respondent

In [2]:
# fields containing free text responses
fields = ['feelings', 'problem_areas', 'suggested_improvements', 'additional_comments', 'business_additional_comments', 'final_comments']

# what row index number to start from (useful if interrputed)
start_from = 555 #default to zero if starting from beginning

def get_unique_values_from_csv_field(column, df):
    combo_vals = df[column].tolist() # contains comma-separated sets of tags
    single_vals = [] # will contain single tags
    for combo in combo_vals:
        list_of_vals = str(combo).split(',') # break up comma-seprated values into single values
        list_of_vals = [val.strip() for val in list_of_vals] # remove whitespace
        single_vals.extend(list_of_vals) # add single values to the ultimate list
        
    # return the value counts
    return single_vals
    
# start with a blank list of tags
problem_tags = []
suggestion_tags = []

# count rows we loop through
counter = 0

# loop through every row in the data
for index, row in df.iterrows():
    
    #skip to starting row
    if counter < start_from:
        counter += 1
        continue

    # determine whether we have already tagged this respondent's free text responses
    my_problems = ''
    my_suggestions = ''
    try:
        my_problems = df.loc[index]['problem_tags']
        my_suggestions = df.loc[index]['suggestion_tags']
        problem_tags = get_unique_values_from_csv_field('problem_tags', df)
        suggestion_tags = get_unique_values_from_csv_field('suggestion_tags', df)
        #display('...got em')        
    except:
        # these columns must not exist yet, so create them
        df['problem_tags'] = ''
        df['suggestion_tags'] = ''
        df['problem_tags'] = df['problem_tags'].astype(str)
        df['suggestion_tags'] = df['suggestion_tags'].astype(str)
        #display('...created em')
    
    clear_output() # wipe the output clear    
    
    display(Markdown('## Response #{}'.format(index)))
    
    # display each row's free text fields:
    for field in fields:
        # display this person's response to this question if they gave one
        value = row[field]
        if not pd.isnull(row[field]):
            display(Markdown('### {} comment\n{}'.format(field, value)))
        
    # ask the user for tags
    display(Markdown('## Tag this response'))
    display(Markdown("Enter tags that represent this person's complaints and suggestions.  Re-use tags used in others' responses, where applicable."))
    
    # display commonly used tags
    display(Markdown('### Common complaints'))
    display(pd.Series(problem_tags).value_counts())

    display(Markdown('### Common suggestions'))
    display(pd.Series(suggestion_tags).value_counts())
    
    # allow editor to tag this response
    my_problems = input("Complaints (or hit Enter to skip): ").strip()
    my_suggestions = input("Suggestions (or hit Enter to skip): ").strip()
    
    # store the tags in this record
    df.at[index,'problem_tags'] = my_problems
    df.at[index,'suggestion_tags'] = my_suggestions
    
    # update lists of tags that have already been used
    problem_tags = get_unique_values_from_csv_field('problem_tags', df)
    suggestion_tags = get_unique_values_from_csv_field('suggestion_tags', df)

    # save these tags to show later
    #p = [val.strip() for val in my_problems.split(',')]
    #s = [val.strip() for val in my_suggestions.split(',')]
    #problem_tags.extend(p)
    #suggestion_tags.extend(s)
    #problem_tags.sort()
    #suggestion_tags.sort()
    
    # increment counter
    counter += 1
    
    # save to file immediately just in case we have to interrupt the script mid-way
    df.to_csv('responses_scrubbed_tagged.csv', index=False)
    

## Response #555

### problem_areas comment
The area around Croton Commons and the Mobil station -- there is no pedestrian-accessible way to get to Croton Commons from the sidewalk on the opposite side of the street, especially when pushing a stroller. You have to go in the car entrance. The Mobil station also interrupts the sidewalk in between the Croton Landing area and Harmon.

### suggested_improvements comment
Improve the sidewalk accessibility around the Mobil station and Croton Commons

## Tag this response

Enter tags that represent this person's complaints and suggestions.  Re-use tags used in others' responses, where applicable.

### Common complaints

speeding                                123
missing sidewalks                       121
nan                                     118
cpa                                      89
sidewalk condition                       82
aggressive driving                       75
s riverside                              73
driver awareness                         72
129                                      53
maple                                    49
cleveland                                46
road surface                             46
cyclists breaking rules                  45
municipal place                          38
missing crosswalks                       32
bushes on sidewalk                       29
mt airy                                  26
shoprite                                 26
grand                                    26
narrow roads                             26
benedict                                 25
five corners                             22
van wyck                        

### Common suggestions

nan                                                      148
bike lanes                                               104
more sidewalks                                            98
enforce speed                                             67
maintain sidewalks                                        63
reduce speed                                              45
educate cyclists                                          38
more crosswalks                                           37
speed bumps                                               25
no bike lanes                                             17
maintain crosswalks                                       16
sidewalk on mt airy s                                     13
enforce crosswalks                                        13
reduce on-street parking                                  13
more stop signs                                           12
better lighting                                           12
more traffic lights     

Complaints (or hit Enter to skip): missing crosswalks, croton commons
Suggestions (or hit Enter to skip): more crosswalks


## Save changes to a new CSV file

In [3]:
df.to_csv('responses_scrubbed_tagged.csv', index=False)