## Open up the scrubbed data file
This notebeook opens up the file named `responses_scrubbed.csv` and asks the user to tag each respondent's free text responses in order to be able to later quantify how many respondent's mentioned certain common complaints or suggestions.

If starting this tagging process from scratch for the first time, make sure to run the `cleanup.ipnyb` script first.  

If you have already tagged responses previously, make a backup of the file named `responses_scrubbed_tagged.csv`.  Run the `cleanup.ipynb` notebook, then run this notebook and make sure to set the `start_from` variable to the appropriate new row number in the code.  Then use a spreadsheet program to copy the tags in the `problem_tags` and `suggestion_tags` from your backup file into the newly-generated `responses_scrubbed_tagged.csv` file.


In [1]:
import numpy as np
import pandas as pd
from IPython.display import display, Markdown, Latex, clear_output

# open the file... this should be a responses_scrubbed.csv generated by the cleanup script to start with
df = pd.read_csv('responses_scrubbed_tagged.csv', index_col=None, parse_dates=['date'])

#don't truncate results
pd.set_option('display.max_rows', None)

## Which row to start tagging?
If this is the first time running this notebook, set it to zero.  If you have previously tagged rows, set this to the first row number you have not yet tagged.

In [None]:
# what row index number to start from (useful if interrputed)
start_from = 555 #default to zero if starting from beginning


## Categorize every free text response from every respondent

In [None]:
# fields containing free text responses
fields = ['feelings', 'problem_areas', 'suggested_improvements', 'additional_comments', 'business_additional_comments', 'final_comments']

def get_unique_values_from_csv_field(column, df):
    combo_vals = df[column].tolist() # contains comma-separated sets of tags
    single_vals = [] # will contain single tags
    for combo in combo_vals:
        list_of_vals = str(combo).split(',') # break up comma-seprated values into single values
        list_of_vals = [val.strip() for val in list_of_vals] # remove whitespace
        single_vals.extend(list_of_vals) # add single values to the ultimate list
        
    # return the value counts
    return single_vals
    
# start with a blank list of tags
problem_tags = []
suggestion_tags = []

# count rows we loop through
counter = 0

# loop through every row in the data
for index, row in df.iterrows():
    
    #skip to starting row
    if counter < start_from:
        counter += 1
        continue

    # determine whether we have already tagged this respondent's free text responses
    my_problems = ''
    my_suggestions = ''
    try:
        my_problems = df.loc[index]['problem_tags']
        my_suggestions = df.loc[index]['suggestion_tags']
        problem_tags = get_unique_values_from_csv_field('problem_tags', df)
        suggestion_tags = get_unique_values_from_csv_field('suggestion_tags', df)
        #display('...got em')        
    except:
        # these columns must not exist yet, so create them
        df['problem_tags'] = ''
        df['suggestion_tags'] = ''
        df['problem_tags'] = df['problem_tags'].astype(str)
        df['suggestion_tags'] = df['suggestion_tags'].astype(str)
        #display('...created em')
    
    clear_output() # wipe the output clear    
    
    display(Markdown('## Response #{}'.format(index)))
    
    # display each row's free text fields:
    for field in fields:
        # display this person's response to this question if they gave one
        value = row[field]
        if not pd.isnull(row[field]):
            display(Markdown('### {} comment\n{}'.format(field, value)))
        
    # ask the user for tags
    display(Markdown('## Tag this response'))
    display(Markdown("Enter tags that represent this person's complaints and suggestions.  Re-use tags used in others' responses, where applicable."))
    
    # display commonly used tags
    display(Markdown('### Common complaints (out of {} total)'.format(pd.Series(problem_tags).value_counts().sum())))
    display(pd.Series(problem_tags).value_counts())

    display(Markdown('### Common suggestions (out of {} total)'.format(pd.Series(suggestion_tags).value_counts().sum())))
    display(pd.Series(suggestion_tags).value_counts())
    
    # allow editor to tag this response
    my_problems = input("Complaints (or hit Enter to skip): ").strip()
    my_suggestions = input("Suggestions (or hit Enter to skip): ").strip()
    
    # store the tags in this record
    df.at[index,'problem_tags'] = my_problems
    df.at[index,'suggestion_tags'] = my_suggestions
    
    # update lists of tags that have already been used
    problem_tags = get_unique_values_from_csv_field('problem_tags', df)
    suggestion_tags = get_unique_values_from_csv_field('suggestion_tags', df)

    # save these tags to show later
    #p = [val.strip() for val in my_problems.split(',')]
    #s = [val.strip() for val in my_suggestions.split(',')]
    #problem_tags.extend(p)
    #suggestion_tags.extend(s)
    #problem_tags.sort()
    #suggestion_tags.sort()
    
    # increment counter
    counter += 1
    
    # save to file immediately just in case we have to interrupt the script mid-way
    df.to_csv('responses_scrubbed_tagged.csv', index=False)
    

## Response #555

### problem_areas comment
The area around Croton Commons and the Mobil station -- there is no pedestrian-accessible way to get to Croton Commons from the sidewalk on the opposite side of the street, especially when pushing a stroller. You have to go in the car entrance. The Mobil station also interrupts the sidewalk in between the Croton Landing area and Harmon.

### suggested_improvements comment
Improve the sidewalk accessibility around the Mobil station and Croton Commons

## Tag this response

Enter tags that represent this person's complaints and suggestions.  Re-use tags used in others' responses, where applicable.

### Common complaints (out of 1661 total)

speeding                                125
missing sidewalks                       122
nan                                     121
cpa                                      88
sidewalk condition                       83
aggressive driving                       75
driver awareness                         74
s riverside                              72
129                                      52
maple                                    48
cyclists breaking rules                  46
cleveland                                46
road surface                             46
municipal place                          39
missing crosswalks                       33
bushes on sidewalk                       29
grand                                    26
narrow roads                             26
mt airy                                  26
shoprite                                 26
benedict                                 25
five corners                             22
van wyck                        

### Common suggestions (out of 972 total)

nan                                                      154
bike lanes                                               104
more sidewalks                                            99
enforce speed                                             67
maintain sidewalks                                        64
reduce speed                                              46
educate cyclists                                          39
more crosswalks                                           38
speed bumps                                               24
no bike lanes                                             17
maintain crosswalks                                       16
enforce crosswalks                                        14
sidewalk on mt airy s                                     13
reduce on-street parking                                  13
more traffic lights                                       12
better lighting                                           12
more stop signs         

## Save changes to a new CSV file

In [None]:
df.to_csv('responses_scrubbed_tagged.csv', index=False)