## Geo Mapping

This notebook is a guide to mapping incidents of racial violence at the state and county levels. To ensure accuracy, please check the quality of the data before proceeding. In particular, remove any trailing or leading whitespaces, unnecessary punctuation and discrepancies in data entry (i.e. "Mob" and "mob" should be identical since they are representing the same information).

## Setup

Import the necessary packages. It may be necessary to [install](https://packaging.python.org/en/latest/tutorials/installing-packages/) the packages if they are not already in your Python kernel. 

In [1]:
import pandas as pd
import nltk
import spacy
import plotly.express as px
import requests
import json
from plotly import express as px

nltk.downloader.download('maxent_ne_chunker')
nltk.downloader.download('words')
nltk.downloader.download('treebank')
nltk.downloader.download('maxent_treebank_pos_tagger')
nltk.downloader.download('punkt')
nltk.download('averaged_perceptron_tagger')
nlp = spacy.load('en_core_web_sm')

[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /Users/clairefenton/nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to
[nltk_data]     /Users/clairefenton/nltk_data...
[nltk_data]   Package words is already up-to-date!
[nltk_data] Downloading package treebank to
[nltk_data]     /Users/clairefenton/nltk_data...
[nltk_data]   Package treebank is already up-to-date!
[nltk_data] Downloading package maxent_treebank_pos_tagger to
[nltk_data]     /Users/clairefenton/nltk_data...
[nltk_data]   Package maxent_treebank_pos_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to
[nltk_data]     /Users/clairefenton/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/clairefenton/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


Load the data set. Change the argument to include the path on your local computer that leads to the file. For example, if the file is in your Downloads folder, the path may look like /Users/firstnamelastname/Downloads/Bridging Racial Violence Compiled Data.xlsx.

In [2]:
brv = pd.read_excel('/Users/clairefenton/Desktop/Emory/BRV Research/Data/Bridging Racial Violence Compiled Data.xlsx')

## Formatting and Data Cleaning

Performing basic data cleaning operations on the `State` column. First, we drop all observations from the data set that have no value for `State` or have a value of 'Multiple', since that can't be mapped. Next, we remove any white spaces and questions marks. Finally. for incidents that occur across multiple states and thus have a comma-separated list as a value, we create separate observations in the data set for that incident for each state. 

In [31]:
brv = brv.dropna(subset='State')
brv = brv[brv['State'] != 'Multiple']
brv['State'] = [x.strip() for x in brv['State']]
brv['State'] = [x.strip('?') for x in brv['State']]

for i in range(len(brv)):
    if ',' in brv['State'].iloc[i]:
        brv['State'].iloc[i] = [x.strip() for x in brv['State'].iloc[i].split(',')]

brv_ploded = brv.explode(['State'])

## Creating Maps

This block of code will always create a map that displays the number of incidents that occurred in each state. Aesthetic modifications are available if desired. Some of the basics are:

* `color_continuous_scale` can be updated if different ![colors](https://plotly.com/python/builtin-colorscales/) are desired
* `labels` is formatted dictionary-style and impacts what information appears in the hover label
* `title` should reflect the content contained in the graph

In [None]:
state_count = brv_ploded['State'].value_counts().reset_index().rename(columns={"index": 'value', 0: 'count'})

fig = px.choropleth(locations=state_count['State'], 
                    locationmode='USA-states', 
                    color=state_count['count'], 
                    scope='usa',
                    color_continuous_scale="OrRd",
                    labels={'color': 'Incidents'},
                    title='Racial Violence Incidents Reported by Atlanta Daily World')
fig.show()

Use the `filtered_graph` function to create a graph filtered by a specific value in the data set. For example, you may want to look only at incidents that involve lynchings, or incidents that only involve white perpetrators. Fill in the three arguments accordingly:

* `column`: a string value representing the name of the column you want to filter
* `value`: a string value representing the value to filter the data by
* `title`: a string value representing the title of the graph

In [56]:
def filtered_graph(column, value, title):
    state_count = brv_ploded[brv_ploded[column] == value]['State'].value_counts().reset_index().rename(columns={"index": 'value', 0: 'count'})

    fig = px.choropleth(locations=state_count['State'], 
                        locationmode='USA-states', 
                        color=state_count['count'], 
                        scope='usa',
                        color_continuous_scale="OrRd",
                        labels={'color': 'Incidents'},
                        title=title)
    fig.show()

Below is an example of a graph examining only incidents involving Black victims. 

In [57]:
filtered_graph('Victim Race', 'Black', 'Racial Violence Incidents Reported by Atlanta Daily World (Black Victims)')

There may be instances where filtering by multiple values within a column is helpful. In that case, use the `filtered_graph_multiple_values` function. The arguments are as follows:

* `column`: a string value representing the name of the column you want to filter
* `values`: a list of string values representing the values to filter the data by
* `title`: a string value representing the title of the graph

In [58]:
def filtered_graph_multiple_values(column, values, title):

    state_count = brv_ploded[brv_ploded[column].isin(values)]['State'].value_counts().reset_index().rename(columns={"index": 'value', 0: 'count'})

    fig = px.choropleth(locations=state_count['State'], 
                        locationmode='USA-states', 
                        color=state_count['count'], 
                        scope='usa',
                        color_continuous_scale="OrRd",
                        labels={'color': 'Incidents'},
                        title=title)
    fig.show()

Below is an example of a graph examining only incidents marked as having a lynching, lynch mob or mob tactic. 

In [59]:
filtered_graph_multiple_values('Tactic', ['Lynching', 'Lynch mob', 'Mob'], 'Racial Violence Incidents Reported by Atlanta Daily World (Lynchings)')