# Monuments of the Confederacy, By the Numbers
In this notebook we'll take a look at some of the data surrounding monuments to the CSA.

## Part 1: Getting the Data
First we need to import data and relevant packages.

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

import matplotlib.image as mpimg
import geopandas as gpd
import geoplot as gplt
from shapely.geometry import Point


ModuleNotFoundError: No module named 'geopandas'

In [None]:
df = pd.read_csv('confederate_symbols.csv')
print('dfset dimensions: ' + str(df.shape))
df.head()

We can see that our given dfset contains information on 1481 monuments across the nation with varying degrees of missingness. Overall, though, we've got some good material to work with.  

We want to avoid any df that contains serious input errors that will interfere with our analysis later on. In fact, let's go ahead and drop df that have a latitude/longitude value outside of the possible ranges for latitude/longitude (-90 to 90 and -180 to 180, respectively)

In [None]:
df = df[(df['latitude'] < 90) & (df['latitude'] > -90)]
df = df[(df['longitude'] < 180) & (df['longitude'] > -180)]   #keep vals that satisfy conditions
df.shape

Fortunately it looks like we've only dropped two df thanks to this cleaning.

## Part 2: Visualizing the df
### Symbols Established Across the Years
When were Confederate symbols established? We take a look at the distribution of when these symbols went up over time, and compare that to other major events in history going on during those times.

In [None]:
sns.set_style('whitegrid')

plt.figure(figsize=(14,8))
ax = plt.hist(df.year_dedicated, bins=40)
plt.xlabel("Year of Symbol Dedication")
plt.ylabel("Number of Symbols Dedicated")
plt.show()

Taxpayer donations to confederate monuments have exceeded $40 million (Smithsonian)

In [None]:
plt.figure(figsize=(14,8))
ax = sns.distplot(df.year_dedicated.dropna(), bins=40)
ax.set(xlabel="Year of Symbol Dedication", ylabel="Proportion")
plt.show()

In [None]:
df.shape

### Across the Country
Where are these symbols? What types are there, and where do certain types of symbols tend to be?

In [None]:
# get an idea of the overall square range of where symbols are
lat_range = (df['latitude'].min() - 5, df['latitude'].max() + 5)  # pad the edge by 5
lon_range = (df['longitude'].min() - 5, df['longitude'].max() + 5)
print(lat_range, lon_range)

In [None]:
df['category'].value_counts()

In [None]:
types = df['category'].value_counts().keys().tolist()[0:5]  #get the top 7 most common types of symbol
types.append('Other')

plt.figure(figsize=(14,8))
for given_type in types:
    subset = df[df['category'] == given_type]
    plt.plot(subset['latitude'], subset['longitude'], 'o', 
             label=given_type)    #plot each subset of df based on the given type
plt.legend()
plt.show()

In [4]:
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
world

NameError: name 'geopandas' is not defined

In [None]:
cities = gpd.read_file(gpd.datasets.get_path('naturalearth_cities'))
cities

In [None]:
earth = world.plot(figsize = (14,8))

In [None]:
geometry = [Point(xy) for xy in zip(df['longitude'], df['latitude'])]

In [None]:
geo_df = gpd.GeoDataFrame(df, crs={'init': 'epsg:4326'}, geometry=geometry)
geo_df.plot()

In [None]:
fig, ax = plt.subplots(figsize=(14,8))
world.plot(ax=ax)
geo_df.plot(ax=ax, color='red')

plt.show

us geodata: https://eric.clst.org/tech/usgeojson/

In [None]:
usmap = gpd.read_file('/Users/Me/Documents/Datascience/Rowe/EPluribus/confederates/geodata/usmap.shp')

In [None]:
usa = gpd.read_file(gplt.datasets.get_path('contiguous_usa'))
usa.plot()