# About notebook:

* Sandbox notebook with Børge's trials
* Updated to load from MongoDB

## Content:
* Currently mapping ACLED dataset with geodataset (country names) to enable plotting of results


## Deleted
* Deleted one-hot encoding material (moved to separate module + notebook with example)

# Imports

In [2]:
import pandas as pd
import numpy as np
import datetime

%matplotlib inline

## Importing ACLED data from MongoDB

In [3]:
import sys
sys.path.insert(0, '../')
import modules.datasets

In [4]:
acled = modules.datasets.ACLED()
acled.mongodb_update_database()

In [5]:
# Loading ACLED-data to pandas.Dataframe:
df = acled.mongodb_get_entire_database()

# Mini-dataset to play with

In [6]:
df_f = df[['event_date', 'country', 'event_type', 'fatalities']].copy()

# Matching 'countries' in ACLED with 'name' in geo-frame:

In [7]:
# Loading shapefile:
sys.path.insert(0, '../modules/')
from ImportShapefile import ImportShapefile
# Update the link to where you have stored the shapefiles:
link = '../data/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp'
df_geo_shapefile = ImportShapefile(link).get_df()

mask = df_geo_shapefile['continent']=='Africa'
df_geo_africa = df_geo_shapefile.loc[mask,:].reset_index(drop=True)



We now have two lists containing the names of the countries as they are written in the two datasets:

In [8]:
cn_acled = df['country'].unique()
cn_geo =   df_geo_africa['name'].unique()

country_names_acled = pd.DataFrame({"names": cn_acled, "acled": cn_acled})
country_names_geo =   pd.DataFrame({"names": cn_geo, "geo": cn_geo})

In [9]:
column_names = pd.merge(country_names_acled, country_names_geo, on="names", how='outer').drop("names", 1)

In [10]:
not_in_acled = column_names['acled'].isnull()
not_in_geo  =  column_names['geo'].isnull()

In [11]:
column_names.loc[not_in_geo]

Unnamed: 0,acled,geo
2,Ivory Coast,
4,Democratic Republic of Congo,
9,South Sudan,
23,Central African Republic,
40,Republic of Congo,
42,Mozambique,
49,Equatorial Guinea,


In [13]:
column_names.loc[2 ,'geo'] = "Côte d'Ivoire"
column_names.loc[4 ,'geo'] = "Dem. Rep. Congo"
column_names.loc[9 ,'geo'] = "S. Sudan"
column_names.loc[23,'geo'] = "Central African Rep."
column_names.loc[40,'geo'] = "Congo"
column_names.loc[42,'geo'] = "DAMNIT? :P"
column_names.loc[49,'geo'] = "Eq. Guinea"


In [None]:
country_names_geo = pd.DataFrame({"name": country_names_geo})

In [None]:
column_names.merge(country_names_geo, how='name')

### TODO for this section
Create pandas.DataFrame with two columns:
- 'ACLED country'
- 'Sharefile country'

Use pandas functions to align the same country, thereafter manually map the rest.

#### Result
As a result, we can map statistics from on country to the mapping functions (e.g. results on 'Ivory Coast' correctly represented on the map with name 'Côte d'Ivoire').

# WARNING: In work, nothing done below

# Plotting fatalities
Inspired by Dirk's examples in lecture 13.02.

### Creating pivot table
* Important to set 'aggfunc' to sum (standard is mean)

In [None]:
df_piv = df_f.pivot_table(index='event_date',
                              columns='country',
                              values='fatalities',
                              aggfunc=np.sum)

### Resampling pivot to monthly

In [None]:
df_piv = df_piv.resample('1M').sum()

### Extract countries
Extract 5 countries with highest total fatalities (for plotting)

In [None]:
most_fat = list(df_piv.sum().sort_values(
                     ascending=False)[0:5].index)

In [None]:
print("Total number of events:")
df_piv.sum().sort_values(
                     ascending=False)[0:5]

In [None]:
df_fat = df_piv[most_fat]

In [None]:
ax = df_fat.plot(figsize=(10,8))
ax.set_ylabel("Fatalities")
ax.set_xlabel("Month")

## Plotting using pandas
Pandas is more flexible and allows zooming ++

In [None]:
# TO BE FIXED

# Bokeh geo-plotting
Based on:
http://nbviewer.jupyter.org/github/bokeh/bokeh-notebooks/blob/master/tutorial/11%20-%20geo.ipynb

In [None]:
from bokeh.io import output_notebook, show
output_notebook()

In [None]:
from bokeh.plotting import figure
from bokeh.tile_providers import WMTSTileSource

# NOTE: This is a little off Africa, but can easily be moved:
dist = 6000000
x_min = -30000
y_min = -8000000

Africa = x_range,y_range = ((-x_min,x_min+dist), (-y_min,y_min+dist))

fig = figure(tools='pan, wheel_zoom', x_range=x_range, y_range=y_range)
fig.axis.visible = False

In [None]:
url = 'http://a.basemaps.cartocdn.com/dark_all/{Z}/{X}/{Y}.png'
attribution = "Map tiles by Carto, under CC BY 3.0. Data by OpenStreetMap, under ODbL"

fig.add_tile(WMTSTileSource(url=url, attribution=attribution))

In [None]:
show(fig)

## Adding some points:

In [None]:
# Function comes from tutorial (see section header):
def wgs84_to_web_mercator(df, lon="LONGITUDE", lat="LATITUDE"):
    """Converts decimal longitude/latitude to Web Mercator format"""
    k = 6378137
    df["x"] = df[lon] * (k * np.pi/180.0)
    df["y"] = np.log(np.tan((90 + df[lat]) * np.pi/360.0)) * k
    return df

Selecting nor or less random samples (statistically speaking

In [None]:
N_points = 10000

test_points = df.iloc[0:N_points][['COUNTRY','LATITUDE', 'LONGITUDE', 'FATALITIES']]

In [None]:
wgs84_to_web_mercator(test_points)

None # To surpress output from function call

#### Plotting the points from above
**Note**: Size of plot set proportional with fatalities, in a little dodgy way (for now)

In [None]:
fig.circle(x=test_points['x'], y=test_points['y'],fill_color='blue', size=4*np.log(1+test_points['FATALITIES']))
show(fig)