## Visualizing Spatial Data with Pandas and Folium

In [None]:
import os
DATADIR = os.path.join(os.path.expanduser("~"),"DATA",
                       "Misc")
print(os.path.exists(DATADIR))
import pandas as pd
import numpy as np

In [None]:
!pip install folium
import folium


`Accidents7904.csv` located in `~/DATA/Misc` is a a record of all the automobile accidents in the UK between 1974 and 2004. This is quite a large data set but nothing that Pandas can't handle, in principle. However, given that we don't want to over tax our system, we will limit ourselves to reading in only parts of the data.

The original data contains 6224198 rows. However, because GPS was not declassified until the late 1990s, the early accidents do not have lattitude and longitude values are so not of interest to us. The first longitude/lattitude value occurs at row 4883216.

We can use the [`skiprows`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_table.html) keyword. 

`skiprows` can take
* An integer number of rows to skip
* A sequence (e.g. a list) of row numbers to skip
* Or a function that returns `True` if the row should be skipped and `False` otherwise.

### Read in the data

We'll use a `lambda` function to specify which rows to skip

In [None]:
data = pd.read_csv(os.path.join(DATADIR, "Accidents7904.csv"),
                        skiprows = lambda index: index >0 and index <=4883216 
                   
                        )#.dropna()

### What are our columns?

In [None]:
data.columns

### What are the values in these columns?

In [None]:
data['Accident_Severity'].unique()

In [None]:
data["Number_of_Casualties"].unique()

In [None]:
data["Light_Conditions"].unique()

## Let's limit ourselves to the following columns:

* `Longitude`
* `Latitude`
* `Time`
* `Number_of_Casualites`

In [None]:
data = pd.read_csv(os.path.join(DATADIR, "Accidents7904.csv"),
                   usecols=['Longitude',"Latitude",
                            "Date", "Time","Number_of_Casualties"],
                   skiprows = lambda index: index >0 and index <=4883216 )
                   
                        

In [None]:
data.head()

In [None]:
data.shape

### We can drop missing values

In [None]:
data2 = data.dropna()

In [None]:
data2.shape

In [None]:
type(data2.iloc[0,3])

In [None]:
type(data2.loc[0,"Time"])

#### Date's and Times are not recognized as such and so are left as strings

* We could set `locale`
* Or we can convert later

In [None]:
data2["Date"] = pd.to_datetime(data2["Date"],format="%d/%m/%Y", 
                              errors='ignore')

In [None]:
from datetime import datetime
tmp = datetime.strptime("09:30","%H:%M")
print(tmp.time())

In [None]:
data2["Time"] = data2.apply(lambda row: datetime.strptime(row["Time"],"%H:%M").time(), 
                            axis=1)

In [None]:
data2.head()

####  We can use the [``sample``](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sample.html) method to get a subset of DataFrame

In [None]:
subdata = data2.sample(2000)
mean_long = np.mean(subdata['Longitude'])
mean_lat  = np.mean(subdata['Latitude'])


In [None]:
help(folium.Map)

In [None]:
map = folium.Map(location=[mean_lat, mean_long], 
                 tiles="Stamen Terrain", zoom_start=5.5)
for _, s in subdata.iterrows():
    rslt = folium.Marker([s["Latitude"], s["Longitude"]],
                  popup="%s\n%s\n# Causalities: %d"%(s["Date"],
                                                     s["Time"],
                                                     s["Number_of_Casualties"]),
                  icon=folium.Icon(icon='cloud')).add_to(map)
map

### Example 2

* Filter Pandas DataFrame on number of casualties
* Select different [Bootstrap icon](https://www.w3schools.com/icons/bootstrap_icons_glyphicons.asp)
* Set different color

In [None]:
from ipywidgets import interact, interactive, fixed, interact_manual, IntSlider
import ipywidgets as widgets
from IPython.display import display

In [None]:
help(folium.Map)

In [None]:
subdata = data2.sample(100)
mean_long = np.mean(subdata['Longitude'])
mean_lat  = np.mean(subdata['Latitude'])
tiles = ["OpenStreetMap", "Mapbox Bright", "Mapbox Control Room", 
         "Stamen Terrain", "Stamen Toner", "Stamen Watercolor", 
         "CartoDB positron","CartoDB dark_matter"]
@interact(num_cas=IntSlider(min=1,
                            max=subdata.Number_of_Casualties.max(), 
                            value=subdata.Number_of_Casualties.max()), 
          data2 = fixed(subdata), 
          loclat = fixed(mean_lat), 
          tile=tiles,
          loclon=fixed(mean_long))
def plot_accidents(data2, num_cas, loclat, loclon, tile):
    map2 = folium.Map(location=[loclat, loclon], 
                     tiles=tile, zoom_start=5.5)
    for _, s in data2[data2["Number_of_Casualties"]>=num_cas].iterrows():
        rslt = folium.Marker([s["Latitude"], s["Longitude"]],
                      popup="%s\n%s\n# Causalities: %d"%(s["Date"],
                                                         s["Time"],
                                                         s["Number_of_Casualties"]),
                      icon=folium.Icon(icon="fa-ambulance", color='red', prefix="fa"),
                            tooltip = 'Click for accident details').add_to(map2)
    display(map2)