<a href="https://colab.research.google.com/github/aguinaldoabbj/minicourse_open_data_natal_2019/blob/master/4_geo_plots.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Geospatial Data Visualization 

## Introduction

Geospatial coordinates are very important when dealing with real world data. Geospatial data sets are mainly composed by latitude and longitude coordinates of the entities under study. So, a primary action would be to put these coordinates in a map.


## Our case study

In this stage we are going to stick with data from [Dados Abertos Natal](http://dados.natal.br/group/09132b97-d39f-4316-a859-240c31e98ef8?res_format=KML), but this time our target data will be a series of [KML (Keyhole Markup Language)](https://developers.google.com/kml/documentation) files available in this repository, regarding bus routes and stops.

We are going to start with the "Paradas" dataset, which refers to the mapped bus stops in the city. 

## Obtaing, Loading and Preparing Data

We can use 'wget' again to download data to our workspace in Colab.

In [2]:
!wget -c https://raw.githubusercontent.com/aguinaldoabbj/minicourse_open_data_natal_2019/master/data/paradas-unificadas.kml

--2019-03-18 13:06:22--  https://raw.githubusercontent.com/aguinaldoabbj/minicourse_open_data_natal_2019/master/data/paradas-unificadas.kml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1444962 (1.4M) [text/plain]
Saving to: ‘paradas-unificadas.kml’


2019-03-18 13:06:22 (30.4 MB/s) - ‘paradas-unificadas.kml’ saved [1444962/1444962]



We have to load two new libraries to deal with KML data: fastKML and BeatifulSoup. The former is to decode the KML and the latter is to parse it. As they are not native in Colab, we have to download and install them with 'pip' utility.

In [7]:
!pip install fastkml
!pip install bs4



Now we can load the installed libraries.

In [0]:
from fastkml import kml
from bs4 import BeautifulSoup


KML files are very similar to XML files, so that we need a specific dealing strategy. We can load the KML file into Python by using Python's native *open()* statement. 

In [0]:
#Read file into string and convert to UTF-8
with open("paradas-unificadas.kml", 'rt', encoding="utf-8") as myfile:
     kml_doc=myfile.read()

And use fastKML to decode our KML file now stored in variable *kml_doc*.

In [0]:
#start fast kml engine
k = kml.KML()
# Read in the KML string
k.from_string(kml_doc)

By browsing KML features (using fastKML's *features()* method), it is possible to locate where data is:

In [24]:
# browsing features's levels
l1 = list(k.features())
print(len(l1))

1


In [27]:
l2 = list(l1[0].features())
print(len(l2))

1


In [28]:
l3 = list(l2[0].features())
print(len(l3))

1958


In [32]:
l3[0].to_string() #getting data from the first placemark

'<kml:Placemark xmlns:kml="http://www.opengis.net/kml/2.2">\n  <kml:name>Parada - 16 de Dezembro - O088</kml:name>\n  <kml:description>&gt; Linhas de ônibus:&lt;br&gt;* 59 - Guarapes/Brasília Teimosa, via Bom Pastor;&lt;br&gt;* 599 - Guarapes/Mirassol, via Av. Cap.-Mor Gouveia;</kml:description>\n  <kml:visibility>1</kml:visibility>\n  <kml:Style>\n    <kml:IconStyle>\n      <kml:scale>0.9</kml:scale>\n      <kml:Icon>\n        <kml:href>http://maps.google.com/mapfiles/kml/pushpin/blue-pushpin.png</kml:href>\n      </kml:Icon>\n    </kml:IconStyle>\n  </kml:Style>\n  <kml:Point>\n    <kml:coordinates>-35.278631,-5.833069,0.000000</kml:coordinates>\n  </kml:Point>\n</kml:Placemark>\n'

Gotcha! Data is in the hierarchy level 3 in the KML. It seems we have 1958 placemarks. Now we can try to parse this XML-like data with BeautifulSoup, a very powerful library which can be used to parse several types of markup data.

In [0]:
# Use bs4 xml parser to parse kml data for each placemark
bus_stop_coords = [] #an array to store the retrieved coordinates
bus_stop_names = []
for p in l3:
    soup = BeautifulSoup(p.to_string(), "lxml-xml") #getting each placemark
    coord = soup.find('kml:coordinates') #searching coodinates
    coord = coord.get_text().split(",")[0:2] #getting only lat/long
    name = soup.find('kml:name') #searching names
    name = name.get_text()
    bus_stop_coords.append(coord) #appeding coordinates to array
    bus_stop_names.append(name)

The previous stage resulted in two arrays, *bus_stop_coords* and *bus_stop_names*, corresponding to bus stops coordinates and names, respectively. Based on these arrays, it is possible to build a Pandas dataframe to make data manipulation easier.

In [41]:
#loading pandas
import pandas as pd

#building new dataframe
bus_stops_df = pd.DataFrame({'Name':bus_stop_names,'Coordinate':bus_stop_coords})# arrays are mapped into dataframe columns

#checking if things are OK
bus_stops_df.head()

Unnamed: 0,Coordinate,Name
0,"[-35.278631, -5.833069]",Parada - 16 de Dezembro - O088
1,"[-35.246364, -5.843083]",Parada - 17 de Dezembro - O068
2,"[-35.280042, -5.838146]",Parada - 28 de Fevereiro -O082
3,"[-35.182530, -5.799994]",Parada - L001
4,"[-35.182753, -5.799874]",Parada - L002


In [96]:
bus_stops_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1958 entries, 0 to 1957
Data columns (total 2 columns):
Coordinate    1958 non-null object
Name          1958 non-null object
dtypes: object(2)
memory usage: 30.7+ KB


## Spatial visualization and Analysis

Now that we have a brand new dataframe contaning a bunch of bus stop coordinates, a next step would be to plot them in a map. To do so, I recommend a very promising library for spatial visualizations and analysis called *folium*. Let's get the latest *folium* with pip and load it:

In [45]:
!pip install folium==0.8.3
import folium



Once folium is loaded, we can instantiate a simple map of our city:

In [105]:
# building a map with folium
natal = folium.Map(
    location=[-5.823728, -35.222590], #a central coordinate to start zooming the map
    width=600,height=700,
    zoom_start=12, # zoom: the higher the zoom, the closer it gets to ground
    tiles='Stamen Terrain' #map types
)

natal #show the map

A next stage is to plot each one of the bus stop coordinates in the map by iteration over dataframe rows with the *iterrows()* method.

In [106]:
for nrow, row in bus_stops_df.iterrows(): #iterating over dataframe rows
  lat=float(row.Coordinate[1]) #getting latitude from coodinate (str -> float)
  long=float(row.Coordinate[0]) #getting longitude from coodinate (str -> float)
  folium.Marker(
        location=[lat,long]
               ).add_to(natal)
    
    
natal

A 

In [86]:
for nrow, row in bus_stops_df.head().iterrows():
  print(row.Coordinate[1])
  print(row.Coordinate[0])

-5.833069
-35.278631
-5.843083
-35.246364
-5.838146
-35.280042
-5.799994
-35.182530
-5.799874
-35.182753
