![Practicum AI Logo image](https://github.com/PracticumAI/practicumai.github.io/blob/main/images/logo/PracticumAI_logo_250x50.png?raw=true)  <img src='images/05/data_visualization.png' align='right' width=50>

# *Practicum AI Data*: Data Visualization - Geospatial Data Plotting 

The exercise is inspired by Mario Döbler & Tim Großmann (2020) <i>The Data Visualization Workshop</i> from <a href="https://www.packtpub.com/product/the-data-visualization-workshop/9781800568846">Packt Publishers</a> and the <a href="https://github.com/PacktWorkshops/The-Data-Visualization-Workshop">Software Carpentries</a>.
***

In this notebook, we will start to explore the power of geoplotlib, a versatile Python library, by combining it with various real-world datasets. Through practical applications, you'll gain hands-on experience in geospatial data visualization and develop skills to make meaningful insights from location-based data. 


## Objectives

By the end of this notebook, you will be able to:

1. Use geoplotlib to make beautiful maps and identify various types of geospatial charts. 
2. Create intricate visualizations using geographical data and customize them with different map styles and layers.

## 1. Introduction

Geospatial data plotting is all about visualizing location-based information on maps, helping us understand geographical patterns and make informed decisions. Python provides various powerful libraries like `geopandas`, `folium`, and `geoplotlib` for creating beautiful geographical visualizations. In this exploration, we will focus solely on `geoplotlib` for our geospatial data plotting journey.

### 1.1 Geoplotlib

Geoplotlib is a Python library used for creating visualizations of geographical data. It stands out for its ability to handle large datasets efficiently and supports interactive and animated visualizations. Unlike other libraries, Geoplotlib is designed to work with map tiles, providing a seamless map of the world. It offers a simple interface to create various geospatial visualizations, such as histograms, point-based plots, and choropleth plots.

The inputs to geoplotlib are data sources and map tiles, and it supports rendering images in Jupyter Notebooks and interactive windows for zooming and panning maps. To use geoplotlib, you need NumPy and SciPy for numerical computations, and pyglet for graphical rendering. Optional requirements include Matplotlib for colormaps and pyshp for reading `.shp` files.The architecture of geoplotlib is as follows:

<div style="text-align:center">
    <img src="images/05/geoplotlib.jpg" width="400">
    <p style="font-style:italic; font-size:12px; text-align:center">Image Source: <a href="https://www.researchgate.net/publication/305983877_Geoplotlib_a_Python_Toolbox_for_Visualizing_Geographical_Data" style="text-decoration:none; color:inherit;">Andrea et al. 2016</a></p>
</div>

### 1.2 Design Principles of geoplotlib

Geoplotlib, a Python open-source toolbox for geographical data visualization, is designed around three main principles:
* Simplicity: Geoplotlib offers built-in tools for tasks like density visualization, spatial graphs, and shapefiles, making it easy to use and understand.
* Integration: It seamlessly works with various Python data analysis tools, like scientific computing, machine learning, and numerical packages. The visualizations support interactive data analysis, enabling easy and iterative visualization design.
* Performance: Geoplotlib performs exceptionally well, allowing visualizations to handle large datasets with millions of data points efficiently.

### 1.3 Power of geoplotlib

Geoplotlib offers a wide range of powerful features:
* It includes dot maps, kernel density estimation, spatial graphs, Voronoi tessellation, shapefiles, and more.
* Geoplotlib supports various map types, such as area maps, heat maps, and point density maps.
* The standard input for geoplotlib is Latitude (lat) and Longitude (lon).

The library consists of several modules:
* geoplotlib module
* geoplotlib.layers module
* geoplotlib.utils module
* geoplotlib.core module
* geoplotlib.colors module

## 2. Geospatial Visualizations

### 2.1 Dot Map

A dot density map is a type of geospatial visualization that uses dots to represent individual data points on a map. Here's an example of how to create a dot density map using geoplotlib in Python:

For the dataset, we will use one of them from the book called poaching_points_cleaned, you can download the dataset [here](https://github.com/PacktWorkshops/The-Data-Visualization-Workshop/blob/master/Datasets/poaching_points_cleaned.csv)

In [1]:
# csv import with pandas
import pandas as pd
pd_dataset = pd.read_csv('data/poaching_points_cleaned.csv')
pd_dataset.head()

Unnamed: 0,id_report,date_report,description,created_date,lat,lon
0,138,01/01/2005 12:00:00 AM,Poaching incident,2005/01/01 12:00:00 AM,-7.049359,34.84144
1,4,01/20/2005 12:00:00 AM,Poaching incident,2005/01/20 12:00:00 AM,-7.65084,34.48001
2,43,01/20/2005 12:00:00 AM,Poaching incident,2005/02/20 12:00:00 AM,-7.843202,34.005704
3,98,01/20/2005 12:00:00 AM,Poaching incident,2005/02/21 12:00:00 AM,-7.745846,33.948526
4,141,01/20/2005 12:00:00 AM,Poaching incident,2005/02/22 12:00:00 AM,-7.876673,33.690167


In [None]:
!pip install geoplotlib

In [None]:
import geoplotlib
from geoplotlib.utils import read_csv

dataset = read_csv('dat/poaching_points_cleaned.csv')

# Plot our dataset with points
geoplotlib.dot(dataset)
geoplotlib.show()

Looking only at the `lat` and `lon` values in the dataset won't give us a good idea of where our elements are on the map or how far apart they are. We can't draw conclusions or understand the patterns without visualizing the data on a map. However, once we see the map, we instantly notice areas with more incidents than others. This insight is not readily apparent from just the dataset's numbers. Data visualization is essential for understanding spatial information.

### 2.2 2D Histograms

2D histograms offer a more straightforward approach to computing density for visualization than dot maps. The density approximation is represented using a color scale, and the bin size determines the number of pixels used for the histogram bins.

In [None]:
import geoplotlib
from geoplotlib.utils import read_csv, BoundingBox

dataset = read_csv('dat/poaching_points_cleaned.csv')

# Plot our dataset as a histogram
geoplotlib.hist(data, colorscale='sqrt', binsize=8)
geoplotlib.set_bbox(BoundingBox.DK)
geoplotlib.show()

Histogram plots provide a clearer view of the density distribution in our dataset. Analyzing the final plot, we can identify hotspots for poaching and observe areas without any poaching incidents.

### 2.3 Voronoi Tessellation

In a Voronoi tessellation, every pair of data points is connected by a line that is equidistant from both points. This separation forms cells, each indicating which data point is closer to any given point. Smaller cells are formed when the data points are closer together.

In [None]:
# plotting a voronoi map
geoplotlib.voronoi(dataset, cmap='Blues_r', \
                   max_area=1e5, alpha=255)
geoplotlib.show()

Voronoi plots help visualize data point density. They come with additional options like `cmap`, `max_area`, and `alpha`. Cmap sets the map color, alpha controls transparency, and max_area determines Voronoi area colors based on a constant.

Comparing the Voronoi plot to the histogram, we can easily spot areas that draw more attention. The center-right edge shows a large dark blue area with an even darker center, something that might have been missed using only the histogram.

### 2.4 Delaunay Triangulation

Delaunay triangulation is closely related to Voronoi tessellation. It involves connecting each data point to its neighboring points with edges, resulting in a triangulated plot. Smaller triangles appear when data points are closer together, revealing density patterns in specific regions. When combined with color gradients, it offers insights into points of interest, similar to a heatmap.

In [None]:
# Plot our dataset as a delaunay
geoplotlib.delaunay(dataset, cmap='hot_r')
geoplotlib.set_smoothing(True)
geoplotlib.show()

Using the `hot_r` color map allows us to get a clear and visually striking representation, making the areas of interest stand out effectively.

### 2.5 Choropleth Plot

The choropleth plot shows areas, such as country states, shaded or colored based on a single data point or a set of data points. It provides an abstract view of the geographical area, enabling visualization of relationships and differences between different regions. 

In the code and visual example below, we can observe that the shade of each US state is determined by the unemployment rate, with darker shades indicating higher rates. Download your dataset from [here](https://github.com/PacktWorkshops/The-Data-Visualization-Workshop/blob/master/Datasets/unemployment.json)

In [None]:
import geoplotlib
from geoplotlib.utils import BoundingBox
from geoplotlib.colors import ColorMap
import json

# Find the unemployment rate for the selected county, and convert it to color
def get_color(properties):
    key = str(int(properties['STATE'])) \
          + properties['COUNTY']
    if key in unemployment_rates:
        return cmap.to_color(unemployment_rates.get(key), \
                             .15, 'lin')
    else:
        return [0, 0, 0, 0]
    
# Get unemployment data
with open('data/unemployment.json') as fin:
    unemployment_rates = json.load(fin)

# Plot the outlines of the states and color them using the unemployment rate
cmap = ColorMap('Blues', alpha=255, levels=10)
geoplotlib.geojson('data/us_states_shapes.json', \
                   fill=True, color=get_color, \
                   f_tooltip=lambda properties: properties['NAME'])
geoplotlib.geojson('data/us_states_shapes.json', \
                   fill=False, color=[255, 255, 255, 64])
geoplotlib.set_bbox(BoundingBox.USA)
geoplotlib.show()

Let's break down what's happening in the code for a better understanding.

* First, we import necessary libraries like geoplotlib and json to handle the dataset provided in a specific format.
* Next, there's a method called `get_color`, which determines the color based on the unemployment rate of each data point. It sets the darkness of the blue value used for visualization.
* In the rest of the code, we read the dataset and use the geojson method to create more complex shapes, allowing us to visualize data with choropleth plots. Additionally, we can display city names when hovering over them.

The `BoundingBox` object helps define the initial focus of the visualization, making it easier for users to grasp the context without panning or zooming around first.

### 2.6 Moving Points

Another fascinating application of geoplotlib is visualizing the movements of objects. For instance, you can use the code below to track taxi movements and witness the output yourself.

This part exercise is adapted from the article [Geoplotlib: Exploring the World with Python](https://medium.com/@HeCanThink/geoplotlib-exploring-the-world-with-python-%EF%B8%8F-bbd2bd583760). You can get the taxi dataset from [here](https://github.com/andrea-cuttone/geoplotlib/blob/master/examples/data/taxi.csv).

In [None]:
from geoplotlib.layers import BaseLayer
from geoplotlib.core import BatchPainter
import geoplotlib
from geoplotlib.colors import colorbrewer
from geoplotlib.utils import epoch_to_str, BoundingBox, read_csv


class TrailsLayer(BaseLayer):

    def __init__(self):
        self.data = read_csv('data/taxi.csv')
        self.cmap = colorbrewer(self.data['taxi_id'], alpha=220)
        self.t = self.data['timestamp'].min()
        self.painter = BatchPainter()


    def draw(self, proj, mouse_x, mouse_y, ui_manager):
        self.painter = BatchPainter()
        df = self.data.where((self.data['timestamp'] > self.t) & (self.data['timestamp'] <= self.t + 15*60))

        for taxi_id in set(df['taxi_id']):
            grp = df.where(df['taxi_id'] == taxi_id)
            self.painter.set_color(self.cmap[taxi_id])
            x, y = proj.lonlat_to_screen(grp['lon'], grp['lat'])
            self.painter.points(x, y, 10)

        self.t += 2*60

        if self.t > self.data['timestamp'].max():
            self.t = self.data['timestamp'].min()

        self.painter.batch_draw()
        ui_manager.info(epoch_to_str(self.t))


    def bbox(self):
        return BoundingBox(north=40.110222, west=115.924463, south=39.705711, east=116.803369)


geoplotlib.add_layer(TrailsLayer())
geoplotlib.show()

## 3. Conclusion

In this notebook, we explored geoplotlib extensively, learning both basic and advanced concepts. It provided valuable insights into the library's inner workings and showed practical applications for different scenarios. The built-in plots are suitable for most needs, but we also learned how to create custom layers for animated and interactive visualizations. The notebook equips you with the knowledge and tools to effectively use geoplotlib for visualizing geographical data.

***

## Bonus Exercises

We will use the `%load` command to load the content of a specified file into the cell, primarily for the purpose of incorporating solutions once you have finished the exercises.

### E: Dot Map

In this exercise, we are going to create a DataFrame for the dataset and use `geoplotlib` to plot it on a dot map. You can follow these steps:

*Step 1: Import the necessary libraries.*

In [None]:
import pandas as pd
import geoplotlib
from geoplotlib.utils import read_csv

*Step 2: Define a dataset in a list of dictionaries and create a DataFrame from the dataset.*

In [None]:
data = [
    {"Farm_Name": "Farm_A", "Latitude": 34.1234, "Longitude": -118.5678, "Crop": "Wheat"},
    {"Farm_Name": "Farm_B", "Latitude": 35.4321, "Longitude": -120.8765, "Crop": "Corn"},
    {"Farm_Name": "Farm_C", "Latitude": 33.9876, "Longitude": -117.5432, "Crop": "Soybean"},
    {"Farm_Name": "Farm_D", "Latitude": 34.5678, "Longitude": -119.4321, "Crop": "Potato"},
    {"Farm_Name": "Farm_E", "Latitude": 35.8765, "Longitude": -121.3210, "Crop": "Cotton"},
    {"Farm_Name": "Farm_F", "Latitude": 34.7654, "Longitude": -120.1234, "Crop": "Barley"},
    {"Farm_Name": "Farm_G", "Latitude": 34.9876, "Longitude": -118.8765, "Crop": "Rice"},
    {"Farm_Name": "Farm_H", "Latitude": 35.6543, "Longitude": -121.4321, "Crop": "Sunflower"},
    {"Farm_Name": "Farm_I", "Latitude": 35.8765, "Longitude": -119.5432, "Crop": "Tomato"},
    {"Farm_Name": "Farm_J", "Latitude": 34.6543, "Longitude": -119.9876, "Crop": "Strawberry"}
]

df = pd.DataFrame(data)

*Step 3: Create a dot map visualization using geoplotlib for the DataFrame.*

In [None]:
# Code it

**Solution**

In [None]:
%load solutions/05.3_dot_map