# Plotting Maps: Visualizing Haiti Earthquake Crisis Data

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas import DataFrame, Series

In [2]:
data = pd.read_csv('../../../CSV Files/O_Reilly/ch08/Haiti.csv')

In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3593 entries, 0 to 3592
Data columns (total 10 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Serial          3593 non-null   int64  
 1   INCIDENT TITLE  3593 non-null   object 
 2   INCIDENT DATE   3593 non-null   object 
 3   LOCATION        3592 non-null   object 
 4   DESCRIPTION     3593 non-null   object 
 5   CATEGORY        3587 non-null   object 
 6   LATITUDE        3593 non-null   float64
 7   LONGITUDE       3593 non-null   float64
 8   APPROVED        3593 non-null   object 
 9   VERIFIED        3593 non-null   object 
dtypes: float64(2), int64(1), object(7)
memory usage: 280.8+ KB


It’s easy now to tinker with this data set to see what kinds of things we might want to do with it. Each row represents a report sent from someone’s mobile phone indicating an emergency or some other problem. Each has an associated timestamp and a location as latitude and longitude:

In [4]:
data[['INCIDENT DATE', 'LATITUDE', 'LONGITUDE']][:10]

Unnamed: 0,INCIDENT DATE,LATITUDE,LONGITUDE
0,05/07/2010 17:26,18.233333,-72.533333
1,28/06/2010 23:06,50.226029,5.729886
2,24/06/2010 16:21,22.278381,114.174287
3,20/06/2010 21:59,44.407062,8.933989
4,18/05/2010 16:26,18.571084,-72.334671
5,26/04/2010 13:14,18.593707,-72.310079
6,26/04/2010 14:19,18.4828,-73.6388
7,26/04/2010 14:27,18.415,-73.195
8,15/03/2010 10:58,18.517443,-72.236841
9,15/03/2010 11:00,18.54779,-72.41001


The CATEGORY field contains a comma-separated list of codes indicating the type of message:

In [5]:
data['CATEGORY'][:5]

0          1. Urgences | Emergency, 3. Public Health, 
1    1. Urgences | Emergency, 2. Urgences logistiqu...
2    2. Urgences logistiques | Vital Lines, 8. Autr...
3                            1. Urgences | Emergency, 
4                            1. Urgences | Emergency, 
Name: CATEGORY, dtype: object

If you notice above in the data summary, some of the categories are missing, so we might want to drop these data points. Additionally, calling describe shows that there are some aberrant locations:

In [6]:
data.describe()

Unnamed: 0,Serial,LATITUDE,LONGITUDE
count,3593.0,3593.0,3593.0
mean,2080.277484,18.611495,-72.32268
std,1171.10036,0.738572,3.650776
min,4.0,18.041313,-74.452757
25%,1074.0,18.52407,-72.4175
50%,2163.0,18.539269,-72.335
75%,3088.0,18.56182,-72.29357
max,4052.0,50.226029,114.174287


Cleaning the bad locations and removing the missing categories is now fairly simple:

In [7]:
data = data[(data.LATITUDE > 18) & (data.LATITUDE < 20)
            & (data.LONGITUDE > -75) & (data.LONGITUDE < -70)
            & (data.CATEGORY.notnull())]

Now we might want to do some analysis or visualization of this data by category, but each category field may have multiple categories. Additionally, each category is given as a code plus an English and possibly also a French code name. Thus, a little bit of wrangling is required to get the data into a more agreeable form. First, I wrote these two functions to get a list of all the categories and to split each category into a code and an English name:

In [8]:
def to_cat_list(catstr):
    stripped = (x.strip() for x in catstr.split(','))
    return [x for x in stripped if x]

def get_all_categories(cat_series):
    cat_sets = (set(to_cat_list(x)) for x in cat_series)
    return sorted(set.union(*cat_sets))

def get_english(cat):
    code, names = cat.split('.')
    if '|' in names:
        names = names.split(' | ')[1]
        return code, names.strip()

You can test out that the get_english function does what you expect:

In [9]:
get_english('2. Urgences logistiques | Vital Lines')

('2', 'Vital Lines')

Now, I make a dict mapping code to name because we’ll use the codes for analysis. We’ll use this later when adorning plots (note the use of a generator expression in lieu of a list comprehension):

In [10]:
all_cats = get_all_categories(data.CATEGORY)

In [11]:
# Generator expression
english_mapping = dict(get_english(x) for x in all_cats)

TypeError: cannot convert dictionary update sequence element #1 to a sequence

In [None]:
english_mapping['2a']

NameError: name 'english_mapping' is not defined

In [None]:
english_mapping['6c']

NameError: name 'english_mapping' is not defined