<em><sub>This page is available as an executable or viewable <strong>Jupyter Notebook</strong>:</sub></em>
<br/><br/>
<a href="https://mybinder.org/v2/gh/JetBrains/lets-plot/v1.5.2demos1?filepath=docs%2Fexamples%2Fjupyter-notebooks%2Fmap_titanic.ipynb"
   target="_parent">
   <img align="left"
        src="https://mybinder.org/badge_logo.svg">
</a>
<a href="https://nbviewer.jupyter.org/github/JetBrains/lets-plot/blob/master/docs/examples/jupyter-notebooks/map_titanic.ipynb"
   target="_parent">
   <img align="right"
        src="https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.png"
        width="109" height="20">
</a>
<br/>
<br/>

## Visualization of Titanic's voyage using Lets-Plot geocoding package.

Geocoding is the process of converting names of places into geographic coordinates.

In this notebook we will geocode Titanic's ports of embarkation to to place markers on a map.

In [1]:
import pandas as pd
import geopandas as gpd
from lets_plot import *

LetsPlot.setup_html()

## Titanic data

Titanic dataset used in this demo is the "Titanic: cleaned data" dataset (train_clean.csv only) available at [kaggle](https://www.kaggle.com/jamesleslie/titanic-cleaned-data?select=train_clean.csv).

Titanic dataset for this demo was downloaded from ["Titanic: cleaned data" dataset](https://www.kaggle.com/jamesleslie/titanic-cleaned-data?select=train_clean.csv) (train_clean.csv) available at [kaggle](https://www.kaggle.com).

In [2]:
df = pd.read_csv("../data/titanic.csv")
df.head()

Unnamed: 0,Age,Cabin,Embarked,Fare,Name,Parch,PassengerId,Pclass,Sex,SibSp,Survived,Ticket,Title,Family_Size
0,22.0,,S,7.25,"Braund, Mr. Owen Harris",0,1,3,male,1,0.0,A/5 21171,Mr,1
1,38.0,C85,C,71.2833,"Cumings, Mrs. John Bradley (Florence Briggs Th...",0,2,1,female,1,1.0,PC 17599,Mrs,1
2,26.0,,S,7.925,"Heikkinen, Miss. Laina",0,3,3,female,0,1.0,STON/O2. 3101282,Miss,0
3,35.0,C123,S,53.1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",0,4,1,female,1,1.0,113803,Mrs,1
4,35.0,,S,8.05,"Allen, Mr. William Henry",0,5,3,male,0,0.0,373450,Mr,0


## Finding coordinates of the ports

Column `Embarked`in this dataset contains a single-letter codes of Titanic's ports of embarkation:
- S: Southampton (UK)
- C: Cherbourg (France)
- Q: Cobh (Ireland)

Now let's try to find coordinates of these ports.

In [3]:
from lets_plot.geo_data import *

ports_of_embarkation = ['Southampton', 'Cherbourg', 'Cobh']

The geodata is provided by © OpenStreetMap contributors and is made available here under the Open Database License (ODbL).


### 1. Using the `regions` function.

To geocode our port cities we can try to call the `regions` function like this:

    regions(level='city', request=ports_of_embarkation)
or its equivalent:

    regions_city(request=ports_of_embarkation)

Unfortunately, this call results in a `ValueError`:

>Multiple objects (6) were found for Southampton:
>- Southampton (United Kingdom, England, South East)
>- Southampton (United States of America, New York, Suffolk County)
>- Southampton (United States of America, Massachusetts)
>- Southampton Township (United States of America, New Jersey, Burlington County)
>- Lower Southampton Township (United States of America, Pennsylvania, Bucks County)
>- Upper Southampton Township (United States of America, Pennsylvania, Bucks County)
>Multiple objects (2) were found for Cherbourg:
>- Saint-Jean-de-Cherbourg (Canada, Québec, Bas-Saint-Laurent, La Matanie)
>- Cherbourg-en-Cotentin (France, France métropolitaine, Normandie, Manche)


In [4]:
#
# This call will fail with an error shown above.
#
#regions_city(ports_of_embarkation)

### 2. Resolving geocoding ambiguity using the `within` parameter.

We can try to resolve ambiguity of the name "Southampton" (found in the United Kingdom and in the US)
and the name "Cherbourg" (found in Canada and France) by narrowing the scope of search using 
parameter `within` and function `regions_country` like this:

    regions_city(ports_of_embarkation, within=regions_country(['France', 'UK']))

But this call results in another `ValueError`:

>No objects were found for Cobh.

In [5]:
#
# This call will fail with "No objects were found for Cobh." error.
#
#regions_city(ports_of_embarkation, within=regions_country(['France', 'UK']))

An alternative way of using parameter `within` is to specify
an array of names of all the countries. 

The territory names must be in the same order 
as the names of the geocoded cities:

In [6]:
regions_city(ports_of_embarkation, within=['UK', 'France', 'Ireland'])

       request        id             found name
0  Southampton    255729            Southampton
1    Cherbourg  11624125  Cherbourg-en-Cotentin
2         Cobh  14066915                   Cobh

### 3. Using `regions_builder` for advanced geocoding.

There are many situations where a simple call of the function `regions` 
will not resolve all geocoding ambiguities.

In other cases, we might want to retrieve all objects matching a name and
not to treat names ambiguity as an error.

The `regions builder` object provides advanced capabilities in fine tuning of geocoding queries.

Let's resolve ambiguity of names "Southampton" and "Cherbourg" with the help of `regions builder`.

In [7]:
ports_of_embarkation_geocoded = regions_builder(level='city', request=ports_of_embarkation) \
        .where('Cherbourg', within='France') \
        .where('Southampton', within='England') \
        .build()
ports_of_embarkation_geocoded

       request        id             found name
0  Southampton    255729            Southampton
1    Cherbourg  11624125  Cherbourg-en-Cotentin
2         Cobh  14066915                   Cobh

## Adding geocoded markers to map

Simple markers (points) can be added to map either via `geom_point` layer
or directly on the `livemap` base-layer.

In this demo we will mark the ports of embarkation right on the `livemap` base-layer
by passing geocoded names to the parameter `map`.

In [8]:
# For our map let's use raster tiles provided by Wikimedia Foundation.
LetsPlot.set(maptiles_zxy(url='https://maps.wikimedia.org/osm-intl/{z}/{x}/{y}@2x.png'))

basemap = ggplot() + ggsize(800, 300) \
    + geom_livemap(map=ports_of_embarkation_geocoded,
                   size=7, 
                   shape=21, color='black', fill='yellow')

basemap

## The 'Titanic site' marker

In [9]:
from shapely.geometry import Point, LineString
titanic_site = Point(-38.056641, 46.920255)

titanic_site_marker = geom_point(x=titanic_site.x, y = titanic_site.y, size=10, shape=9, color='red')
basemap + titanic_site_marker

## Adding to map Titanic's path through geocoded ports

The "geocoded ports" is an object of type `Regions`. 

Object `Regions`, if necessary, can be tranfrormed to `GeoDataFrame`
by calling its `centroids()`, `boundaries()` or `limits()` method.

In [10]:
from geopandas import GeoSeries

# Transform geocoded ports to GeoDataFrame and retrieve a collection of points.
embarkation_points = ports_of_embarkation_geocoded.centroids().geometry

# Create a new GeoDataFrame containing a `LineString` geometry.
titanic_journey_points = embarkation_points.append(GeoSeries(titanic_site), ignore_index=True)
titanic_journey_gdf = gpd.GeoDataFrame(dict(geometry=[LineString(titanic_journey_points)]))

titanic_path = geom_path(map=titanic_journey_gdf, color='dark-blue', linetype='dotted', size=1.2)

basemap + titanic_path + titanic_site_marker

## The last segment that Titanic didn't made.

In [11]:
# Geocoding of The New York City is a trivial task.
NYC = regions_city(['New York']).centroids().geometry[0]

map_layers = titanic_path \
  + geom_segment(x=titanic_site.x, y=titanic_site.y, 
                 xend=NYC.x, yend=NYC.y, 
                 color='white', linetype='dotted', size=1.2) \
  + geom_point(x=NYC.x, y = NYC.y, size=7, shape=21, color='black', fill='white') \
  + titanic_site_marker

basemap + map_layers

## Survival figures by the port of embarkation

In [12]:
from lets_plot.mapping import as_discrete

bars = ggplot(df) \
    + geom_bar(aes('Embarked', fill=as_discrete('Survived')), position='dodge') \
    + scale_fill_discrete(labels=['No', 'Yes']) \
    + scale_x_discrete(labels=['Southampton', 'Cobh', 'Cherbourg'], limits=['S', 'C', 'Q'])

bars + ggsize(800, 250)

In [13]:
bars_settings = theme(axis_title='blank', 
                   axis_line='blank', 
                   axis_ticks_y='blank',
                   axis_text_y='blank',
                   legend_position=[1.12, 1.07],
                   legend_justification=[1, 1]) + scale_x_discrete(expand=[0, 0.05])


map = ggplot() + ggsize(800, 300) \
    + geom_livemap(map=ports_of_embarkation_geocoded.centroids(), 
                    size=8, 
                    shape=21, color='black', fill='yellow',
                    zoom=4, location=[-12, 48])

fig = GGBunch()
fig.add_plot(map + map_layers, 0, 0)
fig.add_plot(bars + bars_settings, 535, 135, 250, 150)
fig