<em><sub>This page is available as an executable or viewable <strong>Jupyter Notebook</strong>:</sub></em>
<br/><br/>
<a href="https://mybinder.org/v2/gh/JetBrains/lets-plot/v2.0.0demos1?filepath=docs%2Fexamples%2Fjupyter-notebooks%2Fmap_titanic.ipynb"
   target="_parent">
   <img align="left"
        src="https://mybinder.org/badge_logo.svg">
</a>
<a href="https://nbviewer.jupyter.org/github/JetBrains/lets-plot/blob/master/docs/examples/jupyter-notebooks/map_titanic.ipynb"
   target="_parent">
   <img align="right"
        src="https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.png"
        width="109" height="20">
</a>
<br/>
<br/>

## Visualization of the Titanic's voyage.

The tasks completed in this notebook:
- Load an interactive basemap layer.
- Geocode Titanic's ports of of embarkation and show them as markers on the map.
- Show the "Titanic's site" on the map.
- Geocode the Titanic destination port and show on the map.
- Connect all markers on the map with dashed lines.
- Compute a simple statistic related to the ports of of embarkation and show the plot and the map on the same figure.

We will use the [Lets-Plot for Python](https://github.com/JetBrains/lets-plot#lets-plot-for-python) library for all charting and geocoding tasks in this notebook.

The Titanic dataset for this demo was downloaded from ["Titanic: cleaned data" dataset](https://www.kaggle.com/jamesleslie/titanic-cleaned-data?select=train_clean.csv) (train_clean.csv) available at [kaggle](https://www.kaggle.com).

In [1]:
from lets_plot import *

LetsPlot.setup_html()

The geodata is provided by © OpenStreetMap contributors and is made available here under the Open Database License (ODbL).


### The ports of embarkation.

Titanic's ports of of embarkation were:
- Southampton (UK)
- Cherbourg (France)
- Cobh (Ireland)

Lets find geographical coordinates of these cities using the `Lets-Plot` geocoding package.

In [2]:
from lets_plot.geo_data import *

ports_of_embarkation = ['Southampton', 'Cherbourg', 'Cobh']

#### 1. Using the `geocode..` function.

To geocode our port cities we can try to call the `geocode()` function:

    geocode(level='city', names=ports_of_embarkation)
or its equivalent:

    geocode_cities(names=ports_of_embarkation)

Unfortunately, this call results in a `ValueError`:

>Multiple objects (6) were found for Southampton:
>- Southampton (United Kingdom, England, South East)
>- Southampton (United States of America, New York, Suffolk County)
>- Southampton (United States of America, Massachusetts)
>- Southampton Township (United States of America, New Jersey, Burlington County)
>- Lower Southampton Township (United States of America, Pennsylvania, Bucks County)
>- Upper Southampton Township (United States of America, Pennsylvania, Bucks County)
>Multiple objects (2) were found for Cherbourg:
>- Saint-Jean-de-Cherbourg (Canada, Québec, Bas-Saint-Laurent, La Matanie)
>- Cherbourg-en-Cotentin (France, France métropolitaine, Normandie, Manche)


In [3]:
#
# This call will fail with an error shown above.
#
#geocode_cities(ports_of_embarkation)

#### 2. Resolving geocoding ambiguity using the `scope()` method.

We can try to resolve ambiguity of the name "Southampton" (found in the United Kingdom and in the US)
and the name "Cherbourg" (found in Canada and France) by narrowing the scope of search using 
the `scope()` function:

    geocode_cities(ports_of_embarkation).scope(geocode_countries(['France', 'UK']))

But this call results in another `ValueError`:

>No objects were found for Cobh.

In [4]:
#
# This call will fail with "No objects were found for Cobh." error.
#
#geocode_cities(ports_of_embarkation).scope(geocode_countries(['France', 'UK']))

An alternative of resolving these geo-coding issues is to specify
the names of all "parent" countries. 

The "parent" names must be in the same order 
as the names of the geocoded cities:

In [5]:
cities_gcoder=geocode_cities(ports_of_embarkation).countries(['UK', 'France', 'Ireland'])
cities_gcoder.get_geocodes()

Unnamed: 0,id,city,found name,country
0,255729,Southampton,Southampton,UK
1,11624125,Cherbourg,Cherbourg-en-Cotentin,France
2,14066915,Cobh,Cobh Municipal District,Ireland


#### 3. Using `where()` qualifiers for advanced geocoding.

There are situations when `scope()` or "parents" methods 
will not resolve all geocoding ambiguities.

Let's resolve ambiguity of names "Southampton" and "Cherbourg" with the help of the `where()` qualifier.

In [6]:
ports_of_embarkation_gcoder = geocode_cities(ports_of_embarkation) \
        .where('Cherbourg', scope='France') \
        .where('Southampton', scope='England')
ports_of_embarkation_gcoder.get_geocodes()

Unnamed: 0,id,city,found name
0,255729,Southampton,Southampton
1,11624125,Cherbourg,Cherbourg-en-Cotentin
2,14066915,Cobh,Cobh Municipal District


### Markers on interactive base-map.

The `Lets-Plot` API makes it easy to create an interactive basemap layer using either its own vector tiles service or 
by configuring a 3-rd party ZXY raster tile providers.

In this notebook we will use beautifull *CARTO Antique* raster tiles by [CARTO](https://carto.com/attribution/) as our basemap.

Simple markers (points) can be added to the base-map either via the `geom_point` layer
or directly on the `livemap` base-layer.

In this demo we will add the ports of embarkation markers right to the `livemap` base-layer (using the `map` parameter)
and, later, add the other markers and shapes via additional `geom` layers.

In [7]:
LetsPlot.set(
    maptiles_zxy(
        url='https://cartocdn_c.global.ssl.fastly.net/base-antique/{z}/{x}/{y}@2x.png',
        attribution='<a href="https://www.openstreetmap.org/copyright">© OpenStreetMap contributors</a> <a href="https://carto.com/attributions#basemaps">© CARTO</a>, <a href="https://carto.com/attributions">© CARTO</a>'
    )
)

In [8]:
basemap = (ggplot() + ggsize(800, 300) +
           geom_livemap(map=ports_of_embarkation_gcoder,
                        size=7, 
                        shape=21, color='black', fill='yellow'))

basemap

### The 'Titanic's site' marker

In [9]:
from shapely.geometry import Point, LineString
titanic_site = Point(-38.056641, 46.920255)

# Add marker using `geom_point` geometry layer.
titanic_site_marker = geom_point(x=titanic_site.x, y = titanic_site.y, size=10, shape=9, color='red')
basemap + titanic_site_marker

### Connecting markers on map.

The `ports_of_embarkation_gcoder` variable in this demo is an object of the type `Geocoder`. 

Object `Geocoder`, if necessary, can be tranfrormed to a `GeoDataFrame`
by calling its `get_centroids()`, `get_boundaries()` or `get_limits()` method.

To create the Titanic's path we will use the `get_centroids()` method to obtain the points of embarkation and then append the "Titanic's site" point to complete the polyline.

In [10]:
from geopandas import GeoSeries
from geopandas import GeoDataFrame

# The points of embarkation
embarkation_points = ports_of_embarkation_gcoder.get_centroids().geometry
titanic_journey_points = embarkation_points.append(GeoSeries(titanic_site), ignore_index=True)

# Create a new GeoDataFrame containing a `LineString` geometry.
titanic_journey_gdf = GeoDataFrame(dict(geometry=[LineString(titanic_journey_points)]))

# Add polyline to the plot using the `geom_path` layer.
titanic_path = geom_path(map=titanic_journey_gdf, color='dark-blue', linetype='dotted', size=1.2)
basemap + titanic_path + titanic_site_marker

### The last segment that Titanic didn't made.

In [11]:
# Geocoding of The New York City is a trivial task.
NYC = geocode_cities(['New York']).get_centroids().geometry[0]

map_layers = (titanic_path 
  + geom_segment(x=titanic_site.x, y=titanic_site.y, 
                 xend=NYC.x, yend=NYC.y, 
                 color='gray', linetype='dotted', size=1.2)
  + geom_point(x=NYC.x, y=NYC.y, size=7, shape=21, color='black', fill='white')
  + titanic_site_marker)

basemap + map_layers

### Titanic's survival rates by the port of embarkation.

In [12]:
import pandas as pd

In [13]:
df = pd.read_csv("../data/titanic.csv")
df.head(3)

Unnamed: 0,Age,Cabin,Embarked,Fare,Name,Parch,PassengerId,Pclass,Sex,SibSp,Survived,Ticket,Title,Family_Size
0,22.0,,S,7.25,"Braund, Mr. Owen Harris",0,1,3,male,1,0.0,A/5 21171,Mr,1
1,38.0,C85,C,71.2833,"Cumings, Mrs. John Bradley (Florence Briggs Th...",0,2,1,female,1,1.0,PC 17599,Mrs,1
2,26.0,,S,7.925,"Heikkinen, Miss. Laina",0,3,3,female,0,1.0,STON/O2. 3101282,Miss,0


In this Titanic dataset the column `Embarked`contains a single-letter codes of the Titanic's ports of embarkation:
- S: Southampton (UK)
- C: Cherbourg (France)
- Q: Cobh (Ireland)

Lets visualize the "Survived" counts by the port of embarkation:

In [14]:
from lets_plot.mapping import as_discrete

bars = ggplot(df) \
    + geom_bar(aes('Embarked', fill=as_discrete('Survived')), position='dodge') \
    + scale_fill_discrete(labels=['No', 'Yes']) \
    + scale_x_discrete(labels=['Southampton', 'Cobh', 'Cherbourg'], limits=['S', 'C', 'Q'])

bars + ggsize(800, 250)

### The final figure.

In [15]:
bars_settings = theme(axis_title='blank', 
                   axis_line='blank', 
                   axis_ticks_y='blank',
                   axis_text_y='blank',
                   legend_position=[1.12, 1.07],
                   legend_justification=[1, 1]) + scale_x_discrete(expand=[0, 0.05])


map = ggplot() + ggsize(800, 300) \
    + geom_livemap(map=ports_of_embarkation_gcoder, 
                    size=8, 
                    shape=21, color='black', fill='yellow',
                    zoom=4, location=[-12, 48])

fig = GGBunch()
fig.add_plot(map + map_layers, 0, 0)
fig.add_plot(bars + bars_settings, 535, 135, 250, 150)
fig