# Geo API Example

## Setup
Install the Signal Ocean SDK:
```
pip install signal-ocean
```
Set your subscription key acquired here: https://apis.signalocean.com/profile

In [1]:
signal_ocean_api_key = '' #replace with your subscription key

In [2]:
from signal_ocean import Connection
from signal_ocean.geo import GeoAPI
import numpy as np
import pandas as pd
import requests
import json

In [3]:
connection = Connection(signal_ocean_api_key)
api = GeoAPI(connection)

## Scope

In this notebook we will show how you can get all the different kinds of Geo data available and how they are inteconnected. We will provide examples for each one and make dataframes showcasing the aforementioned connection between them.

There are three kinds of Geo data:
* Areas
* Countries
* Ports

## Areas

### Get all available areas

In [4]:
all_areas = api.get_areas()

In [5]:
df_areas = pd.DataFrame([a.__dict__ for a in all_areas])

In [6]:
df_areas.head(10)

Unnamed: 0,id,name,area_type_id,parent_area_id
0,2,Arabian Gulf,1,89.0
1,3,Arctic Ocean & Barents Sea,1,25021.0
2,7,Black Sea / Sea Of Marmara,1,93.0
3,9,Caribs,1,25019.0
4,10,East Coast Central America,1,25019.0
5,12,East Coast Canada,1,25019.0
6,13,East Coast Mexico,1,25019.0
7,15,East Coast South America,1,25019.0
8,16,Australia / New Zealand,1,25020.0
9,17,China / Taiwan,1,99.0


#### Explore different area types and how they are related.

First we need to see what area types exist.

In [7]:
df_areas.area_type_id.unique()

array([1, 3, 2, 0, 4], dtype=int64)

Get a sample for each area type

In [8]:
df_areas[df_areas['area_type_id'] == 0].head()

Unnamed: 0,id,name,area_type_id,parent_area_id
31,24583,Black Sea,0,7.0
32,24584,Sea of Marmara,0,7.0
33,24594,Baltic Sea Upper,0,25008.0
34,24598,West Coast Central America,0,38.0
35,24600,East Coast Central America,0,10.0


In [9]:
df_areas[df_areas['area_type_id'] == 1].head()

Unnamed: 0,id,name,area_type_id,parent_area_id
0,2,Arabian Gulf,1,89.0
1,3,Arctic Ocean & Barents Sea,1,25021.0
2,7,Black Sea / Sea Of Marmara,1,93.0
3,9,Caribs,1,25019.0
4,10,East Coast Central America,1,25019.0


In [10]:
df_areas[df_areas['area_type_id'] == 2].head()

Unnamed: 0,id,name,area_type_id,parent_area_id
24,89,Arabian Gulf,2,84.0
25,93,Black Sea / Sea Of Marmara,2,25028.0
26,99,Far East,2,84.0
27,103,Red Sea,2,84.0
28,111,West Coast Central America,2,85.0


In [11]:
df_areas[df_areas['area_type_id'] == 3]

Unnamed: 0,id,name,area_type_id,parent_area_id
22,84,East,3,
23,85,Pacific America,3,
209,25027,Africa,3,
210,25028,West,3,


In [12]:
df_areas[df_areas['area_type_id'] == 4].head()

Unnamed: 0,id,name,area_type_id,parent_area_id
84,24894,SAM_URUG,4,
85,24895,BRAZIL,4,
86,24896,SAF_URUG_2,4,
87,24897,URUG,4,
88,24898,SAF,4,


Areas of types 0 through 3 are connected with a parent child relationship. This relationship is established with the **parent_id**. A **parent_id** of an area corresponds to the **id** of an area of the next type. Notice how area types 3 and 4 don't have a parent id.

This can create a tree-like structure.

**Example** 
* West
    * Black Sea / Sea Of Marmara
    * Atlantic America
    * Baltic / North Sea
    * Mediterranean / UK Continent 
        * UK Continent
        * Mediterranean
            * West Mediterranean
            * East Mediterranean
            * Central Mediterranean

#### Get a dataframe with all area level names

We can get a dataframe with the complete area tree by gradually merging an area type with the one below it.

In [13]:
df_areas_all = df_areas[df_areas['area_type_id'] == 3].merge(df_areas,
                                                             how = 'left',
                                                             left_on = 'id',
                                                             right_on = 'parent_area_id',
                                                             suffixes = ['_type3','_type2'])[['id_type3',
                                                                                              'name_type3',
                                                                                              'id_type2',
                                                                                              'name_type2']]\
                                                      .merge(df_areas,
                                                             how = 'left',
                                                             left_on = 'id_type2',
                                                             right_on = 'parent_area_id')[['id_type3',
                                                                                           'name_type3',
                                                                                           'id_type2',
                                                                                           'name_type2',
                                                                                           'id',
                                                                                           'name']]\
                                                      .merge(df_areas,
                                                             how = 'left',
                                                             left_on = 'id',
                                                             right_on = 'parent_area_id',
                                                             suffixes = ['_type1','_type0'])[['id_type3',
                                                                                              'name_type3',
                                                                                              'id_type2',
                                                                                              'name_type2',
                                                                                              'id_type1',
                                                                                              'name_type1',
                                                                                              'id_type0',
                                                                                              'name_type0']]


In [14]:
df_areas_all.head(10)

Unnamed: 0,id_type3,name_type3,id_type2,name_type2,id_type1,name_type1,id_type0,name_type0
0,84,East,89,Arabian Gulf,2,Arabian Gulf,24777,Arabian Gulf
1,84,East,99,Far East,17,China / Taiwan,24666,North China
2,84,East,99,Far East,17,China / Taiwan,24725,South China
3,84,East,99,Far East,17,China / Taiwan,24726,Central China
4,84,East,99,Far East,17,China / Taiwan,24729,Taiwan
5,84,East,99,Far East,19,Korea / Japan,24715,Korea
6,84,East,99,Far East,19,Korea / Japan,24730,Japan Island
7,84,East,99,Far East,20,Pacific Islands,24782,Pacific Islands
8,84,East,99,Far East,22,Russian Pacific,24714,Russian Pacific
9,84,East,99,Far East,23,South East Asia,24655,Singapore / Malaysia


From there we can recreate the area tree example from above, by filetring a specific type 3 area.

In [15]:
df_areas_all[df_areas_all['name_type3'] == 'West']

Unnamed: 0,id_type3,name_type3,id_type2,name_type2,id_type1,name_type1,id_type0,name_type0
32,25028,West,93,Black Sea / Sea Of Marmara,7,Black Sea / Sea Of Marmara,24583,Black Sea
33,25028,West,93,Black Sea / Sea Of Marmara,7,Black Sea / Sea Of Marmara,24584,Sea of Marmara
34,25028,West,25019,Atlantic America,9,Caribs,24746,Caribs
35,25028,West,25019,Atlantic America,10,East Coast Central America,24600,East Coast Central America
36,25028,West,25019,Atlantic America,12,East Coast Canada,24732,Great Lakes
37,25028,West,25019,Atlantic America,12,East Coast Canada,24740,Canada Atlantic Coast
38,25028,West,25019,Atlantic America,13,East Coast Mexico,24671,East Coast Mexico
39,25028,West,25019,Atlantic America,15,East Coast South America,24767,Brazil
40,25028,West,25019,Atlantic America,15,East Coast South America,24769,Argentina & Uruguay
41,25028,West,25019,Atlantic America,32,US Atlantic Coast,24747,US Atlantic Coast


## Countries

### Get available countries

In [16]:
all_countries = api.get_countries()

In [17]:
df_countries = pd.DataFrame([a.__dict__ for a in all_countries])

In [18]:
df_countries.head(10)

Unnamed: 0,id,name,country_code,country_code_numeric,country_code_iso3
0,3,Azores,1A,999,AZO
1,4,Czechoslovakia,1C,999,CSK
2,5,Madeira,1M,999,PMD
3,6,Neutral Zone (between Saudi Arabia & Iraq),1N,999,NTZ
4,7,Canary Islands,2C,999,CNI
5,8,Andorra,AD,20,AND
6,9,United Arab Emirates,AE,784,ARE
7,10,Afghanistan,AF,4,AFG
8,11,Antigua and Barbuda,AG,28,ATG
9,12,Anguilla,AI,660,AIA


## Ports

### Get availabe ports

In [19]:
all_ports = api.get_ports()

In [20]:
df_ports = pd.DataFrame([a.__dict__ for a in all_ports])

In [21]:
df_ports.head(10)

Unnamed: 0,id,country_id,area_id,name,latitude,longitude
0,3153,9,24777,Fujairah,25.1975,56.3839
1,3154,9,24777,Das Island,25.1334,52.92
2,3155,9,24777,Fateh Terminal,25.5771,54.4685
3,3156,9,24777,Mina Saqr,25.9747,55.9403
4,3157,9,24777,Jebel Ali,25.0071,55.0475
5,3158,9,24777,Jebel Dhanna,24.2157,52.6739
6,3159,9,24777,Mubarras Island,24.435,53.5271
7,3160,9,24777,Ruwais,24.1731,52.7152
8,3161,9,24777,Sharjah,25.364,55.3678
9,3162,9,24777,Zirku Island,25.0147,52.9926


## Using all of the data above

### Get all ports in a specific level 0 area

In [22]:
area_name = 'US Gulf'

In [23]:
df_Ports_In_Area = df_areas[  (df_areas['name'] == area_name ) 
                            & (df_areas['area_type_id'] == 0)].merge(df_ports,
                                                                     how = 'left',
                                                                     left_on = 'id',
                                                                     right_on = 'area_id',
                                                                     suffixes = ['_area','_port'])[['id_port','name_port','id_area','name_area']]

In [24]:
df_Ports_In_Area.head(10)

Unnamed: 0,id_port,name_port,id_area,name_area
0,3838,Baton Rouge,24676,US Gulf
1,3841,Brownsville,24676,US Gulf
2,3845,Corpus Christi,24676,US Gulf
3,3849,Freeport (Texas),24676,US Gulf
4,3850,Galveston,24676,US Gulf
5,3851,Garyville,24676,US Gulf
6,3853,Houston,24676,US Gulf
7,3856,Lake Charles,24676,US Gulf
8,3858,Loop,24676,US Gulf
9,3861,Mobile,24676,US Gulf


### Get all ports in a specific country

In [25]:
country_name = 'United Arab Emirates'

In [26]:
df_Ports_In_Country = df_countries[df_countries['name'] == country_name].merge(df_ports,
                                                                               how = 'left',
                                                                               left_on = 'id',
                                                                               right_on = 'country_id',
                                                                               suffixes = ['_country','_port'])[['id_port','name_port','id_country','name_country']]

In [27]:
df_Ports_In_Country.head(10)

Unnamed: 0,id_port,name_port,id_country,name_country
0,3153,Fujairah,9,United Arab Emirates
1,3154,Das Island,9,United Arab Emirates
2,3155,Fateh Terminal,9,United Arab Emirates
3,3156,Mina Saqr,9,United Arab Emirates
4,3157,Jebel Ali,9,United Arab Emirates
5,3158,Jebel Dhanna,9,United Arab Emirates
6,3159,Mubarras Island,9,United Arab Emirates
7,3160,Ruwais,9,United Arab Emirates
8,3161,Sharjah,9,United Arab Emirates
9,3162,Zirku Island,9,United Arab Emirates


### Get all areas in a country

Some countries can span across multiple type 0 areas. We can make that connection through **ports**.

In [28]:
country_name = 'United States'

In [29]:
df_countries[df_countries['name'] == country_name].merge(df_ports,
                                                         how = 'left',
                                                         left_on = 'id',
                                                         right_on = 'country_id',
                                                         suffixes = ['_country',None])[['id','name','id_country','name_country','area_id']]\
                                                  .merge(df_areas,
                                                         how = 'left',
                                                         left_on = 'area_id',
                                                         right_on = 'id',
                                                         suffixes = (None, '_area'))[['id_country','name_country','id_area', 'name_area']].name_area.unique()

array(['US Atlantic Coast', 'US North Pacific', 'Alaska', 'US Gulf',
       'US West Coast', 'Pacific Islands', 'Great Lakes', 'US Mainland'],
      dtype=object)

#### Find the top 10 Countries with the most Ports

In [30]:
df_Ports_country_count =  df_ports.merge(df_countries,
                                         how = 'left',
                                         left_on = 'country_id',
                                         right_on = 'id',
                                         suffixes =['_port','_country'])[['id_port','name_port','id_country','name_country']] 

In [31]:
df_Ports_country_count.name_country.value_counts().head(10)

United States     195
Japan             158
Indonesia         142
China             116
Canada             89
United Kingdom     85
Australia          75
Norway             72
Brazil             58
Philippines        57
Name: name_country, dtype: int64