# Working with geo data and locations

## Date: 2023-06-13
## Host: Rene Meissner
* **Group Data Platform Lead** in Group Data & Analytics
* UNIQA for 7 years (first EA Data, since 2022 at GDA) 
* Working with big data from begin on (1997)
* Projects and personally interested in geo and location data (GPS, OSM, Ingress) and high performance computing

## Sources: [GeoDataScience@github](https://github.com/artconscious/GeoDataScience)

## Agenda
* <a href="#/2"> Geo in Insurances </a>
* <a href="#/3"> Maps </a>
* <a href="#/4"> Geographice Information Systems </a>
* <a href="#/5"> Projections </a>
* <a href="#/6"> Satellites </a>
* <a href="#/7"> Coordinates </a>
* <a href="#/8"> Typical tasks</a>
* <a href="#/9"> Dive into location with python </a>
* <a href="#/10"> Geohash </a>
* <a href="#/12"> Maps: folium </a>
* <a href="#/13"> Heatmaps </a>
* <a href="#/14"> GeoJSON </a>
* <a href="#/15"> GeoPandas </a>


Slide 2

## A Perfect Match:
# Geo Reference Data and Insurance Business

* Policing and underwriting
* Market research, penetration 
* Risk calculation
* Claims handling
* Market projections
* …

Slide 3

## First Things First: ALL THE MAPPPS!!!

In [5]:
import folium
folium.Map(location=[48.208174, 16.373819], zoom_start = 12)

Slide 4

### Basics:
## Geographical Information System(s): GIS
Here you manage all your geo data, design your maps, and do your analysis (if you deal with geo data at least half the day) 

### Two market player:
#### ArcGis: powerful and expensive
#### QGIS: also powerful but open source

Slide 5

## Projections: How spherical locations are mapped on an 2D coordinate system
### * It depends on your country based on the North/South and East/West extention  
### * In the EU often Gauss-Krüger in different flavours is used 
### * ALWAYS make use of WGS84   
### * For all programming languages you'll find transformation libraries
### * There is a continental drift in Europe: ~0.025m per year

Slide 6

## Satellites 
### GNSS: Global Positioning System(GPS), Gallileo, Baidu, Glonass, …
### * Based on at least three satellites a receiver calculates the position on the groud
### * Pure GNSS coldstart is around 20min
### * The more sats, the higher the accuracy
![title](img/ConstellationGPS.gif)

Slide 7

## Coordinates: Latitude, Longitude, Altitude (WGS84)
### *  You can imagine the earth as an spherical X/Y coordinate system
### *  X is the Longitude (Greenwich = 0.0, East direction is positive, West directions are negative, max is 180.0/-180.0)
### *  Y is the Latitude (Equator = 0.0, North direction is positive, South negative, max is 90.0/-90.0)
### *  Notations can be Degrees, Minutes, Seconds (DMS: 48° 12' 29.426" N 16° 22' 25.748" E) vs. Decimal Degree (DD i.e. 48.208174, 16.373819) 

Slide 8

## Working with geo data (using Python)
### What are the typical tasks you have to deal with when it comes to geo data?
#### * Get an address from a location
#### * Get a location from an address
#### * Give me all clients/prospects in a specific area
#### * Based on this client/prospect list, give me geo reference data (i.e. flood risk, demographic, etc.)
#### * Find a route between these addresses (travelling salesman problem)
#### * What is the are/what type of surface based on these coordinates?
#### * …


Slide 9

## Working with geo data (using Python)
### * Locations can be represented simply as a list of two (or three) floats, in many cases 

In [6]:
import pandas as pd
df = pd.read_csv('../data/raster250-sample.csv')
df[['ID','LAT','LON','EURO_PK','IDX_AT_PK']][0:10]

Unnamed: 0,ID,LAT,LON,EURO_PK,IDX_AT_PK
0,250mN280400E479275,48.167946,16.349755,1660.190476,100.69629
1,250mN280425E479275,48.170186,16.350036,1574.405494,95.493134
2,250mN280375E479300,48.165513,16.352823,1704.590909,103.389329
3,250mN280400E479300,48.167753,16.353104,1697.57826,102.963988
4,250mN280375E479325,48.165319,16.356172,1544.832599,93.699436
5,250mN280400E479325,48.167559,16.356453,1463.046403,88.73882
6,250mN280425E479300,48.169992,16.353386,1510.522561,91.618413
7,250mN280450E479300,48.172232,16.353667,1553.287899,94.212279
8,250mN280425E479325,48.169799,16.356735,1543.557863,93.622119
9,250mN280450E479325,48.172039,16.357017,1562.110684,94.747411


Slide 10

## But!
### * In tabular and relational representation, two columns of floats are hard to handle
### * "Find neighbors", "compare locations", "are these two areas are overlapping" are expensive both in development and computation
## Introducing Geohash

In [7]:
import geohash2
print ('Geohash for 48.078218,16.290442 ',
       '\n 4 chars:', geohash2.encode(48.078218,16.290442,4),
       '\n 6 chars:', geohash2.encode(48.078218,16.290442,6))

Geohash for 48.078218,16.290442  
 4 chars: u2e9 
 6 chars: u2e9d8


Slide 11

## Think about like grouping 
### * Sorting, matching, and grouping based on strings

In [3]:
import pandas as pd
df = pd.read_csv('../data/raster250-sample.csv')
df[['ID','LAT','LON','GEOH6','IDX_AT_PK']].sort_values(by=['GEOH6'])[0:10]

Unnamed: 0,ID,LAT,LON,GEOH6,IDX_AT_PK
0,250mN280400E479275,48.167946,16.349755,u2edh0,100.69629
2,250mN280375E479300,48.165513,16.352823,u2edh0,103.389329
3,250mN280400E479300,48.167753,16.353104,u2edh0,102.963988
4,250mN280375E479325,48.165319,16.356172,u2edh0,93.699436
5,250mN280400E479325,48.167559,16.356453,u2edh0,88.73882
13,250mN280475E479325,48.174279,16.357298,u2edh1,97.696048
12,250mN280475E479300,48.174472,16.353949,u2edh1,94.824149
10,250mN280475E479275,48.174666,16.350599,u2edh1,95.221514
9,250mN280450E479325,48.172039,16.357017,u2edh1,94.747411
7,250mN280450E479300,48.172232,16.353667,u2edh1,94.212279


Slide 12

## Back to the maps
* Maps are great, especially if they are interactive – and hard to manage if you need printing versions ;)
* Browser-based maps are embeds from an online-service called "tile map server"
* A framework will do the work for you

In [9]:
#Create the Map
map_osm = folium.Map(location = [48.167559,16.356453],zoom_start = 13,height=300)
map_osm
#You Markler the point in Map
for indice, row in df[0:10].iterrows():
    folium.Marker(location=[row["LAT"], row["LON"]],popup=("Population: " + str(row['POP'])),icon=folium.map.Icon(color='blue')).add_to(map_osm)
map_osm

Slide 13

## Heatmaps

In [4]:
import folium
from folium.plugins import HeatMap
map_obj = folium.Map(location = [48.167559,16.356453], zoom_start = 14, height = 300)
lats_longs = df[0:10][['LAT','LON','POP']].to_numpy()
HeatMap(lats_longs).add_to(map_obj)
map_obj

Slide 14

## Map services
### Different styles, different availebilities, different costs
### Vector & Image maps (like sat images)

In [11]:
import folium
import pandas as pd
from folium.plugins import HeatMap
map_obj = folium.Map(
    tiles='Stamen Toner', 
    location = [48.167559,16.356453], zoom_start = 14, height = 300)
lats_longs = df[0:10][['LAT','LON','IDX_ZS_SUM']].to_numpy()
HeatMap(lats_longs).add_to(map_obj)
map_obj

In [12]:
import os

from here_location_services import LS
from here_location_services.config.routing_config import ROUTING_RETURN
from here_map_widget import Map, Marker, GeoJSON

LS_API_KEY = "4MxuGCg8m-MsjE5rjXk12UI-elDSVW2RmAAk5bx_AYM" 
# os.environ.get("LS_API_KEY")  # Get API KEY from environment.
ls = LS(api_key=LS_API_KEY)

result = ls.car_route(
    origin=[52.51375, 13.42462],
    destination=[52.52332, 13.42800],
    return_results=[
        ROUTING_RETURN.polyline,
        ROUTING_RETURN.elevation,
        ROUTING_RETURN.instructions,
        ROUTING_RETURN.actions,
    ],
)
geo_json = result.to_geojson()
data = geo_json
geo_layer = GeoJSON(data=data, style={"lineWidth": 5})

m = Map(api_key=LS_API_KEY, center=[52.5207, 13.4283], zoom=14)
origin_marker = Marker(lat=52.51375, lng=13.42462)
dest_marker = Marker(lat=52.52332, lng=13.42800)
m.add_layer(geo_layer)
m.add_object(origin_marker)
m.add_object(dest_marker)
m

Map(api_key='4MxuGCg8m-MsjE5rjXk12UI-elDSVW2RmAAk5bx_AYM', center=[52.5207, 13.4283], controls=(ZoomControl(al…

Slide 15

## GeoJSON

In [13]:
import json
output = json.dumps(data, indent=2)
line_list = output.split("\n")  
for line in line_list:
    print(line)

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "type": "LineString",
        "coordinates": [
          [
            13.424624,
            52.513748,
            76.0
          ],
          [
            13.42473,
            52.51383,
            76.0
          ],
          [
            13.42499,
            52.51402,
            76.0
          ],
          [
            13.42522,
            52.51422,
            76.0
          ],
          [
            13.42552,
            52.51448,
            76.0
          ],
          [
            13.42568,
            52.51466,
            76.0
          ],
          [
            13.42641,
            52.51546,
            76.0
          ],
          [
            13.42675,
            52.51584,
            76.0
          ],
          [
            13.4269,
            52.51602,
            77.0
          ],
          [
            13.42704,
            52.51619,
            77

Slide 15

## GeoPandas
### Pandas and Numpy backend -> Fast
### Polygones to represent shaped areas 
### Native functions like distances and overlapping included