## Import data from the source files

--> Data from https://github.com/krlawrence/graph/raw/master/sample-data/ is available under the Apache 2.0 license. Courtesy of Kelvin Lawrence.

In [None]:
import pandas as pd

df_nodes=pd.read_csv('https://github.com/krlawrence/graph/raw/master/sample-data/air-routes-latest-nodes.csv')
df_edges=pd.read_csv('https://github.com/krlawrence/graph/raw/master/sample-data/air-routes-latest-edges.csv')

Preview a Pandas DataFrame with imported data

In [None]:
df_nodes.head(3)

### Create a Pandas DataFrame (`df_ports`) with airports only

In [None]:
df_nodes.dtypes

The DataFrame contains different types of data that are using different `~label` values:

In [None]:
df_nodes.groupby('~label').size()

1. Keep only records with `airport` labels.
2. Remove unnecessary columns.

In [None]:
df_ports=(
           df_nodes[df_nodes['~label'].isin(['airport'])]
           .drop(['~label','type:string','author:string','date:string'], axis=1)
           .convert_dtypes()
          )

In [None]:
print(df_ports.dtypes)

Clean up the column names

In [None]:
df_ports.columns=(df_ports.columns
                   .str.replace('~','')
                   .str.split(':').str[0]
                   .str.upper()
                  )

In [None]:
df_ports.dtypes

### Create a Pandas DataFrame (`df_routes`) with connections between the airports only

In [None]:
df_edges.dtypes

In [None]:
df_edges.groupby('label').size()

1. Keep only records with `route` labels.
2. Remove unnecessary column `~label`.

In [None]:
df_routes=df_edges[df_edges['~label'].isin(['route'])].drop(['~label'], axis=1).convert_dtypes()

Clean up the column names

In [None]:
df_routes.columns=df_routes.columns.str.replace('~','').str.split(':').str[0].str.upper()

In [None]:
df_routes.dtypes

## Upload into your SAP HANA database

In [None]:
import os, hana_ml
print(hana_ml.__version__)

In [None]:
os.environ["HDB_USE_IDENT"]=os.getenv("WORKSPACE_ID")
print(os.getenv("HDB_USE_IDENT"))

In [None]:
from hana_ml import dataframe as hdf

In [None]:
myconn=hdf.ConnectionContext(userkey='myDevChallenger')
print("SAP HANA DB version: ", myconn.hana_version())

Upload data from a Pandas DataFrame to a SAP HANA database table and return an SAP HANA DataFrame `hdf_*`: https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/2024_2_QRC/en-US/hana_ml.dataframe.html#hana_ml.dataframe.create_dataframe_from_pandas

In [None]:
hdf_ports=hdf.create_dataframe_from_spark(
    connection_context=myconn,
    pandas_df=df_ports,
    table_name="PORTS",
    force=True
)

In [None]:
hdf_routes=hdf.create_dataframe_from_spark(
    connection_context=myconn, 
    pandas_df=df_routes, 
    table_name='ROUTES',
    force=True
)

### Data exploration using HANA DataFrames

Return a dictionary format of a table structure behind the HANA DataFrame: https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/2024_2_QRC/en-US/hana_ml.dataframe.html#hana_ml.dataframe.DataFrame.get_table_structure

In [None]:
hdf_ports.set_table_structure()

**What is the airport with the longest runway?**

Note the use of:
- [select()](https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/2024_2_QRC/en-US/hana_ml.dataframe.html#hana_ml.dataframe.DataFrame.select)
- [sort()](https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/2024_2_QRC/en-US/hana_ml.dataframe.html#hana_ml.dataframe.DataFrame.sort)
- [head()](https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/2024_2_QRC/en-US/hana_ml.dataframe.html#hana_ml.dataframe.DataFrame.head)
- [collect()](https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/2024_2_QRC/en-US/hana_ml.dataframe.html#hana_ml.dataframe.DataFrame.collect)

In [None]:
(
    hdf_ports
    .select("CODE", "DESC", "LONGEST", "COUNTRY", "CITY")
    .sort("LONGEST", desc=True)
    .head().collect()
)

**What country has an airport with the highest number of runways?**

Note the use of:
- [agg()](https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/2024_2_QRC/en-US/hana_ml.dataframe.html#hana_ml.dataframe.DataFrame.agg)

In [None]:
(
    hdf_ports
    .agg(
        agg_list=[("max", "RUNWAYS", "MAXRUNWAYS")], 
        group_by="COUNTRY"
    )
    .sort(MAXRUNWAYS, desc=True)
    .head().collect()
)

**What is the airport closest to either the North or South Pole?**

Note the use of the calculated column `ABSOLUTE_LATITUDE` in a `select()`

In [None]:
(
    hdf_ports
    .select(
        "CODE", "DESC", "COUNTRY", "CITY", "LAT", "LON",
        ('ABS("LAT")', "ABSOLUTE_LATITUDE")
    )
    .sort("ABSOLUTE_ATITUDE", desc=True).head()
    .collect()
)

**How far are the 3 southernmost airports from the South Pole?**

Note the use of:
- constructor [`ST_Point()`](https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-spatial-reference/st-point-double-double-integer-constructor?version=2024_2_QRC&locale=en-US) (note that it requires a key word `NEW` in front of it like in the object-oriented approach)
- method [`ST_Distance()`](https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-spatial-reference/st-distance-method)
- [Spatial Reference Identifier (SRID)](https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-spatial-reference/spatial-reference-systems-srs-and-spatial-reference-identifiers-srid) `4326` to make points and calculations on the Round Earth, and not a 2D projection
- [unit of measure](https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-spatial-reference/units-of-measure) for the distance result

In [None]:
SRID=4326
UNIT_OF_MEASURE='kilometer'

(
    df_ports
    .select(
        "CODE", "DESC", "COUNTRY", "CITY", "LAT", "LON", 
        (f'''NEW ST_Point("LON", -90, {SRID}).ST_Distance(NEW ST_Point("LON", "LAT", {SRID}), '{UNIT_OF_MEASURE}')''', f"DISTANCE_FROM_SOUTHPOLE_IN_{UNIT_OF_MEASURE}")
    )
    .sort(f"DISTANCE_FROM_SOUTHPOLE_IN_{UNIT_OF_MEASURE}", desc=False).head(3)
    .collect()
)

👉 **Spatial units of measure** that can be used in queries are listed in the [system view `ST_UNITS_OF_MEASURE`](https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-sql-reference-guide/st-units-of-measure-system-view?version=2024_2_QRC&locale=en-US).

In the query above you used the `kilometer`, but you can try some other units as well.

In [None]:
myconn.table("ST_UNITS_OF_MEASURES", schema="PUBLIC").collect()

👉 **Spatial reference systems** that can be used in queries are listed in the [system view `ST_SPATIAL_REFERENCE_SYSTEMS`](https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-sql-reference-guide/st-spatial-reference-systems-system-view?version=2024_2_QRC&locale=en-US).

In the query above you used `4326`, which is the spatial reference system id (or **SRID**) standard used in cartography, geodesy, and satellite navigation including GPS: https://epsg.io/4326.

In [None]:
myconn.table("ST_SPATIAL_REFERENCE_SYSTEMS", schema="PUBLIC").collect()

**What are the two closest airport?**

Note the use of:
- [join()](https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/2024_2_QRC/en-US/hana_ml.dataframe.html#hana_ml.dataframe.DataFrame.join)

In [None]:
(
    hdf_routes
    .sort("DIST", desc=False)
    .head()
    .collect()
)

In [None]:
(
    hdf_routes.sort("DIST", desc=False).head()
    .alias('L1').join(hdf_ports.select(("ID", "FROM_ID"), "ICAO", "DESC").alias('R1'), 'L1."FROM" = R1."FROM_ID"')
    .alias('L2').join(hdf_ports.select(("ID", "TO_ID"), "ICAO", "DESC").alias('R2'), 'L2."TO" = R2."TO_ID"')
    .collect().iloc[:, 3:]
)

Closer inspection on the map explains such a short distance between the two airports: https://en.mapy.cz/zakladni?l=0&x=-2.9290799&y=59.3518237&z=14

## Create SAP HANA graph workspace

Use [`hana_ml.graph`](https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/2024_2_QRC/en-US/hana_ml.graph.html#module-hana_ml.graph) from the Python Machine Learning Client for SAP HANA:
* [create_graph_from_dataframes()](https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/2024_2_QRC/en-US/hana_ml.graph.html#hana_ml.graph.create_graph_from_dataframes) to model a [graph workspace](https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-graph-reference/sap-hana-graph-data-model)
* [discover_graph_workspaces()](https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/2024_2_QRC/en-US/hana_ml.graph.html#hana_ml.graph.discover_graph_workspaces) to check existing [graphs workspace artifacts](https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-graph-reference/graph-metadata-views) in SAP HANA database

In [None]:
import hana_ml.graph

In [None]:
hgws_airroutes = (
    hana_ml.graph.create_graph_from_dataframes(
        connection_context=myconn, 
        workspace_name='AIRROUTES_DFH',
        
        vertices_df=hdf_ports,
        vertex_key_column="ID", 
        
        edges_df=hdf_routes, 
        edge_key_column="ID",
        edge_source_column="FROM", edge_target_column="TO"
    )
)

In [None]:
hana_ml.graph.discover_graph_workspaces(myconn)

### Exploring the graph's...

...[vertices](https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/2024_2_QRC/en-US/hana_ml.graph.html#hana_ml.graph.Graph.vertices) (nodes):

In [None]:
hgws_airroutes.vertices(vertex_key=313)

...[edges](https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/2024_2_QRC/en-US/hana_ml.graph.html#hana_ml.graph.Graph.edges) (connections):

In [None]:
hgws_airroutes.edges(vertex_key=313, direction='incoming').head(5)