# Query SAP table and join with local data

A first example demonstrates how to join two SAP tables with an external table. We’ll be using the [ABAP Flight Reference Scenario](https://help.sap.com/docs/ABAP_PLATFORM_NEW/fc4c71aa50014fd1b43721701471913d/a9d7c7c140a0408dbc5966c52d156b49.html), specifically joining the `SFLIGHT` and `SPFLI` tables which contain flight and flight schedule details respectively, with an external table `WEATHER` that holds weather information. We will extract flight information and associated temperatures at departure and arrival cities.

## Import DuckDB & load **ERPL** extension
In the next cells we import duckdb. Then we install the ERPL extension and load it into the current DB-session. Via multiple SET-commands we configure the connection to our SAP development system. In our case we use the docker based [ABAP Platform Trial](https://hub.docker.com/r/sapse/abap-platform-trial). The credentials are set by default, details can be found in the documentation of the docker image. 

In [1]:
import duckdb

In [2]:
con = duckdb.connect(config={"allow_unsigned_extensions": "true"})
con.install_extension("./erpl.duckdb_extension");
con.load_extension("erpl");
con.sql("""
SET sap_ashost = 'localhost';
SET sap_sysnr = '00';
SET sap_user = 'DEVELOPER';
SET sap_password = 'Htods70334';
SET sap_client = '001';
SET sap_lang = 'EN';
""");

If the loading of the extension was successful, we can find the exportet functions in the list of `duckdb_functions()`

In [3]:
con.sql("SELECT * FROM duckdb_functions() WHERE function_name LIKE '%sap%';")

┌───────────────┬─────────────┬──────────────────────┬───────────────┬───┬──────────┬──────────────┬─────────┐
│ database_name │ schema_name │    function_name     │ function_type │ … │ internal │ function_oid │ example │
│    varchar    │   varchar   │       varchar        │    varchar    │   │ boolean  │    int64     │ varchar │
├───────────────┼─────────────┼──────────────────────┼───────────────┼───┼──────────┼──────────────┼─────────┤
│ system        │ main        │ sap_rfc_invoke       │ table         │ … │ true     │         1394 │ NULL    │
│ system        │ main        │ sap_rfc_search_group │ table         │ … │ true     │         1396 │ NULL    │
│ system        │ main        │ sap_rfc_search_fun…  │ table         │ … │ true     │         1398 │ NULL    │
│ system        │ main        │ sap_show_tables      │ table         │ … │ true     │         1400 │ NULL    │
│ system        │ main        │ sap_describe_fields  │ table         │ … │ true     │         1402 │ NULL    │
│

## Explore the schema of the relevant tables

The ERPL extension provides the method `sap_describe_fields` to explore the data dictionary schema of the respective table. For exploring local data we also can use the `DESCRIBE` command to get the fields of e.g. a CSV-file.

In [4]:
con.sql("SELECT * FROM sap_describe_fields('SFLIGHT');")

┌─────────┬─────────┬────────────┬──────────────────────┬───┬─────────────┬───────────┬───────────┬──────────┐
│   pos   │ is_key  │   field    │         text         │ … │ check_table │ ref_table │ ref_field │ language │
│ varchar │ varchar │  varchar   │       varchar        │   │   varchar   │  varchar  │  varchar  │ varchar  │
├─────────┼─────────┼────────────┼──────────────────────┼───┼─────────────┼───────────┼───────────┼──────────┤
│ 0001    │ X       │ MANDT      │ Client               │ … │ T000        │           │           │ E        │
│ 0002    │ X       │ CARRID     │ Airline Code         │ … │ SCARR       │           │           │ E        │
│ 0003    │ X       │ CONNID     │ Flight Connection …  │ … │ SPFLI       │           │           │ E        │
│ 0004    │ X       │ FLDATE     │ Flight date          │ … │             │           │           │ E        │
│ 0005    │         │ PRICE      │ Airfare              │ … │             │ SFLIGHT   │ CURRENCY  │ E        │
│

In [5]:
con.sql("SELECT * FROM sap_describe_fields('SPFLI');")

┌─────────┬─────────┬───────────┬──────────────────────┬───┬──────────┬─────────────┬───────────┬───────────┬──────────┐
│   pos   │ is_key  │   field   │         text         │ … │ decimals │ check_table │ ref_table │ ref_field │ language │
│ varchar │ varchar │  varchar  │       varchar        │   │ varchar  │   varchar   │  varchar  │  varchar  │ varchar  │
├─────────┼─────────┼───────────┼──────────────────────┼───┼──────────┼─────────────┼───────────┼───────────┼──────────┤
│ 0001    │ X       │ MANDT     │ Client               │ … │ 000000   │ T000        │           │           │ E        │
│ 0002    │ X       │ CARRID    │ Airline Code         │ … │ 000000   │ SCARR       │           │           │ E        │
│ 0003    │ X       │ CONNID    │ Flight Connection …  │ … │ 000000   │             │           │           │ E        │
│ 0004    │         │ COUNTRYFR │ Country Key          │ … │ 000000   │ SGEOCITY    │           │           │ E        │
│ 0005    │         │ CITYFROM  

In [6]:
con.sql("DESCRIBE SELECT * FROM 'WEATHER.csv'")

┌─────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │  null   │   key   │ default │  extra  │
│   varchar   │   varchar   │ varchar │ varchar │ varchar │ varchar │
├─────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
│ FLDATE      │ DATE        │ YES     │ NULL    │ NULL    │ NULL    │
│ COUNTRY     │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ CITY        │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
│ TEMPERATURE │ DOUBLE      │ YES     │ NULL    │ NULL    │ NULL    │
│ CONDITION   │ VARCHAR     │ YES     │ NULL    │ NULL    │ NULL    │
└─────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘

## Actual query

The actual SQL query joins the three tables and performs the following operations:
- Retrieves flight details from `SFLIGHT` using ERPL's `sap_read_table`, aliasing it as `f`.
- Again using ERPL's `sap_read_table` we join `SPFLI` (aliased as `s`) on `MANDT`, `CARRID`, and `CONNID` to get the flight's city of origin and destination.
- Incorporates two instances of an external weather data CSV file, `w_from` and `w_to`, matching on flight date and respective cities' country and name for departure and arrival.
- Rounds the temperature data to one decimal place for readability.
- Orders the results by `CARRIER_ID`, `CONNECTION_ID`, and `FLIGHT_DATE`.
- Limits the output to the first 25 rows for a concise view.

The output of this query will provide a comprehensive view of the flights, including their departure and arrival cities, and the corresponding temperatures, thus offering valuable insights for flight operations analysis.

In [7]:
con.sql("""
SELECT 
  f.CARRID,
  f.CONNID,
  f.FLDATE,
  s.CITYFROM as CITY_FROM,
  ROUND(w_from.TEMPERATURE, 1) as TEMP_FROM,
  s.CITYTO as CITY_TO,
  ROUND(w_to.TEMPERATURE, 1) as TEMP_TO,
  FROM sap_read_table('SFLIGHT') AS f
  JOIN sap_read_table('SPFLI') AS s 
      ON (f.MANDT = s.MANDT AND f.CARRID = s.CARRID AND f.CONNID = s.CONNID)
  JOIN 'WEATHER.csv' AS w_from
      ON (f.FLDATE = w_from.FLDATE AND s.COUNTRYFR = w_from.COUNTRY AND s.CITYFROM = w_from.CITY)
  JOIN 'WEATHER.csv' AS w_to
      ON (f.FLDATE = w_to.FLDATE AND s.COUNTRYTO = w_to.COUNTRY AND s.CITYTO = w_to.CITY)
  ORDER BY 1, 2, 3
  LIMIT 25
""")

┌─────────┬─────────┬────────────┬───────────┬───────────┬───────────────┬─────────┐
│ CARRID  │ CONNID  │   FLDATE   │ CITY_FROM │ TEMP_FROM │    CITY_TO    │ TEMP_TO │
│ varchar │ varchar │    date    │  varchar  │  double   │    varchar    │ double  │
├─────────┼─────────┼────────────┼───────────┼───────────┼───────────────┼─────────┤
│ AA      │ 0017    │ 2016-11-15 │ NEW YORK  │      28.3 │ SAN FRANCISCO │    19.8 │
│ AA      │ 0017    │ 2017-02-03 │ NEW YORK  │      18.9 │ SAN FRANCISCO │    17.2 │
│ AA      │ 0017    │ 2017-04-24 │ NEW YORK  │      14.7 │ SAN FRANCISCO │    16.2 │
│ AA      │ 0017    │ 2017-07-13 │ NEW YORK  │      16.8 │ SAN FRANCISCO │    22.5 │
│ AA      │ 0017    │ 2017-10-01 │ NEW YORK  │      14.6 │ SAN FRANCISCO │    28.3 │
│ AA      │ 0017    │ 2017-12-20 │ NEW YORK  │      13.0 │ SAN FRANCISCO │    21.4 │
│ AZ      │ 0555    │ 2016-11-15 │ ROME      │      20.6 │ FRANKFURT     │    24.2 │
│ AZ      │ 0555    │ 2017-02-03 │ ROME      │      13.1 │ FRANKF

## Creating artifical weather information
Of course we did not use real weather information (just in case you were confused), but used the following code to create the CSV file.

In [8]:
import numpy.random as npr

df_weather = con.sql("""
SELECT DISTINCT
  f.FLDATE,
  s.COUNTRYFR as COUNTRY,
  s.CITYFROM as CITY,
  FROM sap_read_table('SFLIGHT') as f
  JOIN sap_read_table('SPFLI') as s 
      ON (f.MANDT = s.MANDT AND f.CARRID = s.CARRID AND f.CONNID = s.CONNID)
UNION
SELECT DISTINCT
  f.FLDATE,
  s.COUNTRYTO as COUNTRY,
  s.CITYTO as CITY
  FROM sap_read_table('SFLIGHT') as f
  JOIN sap_read_table('SPFLI') as s 
      ON (f.MANDT = s.MANDT AND f.CARRID = s.CARRID AND f.CONNID = s.CONNID)
""").to_df()

weather_descriptions = [
    "clear sky",
    "few clouds",
    "scattered clouds",
    "broken clouds",
    "shower rain",
    "rain",
    "thunderstorm",
    "snow",
    "mist",
    "thunderstorm with light rain",
    "thunderstorm with rain",
    "thunderstorm with heavy rain",
    "light thunderstorm",
    "thunderstorm",
    "heavy thunderstorm",
    "ragged thunderstorm",
    "thunderstorm with light drizzle",
    "thunderstorm with drizzle",
    "thunderstorm with heavy drizzle",
    "light intensity drizzle",
    "drizzle",
    "heavy intensity drizzle",
    "light intensity drizzle rain",
    "drizzle rain",
    "heavy intensity drizzle rain",
    "shower rain and drizzle",
    "heavy shower rain and drizzle",
    "shower drizzle",
    "light rain",
    "moderate rain",
    "heavy intensity rain",
    "very heavy rain",
    "extreme rain",
    "freezing rain",
    "light intensity shower rain",
    "shower rain",
    "heavy intensity shower rain",
    "ragged shower rain",
    "light snow",
    "snow",
    "heavy snow",
    "sleet",
    "light shower sleet",
    "shower sleet",
    "light rain and snow",
    "rain and snow",
    "light shower snow",
    "shower snow",
    "heavy shower snow"
]

df_weather["TEMPERATURE"] = npr.normal(loc=20., scale=5., size=len(df_weather))
df_weather["CONDITION"] = npr.choice(weather_descriptions, size=len(df_weather))

df_weather.to_csv("./WEATHER.csv", index=False)