# Query SAP table and join with local data

A first example demonstrates how to join two SAP tables with an external table. We’ll be using the [ABAP Flight Reference Scenario](https://help.sap.com/docs/ABAP_PLATFORM_NEW/fc4c71aa50014fd1b43721701471913d/a9d7c7c140a0408dbc5966c52d156b49.html), specifically joining the `SFLIGHT` and `SPFLI` tables which contain flight and flight schedule details respectively, with an external table `WEATHER` that holds weather information. We will extract flight information and associated temperatures at departure and arrival cities.

## Import DuckDB & load **ERPL** extension
In the next cells we import duckdb. Then we install the ERPL extension and load it into the current DB-session. Via multiple SET-commands we configure the connection to our SAP development system. In our case we use the docker based [ABAP Platform Trial](https://hub.docker.com/r/sapse/abap-platform-trial). The credentials are set by default, details can be found in the documentation of the docker image. 

In [1]:
library("DBI")
con = dbConnect(duckdb::duckdb(dbdir = ":memory:", config = list(allow_unsigned_extensions="true")))

In [2]:
sql = "
INSTALL 'erpl.duckdb_extension';
LOAD 'erpl';

SET sap_ashost = 'localhost';
SET sap_sysnr = '00';
SET sap_user = 'DEVELOPER';
SET sap_password = 'Htods70334';
SET sap_client = '001';
SET sap_lang = 'EN';
"
sql

dbExecute(con, sql)

If the loading of the extension was successful, we can find the exportet functions in the list of `duckdb_functions()`

In [3]:
dbGetQuery(con, "SELECT * FROM duckdb_functions() WHERE function_name LIKE '%sap%';")

database_name,schema_name,function_name,function_type,description,return_type,parameters,parameter_types,varargs,macro_definition,has_side_effects,internal,function_oid,example
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<list>,<list>,<chr>,<chr>,<lgl>,<lgl>,<dbl>,<chr>
system,main,sap_rfc_search_group,table,,,"LANGUAGE , GROUPNAME","VARCHAR, VARCHAR",,,,True,1021,
system,main,sap_rfc_invoke,table,,,"col0, path","VARCHAR, VARCHAR",ANY,,,True,1019,
system,main,sap_read_table,table,,,"col0 , MAX_ROWS, FILTER , COLUMNS , THREADS","VARCHAR , UINTEGER , VARCHAR , VARCHAR[], UINTEGER",,,,True,1029,
system,main,sap_describe_fields,table,,,col0,VARCHAR,,,,True,1027,
system,main,sap_show_tables,table,,,"THREADS , TEXT , TABLENAME","UINTEGER, VARCHAR , VARCHAR",,,,True,1025,
system,main,sap_rfc_search_function,table,,,"LANGUAGE , GROUPNAME, FUNCNAME","VARCHAR, VARCHAR, VARCHAR",,,,True,1023,
system,main,sap_rfc_ping,pragma,,,,,,,,True,1017,
system,main,sap_rfc_function_desc,pragma,,,col0,VARCHAR,,,,True,1031,
system,main,sap_rfc_set_trace_level,pragma,,,col0,INTEGER,,,,True,1033,
system,main,sap_rfc_set_trace_dir,pragma,,,col0,VARCHAR,,,,True,1035,


## Explore the schema of the relevant tables

The ERPL extension provides the method `sap_describe_fields` to explore the data dictionary schema of the respective table. For exploring local data we also can use the `DESCRIBE` command to get the fields of e.g. a CSV-file.

In [4]:
dbGetQuery(con, "SELECT * FROM sap_describe_fields('SFLIGHT');")

pos,is_key,field,text,sap_type,length,decimals,check_table,ref_table,ref_field,language
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
1,X,MANDT,Client,CLNT,3,0,T000,,,E
2,X,CARRID,Airline Code,CHAR,3,0,SCARR,,,E
3,X,CONNID,Flight Connection Number,NUMC,4,0,SPFLI,,,E
4,X,FLDATE,Flight date,DATS,8,0,,,,E
5,,PRICE,Airfare,CURR,15,2,,SFLIGHT,CURRENCY,E
6,,CURRENCY,Local currency of airline,CUKY,5,0,SCURX,,,E
7,,PLANETYPE,Aircraft Type,CHAR,10,0,SAPLANE,,,E
8,,SEATSMAX,Maximum Capacity in Economy Class,INT4,10,0,,,,E
9,,SEATSOCC,Occupied Seats in Economy Class,INT4,10,0,,,,E
10,,PAYMENTSUM,Total of current bookings,CURR,17,2,,SFLIGHT,CURRENCY,E


In [5]:
dbGetQuery(con, "SELECT * FROM sap_describe_fields('SPFLI');")

pos,is_key,field,text,sap_type,length,decimals,check_table,ref_table,ref_field,language
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
1,X,MANDT,Client,CLNT,3,0,T000,,,E
2,X,CARRID,Airline Code,CHAR,3,0,SCARR,,,E
3,X,CONNID,Flight Connection Number,NUMC,4,0,,,,E
4,,COUNTRYFR,Country Key,CHAR,3,0,SGEOCITY,,,E
5,,CITYFROM,Departure city,CHAR,20,0,SGEOCITY,,,E
6,,AIRPFROM,Departure airport,CHAR,3,0,SAIRPORT,,,E
7,,COUNTRYTO,Country Key,CHAR,3,0,SGEOCITY,,,E
8,,CITYTO,Arrival city,CHAR,20,0,SGEOCITY,,,E
9,,AIRPTO,Destination airport,CHAR,3,0,SAIRPORT,,,E
10,,FLTIME,Flight time,INT4,10,0,,,,E


In [6]:
dbGetQuery(con, "DESCRIBE SELECT * FROM 'WEATHER.csv';")

“NAs introduced by coercion”


column_name,column_type,null,key,default,extra
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
FLDATE,DATE,YES,,,
COUNTRY,VARCHAR,YES,,,
CITY,VARCHAR,YES,,,
TEMPERATURE,DOUBLE,YES,,,
CONDITION,VARCHAR,YES,,,


## Actual query

The actual SQL query joins the three tables and performs the following operations:
- Retrieves flight details from `SFLIGHT` using ERPL's `sap_read_table`, aliasing it as `f`.
- Again using ERPL's `sap_read_table` we join `SPFLI` (aliased as `s`) on `MANDT`, `CARRID`, and `CONNID` to get the flight's city of origin and destination.
- Incorporates two instances of an external weather data CSV file, `w_from` and `w_to`, matching on flight date and respective cities' country and name for departure and arrival.
- Rounds the temperature data to one decimal place for readability.
- Orders the results by `CARRIER_ID`, `CONNECTION_ID`, and `FLIGHT_DATE`.
- Limits the output to the first 25 rows for a concise view.

The output of this query will provide a comprehensive view of the flights, including their departure and arrival cities, and the corresponding temperatures, thus offering valuable insights for flight operations analysis.

In [7]:
sql = "
SELECT 
  f.CARRID,
  f.CONNID,
  f.FLDATE,
  s.CITYFROM as CITY_FROM,
  ROUND(w_from.TEMPERATURE, 1) as TEMP_FROM,
  s.CITYTO as CITY_TO,
  ROUND(w_to.TEMPERATURE, 1) as TEMP_TO,
  FROM sap_read_table('SFLIGHT') AS f
  JOIN sap_read_table('SPFLI') AS s 
      ON (f.MANDT = s.MANDT AND f.CARRID = s.CARRID AND f.CONNID = s.CONNID)
  JOIN 'WEATHER.csv' AS w_from
      ON (f.FLDATE = w_from.FLDATE AND s.COUNTRYFR = w_from.COUNTRY AND s.CITYFROM = w_from.CITY)
  JOIN 'WEATHER.csv' AS w_to
      ON (f.FLDATE = w_to.FLDATE AND s.COUNTRYTO = w_to.COUNTRY AND s.CITYTO = w_to.CITY)
  ORDER BY 1, 2, 3
  LIMIT 25
"
sql

dbGetQuery(con, sql)

CARRID,CONNID,FLDATE,CITY_FROM,TEMP_FROM,CITY_TO,TEMP_TO
<chr>,<chr>,<date>,<chr>,<dbl>,<chr>,<dbl>
AA,17,2016-11-15,NEW YORK,28.3,SAN FRANCISCO,19.8
AA,17,2017-02-03,NEW YORK,18.9,SAN FRANCISCO,17.2
AA,17,2017-04-24,NEW YORK,14.7,SAN FRANCISCO,16.2
AA,17,2017-07-13,NEW YORK,16.8,SAN FRANCISCO,22.5
AA,17,2017-10-01,NEW YORK,14.6,SAN FRANCISCO,28.3
AA,17,2017-12-20,NEW YORK,13.0,SAN FRANCISCO,21.4
AZ,555,2016-11-15,ROME,20.6,FRANKFURT,24.2
AZ,555,2017-02-03,ROME,13.1,FRANKFURT,24.1
AZ,555,2017-04-24,ROME,20.7,FRANKFURT,24.6
AZ,555,2017-07-13,ROME,14.0,FRANKFURT,16.3
