# Introduction
This example code requires that you have followed all the steps in the [README](../README.md)

If you are opening this file in a code editor that supports Jupyter notebooks you can run each cell one by one and see the results.

## Define variables
In the code below you will have to define the variables that will be used in the code blocks below.

In [None]:
# --------------------------------------------------
# Maskinporten
# --------------------------------------------------
maskinporten_private_key_file_path = "./private_key.pem" # Path to the private key pem file
maskinporten_client_key_id = "" # The ID of the key the private key corresponds to
maskinporten_client_id = "" # The ID of the client you created for Maskinporten in Selvbetjeningsportalen
maskinporten_audience="" # The audience for Sky/Maskinporten. https://sky.maskinporten.no / https://test.sky.maskinporten.no
maskinporten_scope = ""  # The scope you received from DSB
maskinporten_resource = "" # The resource identifier / audience you received from DSB

# --------------------------------------------------
# Delta Sharing
# --------------------------------------------------
delta_sharing_endpoint = "" # The delta sharing endpoint you received from DSB

In [None]:
# --------------------------------------------------
# Jupyter Notebook setup
# The code here is only for automatically reloading modules if they change.
# --------------------------------------------------
%load_ext autoreload
%autoreload 2

## Acquire an access token from Skyporten / Maskinporten
In the code below we will request an access token from Skyporten / Maskinporten.

The access token is stored in the `access_token` variable and is used in the subsequent code blocks in this file.

The access token is also written to the output of the code cell so you can see it in both JSON and bearer/base64 format.

In [None]:
# Import the helper function in the lib folder to get an access token from Maskinporten
from lib.maskinporten import get_maskinporten_access_token

# Read the private key from a file
private_key = open(maskinporten_private_key_file_path, "rb").read()

# Request an access token from Maskinporten
access_token = get_maskinporten_access_token(
    key_id=maskinporten_client_key_id,
    client_id=maskinporten_client_id,
    audience=maskinporten_audience,
    scope=maskinporten_scope,
    resource=maskinporten_resource,
    private_key=private_key,
)

# Decode and print the access token
import jwt
decoded = jwt.decode(
    access_token,
    options={"verify_signature": False},
    algorithms=["RS256"],
)

print("Decoded access token:")
import json
print(json.dumps(decoded, indent=2))
print()

print("Access token in base64 format:")
print(f"{access_token}")
print()

print(f"The sub-value '{decoded['sub']}' is what you need to send to DSB")

## Delta sharing
The code below will use the `access_token` variable from the previous code block to create a delta sharing profile JSON file.

This profile.json-file will be used by the Delta Sharing client to access the data on the Delta Sharing server/endpoint.

In [None]:
# Import helper functions for Delta Sharing
from lib.deltasharing import create_sharing_profile, get_table_urls

# Create the Delta Sharing profile JSON file
profile = create_sharing_profile(
    profile_name="dsb_maskinporten_profile",
    bearer_token=access_token,
    endpoint=delta_sharing_endpoint
)

In [None]:
# --------------------------------------------------
# Testing access
# Using the created profile we will connect to the Delta Sharing server and list all available tables
# --------------------------------------------------
from delta_sharing import SharingClient
client = SharingClient(profile)

# List all tables using the get_table_urls helper function
print("Available tables:")
table_urls = sorted(get_table_urls(profile, client))
print("\n".join(table_urls))


## Consuming data
The `delta_sharing` package provides two ways to consume/process data via Delta Sharing.

### Pandas
[Pandas](https://pandas.pydata.org/) is a powerful data analysis and manipulation library for Python.
<br>It works great for moderate amounts of data that fits in memory of a single machine that don't require parallel processing.
<br>The examples below will show some examples of how to consume data using Pandas.

### Spark
[Apache Spark](https://spark.apache.org/) is a engine for large-scale data processing.
<br>It requires a Spark cluster of one or more machines to run.
<br>Using Spark is a good approach when you have large datasets that does not fit in memory of a single machine or when you want to process data in parallel.
<br>Due to the requirements of setting up a cluster, we will not provide examples for Spark.

## Consuming data with Pandas

In [None]:
# Import dependencies
from delta_sharing import load_as_pandas

# Load a specific table as a Pandas DataFrame
pandas_df = load_as_pandas(url=table_urls[0])

# Show first 5 rows
print("\nFirst 5 rows of the DataFrame:")
print(pandas_df.head(n = 5))

# Show summary statistics of the DataFrame
print("\nSummary statistics of the DataFrame:")
print(pandas_df.describe())

In [None]:
# --------------------------------------------------
# Download all tables from Delta Sharing into local data folder
# This uses a helper function that takes inn all the table URLs and downloads them in parquet format into a specified folder
#
# The local parquet files can then be loaded into other tools like Power BI, Pandas or Spark for further processing and analysis.
# --------------------------------------------------
from lib.deltasharing import pandas_dump_tables

pandas_dump_tables(
    table_urls=table_urls,
    data_folder="./data",
    format="parquet",
)