# Putting Data into FinSpace
This notebook will demonstrate how to use the FinSpace public APIs to create a dataset of SP500 constituents and create a view of the data for use in FinSpace notebooks.

## What you need to use this notebook
- API Credentials
- Permission group that will have access to this dataset

## Outline
- Get API Credentials from FinSpace
- load the csv data into a Pandas dataframe
- create dataset, add data as a changeset
- create view from changeset

In [None]:
region_name  = 'us-east-1'

In [None]:
# necessary imports
import pandas as pd
from datetime import datetime

In [None]:
### ----------------------------------------------------------------
### Get Credentials from the "API Credentials" in FinSpace 
### ----------------------------------------------------------------
hab_access_key_id     = ''
hab_secret_access_key = ''
hab_session_token     = ''

### ----------------------------------------------------------------
### identify a permission group to use
### ----------------------------------------------------------------
basicPermissionGroupId = '' # Analyst

# Python Helper Class

In [None]:
%load ../Utilities/finspace.py

# Initialize the FinSpace Helper Class

In [None]:
hab_session = boto3.session.Session(
    region_name           = region_name,
    aws_access_key_id     = hab_access_key_id,
    aws_secret_access_key = hab_secret_access_key,
    aws_session_token     = hab_session_token
)

finspace = FinSpace(
    boto_session = hab_session,
    dev_overrides = {
        "region_name" : region_name
    }
)

## Permission Groups and Dataset Ownership

In [None]:
# Constants for the DataSet Creation

# permissions that will be given on the dataset
basicPermissions = [
    "ViewDatasetDetails" 
    ,"ReadDatasetData" 
    ,"AddDatasetData" 
    ,"CreateSnapshot" 
    ,"EditDatasetMetadata"
    ,"ManageDatasetPermissions"
    ,"DeleteDataset"
]

basicOwnerInfo = {
    "phoneNumber" : "12125551000",
    "email"       : "finspace_sampledata@amazon.com",
    "name"        : "Amazon Finspace"
}

# The Data
Read in the csv data into pandas dataframe.

In [None]:
sp500 = pd.read_csv("sp500.csv")
sp500

## Describe the Dataset
The the helper class we have a function that extracts habanero schema definition from the pandas dataframe

In [None]:
# extract schema from the dataframe
schema      =   {
    "columns": finspace.get_schema_from_pandas(sp500),
    "primaryKeyColumns": [ ]  
}

schema

## Create the Dataset

In [None]:
d = time.strftime('%Y-%m-%d %-I:%M %p %Z')  # name is unique to date and time created

name        = f"SP500 Constituents"
description = f"Demonstration of using Habanero external APIs to load data into habanero, executed {d}"

print("creating dataset")

dataset_id = finspace.create_dataset(
    name = name, 
    description = description, 
    permission_group_id = basicPermissionGroupId,
    dataset_permissions = basicPermissions,
    kind = "TABULAR",
    owner_info = basicOwnerInfo,
    schema = schema
)

print(f"dataset_id  = {dataset_id}")

## Add Changeset to Dataset
Since the CSV represents all the data, the change_type is REPLACE, the other option is APPEND.

In [None]:
# create the changeset
change_type = 'REPLACE'

changeset_id = finspace.ingest_pandas(data_frame = sp500, dataset_id = dataset_id, change_type = change_type, wait_for_completion=True)

print(f"Created changeset_id = {changeset_id}")

### Auto-Update Views
Auto updating views will update as new changesets arrive, there can only be one, so its always best to check if one exists before creating it.

In [None]:
# does auto-update view already exist?
existing_snapshots = finspace.list_views(dataset_id = dataset_id, max_results=100)

autoupdate_snapshot_id = None

for ss in existing_snapshots:
    if ss['autoUpdate'] == True: 
        autoupdate_snapshot_id = ss['id']
        
print(autoupdate_snapshot_id)

# Create an auto-update snapshot if one does not exist

if (autoupdate_snapshot_id is None):
    autoupdate_snapshot_id = finspace.create_auto_update_view(
        dataset_id = dataset_id, 
        destination_type = "GLUE_TABLE",
        partition_columns = [], 
        sort_columns = [], 
        wait_for_completion = True)
else:
    print(f"Exists: autoupdate_snapshot_id = {autoupdate_snapshot_id}")

## Static View
We will create a static view of this changeset, statisc views require a specific as-of time for creation

In [None]:
# Create as-of snapshot, use UTC time
as_of = datetime.datetime.utcnow()

asof_snapshot_id = finspace.create_as_of_view(
    dataset_id = dataset_id, 
    as_of_date = as_of,
    destination_type = "GLUE_TABLE",
    partition_columns = [], 
    sort_columns = [], 
    wait_for_completion = True)

In [None]:
# will see two datasets
pd.DataFrame.from_dict( finspace.describe_dataset_details(dataset_id) )

In [None]:
# Complete
print(f"""
dataset_id             = '{dataset_id}'
changeset_id           = '{changeset_id}'
asof_snapshot_id       = '{asof_snapshot_id}'
autoupdate_snapshot_id = '{autoupdate_snapshot_id}'""")

In [None]:
import datetime
print( f"Last Run: {datetime.datetime.now()}" )