# Accessing Dataflows in Ascend from Python

This notebook demonstrates how to use the 
[Ascend SDK for Python](https://github.com/ascend-io/ascend-python-sdk) to access Dataflows, their components, and the data they produce.

## Configuration

> If you are not already an Ascend user you can [Request a Free Trial](https://www.ascend.io/get-started/)

This notebook will connect to your Ascend environment via a Web API using a Service Account you create for in the Ascend UI.

If you are using the `trial.ascend.io` environment, you will not be able to create a Service Account for the 
Data Service **Getting Started with Ascend**, so change `DATA_SERVICE` to the `id` of your Data Service.
You may want to [import the `Tutorial` Dataflow](https://developer.ascend.io/docs/exporting-importing) in your Data Service first so that you can follow along with the examples here.

If you are not on `trial.ascend.io`, change `host` to the hostname of your Ascend environment,
and (optionally) modify the `profile` under which your credentials are stored.


In [1]:
host = "trial.ascend.io"
profile = "trial"
DATA_SERVICE = 'Getting_Started_with_Ascend'

## Credentials

To run this notebook, access Data Feeds from Python, you will need

  * a Service Account with `READ ONLY` permission
  * an Access Key ID and Secret for that Service Account

You can create these in the Ascend UI by going to **Data Service > Service Accounts**.
If you are using `trial.ascend.io`, create the Service Account in your own Data Service.
Otherwise, select the Data Service **Getting Started with Ascend** and create a Service Account there.

Access Keys should not be stored in a notebook. 
Instead, this notebook will look for them in `~/.ascend/credentials` on the machine where your Jupyter server is running.
Your `~/.ascend/credentials` file should look like this (substitute your Ascend Access Key ID and Secret Access Key):
```
[trial]
ascend_access_key_id=Y0URACC355K3Y1D
ascend_secret_access_key=yourSecret!AccessKeyisthelong1
```

Once you have a `credentials` file, you can read it with `configparser` and
create a `Client` to connect to the host using your credentials.

In [2]:
from ascend.client import Client
import configparser
import os


config = configparser.ConfigParser()
config.read(os.path.expanduser("~/.ascend/credentials"))

access_id = config.get(profile, "ascend_access_key_id")
secret_key = config.get(profile, "ascend_secret_access_key")

A = Client(host, access_id, secret_key)
A

<ascend.client.Client at 0x10eea0110>

## Connect to a Data Service

Use `get_data_service` to connect to a Data Service.

You need to provide the `id` of the Data Service you wish to connect to. You can find this `id` in the Ascend UI.

This will fail if the Service Account the Client was created with does not have access to the requested Data Service.

In [3]:
ds = A.get_data_service(DATA_SERVICE)
ds

<ascend.model.DataService Getting_Started_with_Ascend>

## List the Dataflows in a Data Service

Once you have a Data Service, you can use `list_dataflows` to list the Dataflows that have been defined within the Data Service.

In [4]:
list(ds.list_dataflows())


[<ascend.model.Dataflow Getting_Started_with_Ascend.IoT_Device_and_Weather_Analysis>,
 <ascend.model.Dataflow Getting_Started_with_Ascend.Financial_Data_from_APIs>,
 <ascend.model.Dataflow Getting_Started_with_Ascend.Tutorial>]

## Get a Dataflow from a Data Service

You can use `get_dataflow` on a Data Service to access a Dataflow within that Data Service.
You will need the `id` of the Dataflow, which you can either locate in the UI or copy from the list of dataflows above.

In [5]:
df = ds.get_dataflow('Tutorial')
df

<ascend.model.Dataflow Getting_Started_with_Ascend.Tutorial>

You can also access a Dataflow directly from the `Client`, by providing the `id` of the Data Service as well as the Dataflow.

In [6]:
A.get_dataflow(DATA_SERVICE, 'Tutorial')

<ascend.model.Dataflow Getting_Started_with_Ascend.Tutorial>

## List Components in a Dataflow

You can use `list_components` on a Dataflow to list the components which have been defined in the Dataflow.

In [7]:
df.list_components()

[<ascend.model.Component Getting_Started_with_Ascend.Tutorial.NYC_Daily_Cab_Rides_per_Vendor type=DataFeed>,
 <ascend.model.Component Getting_Started_with_Ascend.Tutorial.Green_Cab type=ReadConnector>,
 <ascend.model.Component Getting_Started_with_Ascend.Tutorial.Green_Cab_Transform type=Transform>,
 <ascend.model.Component Getting_Started_with_Ascend.Tutorial.Green_Cab_GCS type=WriteConnector>]

## Get a Component from a Dataflow

You can access any component in the Dataflow by using `get_component`.
You will need to provide the `id` of the component, which can be found in the UI or copied from the list of components above.

In [8]:
comp = df.get_component('Green_Cab_Transform')
comp

<ascend.model.Component Getting_Started_with_Ascend.Tutorial.Green_Cab_Transform type=Transform>

You can also get to a Component directly from the `Client`, 
by providing the `id`s of the Data Service and the Dataflow as well as the component.

In [9]:
A.get_component(DATA_SERVICE, 'Tutorial', 'Green_Cab_Transform')


<ascend.model.Component Getting_Started_with_Ascend.Tutorial.Green_Cab_Transform type=Transform>

## Read the Records from a Component

Once you have a Component, you can use `get_records` to read the records which the component produces.

The records can be iterated over as they come through the API,
or used to create a Pandas DataFrame.

In [10]:
for row in comp.get_records():
    print(row["pickup_date"], end=' ', flush=True)

2016-01-01 2016-01-02 2016-01-03 2016-01-04 2016-01-05 2016-01-06 2016-01-07 2016-01-08 2016-01-09 2016-01-10 2016-01-11 2016-01-12 2016-01-13 2016-01-14 2016-01-15 2016-01-16 2016-01-17 2016-01-18 2016-01-19 2016-01-20 2016-01-21 2016-01-22 2016-01-23 2016-01-24 2016-01-25 2016-01-26 2016-01-27 2016-01-28 2016-01-29 2016-01-30 2016-01-31 2016-01-01 2016-01-02 2016-01-03 2016-01-04 2016-01-05 2016-01-06 2016-01-07 2016-01-08 2016-01-09 2016-01-10 2016-01-11 2016-01-12 2016-01-13 2016-01-14 2016-01-15 2016-01-16 2016-01-17 2016-01-18 2016-01-19 2016-01-20 2016-01-21 2016-01-22 2016-01-23 2016-01-24 2016-01-25 2016-01-26 2016-01-27 2016-01-28 2016-01-29 2016-01-30 2016-01-31 

In [11]:
import pandas as pd

pd.DataFrame.from_records(comp.get_records())

Unnamed: 0,VendorID,pickup_date,ride_count
0,1,2016-01-01,14629
1,1,2016-01-02,10679
2,1,2016-01-03,9813
3,1,2016-01-04,9989
4,1,2016-01-05,9946
5,1,2016-01-06,9724
6,1,2016-01-07,9754
7,1,2016-01-08,11754
8,1,2016-01-09,12677
9,1,2016-01-10,11287
