## Python API for DataMart

This notebook showcases how to use the Python API for the DataMart system. For the augmentation, we use the taxi demand example from MIT-LL, available here: https://gitlab.datadrivendiscovery.org/MIT-LL/phase_2/data_augmentation_track_seed/da_seed_ny_taxi_demand_prediction

The Python API is available on GitHub (https://gitlab.com/ViDA-NYU/datamart/datamart/tree/master/lib_client). To install it, run `python setup.py install` in that directory. Alternatively, the API is also available through pip (https://pypi.org/project/datamart/): to install it, simply run `pip install datamart`.

In [1]:
from d3m.container import Dataset
import datamart, rest
from io import BytesIO
import os
import pandas as pd
from pprint import pprint

Initially, we have the taxi demand data.

In [2]:
taxi_demand_uri = 'file://' + os.path.abspath('data/ny_taxi_demand_prediction.csv')
taxi_demand = Dataset.load(taxi_demand_uri)
taxi_demand['learningData'].head()

Unnamed: 0,d3mIndex,tpep_pickup_datetime,num_pickups
0,0,2018-01-01 00:00:00,67
1,1,2018-01-01 01:00:00,8
2,2,2018-01-01 02:00:00,0
3,3,2018-01-01 03:00:00,0
4,4,2018-01-01 04:00:00,7


### Searching for Datasets

Let's use DataMart to search for a weather datasets that can be used to augment the taxi demand one.

In [3]:
dm = rest.RESTDatamart('http://localhost:8002')

In [4]:
cursor = dm.search_with_data(
    query=datamart.DatamartQuery(
        keywords=['weather'],
        variables=[],
    ),
    supplied_data=taxi_demand,
)

In [None]:
query_results = cursor.get_next_page()
while query_results is not None:
    for result in query_results:
        print('--------')
        print(result)
        print(result.get_metadata())
        print(result.get_augment_hint())
    print('========')
    query_results = cursor.get_next_page()

The first dataset had score 1.0 for join, between columns `tpep_pickup_datetime` (from the taxi demand dataset) and `DATE` (from the query result dataset).

Alternatively, we can also send the path of the dataset, instead of the data itself...

In [None]:
cursor = dm.search_with_data(
    query=datamart.DatamartQuery(
        keywords=['weather'],
        variables=[],
    ),
    supplied_data=taxi_demand_uri,
)

... and we get the same results:

In [None]:
query_results = cursor.get_next_page()
while query_results is not None:
    for result in query_results:
        print('--------')
        print(result)
        print(result.get_metadata())
        print(result.get_augment_hint())
    print('========')
    query_results = cursor.get_next_page()

### Downloading a Dataset

Now let's materialize the weather dataset, in case the user wants to take a look at the data before augmenting it (or so that the user can augment the data him/herself).

In [None]:
cursor = dm.search_with_data(
    query=datamart.DatamartQuery(
        keywords=['weather'],
        variables=[],
    ),
    supplied_data=taxi_demand_uri,
)
result = cursor.get_next_page()[0]

You can inspect the metadata even before downloading:

In [None]:
result.get_metadata().pretty_print()

And download the dataset as a D3M object, suitable for primitives:

In [None]:
weather_data = result.download(taxi_demand_uri)

weather_data['learningData'].head()

In [None]:
weather_data.metadata.pretty_print()

### Augmenting a Dataset

Let's try to do our augmentation for the first query result.

In [None]:
augmented_data = result.augment(taxi_demand_uri)

augmented_data['learningData'].head()