# Tutorial - Extract

## Table of contents
1. [What is an Extract?](#whatis)
2. [How to create an Extract?](#howto)
    1. [Local backend](#howto_local)
    2. [S3 backend](#howto_s3)

## 0. Resources to run below code

In [65]:
from grizly import QFrame
import sqlite3
import requests

HOME_DIR = os.path.expanduser("~")
GRIZLY_HOME = os.getenv("GRIZLY_HOME", os.path.join(HOME_DIR, "grizly"))
sqlite_dsn = "example.sqlite"

r = requests.get("https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite")
with open(sqlite_dsn, "w") as f:
    f.write(r.text)

tracks = {  'select': {
                'fields': {
                    'TrackId': { 'type': 'dim'},
                    'Name': {'type': 'dim'},
                    'AlbumId': {'type': 'dim'},
                    'Composer': {'type': 'dim'},
                    'UnitPrice': {'type': 'num'}
                },
                'table': 'Track'
            }
         }
tracks_qf = QFrame(dsn=sqlite_dsn, db="sqlite", dialect="mysql").from_dict(tracks)

In [66]:
print(tracks_qf)

SELECT "TrackId",
       "Name",
       "AlbumId",
       "Composer",
       "UnitPrice"
FROM Track


## 1. What is an Extract? <a name="whatis"></a>

Extract is a tool for migrating data between different databases.

## 2. How to create an Extract? <a name="howto"></a>

## 2.1 Local backend <a name="howto_local"></a>

### 2.1.1 Create QFrame store

In [None]:
from grizly import QFrame, get_path

# name of your logger
logger = logging.getLogger("distributed.worker").getChild("dss_extract")

qf_store_path = os.path.join(GRIZLY_HOME, "notebooks", "store.json")
qf = QFrame(dsn=dsn, logger=logger).from_json(qf_store_path, subquery="example_subquery")

### 2.1.2 Create Extract store*
* See [Extract store documentation](#extract_store) for more information

## 2.2 S3 backend <a name="howto_s3"></a>

### 2.2.1 Create a folder for assets in S3
We'll be storing all of the assets on S3, so go ahead and create a folder with your extract's name in `s3://your_bucket/extracts`

### 2.2.2 Prepare the driver
In this example we'll be using a regular QFrame:

In [None]:
from grizly import QFrame, config
import logging

# name of your logger
logger = logging.getLogger("distributed.worker").getChild("dss_extract")

# load the driver
dsn = "DenodoPROD"
bucket = config.get_service("s3").get("bucket")
qf_store_path = f"s3://{bucket}/extracts/example/example_qf_store.json"
# loading the QFrame from dict for the sake of
qf = QFrame(dsn=dsn, logger=logger).from_json(qf_store_path, subquery="example_subquery")

### 2.2.3 Prepare Extract store

In [None]:
dask_client_str = os.getenv("GRIZLY_DASK_SCHEDULER_ADDRESS")
e = Extract("Direct Sales Summary CSR", qf, store_backend="s3", if_exists="replace", reload_data=False)
e.submit(output_table_type="base", client_str=dask_client_str, refresh_partitions_list=False)

## Extract store in depth <a name="extract_store"></a>