# Example: Store and Access Data from Quilt Repository

* Data Repository: I want collaborators to be able import my data sets easily
* Notebook Server: I want collaborators to rerun my Jupyter Noteboooks on a notebook server without having to download and save code or data locally, on their machines.
* Preserve Data Types: I want to share validated data types with my collaborators (support Python Type Annotation)
* Data Science @Home, but requirements may evolve into feature for enterprise analytics platform
* Quilit is intuitive and fun ...

## Imports 

In [1]:
import quilt
import pandas as pd

# Deleting Remote and Local Repositories

In [2]:
# check if packages local repo
quilt.ls()

/Users/stewarta/Library/Application Support/QuiltCli/quilt_packages


In [5]:
quilt.rm("avare/homecredit") # local

Remove avare/homecredit? (y/n) y


In [4]:
quilt.delete("avare/homecredit") # remote

Are you sure you want to delete this package and its entire history? Type 'avare/homecredit' to confirm: avare/homecredit


HTTPResponseException: Package avare/homecredit does not exist

In [6]:
quilt.ls()

/Users/stewarta/Library/Application Support/QuiltCli/quilt_packages


# Create Home Credit Data 

## Create an empty Quilt package

Persisting data in Quilt starts with a user's home directory ("avare"). The home directory is create when you create a Quilt account. We create a package "homecredit" in the user's home directory. 

So far we have only created this package in our local repository

In [7]:
quilt.build("avare/homecredit") # create new, empty package
quilt.ls() # verify created

## Import package into Python

In [9]:
from quilt.data.avare import homecredit 
homecredit # verify

## Create Data Nodes

In [11]:
homecredit._set(['application_train'], pd.read_csv('/Users/stewarta/Documents/DATA/Home Data/application_train.csv'))
homecredit._set(['bureau_balance'], pd.read_csv('/Users/stewarta/Documents/DATA/Home Data/bureau_balance.csv'))
homecredit._set(['bureau'], pd.read_csv('/Users/stewarta/Documents/DATA/Home Data/bureau.csv'))
homecredit._set(['credit_card_balance'], pd.read_csv('/Users/stewarta/Documents/DATA/Home Data/credit_card_balance.csv'))
homecredit._set(['installments_payments'], pd.read_csv('/Users/stewarta/Documents/DATA/Home Data/installments_payments.csv'))
homecredit._set(['POS_CASH_balance'], pd.read_csv('/Users/stewarta/Documents/DATA/Home Data/POS_CASH_balance.csv'))
homecredit._set(['previous_application'], pd.read_csv('/Users/stewarta/Documents/DATA/Home Data/previous_application.csv'))
quilt.build('avare/homecredit', homecredit)
homecredit # verify

# Create Heat Temperture Data

In [41]:
quilt.build("avare/htsensor") # create new, empty package
quilt.ls() # verify created

/Users/stewarta/Library/Application Support/QuiltCli/quilt_packages
avare/homecredit               latest               1266d85da94de98efe04a61beb6b028db90ad703155b28b550fbfb79e9b8fb7e
avare/htsensor                 latest               3e659fce4c878d0ad00f6df85ca1aef2aa1bcef25137f030312f01f39d256a82


In [42]:
from quilt.data.avare import htsensor 
htsensor # verify

<GroupNode>
braunschweig
raw

In [43]:
htsensor._set(['raw'], 'data/raw.data')
htsensor._set(['braunschweig'], 'data/produkt_tu_stunde_19510101_20171231_00662.txt')
htsensor

<GroupNode>
braunschweig
raw

## Login to Quilt

In [44]:
quilt.login()

Launching a web browser...
If that didn't work, please visit the following URL: https://pkg.quiltdata.com/login

Enter the code from the webpage: eyJpZCI6ICJmZDhmNGZmZi0yNmE0LTQ5NzktYjcxZS1lZGYwNWQzMmM5ZDIiLCAiY29kZSI6ICJjYTA1NWJmNy1iN2YxLTQ1YjQtOWEzMS03ZGI3MmJkZDQ0YzUifQ==



## Push the new package to the registry

In [14]:
#Stores the package in the registry
quilt.push("avare/homecredit",is_public=True)

Fetching upload URLs from the registry...


  0%|          | 0.00/707M [00:00<?, ?B/s]

Uploading 7 fragments (707098199 bytes)...


100%|██████████| 707M/707M [00:00<00:00, 1.22GB/s] 


Fragment 40cbc65c2981e527b69cbc5fb4f257611af325edac5bd7c02398d698fe6705d2 already uploaded; skipping.
Fragment 6d3d0e19395f145c9e3c67e6a6dd93af825d7373cbd6cbb78b1e48751fa709f7 already uploaded; skipping.
Fragment 2f6e16cadbf4569a6ce78c6efc80b50547a7f85473bbb2a14c6ec27a1cbb9fc5 already uploaded; skipping.
Fragment 387757fda555fa074f213247ea5dad44fbc21c7f36d012a8267f5b3dc7f4770b already uploaded; skipping.
Fragment ab6b04fb562f2fc5b4f4ac7cbb6d7f9595593012c6054d3e829bb58d4eac309f already uploaded; skipping.
Fragment 96bd8bd61b5e0c092a0250446fef02bed339292a849cf4d657960f8360cffc20 already uploaded; skipping.
Fragment 7c37cfd9209cba651804e30c1075f58c61378cace8328a1af3d29b12a44dc8a9 already uploaded; skipping.
Uploading package metadata...
Updating the 'latest' tag...
Push complete. avare/homecredit is live:
https://quiltdata.com/package/avare/homecredit


# Accessing Data from a Quilt Repository

In [38]:
homecredit

<GroupNode>
POS_CASH_balance
application_train
bureau
bureau_balance
credit_card_balance
installments_payments
previous_application

In [39]:
htsensor

<GroupNode>
braunschweig
raw

In [15]:
# Use log() to get the hash for the latest version.
quilt.log("avare/homecredit")

Hash                                                              Pushed               Author  Tags        Versions
1266d85da94de98efe04a61beb6b028db90ad703155b28b550fbfb79e9b8fb7e  2019-04-26 22:45:21  avare   ['latest']  None    


In [2]:
#installs a package or sub-package. `force=True` ensures no interactive yes/no from shell if package already exists
quilt.install("avare/homecredit" , hash="1266d85da94de98efe04a61beb6b028db90ad703155b28b550fbfb79e9b8fb7e", force=True)

Downloading package metadata...
Fragments already downloaded


## Import Data

In [16]:
from quilt.data.avare import homecredit

##  Browse

In [18]:
homecredit

<GroupNode>
POS_CASH_balance
application_train
bureau
bureau_balance
credit_card_balance
installments_payments
previous_application

In [46]:
homecredit.application_train().head()

Unnamed: 0,SK_ID_CURR,TARGET,NAME_CONTRACT_TYPE,CODE_GENDER,FLAG_OWN_CAR,FLAG_OWN_REALTY,CNT_CHILDREN,AMT_INCOME_TOTAL,AMT_CREDIT,AMT_ANNUITY,...,FLAG_DOCUMENT_18,FLAG_DOCUMENT_19,FLAG_DOCUMENT_20,FLAG_DOCUMENT_21,AMT_REQ_CREDIT_BUREAU_HOUR,AMT_REQ_CREDIT_BUREAU_DAY,AMT_REQ_CREDIT_BUREAU_WEEK,AMT_REQ_CREDIT_BUREAU_MON,AMT_REQ_CREDIT_BUREAU_QRT,AMT_REQ_CREDIT_BUREAU_YEAR
0,100002,1,Cash loans,M,N,Y,0,202500.0,406597.5,24700.5,...,0,0,0,0,0.0,0.0,0.0,0.0,0.0,1.0
1,100003,0,Cash loans,F,N,N,0,270000.0,1293502.5,35698.5,...,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0
2,100004,0,Revolving loans,M,Y,Y,0,67500.0,135000.0,6750.0,...,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0
3,100006,0,Cash loans,F,N,Y,0,135000.0,312682.5,29686.5,...,0,0,0,0,,,,,,
4,100007,0,Cash loans,M,N,Y,0,121500.0,513000.0,21865.5,...,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0


## Some `DataNode`s aren't data frames
e.g. not columnar data. In this case you get a path to the object (or fragment) on Disk.

## Links:

There's more on editing packages [here, in the docs](https://docs.quiltdata.com/edit-a-package.html)