# Moving data to Data.World
Three things:
* Create a .env file to hold environment variables (api tokens)
* Merge pull request before running this notebook 
* Successfully execute this notebook

If interested:
* The developer created the "maintainer-config.json" file during testing.
* Maintainer-config cuts down on the maintainer configuration tasks.
* Maintainer-config describes the basics of the github to data.world transfer.  
* Maintainer-config gets fouled, let a developer know and it will get replaced.
* CSV data is pulled from the 'master' branch of the repo
* The Developer has to submit a pull request to the Maintainer
* The Maintainer has to merge the pull request before running this notebook 


## Environment Variables
Create a .env file in the sciprts file 
### Maintainer Environment Variables
Use for production data updates.
* Your data.world read/write token works only if you are a member of data.world/citizenlabs with Manage privilages.

```
cd data.world/scripts/adopt-a-drain/maintainer
echo GH_OWNER=citizenlabsgr >> .env
echo DW_OWNER=citizenlabs >> .env
echo DW_AUTH_TOKEN=your-personal-data-world-read-write-token >> .env
```


### Developer Environment Variables
Use for script development and testing
* Use your GitHub account by changing GH_OWNER to your-github-user-name
* Use your Data.World account by changing DW_OWNER to your-data.world-owner-name

```
cd data.world/scripts/adopt-a-drain/maintainer
echo GH_OWNER=your-personal-github-user-name >> .env
echo DW_OWNER=your-personal-data-world-owner-id >> .env
echo DW_AUTH_TOKEN=your-personal-data-world-read-write-token >> .env
```


In [1]:
import os
from IPython.display import Markdown
import json
from pprint import pprint
# from lib.p3_ProcessLogger import ProcessLogger
from lib.p3_ProcessLogger import ProcessLogger

In [2]:
# intiate some objects
cell_log = ProcessLogger()

In [3]:

%env
ENV_ERROR = False
cell_log.clear()
cell_log.collect("<a id='notebook-config'></a>")
cell_log.collect("## Notebook Config")
# ------------ environment variable magic

# Install a pip packages in the current Jupyter kernel
# ------------ Python-dotenv
cell_log.collect("* python-dotenv")
import sys
!{sys.executable} -m pip install python-dotenv
# ------------ data.world API 
cell_log.collect("* datadotworld")
!{sys.executable} -m pip install datadotworld[pandas]
# ------------ py-github
# cell_log.collect("* ipywidgets")
# !{sys.executable} -m pip install ipywidgets 
# !{sys.executable} -m jupyter nbextension enable --py widgetsnbextension
if ENV_ERROR:
    cell_log.collect("# Script Failure!!")
    cell_log.collect("# !!! Missing Environment Variables !!!")
    cell_log.collect("### see [Environment Variable Setup](#env-setup)")
    
Markdown('''{}'''.format(cell_log.getMarkdown()))

[33mYou are using pip version 9.0.1, however version 18.0 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
[33mYou are using pip version 9.0.1, however version 18.0 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


<a id='notebook-config'></a>
## Notebook Config
* python-dotenv
* datadotworld

In [4]:
# import github
import settings
import helper
cell_log = ProcessLogger() 
from datadotworld.client import _swagger
from datadotworld.client.api import RestApiError
import datadotworld as dw
# open config file ... has last cleaned file name
with open('maintainer-config.json') as f:
    maintainer_data = json.load(f)

LOCAL_CLEAN_NAME = maintainer_data['gh_file_name']

# DW_USER = os.getenv("DW_USER")
DW_OWNER=os.getenv("DW_OWNER")
DW_TABLE_NAME=maintainer_data['dw_table_name']
DW_DB_URL = "https://api.data.world/v0/datasets/%s/".replace("%s", DW_OWNER)

# GH_USER=os.getenv("GH_USER")
GH_OWNER=os.getenv("GH_OWNER")
GH_FILE_NAME=maintainer_data['gh_file_name']
GH_URL_CLEAN = "https://raw.githubusercontent.com/{}/data.world/master/clean-data/".format(GH_OWNER)
#               https://raw.githubusercontent.com/citizenlabsgr/data.world/master/clean-data/adopt-a-drain/grb_drains.csv


dw_dataset_id = DW_OWNER + "/" + maintainer_data['dw_title'].lower().replace('_','-').replace(' ','-')
gh_csv_name = DW_TABLE_NAME
gh_csv_name_ext = gh_csv_name + '.' + maintainer_data['gh_file_type']
gh_csv_path_name = GH_URL_CLEAN + gh_csv_name_ext

'''
------------- configure source csv
'''

tbl = { "owner_id": DW_OWNER, 
        "app_name": helper.get_app_name(),
         "dw_title": maintainer_data['dw_title'], 
         "dw_desc": maintainer_data['dw_desc'],
         "dw_table": DW_TABLE_NAME,
         "dw_dataset_id": dw_dataset_id,
         "dw_url": DW_DB_URL + GH_FILE_NAME,
         "gh_url": GH_URL_CLEAN + "{}/{}".format( helper.get_app_name(), GH_FILE_NAME), 
         "visibility": "OPEN", 
         "license": "Public Domain",
         "files": {GH_FILE_NAME: {"url": GH_URL_CLEAN + "{}/{}".format( helper.get_app_name(), GH_FILE_NAME)}}
      }
      #       "local_raw": LOCAL_RAW_FOLDER + '/' + gh_csv_name_ext,
      #       "local_clean": LOCAL_CLEAN_FOLDER + '/' + gh_csv_name_ext,
      #     }
pprint(tbl)
# github.showState(cell_log)

Markdown('''{}'''.format(cell_log.getMarkdown()))

settings
{'app_name': 'adopt-a-drain',
 'dw_dataset_id': 'wilfongjt/grb-storm-drains',
 'dw_desc': 'Storm Drains of the Grand River Basin, Michigan',
 'dw_table': 'grb_drains',
 'dw_title': 'GRB Storm Drains',
 'dw_url': 'https://api.data.world/v0/datasets/wilfongjt/grb_drains.csv',
 'files': {'grb_drains.csv': {'url': 'https://raw.githubusercontent.com/citizenlabsgr/data.world/master/clean-data/adopt-a-drain/grb_drains.csv'}},
 'gh_url': 'https://raw.githubusercontent.com/citizenlabsgr/data.world/master/clean-data/adopt-a-drain/grb_drains.csv',
 'license': 'Public Domain',
 'owner_id': 'wilfongjt',
 'visibility': 'OPEN'}




In [5]:
import interface
# move data 
#   from github/citizenlabsgr/data.world
#   to data.world/citizenlabs/
cell_log.clear()

if ENV_ERROR:
    cell_log.collect("# Script Failure!!")
    cell_log.collect("# !!! Missing Environment Variables !!!")
    cell_log.collect("### see [Environment Variable Setup](#env-setup)")
else:
    # interface.git_hub(df_source, tbl, cell_log)
    interface.data_world(tbl, cell_log)
    cell_log.collect("# OK - Done") 
Markdown('''{}'''.format(cell_log.getMarkdown()))


# Data.World Process
* input: https://raw.githubusercontent.com/citizenlabsgr/data.world/master/clean-data/adopt-a-drain/grb_drains.csv
* drop: wilfongjt/grb-storm-drains
* delay: 6 seconds to delete data
* load: complete
* output: https://api.data.world/v0/datasets/wilfongjt/grb_drains.csv
# OK - Done