# Exporting notebooks from ADB

This notebook does two things:
1. It recursively exports a folder recursively as a dbc archive.
1. It recursively exports all notebooks in a folder as jupyter notebooks.

We start with the setup

In [1]:
path = 'notebooks' # the path to folder in your ADB workspace

region = 'westus'
username = 'wopauli@microsoft.com' 

We configure the personal access token we configured in ADB. We are reading it in here to reduce the odds of accidentally exposing it.

In [2]:
with open('token.txt', 'r') as f:
    token = f.read().strip()
    
headers = {
    'Authorization': 'Bearer %s' % token
}

Next we download the entire DBC archive. THis serves multiple purposes:
1. We have it exported.
1. We will list its contents so that we can export jupyter notebooks one by one.

In [3]:
import requests

url = 'https://%s.azuredatabricks.net/api/2.0/workspace/export?path=/Users/%s/%s&direct_download=true&format=DBC' % (region, username, path)

print("Starting export of DBC archive. This might take a while, depending on your connection.")
r = requests.get(url=url, headers=headers)
print("Done.")

if r.ok:
    print("Writing to file.")
    with open(path + '.dbc', 'wb') as f:
        f.write(r.content)
else:
    print("Downloading notebook archive failed")

Starting export of DBC archive. This might take a while, depending on your connection.
Done.
Writing to file.


We list the notebooks contained in the archive.

In [4]:
import zipfile

path_to_zip_file = './notebooks.dbc'
zip_ref = zipfile.ZipFile(path_to_zip_file, 'r')

files = zip_ref.namelist()

notebooks = [x for x in files if x.endswith('.python')]

We iterate through the notebooks, and export one by one as a jupyter notebook.

In [5]:
import os

for notebook in notebooks:
    notebook = os.path.splitext(notebook)[0]
    print("Working on: %s" % notebook)
    url = 'https://%s.azuredatabricks.net/api/2.0/workspace/export?path=/Users/%s/%s&direct_download=true&format=JUPYTER' % (region, username, notebook)

    r = requests.get(url=url, headers=headers)
    if r.ok:
        notebook_path, ipynb_notebook = os.path.split(notebook + ".ipynb")
        
        if not os.path.exists(notebook_path):
            os.makedirs(notebook_path)
            
        with open(os.path.join(notebook_path, ipynb_notebook), 'wb') as f:
            f.write(r.content)
    else:
        print("Failed: %s" % notebook)

Working on: notebooks/tests/run_notebooks
Working on: notebooks/day_1/04_hyperparameter_tuning
Working on: notebooks/day_1/03_sentiment_analysis
Working on: notebooks/day_1/05_structured_streaming
Working on: notebooks/day_1/02_feature_engineering
Working on: notebooks/day_1/01_introduction
Working on: notebooks/includes/mnt_blob
Working on: notebooks/includes/mnt_blob_rw
Working on: notebooks/day_2/06_deployment
Working on: notebooks/day_2/05_automated_ML
Working on: notebooks/day_2/03_aml_getting_started
Working on: notebooks/day_2/02_random_forests
Working on: notebooks/day_2/01_logistic_regression
Working on: notebooks/day_2/04_ml_experimentation


In [6]:
notebook_path

'notebooks/includes'

Consider using the following command to clear the output of all notebooks. 

*Note:* this may require `git bash` or `bash`, and may not work in vania

jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace Notebook.ipynb