In [1]:
# hide
%load_ext autoreload
%autoreload 2
%load_ext nb_black
%load_ext lab_black

<IPython.core.display.Javascript object>

In [2]:
# hide
from nbdev.showdoc import *

<IPython.core.display.Javascript object>

# Google Cloud Storage (GCS)

All `Downloaders` and `Submittors` support Google Cloud Storage (GCS).

__Credentials are detected automatically in the following way:__
1. The environment variable `GOOGLE_APPLICATION_CREDENTIALS` is set and points to a valid `.json` file.

2. (Fallback 1) You have a valid Cloud SDK installation.

3. (Fallback 2) The machine running the code is a GCP machine.

In [3]:
from nbdev import show_doc
from numerblox.download import NumeraiClassicDownloader, BaseIO

<IPython.core.display.Javascript object>

## Example usage

In order to use GCS you should:
1. Instantiate a `Downloader` or `Submittor`.

2a. For single files, call `.upload_file_to_gcs` or `.download_file_from_gcs`.

2b. For directories, call `.upload_directory_to_gcs` or `.download_directory_from_gcs`.

#### 1a. Downloading Numerai Classic inference data and uploading to GCS

In [4]:
# slow
# This should point to a valid GCS bucket within your Google Cloud environment.
bucket_name = "test"

# Get inference data for current round
downloader = NumeraiClassicDownloader("round_n")
downloader.download_inference_data("inference", version=2, int8=False)

2022-02-18 13:48:16,714 INFO numerapi.utils: starting download
round_n/inference/numerai_tournament_data.parquet: 582MB [02:18, 4.20MB/s]                             


<IPython.core.display.Javascript object>

All the data that has been downloaded can be uploaded to a GCS bucket with 1 line of code.

In [5]:
# Upload inference data for most recent round to GCS
# downloader.upload_directory_to_gcs(bucket_name=bucket_name, gcs_path="round_n")

<IPython.core.display.Javascript object>

#### 2b. Downloading inference data from GCS Bucket

Conversely, A directory stored in a GCS bucket can be downloaded to your local directory. It will be stored in the base directory specified when you instantiated `nmr_downloader`.

In [6]:
# Download data from bucket to local directory
# downloader.download_directory_from_gcs(bucket_name=bucket_name, gcs_path="round_n")

<IPython.core.display.Javascript object>

Hope you enjoyed this short example of how to work with GCS buckets for Downloaders and Submittors in this framework. The object handling all this logic under the hood is `BaseIO`.

In [7]:
#hide_input
show_doc(BaseIO)

<h2 id="BaseIO" class="doc_header"><code>class</code> <code>BaseIO</code><a href="https://github.com/crowdcent/numerai_blocks/tree/main/numerai_blocks/download.py#L22" class="source_link" style="float:right">[source]</a></h2>

> <code>BaseIO</code>(**`directory_path`**:`str`) :: `ABC`

Basic functionality for IO (downloading and uploading).
:param directory_path: Base folder for IO. Will be created if it does not exist.

<IPython.core.display.Javascript object>

Your local environment can be cleaned up with 1 line of code. Convenient if you are done with inference and would like to delete downloaded inference data automatically.

In [8]:
# Clean up environment
downloader.remove_base_directory()

<IPython.core.display.Javascript object>

------------------------------------

In [9]:
# hide
# Run this cell to sync all changes with library
from nbdev.export import notebook2script

notebook2script()

Converted 00_misc.ipynb.
Converted 01_download.ipynb.
Converted 02_numerframe.ipynb.
Converted 03_preprocessing.ipynb.
Converted 04_model.ipynb.
Converted 05_postprocessing.ipynb.
Converted 06_modelpipeline.ipynb.
Converted 07_evaluation.ipynb.
Converted 08_key.ipynb.
Converted 09_submission.ipynb.
Converted 10_staking.ipynb.
Converted index.ipynb.


<IPython.core.display.Javascript object>