<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

In [None]:
#| include: false

In [None]:
#| include: false
from nbdev.showdoc import *

## 0. BaseIO

There are common methods needed for `Downloaders` and `Submittors`. `BaseIO` implements this functionality and allows us to make abstract base classes. Namely, `BaseDownloader` and `BaseSubmitter` (implemented in `submission` section).

In [1]:
#| echo: false
#| output: asis
show_doc(BaseIO)

---

### BaseIO

>      BaseIO (directory_path:str)

Basic functionality for IO (downloading and uploading).

:param directory_path: Base folder for IO. Will be created if it does not exist.

## 1. BaseDownloader

`BaseDownloader` is an object which implements logic common to all downloaders.

To implement a new Downloader, you should inherit from `BaseDownloader` and be sure to implement at least methods for `.download_training_data` and `.download_inference_data`.

In [2]:
#| echo: false
#| output: asis
show_doc(BaseDownloader)

---

### BaseDownloader

>      BaseDownloader (directory_path:str)

Abstract base class for downloaders.

:param directory_path: Base folder to download files to.

## 2. Numerai Classic

In [3]:
#| echo: false
#| output: asis
show_doc(NumeraiClassicDownloader)

---

### NumeraiClassicDownloader

>      NumeraiClassicDownloader (directory_path:str, *args, **kwargs)

WARNING: Versions 1 and 2 (legacy data) are deprecated. Only supporting version 3+.

Downloading from NumerAPI for Numerai Classic data. 

:param directory_path: Base folder to download files to. 

All *args, **kwargs will be passed to NumerAPI initialization.

In [None]:
#| eval: false
test_dir_classic = "test_numclassic_general"
numer_classic_downloader = NumeraiClassicDownloader(test_dir_classic)

# Test building class
assert isinstance(numer_classic_downloader.dir, PosixPath)
assert numer_classic_downloader.dir.is_dir()

# Test is_empty
(numer_classic_downloader.dir / "test.txt").write_text("test")
rich_print(f"Directory contents:\n{numer_classic_downloader.get_all_files}")
assert not numer_classic_downloader.is_empty

# Downloading example data
numer_classic_downloader.download_example_data("test/", version=4, round_num=310)

# Features
feature_stats_test = numer_classic_downloader.get_classic_features()
assert isinstance(feature_stats_test, dict)
assert len(feature_stats_test["feature_sets"]["medium"]) == 472

# Remove contents
numer_classic_downloader.remove_base_directory()
assert not os.path.exists(test_dir_classic)

### 2.1. Example usage

This section will explain how to quickly get started with `NumeraiClassicDownloader`.

The more advanced use case of working with GCS (Google Cloud Storage) is discussed in `edu_nbs/google_cloud_storage.ipynb`.

#### 2.1.1. Training data

Training + validation data for Numerai Classic can be downloaded with effectively 2 lines of code.
Feature stats and overview can be downloaded with `.get_classic_features()`.

In [None]:
# Initialization
train_base_directory = "test_numclassic_train"
numer_classic_downloader = NumeraiClassicDownloader(train_base_directory)

# Uncomment line below to download training and validation data
# numer_classic_downloader.download_training_data("train_val", int8=False)

# Get feature overview (dict)
numer_classic_downloader.get_classic_features()

# Remove contents (To clean up environment)
numer_classic_downloader.remove_base_directory()

__For the training example the directory structure will be:__

In [None]:
#| echo: false
console = Console(record=True, width=100)

tree = Tree(
    f":file_folder: {train_base_directory} (base_directory)",
    guide_style="bold bright_black",
)
folder_tree = tree.add(":page_facing_up: features.json")
train_val_tree = tree.add(":file_folder: train_val")
train_val_tree.add(":page_facing_up: numerai_training_data.parquet")
train_val_tree.add(":page_facing_up: numerai_validation_data.parquet")

console.print(tree)

#### 2.1.2. Inference data

Inference data for the most recent round of Numerai Classic can be downloaded with effectively 2 lines of code.
It can also easily be deleted after you are done with inference by calling `.remove_base_directory`.

In [None]:
# Initialization
inference_base_dir = "test_numclassic_inference"
numer_classic_downloader = NumeraiClassicDownloader(directory_path=inference_base_dir)

# Download tournament (inference) data
numer_classic_downloader.download_inference_data("inference", version=4, int8=True)

# Remove folder when done with inference
numer_classic_downloader.remove_base_directory()

__For the inference example the directory structure will be:__

In [None]:
#| echo: false
console = Console(record=True, width=100)

tree = Tree(
    f":file_folder: {inference_base_dir} (base_directory)",
    guide_style="bold bright_black",
)
inference_tree = tree.add(":file_folder: inference")
inference_tree.add(":page_facing_up: numerai_tournament_data.parquet")

console.print(tree)

## 3. KaggleDownloader (Numerai Signals)

The Numerai community maintains some excellent datasets on Kaggle for Numerai Signals.

For example, [Katsu1110](https://www.kaggle.com/code1110) maintains a [dataset with yfinance price data](https://www.kaggle.com/code1110/yfinance-stock-price-data-for-numerai-signals) on Kaggle that is updated daily. `KaggleDownloader` allows you to easily pull data through the Kaggle API. We will be using this dataset in an example below.

In this case, `download_inference_data` and `download_training_data` have the same functionality as we can't make the distinction beforehand for an arbitrary dataset on Kaggle.

In [4]:
#| echo: false
#| output: asis
show_doc(KaggleDownloader)

---

### KaggleDownloader

>      KaggleDownloader (directory_path:str)

Download awesome financial data from Kaggle.

For authentication, make sure you have a directory called .kaggle in your home directory
with therein a kaggle.json file. kaggle.json should have the following structure: 

`{"username": USERNAME, "key": KAGGLE_API_KEY}` 

More info on authentication: github.com/Kaggle/kaggle-api#api-credentials 

More info on the Kaggle Python API: kaggle.com/donkeys/kaggle-python-api 

:param directory_path: Base folder to download files to.

The link to Katsu1110's yfinance price dataset is [https://www.kaggle.com/code1110/yfinance-stock-price-data-for-numerai-signals](https://www.kaggle.com/code1110/yfinance-stock-price-data-for-numerai-signals). In `.download_training_data` we define the slug after kaggle.com (`code1110/yfinance-stock-price-data-for-numerai-signals`) as an argument. The full Kaggle dataset is downloaded and unzipped.

In [None]:
#| eval: false
home_directory = "test_kaggle_downloader"
kd = KaggleDownloader(home_directory)
kd.download_training_data("code1110/yfinance-stock-price-data-for-numerai-signals")

This Kaggle dataset contains one file called `"full_data.parquet"`.

In [None]:
#| eval: false
list(kd.dir.iterdir())

In [None]:
#| eval: false
df = pd.read_parquet(f"{home_directory}/full_data.parquet")
df.head(2)

Folder can be cleaned up when done with inference.

In [None]:
#| eval: false
kd.remove_base_directory()

## 4. Pandas Datareader

[pandas-datareader](https://pydata.github.io/pandas-datareader/stable/readers/index.html) is a library maintained by pydata. It offers several backends to directly retrieve data, including [Yahoo! Finance](https://finance.yahoo.com/) and [FRED database](https://fred.stlouisfed.org/). Our `PandasDataReader` object simplifies pulling training, inference and live data for Numerai Signals pipelines.

In [5]:
#| echo: false
#| output: asis
show_doc(PandasDataReader)

---

### PandasDataReader

>      PandasDataReader (directory_path:str, tickers:list, backend:str='yahoo')

Download financial data using Pandas Datareader.

:param directory_path: Base folder to download files to. 

:param tickers: list of tickers used for downloading. 

:param backend: Data provider you want to use. Yahoo Finance by default. 

Check pydata.github.io/pandas-datareader/stable/readers/index.html to see all data readers.

In [None]:
pdr = PandasDataReader(directory_path="pandas_datareader_test", tickers=['AAPL', 'MSFT', 'NOTATICKER'])

`.download_training_data` downloads all data from given start date (`datetime` object).

`.download_inference_data` downloads data for a year.

`.download_live_data` downloads data for a month.

In [None]:
pdr.download_training_data(start=dt(year=2008, month=1, day=1))
pdr.download_inference_data()
pdr.download_live_data()

assert Path(f"pandas_datareader_test/yahoo_20080101_{dt.now().strftime('%Y%m%d')}.parquet").is_file()

`.get_live_data()` returns a `NumerFrame` directly with data for a month.

In [None]:
dataf = pdr.get_live_data()

In [None]:
print(dataf.shape)
dataf.head(2)

In [None]:
dataf.tail(2)

In [None]:
dataf[dataf['ticker']=="AAPL"].set_index("date")['Adj Close'].plot()
dataf[dataf['ticker']=="MSFT"].set_index("date")['Adj Close'].plot()
plt.legend(['AAPL', 'MSFT']);

In [None]:
pdr.remove_base_directory()

## 5. FinnhubDownloader

[Finnhub](https://finnhub.io) is a professional RESTFul stock API. Note that this a paid service. You will need to pass a Finnhub key (string) to use this downloader.

WARNING: Note that Finnhub has its own ticker format. You will need to make your own mapping from this format to for example Bloomberg tickers. See [documentation for ticker symbol conventions](https://finnhub.io/docs/api/stock-symbols).

In [6]:
#| echo: false
#| output: asis
show_doc(FinnhubDownloader)

---

### FinnhubDownloader

>      FinnhubDownloader (directory_path:str, key:str, tickers:list,
>                         frequency:str='D')

Download financial data from Finnhub.

:param directory_path: Base folder to download files to. 

:param key: Valid Finnhub client key. 

:param tickers: List of valid Finnhub tickers. 

:param frequency: Choose from [1, 5, 15, 30, 60, D, W, M]. 

Daily data by default.

In [None]:
#| eval: false
key = BaseDownloader._load_json("test_assets/keys.json")['finnhub_key'] # YOUR_FINNHUB_KEY_HERE
fhd = FinnhubDownloader(directory_path="finnhub_test", key=key, tickers=['AA', 'AAPL', 'MSFT', 'COIN', 'NOT_A_TICKER'])

In [None]:
#| eval: false
fhd.download_inference_data()
fhd.download_training_data()

If no starting date is passed in `download_training_data` this downloader will take the earliest date available. That is why the starting date in the filename is the 1st Unix timestamp (January 1st 1970).

In [None]:
#| eval: false
today = dt.now().strftime("%Y%m%d")
df = pd.read_parquet(f"finnhub_test/finnhub_19700101_{today}.parquet")
df.head(2)

Live data with a custom starting date can be retrieved as a `NumerFrame` directly with `get_live_data`.

In [None]:
#| eval: false
live_dataf = fhd.get_live_data(start=pd.Timestamp(year=2021, month=1, day=1))
live_dataf.head(2)

In [None]:
#| eval: false
fhd.remove_base_directory()

## 6. EODDownloader

[EOD Historical data](https://eodhistoricaldata.com/) is an affordable Financial data APIs that offers a large range of global stock tickers. Very convenient for Numerai Signals modeling. We will use a Python API build on top of EOD Historical data to download stock ticker data for training and inference.

In [7]:
#| echo: false
#| output: asis
show_doc(EODDownloader)

---

### EODDownloader

>      EODDownloader (directory_path:str, key:str, tickers:list,
>                     frequency:str='d')

Download data from EOD historical data. 

More info: https://eodhistoricaldata.com/

:param directory_path: Base folder to download files to. 

:param key: Valid EOD client key. 

:param tickers: List of valid EOD tickers (Bloomberg ticker format). 

:param frequency: Choose from [d, w, m]. 

Daily data by default.

In [None]:
#| eval: false
key = BaseDownloader._load_json("test_assets/keys.json")['eod_key'] # YOUR_EOD_KEY_HERE
eodd = EODDownloader(directory_path="eod_test", key=key, tickers=['AAPL.US', 'MSFT.US', 'COIN.US', 'NOT_A_TICKER'])

If no starting date is passed in `download_training_data` this downloader will take the earliest date available. That is why the starting date in the filename is the 1st Unix timestamp (January 1st 1970).

In [None]:
#| eval: false
eodd.download_inference_data()
eodd.download_training_data()

In [None]:
#| eval: false
today = dt.now().strftime("%Y%m%d")
df = pd.read_parquet(f"eod_test/eod_19700101_{today}.parquet")
df.head(2)

Live data with a custom starting date can be retrieved as a `NumerFrame` directly with `get_live_data`. The starting date can be either in `datetime`, `pd.Timestamp` or string format.

In [None]:
#| eval: false
live_dataf = fhd.get_live_data(start=pd.Timestamp(year=2021, month=1, day=1))
live_dataf.head(2)

In [None]:
#| eval: false
live_dataf[live_dataf['ticker']=="AAPL"].set_index("date")['close'].plot(figsize=(15, 6), title="AAPL from January 2021");

In [None]:
#| eval: false
eodd.remove_base_directory()

## 7. Custom Downloader

We invite the Numerai Community to implement new downloaders for this project using interesting APIs.

These are especially important for creating innovative Numerai Signals models.

A new Downloader can be created by inheriting from `BaseDownloader`. You should implement methods for `.download_inference_data` and `.download_training_data` so every downloader has a common interface. Below you will find a template for a new downloader.

In [8]:
#| echo: false
#| output: asis
show_doc(AwesomeCustomDownloader)

---

### AwesomeCustomDownloader

>      AwesomeCustomDownloader (directory_path:str)

TEMPLATE -
Download awesome financial data from who knows where.

:param directory_path: Base folder to download files to.

------------------------------------------------------------