## GResearch Crypto Forecasting
### Introduction
In this competition you will forecast the prices of several cyrptoassets. Once you make that prediction, you can move on to the next batch and will get additional data.

This competition is different from most Kaggle Competitions in that:
* You can only submit from Kaggle Notebooks
* You must use our custom `gresearch_crypto` Python module.  The purpose of this module is to control the flow of information to ensure that you are not using future data to make predictions.  If you do not use this module properly, your code may fail.

### This starter notebook demonstrates how to use the `gresearch_crypto` module to get the test features and make predictions.
### TL;DR: End-to-End Usage Example
```
import gresearch_crypto
env = gresearch_crypto.make_env()

# Training data is in the competition dataset as usual
train_df = pd.read_csv('/kaggle/input/g-research-crypto-forecasting/train.csv', low_memory=False)
tgt_1_model.fit(train_df)
tgt_2_model.fit(train_df)
iter_test = env.iter_test()
for (test_df, sample_prediction_df) in iter_test:
    sample_prediction_df['Target'] = tgt_1_model.predict(test_df)
    env.predict(sample_prediction_df)
```
Note that `tgt_1_model.fit` and `tgt_2_model.fit` are examples of the functions you need to write for the above example to work.

### Introduction
First let's import the module and create an environment. Adding the directory holding the module to the pythonpath with `sys.path.append` isn't strictly necessary within Kaggle notebooks, which handles that behind the scenes, but will be necessary if you are testing your code offline.

In [1]:
import sys
!pwd
#sys.path.append(r'/mnt/e/ML/kaggle/gresearch_crypto')
import gresearch_crypto
import pandas as pd

/mnt/e/ML/kaggle


You can only call make_env() **once**, so don't lose it!

In [2]:
env = gresearch_crypto.make_env()

In [3]:
#import os
#for dirname, _, filenames in os.walk('./'):
#    for filename in filenames:
#        print(os.path.join(dirname, filename))

### Training data is in the competition dataset as usual

In [None]:
train_df = pd.read_csv('train.csv', low_memory=False, 
                       dtype={'Asset_ID': 'int8', 'Count': 'int32', 'row_id': 'int32', 'Count': 'int32', 
                              'Open': 'float64', 'High': 'float64', 'Low': 'float64', 'Close': 'float64', 
                              'Volume': 'float64', 'VWAP': 'float64'
                             }
                      )
train_df.head(3)

### `iter_test` function

This is a generator which loops through each timestamp in the test set. You have direct access to the example test rows for your convenience, but your code will only be able to get rows from the real test set via the API. Once you call `predict` you can continue on to the next batch.

Yields:
* While there are more batch(es) and `predict` was called successfully since the last yield, yields a tuple of:
    * `test_df`: DataFrame with the test features for the next batch, and user responses for the previous batch.
    * `sample_prediction_df`: DataFrame with an example prediction.  Intended to be filled in and passed back to the `predict` function.
* If `predict` has not been called successfully since the last yield, prints an error and yields `None`.

In [None]:
# You can only iterate through a result from `env.iter_test()` once
# so be careful not to lose it once you start iterating.
iter_test = env.iter_test()

Let's get the data for the first test batch and check it out.

In [None]:
(test_df, sample_prediction_df) = next(iter_test)
test_df.head(3)

Note the warning about the lack of optimization! The version of the API that will deliver the hidden test set is both more efficient and going to deliver substantially more data. It's highly recommended that you read to [the data page](https://www.kaggle.com/c/g-research-crypto-forecasting/data) timeseries section for a discussion of how to plan for the impact of the API on your notebook's runtime and memory use.

In [None]:
sample_prediction_df.head(3)

We'll get an error if we try to continue on to the next batch without making our predictions for the current batch.

In [None]:
next(iter_test)

### **`predict`** function
Stores your predictions for the current batch.  Expects the same format as `sample_prediction_df`.

Args:
* `predictions_df`: DataFrame which must have the same format as `sample_prediction_df`.

This function will raise an Exception if not called after a successful iteration of the `iter_test` generator.

Let's make a dummy prediction using the sample provided by `iter_test`.

In [None]:
env.predict(sample_prediction_df)

### Main Loop
Let's loop through all the remaining batches in the test set generator and make the default prediction for each.  The `iter_test` generator will simply stop returning values once you've reached the end.

When writing your own notebooks, be sure to write robust code that makes as few assumptions about the `iter_test`/`predict` loop as possible.  For example there may be large gaps between timestamps for one or more cryptoassets. In the unlikely event that a cryptoasset were dropped from enough exchanges it might go missing from the dataset entirely.

You may assume that the structure of `sample_prediction_df` will not change in this competition.

In [None]:
for (test_df, sample_prediction_df) in iter_test:
    sample_prediction_df['Target'] = 0
    env.predict(sample_prediction_df)

### Restart the Notebook to run your code again
In order to combat cheating you are only allowed to call `make_env` or iterate through `iter_test` once per Notebook run.  However, while you're iterating on your model it's reasonable to try something out, change the model a bit, and try it again.  Unfortunately, if you try to simply re-run the code, or even refresh the browser page, you'll still be running on the same Notebook execution session you had been running before, and the `gresearch_crypto` module will still throw errors.  To get around this you need to explicitly restart your Notebook execution session.