Skip to content

Commit

Permalink
Merge pull request #8 from numerai/master
Browse files Browse the repository at this point in the history
Update of leaderboard function and example file
  • Loading branch information
atreichel committed Aug 25, 2017
2 parents 24668fc + 5d10012 commit 5de3d61
Show file tree
Hide file tree
Showing 8 changed files with 692 additions and 244 deletions.
22 changes: 22 additions & 0 deletions .circleci/config.yml
@@ -0,0 +1,22 @@
version: 2
jobs:
build:
working_directory: ~/repo
docker:
- image: circleci/python:3.5.3
- image: circleci/mysql:5.6
- image: circleci/mongo:3.0.14
steps:
- checkout
- run:
name: Install Python Dependencies
command: |
python3 -m venv venv
. venv/bin/activate
cd ~/repo && pip install -e . | tee
- run:
name: Lint Python Packages
command: |
. venv/bin/activate
flake8 --config=./.flake8 ./
find . -iname "*.py" ! -name "setup.py" ! -name "__init__.py" ! -path "./venv/*" | xargs pylint --rcfile=./.pylintrc
3 changes: 3 additions & 0 deletions .flake8
@@ -0,0 +1,3 @@
[flake8]
ignore = E501, E701
exclude = setup.py, __init__.py, venv
4 changes: 4 additions & 0 deletions .gitignore
Expand Up @@ -96,4 +96,8 @@ ENV/
# mkdocs documentation
/site

# vim
.session.vim
.swp

numerai_datasets/
12 changes: 12 additions & 0 deletions .pylintrc
@@ -0,0 +1,12 @@
[MESSAGES CONTROL]
max-line-length=120
max-args=7
max-attributes=8
max-locals=17
disable=W0141,superfluous-parens,multiple-statements,C0111,C0103,E1101,logging-format-interpolation

[TYPECHECK]
ignored-modules = numpy, numpy.random, tensorflow

[MISCELLANEOUS]
notes=XXX
253 changes: 192 additions & 61 deletions README.md
@@ -1,76 +1,207 @@
# Numerai Python API
Automatically download and upload data for the Numerai machine learning competition
Automatically download and upload data for the Numerai machine learning
competition.

This library is a client to the Numerai API. The interface is programmed in Python and allows downloading the training data, uploading predictions, and accessing some user information. Some parts of the code were taken from [numerflow](https://github.com/ChristianSch/numerflow) by ChristianSch. Visit his [wiki](https://github.com/ChristianSch/numerflow/wiki/API-Reverse-Engineering), if you need further information on the reverse engineering process.
This library is a Python client to the Numerai API. The interface is programmed
in Python and allows downloading the training data, uploading predictions, and
accessing user and submission information. Some parts of the code were taken
from [numerflow](https://github.com/ChristianSch/numerflow) by ChristianSch.
Visit his
[wiki](https://github.com/ChristianSch/numerflow/wiki/API-Reverse-Engineering),
if you need further information on the reverse engineering process.

If you encounter a problem or have suggestions, feel free to open an issue.

# Installation
This library supports both Python 2 and 3. Clone this repository, then `cd` into this repository's directory. Then `pip install -e .`
1. Obtain a copy of this API
* If you do not plan on contributing to this repository, download a release.
1. Navigate to [releases](https://github.com/numerai/NumerAPI/releases).
2. Download the latest version.
3. Extract with `unzip` or `tar` as necessary.

* If you do plan on contributing, clone this repository instead.

2. `cd` into the API directory (defaults to `numerapi`, but make sure not to go
into the sub-directory also named `numerapi`).
3. `pip install -e .`

# Usage
See the example.py. You can run it as `./example.py`
See `example.py`. You can run it as `./example.py`

# Documentation
### `download_current_dataset`
#### Parameters
* `dest_path`: Optional parameter. Destination folder for the dataset. Default: currrent working directory.
* `unzip`: Optional parameter. Decide if you wish to unzip the downloaded training data. Default: True

#### Return values
* `status_code`: Status code of the requests operation.

### `upload_prediction`
#### Parameters
* `file_path`: Path to the prediction. It should already contain the file name ('path/to/file/prediction.csv')

#### Return values
* `status_code`: Status code of the requests operation.

#### Notes
* Uploading a prediction shortly before a new dataset is released may result in a <400 Bad Request>. If this happens, just wait for the new dataset and upload new predictions then.
* Uploading too many predictions in a certain amount of time will result in a <429 Too Many Requests>.
* Uploading predictions to an account that has 2FA (Two Factor Authentication) enabled is not currently supported

### `get_user`
## Layout
Parameters and return values are given with Python types. Dictionary keys are
given in quotes; other names to the left of colons are for reference
convenience only. In particular, `list`s of `dict`s have names for the `dict`s;
these names will not show up in the actual data, only the actual `dict` data
itself.

## `login`
### Parameters
* `username`: Name of the user you want to request.

### Return values
* `array-like`: Tuple of size nine containing the `username`, `submission_id`, `validation_logloss`, `validation_consistency`, `originality`, `concordance`, `career_usd`, `career_nmr` and the status code of the requests operation. If it fails all values except the status code will be `None`.

### `get_scores`
* `email` (`str`, optional): email of user account
* will prompt for this value if not supplied
* `password` (`str`, optional): password of user account
* will prompt for this value if not supplied
* prompting is recommended for security reasons
* `prompt_for_mfa` (`bool`, optional): indication of whether to prompt for MFA
code
* only necessary if MFA is enabled for user account
### Return Values
* `user_credentials` (`dict`): credentials for logged-in user
* `"username"` (`str`)
* `"access_token"` (`str`)
* `"refresh_token"` (`str`)

## `download_current_dataset`
### Parameters
* `username`: Name of the user you want to request.

### Return values
* `array-like`: Tuple of size 2 containing a `numpy.ndarray` containing the scores of all uploaded predictions with the newest first and the status code of the requests operation. If it fails all values except the status code will be `None`.

### `get_earnings_per_round`
* `dest_path` (`str`, optional, default: `.`): destination folder for the
dataset
* `unzip` (`bool`, optional, default: `True`): indication of whether the
training data should be unzipped
### Return Values
* `success` (`bool`): indication of whether the current dataset was
successfully downloaded

## `get_all_competitions`
### Return Values
* `all_competitions` (`list`): information about all competitions
* `competition` (`dict`)
* `"_id"` (`int`)
* `"dataset_id"` (`str`)
* `"start_date"` (`str (datetime)`)
* `"end_date"` (`str (datetime)`)
* `"paid"` (`bool`)
* `"leaderboard`" (`list`)
* `submission` (`dict`)
* `"concordant"` (`dict`)
* `"pending"` (`bool`)
* `"value"` (`bool`)
* `"earnings"` (`dict`)
* `"career"` (`dict`)
* `"nmr"` (`str`)
* `"usd"` (`str`)
* `"competition"` (`dict`)
* `"nmr"` (`str`)
* `"usd"` (`str`)
* `"logloss"` (`dict`)
* `"consistency"` (`int`)
* `"validation"` (`float`)
* `"original"` (`dict`)
* `"pending"` (`bool`)
* `"value"` (`bool`)
* `"submission_id"` (`str`)
* `"username"` (`str`)

## `get_competition`
### Return Values
* `competition` (`dict`): information about requested competition
* `_id` (`int`)
* `"dataset_id"` (`str`)
* `"start_date"` (`str (datetime)`)
* `"end_date"` (`str (datetime)`)
* `"paid"` (`bool`)
* `"leaderboard"` (`list`)
* `submission` (`dict`)
* `"concordant"` (`dict`)
* `"pending"` (`bool`)
* `"value"` (`bool`)
* `"earnings"` (`dict`)
* `"career"` (`dict`)
* `"nmr"` (`str`)
* `"usd"` (`str`)
* `"competition"` (`dict`)
* `"nmr"` (`str`)
* `"usd"` (`str`)
* `"logloss"` (`dict`)
`"consistency"`: (int`)
`"validation"`: (float`)
* `"original"` (`dict`)
* `"pending"` (`bool`)
* `"value"` (`bool`)
* `"submission_id"` (`str`)
* `"username"` (`str`)

## `get_earnings_per_round`
### Parameters
* `username`: Name of the user you want to request.
* `username`: user for which earnings are requested
### Return Values
* `round_ids` (`np.ndarray(int)`): IDs of each round for which there are
earnings
* `earnings` (`np.ndarray(float)`): earnings for each round

### Return values
* `array-like`: Tuple of size 2 containing a `numpy.ndarray` containing the earnings of each round with the oldest first and the status code of the requests operation. If it fails all values except the status code will be `None`.

### `login`
#### Return values
* `array-like`: Tuple of size four containing the `accessToken`, `refreshToken`, `id`, and the status code of the requests operation. If it fails all values except the status code will be `None`.

### `authorize`
#### Parameters
* `file_path`: Path to the prediction. It should already contain the file name ('path/to/file/prediction.csv')

#### Return values
* `array-like`: Tuple of size four containing the `filename`, `signedRequest`, `headers`, and the status code of the requests operation. If it fails all values except the status code will be `None`.

### `get_current_competition`
#### Return values
* `array-like`: Tuple of size three containing the `dataset_id`, `_id` and the status code of the requests operation. If it fails all values except the status code will be `None`.

### `get_new_leaderboard`
#### Return Values
* `list`: A list of every user that has submitted in this round of the competition, including statistics like how much USD and NMR were earned by that user in that round.
## `get_scores_for_user`
### Parameters
* `username`: user for which scores are being requested
### Return Values
* `validation_scores` (`np.ndarray(float)`): logloss validation scores
* `consistency_scoress` (`np.ndarray(float)`): logloss consistency scores
* `round_ids` (`np.ndarray(int`): IDs of the rounds for which there are scores

#### Notes
* Each round of the competition is numbered. The first competition is 1. Specify a round of the competition to get leaderboard information for that round, or leave off the round of the competition to get the current round of the competition.
## `get_user`
### Parameters
* `username`: `str` - name of requested user
### Return Values
* `user` (`dict`): information about the requested user
* `"_id"` (`str`)
* `"username"` (`str`)
* `"assignedEthAddress"` (`str`)
* `"created"` (`str (datetime)`)
* `"earnings"` (`float`)
* `"followers"` (`int`)
* `"rewards"` (`list`)
* `reward` (`dict`)
* `"_id"` (`int`)
* `"amount"` (`float`)
* `"earned"` (`float`)
* `"nmr_earned"` (`str`)
* `"start_date"` (`str (datetime)`)
* `"end_date"` (`str (datetime)`)
* `"submissions"` (`dict`)
* `"results"` (`list`)
* `result` (`dict`)
* `"_id"` (`str`)
* `"competition"` (`dict`)
* `"_id"` (`str`)
* `"start_date"` (`str (datetime)`)
* `"end_date"` (`str (datetime)`)
* `"competition_id"` (`int`)
* `"created"` (`str (datetime)`)
* `"id"` (`str`)
* `"username"` (`str`)

## `get_submission_for_round`
### Parameters
* `username` (`str`): user for which submission is requested
* `round_id` (`int`, optional): round for which submission is requested
* if no `round_id` is supplied, the submission for the current round will be
retrieved
### Return Values
* `username` (`str`): user for which submission is requested
* `submission_id` (`str`): ID of submission for which data was found
* `logloss_val` (`float`): amount of logloss for given submission
* `logloss_consistency` (`float`): consistency of given submission
* `career_usd` (`float`): amount of USD earned by given user
* `career_nmr` (`float`): amount of NMR earned by given user
* `concordant` (`bool` OR `dict` (see note)): whether given submission is
concordant
* for rounds before 64, this was only a boolean, but from 64 on, it is a dict
which indicates whether concordance is still being computed
* `original` (`bool` OR `dict` (see note)): whether given submission is
original
* for rounds before 64, this was only a boolean, but from 64 on, it is a dict
which indicates whether originality is still being computed

## `upload_predictions`
### Parameters
* `file_path` (`str`): path to CSV of predictions
* should already contain the file name (e.g. `"path/to/file/prediction.csv"`)

### Return Values
* `success`: indicator of whether the upload succeeded

### Notes
* Uploading a prediction shortly before a new dataset is released may result in
a `400 Bad Request`. If this happens, wait for the new dataset and attempt to
upload again.
* Uploading too many predictions in a certain amount of time will result in a
`429 Too Many Requests`.

0 comments on commit 5de3d61

Please sign in to comment.