automatically download and upload data for the numer.ai machine learning competition
Clone or download
atreichel Merge pull request #8 from numerai/master
Update of leaderboard function and example file
Latest commit 5de3d61 Aug 25, 2017
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci Add CircleCI config Aug 14, 2017
numerapi Fix pylint warnings Aug 14, 2017
.flake8 Add linter configs Aug 14, 2017
.gitignore rewrite NumerAPI Aug 14, 2017
.pylintrc Fix pylint warnings Aug 14, 2017
LICENSE Update LICENSE Dec 13, 2016
README.md fix folder name in readme Aug 14, 2017
example.py Fix flake8 warnings Aug 14, 2017
setup.py Fix pylint warnings Aug 14, 2017

README.md

Numerai Python API

Automatically download and upload data for the Numerai machine learning competition.

This library is a Python client to the Numerai API. The interface is programmed in Python and allows downloading the training data, uploading predictions, and accessing user and submission information. Some parts of the code were taken from numerflow by ChristianSch.
Visit his wiki, if you need further information on the reverse engineering process.

If you encounter a problem or have suggestions, feel free to open an issue.

Installation

  1. Obtain a copy of this API
  • If you do not plan on contributing to this repository, download a release.

    1. Navigate to releases.
    2. Download the latest version.
    3. Extract with unzip or tar as necessary.
  • If you do plan on contributing, clone this repository instead.

  1. cd into the API directory (defaults to numerapi, but make sure not to go into the sub-directory also named numerapi).
  2. pip install -e .

Usage

See example.py. You can run it as ./example.py

Documentation

Layout

Parameters and return values are given with Python types. Dictionary keys are given in quotes; other names to the left of colons are for reference convenience only. In particular, lists of dicts have names for the dicts; these names will not show up in the actual data, only the actual dict data itself.

login

Parameters

  • email (str, optional): email of user account
    • will prompt for this value if not supplied
  • password (str, optional): password of user account
    • will prompt for this value if not supplied
    • prompting is recommended for security reasons
  • prompt_for_mfa (bool, optional): indication of whether to prompt for MFA code
    • only necessary if MFA is enabled for user account

Return Values

  • user_credentials (dict): credentials for logged-in user
    • "username" (str)
    • "access_token" (str)
    • "refresh_token" (str)

download_current_dataset

Parameters

  • dest_path (str, optional, default: .): destination folder for the dataset
  • unzip (bool, optional, default: True): indication of whether the training data should be unzipped

Return Values

  • success (bool): indication of whether the current dataset was successfully downloaded

get_all_competitions

Return Values

  • all_competitions (list): information about all competitions
    • competition (dict)
      • "_id" (int)
        • "dataset_id" (str)
        • "start_date" (str (datetime))
        • "end_date" (str (datetime))
        • "paid" (bool)
        • "leaderboard" (list)
          • submission (dict)
            • "concordant" (dict)
              • "pending" (bool)
              • "value" (bool)
            • "earnings" (dict)
              • "career" (dict)
                • "nmr" (str)
                • "usd" (str)
              • "competition" (dict)
                • "nmr" (str)
                • "usd" (str)
            • "logloss" (dict)
              • "consistency" (int)
              • "validation" (float)
            • "original" (dict)
              • "pending" (bool)
              • "value" (bool)
            • "submission_id" (str)
            • "username" (str)

get_competition

Return Values

  • competition (dict): information about requested competition
    • _id (int)
      • "dataset_id" (str)
      • "start_date" (str (datetime))
      • "end_date" (str (datetime))
      • "paid" (bool)
      • "leaderboard" (list)
        • submission (dict)
          • "concordant" (dict)
            • "pending" (bool)
            • "value" (bool)
          • "earnings" (dict)
            • "career" (dict)
              • "nmr" (str)
              • "usd" (str)
            • "competition" (dict)
              • "nmr" (str)
              • "usd" (str)
          • "logloss" (dict) "consistency": (int) "validation": (float)
          • "original" (dict)
            • "pending" (bool)
            • "value" (bool)
          • "submission_id" (str)
          • "username" (str)

get_earnings_per_round

Parameters

  • username: user for which earnings are requested

Return Values

  • round_ids (np.ndarray(int)): IDs of each round for which there are earnings
  • earnings (np.ndarray(float)): earnings for each round

get_scores_for_user

Parameters

  • username: user for which scores are being requested

Return Values

  • validation_scores (np.ndarray(float)): logloss validation scores
  • consistency_scoress (np.ndarray(float)): logloss consistency scores
  • round_ids (np.ndarray(int): IDs of the rounds for which there are scores

get_user

Parameters

  • username: str - name of requested user

Return Values

  • user (dict): information about the requested user
    • "_id" (str)
    • "username" (str)
    • "assignedEthAddress" (str)
    • "created" (str (datetime))
    • "earnings" (float)
    • "followers" (int)
    • "rewards" (list)
      • reward (dict)
        • "_id" (int)
        • "amount" (float)
        • "earned" (float)
        • "nmr_earned" (str)
        • "start_date" (str (datetime))
        • "end_date" (str (datetime))
    • "submissions" (dict)
      • "results" (list)
        • result (dict)
          • "_id" (str)
          • "competition" (dict)
            • "_id" (str)
            • "start_date" (str (datetime))
            • "end_date" (str (datetime))
          • "competition_id" (int)
          • "created" (str (datetime))
          • "id" (str)
          • "username" (str)

get_submission_for_round

Parameters

  • username (str): user for which submission is requested
  • round_id (int, optional): round for which submission is requested
    • if no round_id is supplied, the submission for the current round will be retrieved

Return Values

  • username (str): user for which submission is requested
  • submission_id (str): ID of submission for which data was found
  • logloss_val (float): amount of logloss for given submission
  • logloss_consistency (float): consistency of given submission
  • career_usd (float): amount of USD earned by given user
  • career_nmr (float): amount of NMR earned by given user
  • concordant (bool OR dict (see note)): whether given submission is concordant
    • for rounds before 64, this was only a boolean, but from 64 on, it is a dict which indicates whether concordance is still being computed
  • original (bool OR dict (see note)): whether given submission is original
    • for rounds before 64, this was only a boolean, but from 64 on, it is a dict which indicates whether originality is still being computed

upload_predictions

Parameters

  • file_path (str): path to CSV of predictions
    • should already contain the file name (e.g. "path/to/file/prediction.csv")

Return Values

  • success: indicator of whether the upload succeeded

Notes

  • Uploading a prediction shortly before a new dataset is released may result in a 400 Bad Request. If this happens, wait for the new dataset and attempt to upload again.
  • Uploading too many predictions in a certain amount of time will result in a 429 Too Many Requests.