Merge pull request #8 from numerai/master

Update of leaderboard function and example file
atreichel · Aug 25, 2017 · 5de3d61 · 5de3d61
2 parents 24668fc + 5d10012
commit 5de3d61
Show file tree

Hide file tree

Showing 8 changed files with 692 additions and 244 deletions.
diff --git a/.circleci/config.yml b/.circleci/config.yml
@@ -0,0 +1,22 @@
+version: 2
+jobs:
+  build:
+    working_directory: ~/repo
+    docker:
+      - image: circleci/python:3.5.3
+      - image: circleci/mysql:5.6
+      - image: circleci/mongo:3.0.14
+    steps:
+      - checkout
+      - run: 
+          name: Install Python Dependencies
+          command: |
+            python3 -m venv venv
+            . venv/bin/activate
+            cd ~/repo && pip install -e . | tee
+      - run:
+          name: Lint Python Packages
+          command: |
+            . venv/bin/activate
+            flake8 --config=./.flake8 ./
+            find . -iname "*.py" ! -name "setup.py" ! -name "__init__.py" ! -path "./venv/*" | xargs pylint --rcfile=./.pylintrc
diff --git a/.flake8 b/.flake8
@@ -0,0 +1,3 @@
+[flake8]
+ignore = E501, E701
+exclude = setup.py, __init__.py, venv
diff --git a/.gitignore b/.gitignore
@@ -96,4 +96,8 @@ ENV/
 # mkdocs documentation
 /site
 
+# vim
+.session.vim
+.swp
+
 numerai_datasets/
diff --git a/.pylintrc b/.pylintrc
@@ -0,0 +1,12 @@
+[MESSAGES CONTROL]
+max-line-length=120
+max-args=7
+max-attributes=8
+max-locals=17
+disable=W0141,superfluous-parens,multiple-statements,C0111,C0103,E1101,logging-format-interpolation
+
+[TYPECHECK]
+ignored-modules = numpy, numpy.random, tensorflow
+
+[MISCELLANEOUS]
+notes=XXX
diff --git a/README.md b/README.md
@@ -1,76 +1,207 @@
 # Numerai Python API
-Automatically download and upload data for the Numerai machine learning competition
+Automatically download and upload data for the Numerai machine learning 
+competition.
 
-This library is a client to the Numerai API. The interface is programmed in Python and allows downloading the training data, uploading predictions, and accessing some user information. Some parts of the code were taken from [numerflow](https://github.com/ChristianSch/numerflow) by ChristianSch. Visit his [wiki](https://github.com/ChristianSch/numerflow/wiki/API-Reverse-Engineering), if you need further information on the reverse engineering process.
+This library is a Python client to the Numerai API. The interface is programmed 
+in Python and allows downloading the training data, uploading predictions, and 
+accessing user and submission information. Some parts of the code were taken 
+from [numerflow](https://github.com/ChristianSch/numerflow) by ChristianSch.  
+Visit his 
+[wiki](https://github.com/ChristianSch/numerflow/wiki/API-Reverse-Engineering), 
+if you need further information on the reverse engineering process.
 
 If you encounter a problem or have suggestions, feel free to open an issue.
 
 # Installation
-This library supports both Python 2 and 3.  Clone this repository, then `cd` into this repository's directory.  Then `pip install -e .`
+1. Obtain a copy of this API
+  * If you do not plan on contributing to this repository, download a release.
+    1. Navigate to [releases](https://github.com/numerai/NumerAPI/releases).
+    2. Download the latest version.
+    3. Extract with `unzip` or `tar` as necessary.
+
+  * If you do plan on contributing, clone this repository instead.
+
+2. `cd` into the API directory (defaults to `numerapi`, but make sure not to go 
+into the sub-directory also named `numerapi`).
+3. `pip install -e .`
 
 # Usage
-See the example.py.  You can run it as `./example.py`
+See `example.py`.  You can run it as `./example.py`
 
 # Documentation
-### `download_current_dataset`
-#### Parameters
-* `dest_path`: Optional parameter. Destination folder for the dataset. Default: currrent working directory.
-* `unzip`: Optional parameter. Decide if you wish to unzip the downloaded training data. Default: True
-
-#### Return values
-* `status_code`: Status code of the requests operation.
-
-### `upload_prediction`
-#### Parameters
-* `file_path`: Path to the prediction. It should already contain the file name ('path/to/file/prediction.csv')
-
-#### Return values
-* `status_code`: Status code of the requests operation.
-
-#### Notes
-* Uploading a prediction shortly before a new dataset is released may result in a <400 Bad Request>. If this happens, just wait for the new dataset and upload new predictions then.
-* Uploading too many predictions in a certain amount of time will result in a <429 Too Many Requests>.
-* Uploading predictions to an account that has 2FA (Two Factor Authentication) enabled is not currently supported
-
-### `get_user`
+## Layout
+Parameters and return values are given with Python types. Dictionary keys are 
+given in quotes; other names to the left of colons are for reference 
+convenience only. In particular, `list`s of `dict`s have names for the `dict`s; 
+these names will not show up in the actual data, only the actual `dict` data 
+itself.
+
+## `login`
 ### Parameters
-* `username`: Name of the user you want to request.
-
-### Return values
-* `array-like`: Tuple of size nine containing the `username`, `submission_id`, `validation_logloss`, `validation_consistency`, `originality`, `concordance`, `career_usd`, `career_nmr` and the status code of the requests operation. If it fails all values except the status code will be `None`.
-
-### `get_scores`
+* `email` (`str`, optional): email of user account
+  * will prompt for this value if not supplied
+* `password` (`str`, optional): password of user account
+  * will prompt for this value if not supplied
+  * prompting is recommended for security reasons
+* `prompt_for_mfa` (`bool`, optional): indication of whether to prompt for MFA 
+  code
+  * only necessary if MFA is enabled for user account
+### Return Values
+* `user_credentials` (`dict`): credentials for logged-in user
+  * `"username"` (`str`)
+  * `"access_token"` (`str`)
+  * `"refresh_token"` (`str`)
+
+## `download_current_dataset`
 ### Parameters
-* `username`: Name of the user you want to request.
-
-### Return values
-* `array-like`: Tuple of size 2 containing a `numpy.ndarray` containing the scores of all uploaded predictions with the newest first and the status code of the requests operation. If it fails all values except the status code will be `None`.
-
-### `get_earnings_per_round`
+* `dest_path` (`str`, optional, default: `.`): destination folder for the 
+  dataset
+* `unzip` (`bool`, optional, default: `True`): indication of whether the 
+  training data should be unzipped
+### Return Values
+* `success` (`bool`): indication of whether the current dataset was 
+  successfully downloaded
+
+## `get_all_competitions`
+### Return Values
+* `all_competitions` (`list`): information about all competitions
+  * `competition` (`dict`)
+    * `"_id"` (`int`)
+        * `"dataset_id"` (`str`)
+        * `"start_date"` (`str (datetime)`)
+        * `"end_date"` (`str (datetime)`)
+        * `"paid"` (`bool`)
+        * `"leaderboard`" (`list`)
+          * `submission` (`dict`)
+            * `"concordant"` (`dict`)
+              * `"pending"` (`bool`)
+              * `"value"` (`bool`)
+            * `"earnings"` (`dict`)
+              * `"career"` (`dict`)
+                * `"nmr"` (`str`)
+                * `"usd"` (`str`)
+              * `"competition"` (`dict`)
+                * `"nmr"` (`str`)
+                * `"usd"` (`str`)
+            * `"logloss"` (`dict`)
+              * `"consistency"` (`int`)
+              * `"validation"` (`float`)
+            * `"original"` (`dict`)
+              * `"pending"` (`bool`)
+              * `"value"` (`bool`)
+            * `"submission_id"` (`str`)
+            * `"username"` (`str`)
+
+## `get_competition`
+### Return Values
+* `competition` (`dict`): information about requested competition
+  * `_id` (`int`)
+    * `"dataset_id"` (`str`)
+    * `"start_date"` (`str (datetime)`)
+    * `"end_date"` (`str (datetime)`)
+    * `"paid"` (`bool`)
+    * `"leaderboard"` (`list`)
+      * `submission` (`dict`)
+        * `"concordant"` (`dict`)
+          * `"pending"` (`bool`)
+          * `"value"` (`bool`)
+        * `"earnings"` (`dict`)
+          * `"career"` (`dict`)
+            * `"nmr"` (`str`)
+            * `"usd"` (`str`)
+          * `"competition"` (`dict`)
+            * `"nmr"` (`str`)
+            * `"usd"` (`str`)
+        * `"logloss"` (`dict`)
+          `"consistency"`: (int`)
+          `"validation"`: (float`)
+        * `"original"` (`dict`)
+          * `"pending"` (`bool`)
+          * `"value"` (`bool`)
+        * `"submission_id"` (`str`)
+        * `"username"` (`str`)
+
+## `get_earnings_per_round`
 ### Parameters
-* `username`: Name of the user you want to request.
+* `username`: user for which earnings are requested
+### Return Values
+* `round_ids` (`np.ndarray(int)`): IDs of each round for which there are 
+  earnings
+* `earnings` (`np.ndarray(float)`): earnings for each round
 
-### Return values
-* `array-like`: Tuple of size 2 containing a `numpy.ndarray` containing the earnings of each round with the oldest first and the status code of the requests operation. If it fails all values except the status code will be `None`.
-
-### `login`
-#### Return values
-* `array-like`: Tuple of size four containing the `accessToken`, `refreshToken`, `id`, and the status code of the requests operation. If it fails all values except the status code will be `None`.
-
-### `authorize`
-#### Parameters
-* `file_path`: Path to the prediction. It should already contain the file name ('path/to/file/prediction.csv')
-
-#### Return values
-* `array-like`: Tuple of size four containing the `filename`, `signedRequest`, `headers`, and the status code of the requests operation. If it fails all values except the status code will be `None`.
-
-### `get_current_competition`
-#### Return values
-* `array-like`: Tuple of size three containing the `dataset_id`, `_id` and the status code of the requests operation. If it fails all values except the status code will be `None`.
-
-### `get_new_leaderboard`
-#### Return Values
-* `list`: A list of every user that has submitted in this round of the competition, including statistics like how much USD and NMR were earned by that user in that round.
+## `get_scores_for_user`
+### Parameters
+* `username`: user for which scores are being requested
+### Return Values
+* `validation_scores` (`np.ndarray(float)`): logloss validation scores
+* `consistency_scoress` (`np.ndarray(float)`): logloss consistency scores
+* `round_ids` (`np.ndarray(int`): IDs of the rounds for which there are scores
 
-#### Notes
-* Each round of the competition is numbered.  The first competition is 1.  Specify a round of the competition to get leaderboard information for that round, or leave off the round of the competition to get the current round of the competition.
+## `get_user`
+### Parameters
+* `username`: `str` - name of requested user
+### Return Values
+* `user` (`dict`): information about the requested user
+  * `"_id"` (`str`)
+  * `"username"` (`str`)
+  * `"assignedEthAddress"` (`str`)
+  * `"created"` (`str (datetime)`)
+  * `"earnings"` (`float`)
+  * `"followers"` (`int`)
+  * `"rewards"` (`list`)
+    * `reward` (`dict`)
+      * `"_id"` (`int`)
+      * `"amount"` (`float`)
+      * `"earned"` (`float`)
+      * `"nmr_earned"` (`str`)
+      * `"start_date"` (`str (datetime)`)
+      * `"end_date"` (`str (datetime)`)
+  * `"submissions"` (`dict`)
+    * `"results"` (`list`)
+      * `result` (`dict`)
+        * `"_id"` (`str`)
+        * `"competition"` (`dict`)
+          * `"_id"` (`str`)
+          * `"start_date"` (`str (datetime)`)
+          * `"end_date"` (`str (datetime)`)
+        * `"competition_id"` (`int`)
+        * `"created"` (`str (datetime)`)
+        * `"id"` (`str`)
+        * `"username"` (`str`)
+
+## `get_submission_for_round`
+### Parameters
+* `username` (`str`): user for which submission is requested
+* `round_id` (`int`, optional): round for which submission is requested
+  * if no `round_id` is supplied, the submission for the current round will be 
+    retrieved
+### Return Values
+* `username` (`str`): user for which submission is requested
+* `submission_id` (`str`): ID of submission for which data was found
+* `logloss_val` (`float`): amount of logloss for given submission
+* `logloss_consistency` (`float`): consistency of given submission
+* `career_usd` (`float`): amount of USD earned by given user
+* `career_nmr` (`float`): amount of NMR earned by given user
+* `concordant` (`bool` OR `dict` (see note)): whether given submission is 
+  concordant
+  * for rounds before 64, this was only a boolean, but from 64 on, it is a dict
+    which indicates whether concordance is still being computed
+* `original` (`bool` OR `dict` (see note)): whether given submission is 
+  original
+  * for rounds before 64, this was only a boolean, but from 64 on, it is a dict
+    which indicates whether originality is still being computed
+
+## `upload_predictions`
+### Parameters
+* `file_path` (`str`): path to CSV of predictions
+  * should already contain the file name (e.g. `"path/to/file/prediction.csv"`)
+
+### Return Values
+* `success`: indicator of whether the upload succeeded
+
+### Notes
+* Uploading a prediction shortly before a new dataset is released may result in 
+  a `400 Bad Request`. If this happens, wait for the new dataset and attempt to 
+  upload again.
+* Uploading too many predictions in a certain amount of time will result in a 
+  `429 Too Many Requests`.