![werwe](https://uploads-ssl.webflow.com/625391e03972c921373d60ba/6296d332b7e7cd998bf9035b_judge_logo_white.png)

&nbsp;



# Tutorial:  Towards a Sophisticated Research Methodology 

*Read time:  11 min.* 

&nbsp;

----

This tutorial demonstrates the ease-of-use and power of Judge Research.  If you complete it while referring back to our [wiki](https://judgeresearch.notion.site/The-Judge-Research-Wiki-37d2ae0159254928b483f01fec87b576) whenever a step's logic is not clear to you, you will be fully ready to use Judge Research.  

You can use Judge Research to (1) contribute to the decentralized systematic fund & be rewarded; (2) test competing operationalizations with extreme rigour; (3) compare findings from initial (validation set) findings and live findings as they come in; and (4) engineer features that represent something important about the market, and develop rich dashboards that communicate that information in real time.

***For a faster tutorial that relies on wrapper functions which turn 30-40 lines of code here into 1 or 2 lines, check out Feature_Tutorial_Short.  Those wrapper functions make it harder to understand what is going on under the hood in the SDK, but make getting down to work a matter of minutes.*** 

----

![](https://uploads-ssl.webflow.com/625391e03972c921373d60ba/626b3edab9b30b7f49b3f554_23519116918_1a87106387_k.jpeg)






----

## Introduction

**The Workflow:**  You now have the data environment of a world-class hedge fund at your fingertips.  You can use it in this notebook to engage in feature engineering and display your preliminary findings in a single cohesive document; then submit your features to Judge Research and embed the live, interactive data tools from your dashboard into this document.    

**What it Accomplishes:**  With a few clicks, you turn this document into a live research tool linked up to an AI run on massively parallel processes & coded out by a team of PhDs.  It evaluates your features across millions of modeling contexts, ranks and compares them to what is in the market's larger data environment, and rigorously tests them for overfit.    

----


### The Tech Stack & the Decentralized Systematic Fund

This tutorial is meant for those who want to be rewarded for contributing to the decentralized systematic fund, and to use the fund as a new *kind* of research tool.  

We will soon be launching a SaaS version of the software.  It will allow you to take advantage of the above functionality without participating in the decentralized fund.  Your features & algorithms will be sent to a version of the AI that only looks at your fund's features and algorithms but still ranks & evaluates them for overfit in real time. 

Likewise, you can use the SaaS version to collaborate with other funds without revealing IP to one another.  We believe this type of collaboration is novel and has significant implications for the industry - allowing a small set of funds to match the capacities of the largest systematic funds.       

----

# Outline:  The Basic Steps for Submitting Your Features

There are **three preliminary steps:**  

1.  Authenticate & Setup the Workspace.
2.  Configure the parameters of the historical data you would like to call, and the parameters of the series you are studying - for instance, the BTC-USD volatility series at 45 minute intervals.
3.  Call the data.

Section four is where you will spend 95% of your time:  Here you will **do your data exploration and feature engineering**.  There are then **three final steps:** 

5.  Submit your historical data.  This historical data submission will cover a time period from the beginning of a specified date - e.g. the 4 hr alpha test series begins on July  2019 - and continue up to the present moment.
6.  Schedule the cron job to submit live features.
7.  (Optional) Embed the data tools from your dashboard into the same notebook, writing up your findings to create live, interactive research tools and/or market signals. 

----



## 1.  Handle Your Authentications & Setup the Workspace
Paste your keys and then import the Judge Research & Coinalytix packages, as well as any tools for charting, technical & statistical analysis you might want.

In [63]:
JR_API_KEY = 'xxx'
CA_API_KEY = 'xxx'


from historical_data import Coinalytix, HDParams
from judgeresearch import JudgeResearch, JRParams

# Import classes for data handling & visualization 
import json
import scipy 
import plotly.graph_objects as go
import pandas as pd
from datetime import date, datetime, timedelta
import pandas_ta as ta
import time
import math
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf 

from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import PolynomialFeatures
import sklearn
import requests # these two not in other imports
import json

import JudgeHelperFuncs as jh
from watchlist import colors

----

## 2. Configure Assets, Call, Organize & Clean the Data

An easier way to call assets can be found in the Feature_Tutorial_Short.ipynb notebook.  That relies on a few wrapper functions, though, so this tutorial is an easier way to understand what is going on under the hood.  

Define the parameters for the historical data you want to call.  Typically, you want to line up the start date(s) of your historical data with the start date of the GA instance(s) to which you'll eventually submit features.  You can finds those dates in our [wiki](https://judgeresearch.notion.site/The-Judge-Research-Wiki-37d2ae0159254928b483f01fec87b576). The below calls data from `startDateString` to t-1

In [64]:

startDateString = "2019-09-01 00:00:00" #     All times are of course in UTC
periodsPerDay = 6 #                           MAKE SURE periodsPerDay & intervalString align
intervalString = "4h" #                       Example arguments:  45m 4h, 1d 
startDate = datetime.strptime(startDateString, "%Y-%m-%d %H:%M:%S")
timeBack = datetime.now() - startDate
nObs = timeBack.total_seconds() / ((60*60*24) / periodsPerDay) 
nObs = math.ceil(nObs) #                     number of discrete time blocs up to the present
print(nObs)


6395


In [65]:
print("The series begins at {}.".format(startDateString)) 
print("And extends {} of obversvations.".format(nObs)) 

The series begins at 2019-09-01 00:00:00.
And extends 6395 of obversvations.


In [66]:
asset = HDParams() #                         Set exchange, must be "BINANCE" or ...
asset.exchange = "BINANCE" #                 Set asset, currently supports "BTC-USD-SPOT", "ETH-USD-SPOT", ...
asset.ticker = "BTC-USDT-SPOT" #             Set start of reporting period in form YYYY-MM-DD HH:MM:SS
asset.set_start_date(startDateString) #      The 4h series for the alpha test start on July 1, 2019. 
asset.interval = intervalString #            Example arguments:  45m 4h, 1d                     
asset.num_periods = nObs #                   Set number of reporting periods

----

## 3.a.  Collect Data
Authenticate & fetch historical data. 


In [67]:
HD = Coinalytix() #                         Set api key
HD.with_api_key(CA_API_KEY) #               Fetch historical data
asset_data = HD.fetch_hd(asset) #           Create Pandas data frame from result

btc = pd.DataFrame.from_dict(asset_data) #  Adjust DatetimeIndex.
btc.set_index(pd.DatetimeIndex(btc["StartDate"]*1000000000), inplace=True)



## 3.b. Clean Your Data

It is common for exchange data at smaller time blocs to return occasional missing data.

In [68]:
missCount = btc.isna().sum() #      get a sense of how many obs you're missing 
missPerc = missCount / len(btc)
print(missPerc)

btc = btc.fillna(method='ffill') #     fill w/ previous observation. 

StartDate    0.000000
Open         0.000156
High         0.000156
Low          0.000156
Close        0.000156
Volume       0.000000
dtype: float64


Great!  Now that you have set your data environment up, you can carry on to your main job:  Research & Feature engineering.

----

<img src = "https://uploads-ssl.webflow.com/625391e03972c921373d60ba/6296d332b7e7cd998bf9035b_judge_logo_white.png" width=400>

## 4.  The Main Section:  Your Feature Engineering Sandbox

It is good practice to write your preliminary findings down and frame explicitly the hypotheses you are investigating by sending your features into Judge Research.  Obviously, that is true with all research, but Judge Research is more than an AI:  It suggests an organization to your research that helps you confirm your intitial (validation set) findings with live data *at scale.*

That is easy for any professional researcher to do for a small set of findings, but to making research **truly cumulative over the long run** is one of the primary functions of Judge Research. 

In [69]:
y1 = (btc['Open'] - btc['Close']) / btc['Open']
y1 = y1

y1 = pd.concat(([btc['StartDate'], y1]), axis = 1)
y1.columns = ['StartDate', 'feature']
y1 = y1.applymap(str)

In [70]:
y1.tail(5)

Unnamed: 0_level_0,StartDate,feature
StartDate,Unnamed: 1_level_1,Unnamed: 2_level_1
2022-08-01 00:00:00,1659312000.0,-0.0024982400714272
2022-08-01 04:00:00,1659326400.0,0.0020191131302241
2022-08-01 08:00:00,1659340800.0,0.0033957458850163
2022-08-01 12:00:00,1659355200.0,-0.0050801195119641
2022-08-01 16:00:00,1659369600.0,0.006486546739149


### Research, Chart & Analyze
State & chart your preliminary hypotheses and findings here, prior to putting submitting to Judge Research.  One function of these notebooks is to seemlessly organize your live data-derived findings as they come in with your initial findings.  That makes it all the more important to state your initial findings so that you understand what you were thinking weeks or months later.  


<img src = "https://uploads-ssl.webflow.com/625391e03972c921373d60ba/6296d332b7e7cd998bf9035b_judge_logo_white.png" width=400>

## 7.  Reference your live findings

Let's do things a bit out of order:  In steps 5 & 6, we submit our historical data & our live data.  

We put that at the end of the document, so we can have a cohesive research flow to the research.

Add charts from judgeresearch.co -> member portal -> my profile, and paste your iframes in this document.  The below is an example from a team member's profile.

Notice the '%%html' that immediately procedes the html object.  

In [44]:
%%html
<iframe src="http://ec2-3-131-96-30.us-east-2.compute.amazonaws.com:3838/ShinyBuild/ScatterPlot/54ckzfe5tj", width=800, height=500 ></iframe>

In [45]:
%%html
<iframe src="http://ec2-3-131-96-30.us-east-2.compute.amazonaws.com:3838/ShinyBuild/LinePlot/54ckzfe5tj", width=800, height=500 ></iframe>

To submit historical data to Judge Research's AI, you,

1.  Prepare the connection, 

2.  Choose information to tell the API: 
- The instance of the GA you are submitting (e.g. BTC-USD at a 4 hour time period) 
- What you are labeling your feature (e.g. 'x1' so as to not reveal any IP), and 
- The interpolation procedure for when you miss the occassional submission.

3.  Fromat the payload
4.  And submit the feature!


In [71]:
JR = JudgeResearch()
JR.with_api_key(JR_API_KEY)

Format the Historical Features:  Below we specify which instance(s) of the AI are going to receive our feature(s).  Each instance is trained on a different dependent variable.   We communicate this to Judge Research by formatting our feature(s) as their JSON.  See the [API documentation in our wiki](https://judgeresearch.notion.site/Use-The-API-5143af17c10f407d91a8860a7c91936e) for information about each argument. 

For example, here we specify this submission is for BTC-USD at the 45 minute interval, and we label it 'BTCt' for BTC at time *t*.   

In [None]:
ft_params = JRParams()
ft_params.mbs= '240'  #               mbs: 'minute bloc size' - the size of the discrete time bloc.  In the alpha test, options are 45 & 240.  
ft_params.feature_name = "ETHt" #     feature_name: name it to represent the substance of the feature or generically so no one but you knows what it is; e.g. 'x1'
ft_params.dv = 'ETH-USD' #            the dependent variable you are attempting to help explain
ft_params.ipp = "last" #              ipp: interpolation procedure - fill-in value if you miss a submission.  'last' or 'zero'

features = JR.craft_features(ft_params, y1)
payload = JR.format_payload(features)
submit = JR.submit_feature(payload)
print(submit)


You can submit the same feature to multiple series of the GA.  Check out Feature_Tutorial_Short.ipynb for an example.  

## 6. Live Feature Calculation and Submission

Now it's time to schedule the live submissions.  You don't want to load a full notebook every time, so switch over to Feature_Tutorial_live.py to see the rest of the introductory tutorial.  You can find the functions that script calls in the JudgeHelperFuncs.py module, which we typically load into workspaces as 'jh'.  

There are a few idiosyncrasies that come from scheduling the cron job w/ Docker's build.  It just requires you to tweak the cron a bit.  So we put together a short [page of our wiki](https://judgeresearch.notion.site/Scheduling-Your-Live-Send-Scripts-fc64827cedf4469ab826e1df2c25867f) for your convenience.  