![werwe](https://uploads-ssl.webflow.com/625391e03972c921373d60ba/6296d332b7e7cd998bf9035b_judge_logo_white.png)

&nbsp;



# Tutorial:  Towards a Sophisticated Research Methodology 

*Read time:  11 min.* 

&nbsp;

----

This tutorial demonstrates the ease-of-use and power of Judge Research.  If you complete it while referring back to our [wiki](https://judgeresearch.notion.site/The-Judge-Research-Wiki-37d2ae0159254928b483f01fec87b576) whenever a step's logic is not clear to you, you will be fully ready to use Judge Research.  

You can use Judge Research to (1) contribute to the decentralized systematic fund & be rewarded; (2) test competing operationalizations with extreme rigour; (3) compare findings from initial (validation set) findings and live findings as they come in; and (4) engineer features that represent something important about the market, and develop rich dashboards that communicate that information in real time.

----

![](https://uploads-ssl.webflow.com/625391e03972c921373d60ba/626b3edab9b30b7f49b3f554_23519116918_1a87106387_k.jpeg)






----

## Introduction

**The Workflow:**  You now have the data environment of a world-class hedge fund at your fingertips.  You can use it in this notebook to engage in feature engineering and display your preliminary findings in a single cohesive document; then submit your features to Judge Research and embed the live, interactive data tools from your dashboard into this document.    

**What it Accomplishes:**  With a few clicks, you turn this document into a live research tool linked up to an AI run on massively parallel processes & coded out by a team of PhDs.  It evaluates your features across millions of modeling contexts, ranks and compares them to what is in the market's larger data environment, and rigorously tests them for overfit.    

----


### The Tech Stack & the Decentralized Systematic Fund

This tutorial is meant for those who want to be rewarded for contributing to the decentralized systematic fund, and to use the fund as a new *kind* of research tool.  

We will soon be launching a SaaS version of the software.  It will allow you to take advantage of the above functionality without participating in the decentralized fund.  Your features & algorithms will be sent to a version of the AI that only looks at your fund's features and algorithms but still ranks & evaluates them for overfit in real time. 

Likewise, you can use the SaaS version to collaborate with other funds without revealing IP to one another.  We believe this type of collaboration is novel and has significant implications for the industry - allowing a small set of funds to match the capacities of the largest systematic funds.       

----

# Outline:  The Basic Steps for Submitting Your Features

There are **three preliminary steps:**  

1.  Authenticate & Setup the Workspace.
2.  Configure the parameters of the historical data you would like to call, and the parameters of the series you are studying - for instance, the BTC-USD volatility series at 45 minute intervals.
3.  Call, Organize & Clean the data.  

These three steps are broken out by their underlying functions in the longer tutorial.  Here, wrapper functions shorten about 30 lines of code to one.  

Step four is where you will spend 95% of your time:  Here you will **do your data exploration and feature engineering**.  There are then **three final steps:** 

5.  Submit your historical data.  This historical data submission will cover a time period from the beginning of a specified date - e.g. the 4 hr alpha test series begins on July  2019 - and continue up to the present moment.
6.  Schedule the cron job to submit live features.
7.  (Optional) Embed the data tools from your dashboard into the same notebook, writing up your findings to create live, interactive research tools and/or market signals. 

----



## 1.  Handle Your Authentications & Setup the Workspace

You can run this code for-real by hitting shift + enter in each codeblock.  But first go and get your API keys  from [Coinalytix](www.coinalytix.io) and [Judge Research](www.judgeresearch.co).

In [3]:
JR_API_KEY = "xxx" # Your Judge Research API was given to you upon sign-up.  You can find it under your profile.
CA_API_KEY = "xxx" # Sign up at Coinalytix.io - 90 days free and no payment information need be entered.

Import the Judge Research & Coinalytix packages, as well as the python tools want:

In [4]:
# Import classes that handle API connections and parameters
from historical_data import Coinalytix, HDParams
from judgeresearch import JudgeResearch, JRParams

# Import classes for data handling & visualization 
import json
import scipy 
import plotly.graph_objects as go
import plotly
import pandas as pd
from datetime import date, datetime, timedelta
import pandas_ta as ta
import time
import math
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf 

from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import PolynomialFeatures
import sklearn
import requests # these two not in other imports
import json

import JudgeHelperFuncs as jh
from watchlist import colors

----

## 2. Configure Assets, Call, Organize & Clean the Data
Define the parameters for the historical data you want to call.  Typically, you want to line up the start date(s) of your historical data with the start date of the GA instance(s) to which you'll eventually submit features.  You can finds those dates in our [wiki](https://judgeresearch.notion.site/The-Judge-Research-Wiki-37d2ae0159254928b483f01fec87b576). The below calls data from `startDateString` to t-1. 

One useful convenience function from the JudgeHelperFuncs module is a little regex + querying coinalytix for what assets are available:

In [22]:
tc = jh.whichTickers(verbose = False, apiKey = CA_API_KEY, Which2 = "ETH", Which='[0-9]+|PERP', WhichExch ='') # tc: this call
print(tc)

    Exchange           Ticker
67   BINANCE   ETH-USD-200925
68   BINANCE   ETH-USD-201225
69   BINANCE   ETH-USD-210326
70   BINANCE   ETH-USD-210625
71   BINANCE   ETH-USD-210924
72   BINANCE   ETH-USD-211231
73   BINANCE   ETH-USD-220325
74   BINANCE   ETH-USD-220624
75   BINANCE   ETH-USD-220930
76   BINANCE     ETH-USD-PERP
156      FTX  ETH-USDT-220624
157      FTX  ETH-USDT-220930
158      FTX  ETH-USDT-221230
159      FTX    ETH-USDT-PERP
231   KRAKEN   ETH-USD-190329
232   KRAKEN   ETH-USD-190628
233   KRAKEN   ETH-USD-190927
234   KRAKEN   ETH-USD-191227
235   KRAKEN   ETH-USD-200327
236   KRAKEN   ETH-USD-200626
237   KRAKEN   ETH-USD-200925
238   KRAKEN   ETH-USD-201225
239   KRAKEN   ETH-USD-210326
240   KRAKEN   ETH-USD-210625
241   KRAKEN   ETH-USD-210924
242   KRAKEN   ETH-USD-211231
243   KRAKEN   ETH-USD-220325
244   KRAKEN   ETH-USD-220624
245   KRAKEN   ETH-USD-220729
246   KRAKEN   ETH-USD-220826
247   KRAKEN   ETH-USD-220930
248   KRAKEN   ETH-USD-221230
249   KRAK

Instead of pulling assets from this DataFrame, let's start with an easy call of a couple spot prices:

In [None]:
tbs = '45m'                            # time block size:  1h, 4h, 1d, etc.
MBS = '45' #                           # minute bloc size:  for submitting to JR's API.  accord with 
thisDateStart = "2022-01-01 00:00:00"  # always as a string denoted as so

XDict = jh.assetCallLoop(exchangeList = ['BINANCE', 'BINANCE', 'BINANCE'], assetList = ['ETH-USDT-SPOT', 'BTC-USDT-SPOT', 'SOL-USDT-SPOT'], startDateString = thisDateStart, perSize = tbs, APIKey = CA_API_KEY, verbose = True)

----

<img src = "https://uploads-ssl.webflow.com/625391e03972c921373d60ba/6296d332b7e7cd998bf9035b_judge_logo_white.png" width=400>

## 4.  The Main Section:  Your Feature Engineering Sandbox

Judge Research suggests an organization to your research that helps you confirm your intitial (validation set) findings with live data *at scale.*

That is easy for any professional researcher to do for a small set of findings, but to making research **truly cumulative over the long run** is one of the primary functions of our AI, and your ability to plug its findings into these notebooks.

In [16]:
# Let's make our working example super simple for sake of focusing on the workflow:

eth = XDict['ETH-USDT-SPOT']
btc = XDict['BTC-USDT-SPOT']
sol = XDict['SOL-USDT-SPOT']

def percDif(y):
    x = (y['Open'] - y['Close']) / y['Open']
    z = pd.concat(([y['StartDate'], x]), axis = 1)
    z.columns = ['StartDate', 'feature']
    z = z.applymap(str)
    return(z)


y1 = percDif(eth)
y2 = percDif(btc)
y3 = percDif(sol)

Notice the three formatting lines at the end of the function.  That's the only overhead you'll need to make your features play well with the SDK's formatting & submission scripts.  

In [None]:
btc.tail(3)

### A Slightly Busier Example 

Here are some simple moving average-related functions housed in one function.  You can ignore the details.  The key substantive point is our AI is really useful for comparing operationalizations of related concepts.  Normally when one compares opationalizations it is in a single research context, and that (n = 1) workflow leads to overfitting or fragile research. 

In [None]:
def feature_gen(df):
    ''' calculate macd, awesome oscillator, and bbands '''
    df.ta.macd(fast=8, slow=21, signal=9, min_periods=None, append=True) #                      MACD
    df.ta.ao(high=df["High"], low=df["Low"], window1=5, window2=34, fillna=True, append=True) # Awesome Oscillator 
    df["AO_5_34"] = pd.to_numeric(df["AO_5_34"])
    df.ta.bbands(close=df["Close"], append=True) #                                              Bollinger Bands
    df["feature"] = (df["AO_5_34"] - df["MACDs_8_21_9"]) * df["BBP_5_2.0"] #                    Calculate the feature by finding the difference of MACDs and AO, multiply by BBANDS
    df = df.fillna(0)
    return df

otherExample = feature_gen(eth)

otherExample.tail(3)

The key data-wrangling point below is to notice how the final feature that one wants to submit is labeled, exactly, 'feature.'  The SDK's formatting function takes in a time series DataFrame, grabs the column labeled 'feature', and the row labels, and translates it into a JSON that Judge Research's API understands.  

### Write Up Your Findings

Spend extra time writing up your initial findings so the live findings speak to those initial thoughts months or even years later.




----


<img src = "https://uploads-ssl.webflow.com/625391e03972c921373d60ba/6296d332b7e7cd998bf9035b_judge_logo_white.png" width=400>

## 7.  Reference your live findings

Let's do things a bit out of order:  In steps 5 & 6, we submit our historical data & our live data.  

We put that at the end of the document, so we can have a cohesive research flow to the research.

Add charts from judgeresearch.co -> member portal -> my profile, and paste your iframes in this document.  The below is an example from a team member's profile.

Notice the '%%html' that immediately procedes the html object.  

In [16]:
%%html
<iframe src="http://ec2-3-131-96-30.us-east-2.compute.amazonaws.com:3838/ShinyBuild/ScatterPlot/54ckzfe5tj", width=800, height=500 ></iframe>

In [17]:
%%html
<iframe src="http://ec2-3-131-96-30.us-east-2.compute.amazonaws.com:3838/ShinyBuild/LinePlot/54ckzfe5tj", width=800, height=500 ></iframe>

----
## Steps 5 Submitting Your Feature's Historial Time Series to Judge Research

To submit historical data to Judge Research's AI, you,

1.  Prepare the connection, 

2.  Choose information to tell the API: 
- The instance of the GA you are submitting (e.g. BTC-USD at a 4 hour time period) 
- What you are labeling your feature (e.g. 'x1' so as to not reveal any IP), and 
- The interpolation procedure for when you miss the occassional submission.

3.  Fromat the payload
4.  And submit the feature!


In [19]:
JR = JudgeResearch()
JR.with_api_key(JR_API_KEY)

Format the Historical Features:  Below we specify which instance(s) of the AI are going to receive our feature(s).  Each instance is trained on a different dependent variable.   We communicate this to Judge Research by formatting our feature(s) as their JSON.  See the [API documentation in our wiki](https://judgeresearch.notion.site/Use-The-API-5143af17c10f407d91a8860a7c91936e) for information about each argument. 

For example, here we specify this submission is for BTC-USD at the 45 minute interval, and we label it 'BTCt' for BTC at time *t*.   

In [20]:
ft_params = JRParams()
ft_params.mbs= MBS  #                 mbs: 'minute bloc size' - the size of the discrete time bloc.  In the alpha test, options are 45 & 240.  
ft_params.feature_name = "ETHt" #     feature_name: name it to represent the substance of the feature or generically so no one but you knows what it is; e.g. 'x1'
ft_params.dv = 'ETH-USD' #            the dependent variable you are attempting to help explain
ft_params.ipp = "last" #              ipp: interpolation procedure - fill-in value if you miss a submission.  'last' or 'zero'

features = JR.craft_features(ft_params, y1)
payload = JR.format_payload(features)
submit = JR.submit_feature(payload)
#print(submit)
#print(payload)

You can submit the same feature to multiple series of the GA.  Here we use the same code as above to submit three features each to each of the alpha test series that make one step ahead forecasts at the four hour interval.

In [None]:
myVarNames = ('ETHt', 'BTCt', 'SOLt') #                    i as counter
myDVs = ('ETH-USD', 'V-ETH-USD', 'BTC-USD', 'V-BTC-USD') # j as counter

for j in range(len(myDVs)):
    i = 0
    for k, v in XDict.items():
        y1 = percDif(v)
        JR = JudgeResearch()
        JR.with_api_key(JR_API_KEY)
        ft_params = JRParams()
        # MR got rid of MBS 240 below wasn't working
        ft_params.mbs= MBS    
        ft_params.feature_name = myVarNames[i]
        ft_params.dv = myDVs[j]
        ft_params.ipp = "last"
        features = JR.craft_features(ft_params, y1)
        payload = JR.format_payload(features)
        submit = JR.submit_feature(payload)
        print(submit)
        i = i + 1
    

----

## 6. Live Feature Calculation and Submission

Now it's time to schedule the live submissions.  You don't want to load a full notebook every time, so switch over to Feature_Tutorial_live.py to see the rest of the introductory tutorial.  You can find the functions that script calls in the JudgeHelperFuncs.py module, which we typically load into workspaces as 'jh'.  

There are a few idiosyncrasies that come from scheduling the cron job w/ Docker's build.  It just requires you to tweak the cron a bit.  So we put together a short [page of our wiki](https://judgeresearch.notion.site/Scheduling-Your-Live-Send-Scripts-fc64827cedf4469ab826e1df2c25867f) for your convenience.  