# Qlib

Qlib contains the full ML pipeline of data processing, model training, back-testing; and covers the entire auto workflow of quant investment. Other features include risk modelling, portfolio optimization, alpha seeking, and order execution. It is the first open-source platform that covers the workflow of a modern quantitative researcher in the age of AI. It aims to empower quantitative researchers with the true potential of machine learning in quantitative investment. 

# Code Implementation

## Installation:  

In [None]:
!python -m pip install pip --upgrade --user -q
!python -m pip install numpy pandas seaborn matplotlib scipy sklearn statsmodels tensorflow keras --user -q

In [None]:
!python -m pip install pyqlib --user -q

## Load and Prepare Data:

In [None]:
!git clone https://github.com/microsoft/qlib.git 

In [None]:
%cd qlib
!python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn

Workflow Code from [Qlib GitHub repo](https://github.com/microsoft/qlib/blob/main/examples/workflow_by_code.ipynb): Qlib makes use of its tool ‘qrun’ to automate the workflow( data loading, training, backtesting and drawing inferences using graphs).

Importing libraries

In [None]:
!python setup.py build_ext --inplace

In [None]:
import qlib
from qlib.config import REG_CN

from qlib.contrib.model.gbdt import LGBModel
from qlib.contrib.data.handler import Alpha158
from qlib.contrib.strategy.strategy import TopkDropoutStrategy
from qlib.contrib.evaluate import ( backtest as normal_backtest,   risk_analysis,)
from qlib.utils import exists_qlib_data, init_instance_by_config
from qlib.workflow import R
from qlib.workflow.record_temp import SignalRecord, PortAnaRecord
from qlib.utils import flatten_dict

Train model – Model used is LightGBM by fine-tuning with Qlib’s Hyperparameter Test Engine(HTE). Dataset used is Alpha158 (Qlib has another dataset Alpha 360) 

In [None]:
provider_uri = "~/.qlib/qlib_data/cn_data"
qlib.init(provider_uri=provider_uri, region=REG_CN)

In [None]:
market = "csi300"
benchmark = "SH000300"

In [None]:
data_handler_config = {
    "start_time": "2008-01-01","end_time": "2020-08-01",
    "fit_start_time": "2008-01-01","fit_end_time": "2014-12-31",
    "instruments": market,}
task = {
    "model": {
        "class": "LGBModel","module_path": "qlib.contrib.model.gbdt",
        "kwargs": {
            "loss": "mse","colsample_bytree": 0.8879,
            "learning_rate": 0.0421,"subsample": 0.8789,
            "lambda_l1": 205.6999,"lambda_l2": 580.9768,"max_depth": 8,
            "num_leaves": 210,"num_threads": 20,},},
    "dataset": {
        "class": "DatasetH",
        "module_path": "qlib.data.dataset",
        "kwargs": {
            "handler": {
                "class": "Alpha158",
                "module_path": "qlib.contrib.data.handler",
                "kwargs": data_handler_config,
            },
            "segments": {
                "train": ("2008-01-01", "2014-12-31"),
                "valid": ("2015-01-01", "2016-12-31"),
                "test": ("2017-01-01", "2020-08-01"),
            }, }, }, } 

In [None]:
# model initiaiton
model = init_instance_by_config(task["model"])
dataset = init_instance_by_config(task["dataset"])

# start exp to train model
with R.start(experiment_name="train_model"):
    R.log_params(**flatten_dict(task))
    model.fit(dataset)
    R.save_objects(trained_model=model)
    rid = R.get_recorder().id

prediction, backtest & analysis

In [None]:
port_analysis_config = {
    "strategy": {
        "class": "TopkDropoutStrategy",
        "module_path": "qlib.contrib.strategy.strategy",
        "kwargs": { "topk": 50, "n_drop": 5, }, },
    "backtest": {
        "verbose": False, "limit_threshold": 0.095,"account": 100000000,
        "benchmark": benchmark,"deal_price": "close","open_cost": 0.0005,
        "close_cost": 0.0015,"min_cost": 5, },} 

backtest and analysis

In [None]:
with R.start(experiment_name="backtest_analysis"):
  recorder = R.get_recorder(rid, experiment_name="train_model")
  model = recorder.load_object("trained_model") 
  recorder = R.get_recorder()
  ba_rid = recorder.id
  sr = SignalRecord(model, dataset, recorder)
  sr.generate() 
  par = PortAnaRecord(recorder, port_analysis_config)
  par.generate() 

Report – Portfolio Analysis –  Backtest Return

In [None]:
analysis_position.report_graph(report_normal_df)

Risk analysis

In [None]:
analysis_position.risk_analysis_graph(analysis_df, report_normal_df)

Score IC

In [None]:
pred_label = pd.concat([label_df, pred_df], axis=1, sort=True).reindex(label_df.index)
analysis_position.score_ic_graph(pred_label)