### greenflow Tutorial
First import all the necessary modules.

In [1]:
import sys; sys.path.insert(0, '..')
import os
import warnings
import ipywidgets as widgets
from greenflow.dataframe_flow import TaskGraph

warnings.simplefilter("ignore")

In [2]:
! ((test ! -f './data/stock_price_hist.csv.gz' ||  test ! -f './data/security_master.csv.gz') && \
  cd .. && bash download_data.sh) || echo "Dataset is already present. No need to re-download it."

Dataset is already present. No need to re-download it.


In this tutorial, we are going to use greenflow to do a simple quant job. The task is fully described in a yaml file

In [3]:
!head -n 31 ../taskgraphs/simple_trade.gq.yaml

- conf:
    file: notebooks/data/stock_price_hist.csv.gz
  id: stock_data
  inputs: {}
  module: greenflow_gquant_plugin.dataloader
  type: CsvStockLoader
- conf:
    file: notebooks/data/security_master.csv.gz
  id: stock_name
  inputs: {}
  module: greenflow_gquant_plugin.dataloader
  type: StockNameLoader
- conf:
    asset: 4330
  id: stock_selector
  inputs:
    name_map: stock_name.map_data
    stock_in: stock_data.cudf_out
  module: greenflow_gquant_plugin.transform
  type: AssetFilterNode
- conf: {}
  id: ''
  inputs:
    in1: stock_selector.stock_name
    in2: lineplot.lineplot
    in3: barplot.barplot
    in4: sharpe_ratio.sharpe_out
    in5: cumulative_return.cum_return
    in6: stock_data.cudf_out
  module: rapids_modules
  type: Output_Collector


The yaml file is describing the computation task by a graph, we can visualize it

In [4]:
task_graph = TaskGraph.load_taskgraph('../taskgraphs/simple_trade.gq.yaml')
task_graph.draw()

GreenflowWidget(sub=HBox(), value=[OrderedDict([('id', 'stock_data'), ('type', 'CsvStockLoader'), ('conf', {'f…

In [5]:
task_graph.run(formated=True)

Tab(children=(Output(), Output(), Output(), Output(), Output(), Output(), Output(layout=Layout(border='1px sol…

We define a method to organize the output images

In [6]:
def fig2img(fig):
    """Convert a Matplotlib figure to BufferIO"""
    import io
    buf = io.BytesIO()
    fig.savefig(buf)
    buf.seek(0)
    return buf

def plot_figures(result):
    # format the figures
    figure_width = '1200px'
    figure_height = '400px'
    bar_figure = result['barplot.barplot']
    sharpe_number = result['sharpe_ratio.sharpe_out']
    cum_return = result['cumulative_return.cum_return']
    cum_return.set_figwidth(10)
    cum_return.suptitle('P & L %.3f' % (sharpe_number), fontsize=16)
    i = fig2img(cum_return)
    img_cum = widgets.Image(
                            value=i.read(),
                            format='png',
                            width=600,
                            height=900,
    )
    signals = result['lineplot.lineplot']
    signals.set_figwidth(10)
    i = fig2img(signals)
    img_signals = widgets.Image(
                            value=i.read(),
                            format='png',
                            width=600,
                            height=900,
    ) 
    symbol = result['stock_selector.stock_name']
    output = widgets.VBox([bar_figure, img_cum, img_signals])
    return output

Rerun the graph and send the computation result to the `plot_figure` method

In [7]:
result = task_graph.run()
plot_figures(result)

VBox(children=(Image(value=b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x03 \x00\x00\x02?\x08\x06\x00\x00\x00…

You can change the TaskGraph node parameters interatively and hit the run button to get the updated result. It can also be done programtically, E.g. change the mean reversion parameters:

In [8]:
o = task_graph.run(
            outputs=(list(result.get_keys())[0:]),
            replace={'stock_data': {"load": {'cudf_out': result['stock_data.cudf_out']}},
                     'mean_reversion': {'conf': {'fast': 1, 'slow': 10}}})
figure_combo = plot_figures(o)
figure_combo

VBox(children=(Image(value=b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x03 \x00\x00\x02?\x08\x06\x00\x00\x00…

Since computation is accelerated in the GPU, we can do hyper-parameter search interatively, try to change the parameters of the `slow` and `fast` for the moving average and see if you can improve the result:

In [9]:
para_selector = widgets.IntRangeSlider(value=[10, 30],
                                       min=3,
                                       max=60,
                                       step=1,
                                       description="MA:",
                                       disabled=False,
                                       continuous_update=False,
                                       orientation='horizontal',
                                       readout=True)


def para_selection(*stocks):
    with out:
        print('run')
        para1 = para_selector.value[0]
        para2 = para_selector.value[1]
        o = task_graph.run(
            outputs=(list(result.get_keys())[0:]),
            replace={'stock_data': {"load": {'cudf_out': result['stock_data.cudf_out']}},
                     'mean_reversion': {'conf': {'fast': para1, 'slow': para2}}})
        figure_combo = plot_figures(o)
        w.children = (w.children[0], figure_combo,)


out = widgets.Output(layout={'border': '1px solid black'})
para_selector.observe(para_selection, 'value')
selectors = widgets.HBox([para_selector])
w = widgets.VBox([selectors])
w

VBox(children=(HBox(children=(IntRangeSlider(value=(10, 30), continuous_update=False, description='MA:', max=6…

In [10]:
out

Output(layout=Layout(border='1px solid black'))

In [10]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}