Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Other Timeframes #8

Closed
AdrianP- opened this issue Aug 14, 2017 · 23 comments
Closed

Other Timeframes #8

AdrianP- opened this issue Aug 14, 2017 · 23 comments

Comments

@AdrianP-
Copy link
Contributor

AdrianP- commented Aug 14, 2017

I know a current limitation is accept Forex 1 min (only Forex?), but my datasets are with bigger timeframes.

This is the stacktrace when timeframe is changed:

Process BTgymServer-2:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()

  File "/home/adrian/btgym/btgym/server.py", line 405, in run
    episode = cerebro.run(stdstats=True, preload=False, oldbuysell=True)[0]

  File "/usr/local/lib/python3.5/dist-packages/backtrader/cerebro.py", line 1142, in run
    runstrat = self.runstrategies(iterstrat)

  File "/usr/local/lib/python3.5/dist-packages/backtrader/cerebro.py", line 1327, in runstrategies
    self.stop_writers(runstrats)

  File "/usr/local/lib/python3.5/dist-packages/backtrader/cerebro.py", line 1352, in stop_writers
    datainfos['Data%d' % i] = data.getwriterinfo()

  File "/usr/local/lib/python3.5/dist-packages/backtrader/dataseries.py", line 101, in getwriterinfo
    info['Timeframe'] = TimeFrame.TName(self._timeframe)

  File "/usr/local/lib/python3.5/dist-packages/backtrader/dataseries.py", line 57, in TName
    return cls.Names[tframe]
IndexError: list index out of range

Any idea?

@Kismuz
Copy link
Owner

Kismuz commented Aug 14, 2017

  1. ....seems it happens when backtrader tries to run the strategy. The only way to find what is wrong is first try to manually run dummy strategy in backtrader "traditional way" over your data to ensure data parsing works correct and only after that plug it in btgym.
    What data you are feeding in?

  2. There is actually no matter Forex or not, matter is data input format. Built-in data parser was made to accept data from particular source. You can find parsing configuration in
    https://github.com/Kismuz/btgym/blob/master/btgym/datafeed.py
    under CSV to Pandas Params section

@AdrianP-
Copy link
Contributor Author

This happen also with DAT_ASCII_EURUSD_M1_2016.csv.

  1. In fact I add new function to use all data:
    def sequential_dataset(self):

        episode = BTgymDataset(**self.params)
        episode.filename = self.filename
        self.log.debug('Episode filename: <{}>.'.format(episode.filename))
        episode.data = self.data
        return episode

However, there is a misconception. The value in algo-trading isn't in OHLC data, is in the indicators that you can calculate with this data.
In a first view the code is not prepared for the indicators. Have you worked on that?

@Kismuz
Copy link
Owner

Kismuz commented Aug 15, 2017

You can use any indicators or any value calculated over raw OHLC data by subclassing base BTgymStrategy and and defining your own set_datalines(), get_state() and get_reward() methods,

  1. Define all indicators ( as backtrader indicators) in set_datalines(), which is invoked once before episode run;
  2. Define any function of those indicators in get_state() which is invoked upon every env.step() an use output as state observation.
  3. Same for reward shaping.
    The main idea was to incorporate backtrader workflow with all it's host of indicators and any custom functions to define desired state observation presentation. Refer to backtrader docs how to set custom datalines feeds such as indicators.
    Defaults are ( from strategy.py):
   def get_state(self):
        """
        One can override this method,
        defining necessary calculations and return arbitrary shaped tensor.
        It's possible either to compute entire featurized environment state
        or just pass raw price data to RL algorithm featurizer module.
        Note1: 'data' referes to bt.startegy datafeeds and should be treated as such.
        Datafeed Lines that are not default to BTgymStrategy should be explicitly defined in
        define_datalines().
        NOTE: while iterating, ._get_raw_state() method is called just before this one,
        so variable `self.raw_state` is fresh and ready to use.
        """
        return self.raw_state

    def get_reward(self):
        """
        Default reward estimator.
        Computes reward as log utility of current to initial portfolio value ratio.
        Returns scalar <reward, type=float>.
        Same principles as for state composer apply. Can return raw portfolio
        performance statistics or enclose entire reward estimation algorithm.
        """
        return float(np.log(self.stats.broker.value[0] / self.env.broker.startingcash))

Does that answers?

@Kismuz Kismuz closed this as completed Aug 19, 2017
@kfeeeeee
Copy link

You can use any indicators or any value calculated over raw OHLC data by subclassing base BTgymStrategy and and defining your own set_datalines(), get_state() and get_reward() methods,

Define all indicators ( as backtrader indicators) in set_datalines(), which is invoked once before episode run;
Define any function of those indicators in get_state() which is invoked upon every env.step() an use output as state observation.

Could you please explain this in more details? I am not sure how I should actually set those datalines. In my case, the state consists of OHLC plus some calculated indicators that are stored in a csv file. What would be the way to set those data as a dataline and actually feed it (additionally to the raw price state) to the RL algorithm?

Any help is appreciated! Thanks.

@Kismuz Kismuz reopened this Sep 19, 2017
@Kismuz
Copy link
Owner

Kismuz commented Sep 19, 2017

@kfeeeeee,
There are two parts:

  1. the comment you have cited is for the case when you only have OHLC data stored in csv-file and want to compute some extra statistics or features as observation state. In this case(example):
class MyStrategy(BTgymStrategy):
    """
    Example subclass of BT server inner computation strategy.
    """
    
    def __init__(self, **kwargs):
        super(MyStrategy, self).__init__(**kwargs)

       # Use backtrader functions to add four mov.averages (using Open values :
        self.data.sma_4 = btind.SimpleMovingAverage(self.datas[0], period=4)
        self.data.sma_8 = btind.SimpleMovingAverage(self.datas[0], period=8)
        self.data.sma_16 = btind.SimpleMovingAverage(self.datas[0], period=16)
        self.data.sma_32 = btind.SimpleMovingAverage(self.datas[0], period=32)
       
        # Time-embedding dimension shortcut:
       self.dim_0 = self.p.state_shape['raw_state'].shape[0]
       
       # Service sma to correctly get initial embedded values: 
       self.data.dim_sma = btind.SimpleMovingAverage(
            self.datas[0],
            period=(32 + self.dim_0)
        )
        self.data.dim_sma.plotinfo.plot = False

    def get_state(self):
        """
        Overrides default method to compute state observation as  dictionary of two [dim_0, 4] arrays 
        of OHLC prices and mov. averages.
       """              
        x = np.stack(
            [
                np.frombuffer(self.data.sma_4.get(size=self.dim_0)), 
                np.frombuffer(self.data.sma_8.get(size=self.dim_0)), 
                np.frombuffer(self.data.sma_16.get(size=self.dim_0)),
                 np.frombuffer(self.data.sma_32.get(size=self.dim_0)),
            ], 
            axis=-1
        )
      
        self.state['raw_state'] = self.raw_state
                
        self.state['model_input'] =x
        
        return self.state      

Later, when instantiating environment (example):

time_embed_dim = 16
state_shape = {
    'raw_state': spaces.Box(low=-100, high=100, shape=(time_embed_dim, 4)),
    'model_input': spaces.Box(low=-100, high=100, shape=(time_embed_dim, 4)),
}

MyCerebro = bt.Cerebro()

MyCerebro.addstrategy(
    MyStrategy,
    state_shape=state_shape,
    portfolio_actions=('hold', 'buy', 'sell', ),
    drawdown_call=5, # max to loose, in percent of initial cash
    target_call=8,  # max to win, same
    skip_frame=10,
)

etc...




@Kismuz
Copy link
Owner

Kismuz commented Sep 19, 2017

  1. When data file already holds additional stats as separate columns, one need to set correct parsing configuration of given data found in datafeed.py. After that all additional data will be available inside strategy as standard backtrader datalines (self.datas by default) and can be retrieved as described above.

@Kismuz
Copy link
Owner

Kismuz commented Sep 19, 2017

Part 2 example:
Suppose file contains one min. OHLC bars, Volume and two custom indicators. Than:

params = dict(
        # CSV to Pandas params.
        sep=';',
        header=0,
        index_col=0,
        parse_dates=True,
        names=['open', 'high', 'low', 'close', 'volume', 'my_indicator_1', 'my_indicator_2'],

        # Pandas to BT.feeds params:
        timeframe=1,  # 1 minute.
        datetime=0,
        open=1,
        high=2,
        low=3,
        close=4,
        volume=5,
        my_indicator_1=6,
        my_indicator_2=7,

        # Random-sampling params:
        start_weekdays=[0, 1, 2, 3, ],  # Only weekdays from the list will be used for episode start.
        start_00=True,  # Episode start time will be set to first record of the day (usually 00:00).
        episode_len_days=1,  # Maximum episode time duration in days, hours, minutes:
        episode_len_hours=23,
        episode_len_minutes=55,
        time_gap_days=0,  # Maximum data time gap allowed within sample in days, hours. Thereby,
        time_gap_hours=5,  # if set to be < 1 day, samples containing weekends and holidays gaps will 
        be rejected.
    )

MyDataset = BTgymDataset(
    filename='<your_filename.scv>', 
    **params,
)
# Check:

MyDataset.read_csv()
MyDataset.describe()

# Pass it to environment ...

@kfeeeeee
Copy link

Perfect, that is exactly what I needed. Keep up the good work!

@kfeeeeee
Copy link

Sorry for asking again. You wrote

After that all additional data will be available inside strategy as standard backtrader datalines (self.datas by default) and can be retrieved as described above.

However, when I try your code above, my_indicator_1 and my_indicator_2 are not accessible in strategy at all (self.datas) is a list of length 1 and self.datas[0] contains the OHLC data.
Am I missing something?

@Kismuz
Copy link
Owner

Kismuz commented Sep 19, 2017

@kfeeeeee ,
I apologise for giving incorrect answer, it turned out to be tricky.
In short:
I have no working solution for extended datafeeds yet .
In detail:
principally, one can extend OHLC data as described here:
https://www.backtrader.com/docu/extending-a-datafeed.html

But as far as I understand it is correct for generic CSV datafeed, while btgym uses btfeeds.PandasDirectData internally.
After making minor tweaks ( you need to update btgym to get use of it) a'm able to access custom data in code like this:

import backtrader.feeds as btfeeds

class ExtraPandasDirectData(btfeeds.PandasDirectData):
    lines = ('my_id_1', 'my_id_2')  # extra datalines

class ExtraLinesDataset(BTgymDataset):

    def to_btfeed(self):
        """
        Overrides default method to add custom datalines.
        Performs BTgymDataset-->bt.feed conversion.
        Returns bt.datafeed instance.
        """
        try:
            assert not self.data.empty
            btfeed = ExtraPandasDirectData(
                dataname=self.data,
                timeframe=self.timeframe,
                datetime=self.datetime,
                open=self.open,
                high=self.high,
                low=self.low,
                close=self.close,
                volume=self.volume,
                my_id_1=6,  # Same lines
                my_id_2=7,
            )
            btfeed.numrecords = self.data.shape[0]
            return btfeed
    
        except:
            msg = 'BTgymDataset instance holds no data. Hint: forgot to call .read_csv()?'
            self.log.error(msg)
            raise AssertionError(msg)

params = dict(
    # CSV to Pandas params.
    sep=';',
    header=0,
    index_col=0,
    parse_dates=True,
    names=['open', 'high', 'low', 'close', 'volume', 'my_id_1', 'my_id_2'],

    # Pandas to BT.feeds params:
    timeframe=1,  # 1 minute.
    datetime=0,
    open=1,
    high=2,
    low=3,
    close=4,
    volume=5,
    openinterest=-1,
    my_id_1=6,
    my_id_2=7,

    # Random-sampling params:
    # .....omitted for brevity....
)
            
MyDataset = ExtraLinesDataset(filename='my_file.csv', **params)

after that lines are accessible inside strategy as self.data.my_id_1, but for reasons I can't yet understand it contains only nan values. Maybe it is bug in my code, maybe it is related to PandasDirectData structure.

@kfeeeeee
Copy link

@Kismuz
Thanks for your reply. I was able to reproduce the bug with the nan values and I assume it is due to the PandasDirectData. For now (though the docu of backtrader is not clear in my opinion) I think it is only possible to extend the loaded data using the GenericCSVData.

@kfeeeeee
Copy link

After reading this thread:
https://community.backtrader.com/topic/158/how-to-feed-backtrader-alternative-data/4

I found a working solution like this:

class ExtraPandasDirectData(btfeeds.PandasDirectData):
    lines = ('width',)
    params = (
        ('width',2),
    )

    datafields = btfeeds.PandasData.datafields + (['width'])

@Kismuz
Copy link
Owner

Kismuz commented Sep 20, 2017

Aha! Fine and simple.

@joaosalvado10
Copy link

joaosalvado10 commented Dec 21, 2017

Hi @kfeeeeee (@Kismuz )
I used your suggestion and I was capable of including a new feature already presented in my csv file, however when I am not capable of using them in the model. what needs to be done in order to use the new features in the model?
I added a new data channel and I add this new data to the np.stack as weel, finally I changed the shape of external to (time_dim,1,4) which freezes the running.

Thank you

@Kismuz
Copy link
Owner

Kismuz commented Dec 21, 2017

@joaosalvado10,

I added a new data channel and I add this new data to the np.stack as weel, finally I changed the shape of external to (time_dim,1,4) which freezes the running.

  • it is correct and simplest way and it should work. Can you be more specific what errors and terminal output you get? Setting verbose=2 both for environment and launcher can also hint what's going wrong.

@Kismuz Kismuz reopened this Dec 21, 2017
@joaosalvado10
Copy link

joaosalvado10 commented Dec 21, 2017

@Kismuz I am running the Unreal example.
Actually, there is no error but the train never starts it keeps creating master sessions forever.
CheckOut my log file with the verbose=2 on launcher and env.

https://wetransfer.com/downloads/2748b17526fb4f1fc90603df59dafd9f20171221172751/6a1910e00d4a3f5e869700b6176c2e6b20171221172751/e5eb01

@Kismuz
Copy link
Owner

Kismuz commented Dec 21, 2017

Does running your environment manually (doing reset() and step() before putting it in AAC framework) goes correct?
If yes, the most probable cause is error in TF graph execution by one of the workers. It can come muted while doing distributed work: no terminal output, just freezing. It also same behaviour if error comes when defining graph itself but in this case I see this point passed.
The remedy is to include debug check strings all over related files (aac.py/BaseAAC and train.py/env_runner), to see progress going, which I usually do in such cases.

I can only help by replicating error at my workplace with full code in hand.

@joaosalvado10
Copy link

joaosalvado10 commented Dec 21, 2017

Does running your environment manually (doing reset() and step() before putting it in AAC framework) goes correct?

Yes, I performed the reset and the step manually and it goes well.

If yes, the most probable cause is error in TF graph execution by one of the workers. It can come muted while doing distributed work: no terminal output, just freezing

Maybe that is the cause but i cant see any apparent reason for that to happen.
It seems like it is freezing in worker.py on this line of code :

with sv.managed_session(server.target,config=config) as sess,sess.as_Default()

@joaosalvado10
Copy link

Hello, @Kismuz have you managed how to fix this to enable having more than 3 features?
Thank you

@Kismuz
Copy link
Owner

Kismuz commented Dec 27, 2017

@joaosalvado10 ,
no since I don't have your code. I have pushed one of my developer branches here:
https://github.com/Kismuz/btgym/tree/reserach_dev_strat_4_11
Take a look at code for strategies classes #4_7 ... 4_11 in reserarch/strategy_4.py - those use different number of features and work fine.
Here is copy of my working notebook with running setup:
https://github.com/Kismuz/btgym/tree/reserach_dev_strat_4_11/develop_notebook

  • please don't use this branch for regular work since stackedLSTM architecture is still under development and can be unstable.

@joaosalvado10
Copy link

Hello thank for the help, Here is my code.
I am running test_btgym.py which is unreal example.
If you have some time have a look.
https://github.com/joaosalvado10/btgym/tree/master/btgym

Thank you

@Kismuz
Copy link
Owner

Kismuz commented Dec 29, 2017

@joaosalvado10,
to submit code for review, comments or checking it locally follow general git guidelines:

https://help.github.com/categories/collaborating-with-issues-and-pull-requests/

https://gist.github.com/Chaser324/ce0505fbed06b947d962

@joaosalvado10
Copy link

@Kismuz I was preparing everything to do the pull request then after merging all the stuff I realized that the code works now. Probably there was some update that I did not fetch.
Thank you for the help!

@Kismuz Kismuz closed this as completed Jan 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants