Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved data module #127

Merged
merged 43 commits into from
Feb 15, 2024
Merged

Improved data module #127

merged 43 commits into from
Feb 15, 2024

Conversation

enzbus
Copy link
Collaborator

@enzbus enzbus commented Jan 22, 2024

Better pull request (replaces #125)

  • refactor data.py in data/ module (keeping git history!)
  • yahoo finance inherits from ohlcv class
  • data cleaning factored out
  • better data cleaning, first use adj close prices which are more accurate then convert to open to open total returns
  • use filtering based on absolute val of logreturn vs median abs val in window around it, both for close to close and open to close, ...

@enzbus enzbus merged commit 67e1917 into master Feb 15, 2024
16 checks passed
enzbus added a commit that referenced this pull request Feb 17, 2024
Few changes to the SymbolData and YahooFinance classes following PR #127 (improved data module); lots of new testcases. There was an incident yesterday with one of the larger example strategies, online update of a symbol failed to correctly ffill. This should have been fixed. Cleaning and filling on update of YahooFinance has been improved in various ways and tested much more. Note that this is specific to *updating* already downloaded data, not downloading from scratch (that was already tested with #127).
enzbus added a commit that referenced this pull request Feb 21, 2024
This minor release contains some new features, as expected under semantic versioning. The two major pull requests merged are #126 and #127.

We significantly enhanced the built-in forecaster classes, like HistoricalMeanReturn and HistoricalCovariance, which are used by Cvxportfolio objects (ReturnsForecast and FullCovariance, for example) as default forecasters. These now take new “rolling” and “half_life” parameters, used to specify the length of the historical period used for estimation (by default, all history available at each point in time) and the half-life of exponential smoothing applied (by default, no smoothing), respectively. These apply to all forecasters. Internally, these more complex estimations are done with minimal extra computation (by updating, at each point in time, the estimation from the previous step). These parameters can be specified by instantiating the forecasters explicitely before passing them to each Cvxportfolio object (as is explained in the documentation).

Thanks to the improved forecasters, we enhanced the trading cost classes TransactionCost and HoldingCost, clearing up some minor discrepancies with their original (2016) implementation. For TransactionCost, most notably, it is now possible to provide all forecasts without relying on internal forecasters. This will make it easier to translate the code of the original examples to the stable API. The changes are backward compatible with the documented interfaces of the previous versions. One interesting change we introduced on the cost objects is that now the same code paths are used to compute the cost values in simulation and optimization, further improving safety and auditability of the library.

Finally, we re-wrote the data cleaning and data quality check code applied by the YahooFinance default data interface to US and international stock market data. This has been partially factored out in a new base class for open-low-high-close-volume data, which we expect to use for other (future) data interfaces. The cleaning is done by first removing impossible observations for stock market data (non-positive prices, …), then removing unlikely data (prices that imply 100x returns, …) with threshold based-testing (where all thresholds can be modified by the user), and finally by filling missing values being careful to avoid look-ahead biases (open prices that are missing get filled with close prices from the day before, …). A new example, “data_cleaning.py”, can be used to see what exactly is being done on each given name. Logs are also highly informative. All this is thoroughly tested both by the unit tests and by the example strategies that are run on every trading day a few minutes after the open time of both the US and now also international stock markets: we added a daily strategy that is run on the FTSE100 universe of the London stock market.

Other minor edits and bug fixes are also present.
@enzbus enzbus deleted the improved_data_module branch July 13, 2024 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant