Improved data module #127

enzbus · 2024-01-22T07:49:03Z

Better pull request (replaces #125)

refactor data.py in data/ module (keeping git history!)
yahoo finance inherits from ohlcv class
data cleaning factored out
better data cleaning, first use adj close prices which are more accurate then convert to open to open total returns
use filtering based on absolute val of logreturn vs median abs val in window around it, both for close to close and open to close, ...

used script git-split.sh #!/bin/sh # used https://stackoverflow.com/questions/3887736/keep-git-history-when-splitting-a-file if [[ $# -ne 2 ]] ; then echo "Usage: git-split.sh original copy" exit 0 fi git mv $1 $2 git commit -n -m "Split history $1 to $2" REV=`git rev-parse HEAD` git reset --hard HEAD^ git mv $1 temp git commit -n -m "Split history $1 to $2" git merge $REV git commit -a -n -m "Split history $1 to $2" git mv temp $1 git commit -n -m "Split history $1 to $2"

test cvxportfolio/tests/test_data.py TestData.test_yfinance_download became fragile, need to understand why

names that get historical data trimmed down are HUBB, JCI, NVR, and seem reasonable

Few changes to the SymbolData and YahooFinance classes following PR #127 (improved data module); lots of new testcases. There was an incident yesterday with one of the larger example strategies, online update of a symbol failed to correctly ffill. This should have been fixed. Cleaning and filling on update of YahooFinance has been improved in various ways and tested much more. Note that this is specific to *updating* already downloaded data, not downloading from scratch (that was already tested with #127).

This minor release contains some new features, as expected under semantic versioning. The two major pull requests merged are #126 and #127. We significantly enhanced the built-in forecaster classes, like HistoricalMeanReturn and HistoricalCovariance, which are used by Cvxportfolio objects (ReturnsForecast and FullCovariance, for example) as default forecasters. These now take new “rolling” and “half_life” parameters, used to specify the length of the historical period used for estimation (by default, all history available at each point in time) and the half-life of exponential smoothing applied (by default, no smoothing), respectively. These apply to all forecasters. Internally, these more complex estimations are done with minimal extra computation (by updating, at each point in time, the estimation from the previous step). These parameters can be specified by instantiating the forecasters explicitely before passing them to each Cvxportfolio object (as is explained in the documentation). Thanks to the improved forecasters, we enhanced the trading cost classes TransactionCost and HoldingCost, clearing up some minor discrepancies with their original (2016) implementation. For TransactionCost, most notably, it is now possible to provide all forecasts without relying on internal forecasters. This will make it easier to translate the code of the original examples to the stable API. The changes are backward compatible with the documented interfaces of the previous versions. One interesting change we introduced on the cost objects is that now the same code paths are used to compute the cost values in simulation and optimization, further improving safety and auditability of the library. Finally, we re-wrote the data cleaning and data quality check code applied by the YahooFinance default data interface to US and international stock market data. This has been partially factored out in a new base class for open-low-high-close-volume data, which we expect to use for other (future) data interfaces. The cleaning is done by first removing impossible observations for stock market data (non-positive prices, …), then removing unlikely data (prices that imply 100x returns, …) with threshold based-testing (where all thresholds can be modified by the user), and finally by filling missing values being careful to avoid look-ahead biases (open prices that are missing get filled with close prices from the day before, …). A new example, “data_cleaning.py”, can be used to see what exactly is being done on each given name. Logs are also highly informative. All this is thoroughly tested both by the unit tests and by the example strategies that are run on every trading day a few minutes after the open time of both the US and now also international stock markets: we added a daily strategy that is run on the FTSE100 universe of the London stock market. Other minor edits and bug fixes are also present.

enzbus added 2 commits January 4, 2024 11:52

started ftse100 strategy

d745b00

data quality check

1ac90d4

enzbus mentioned this pull request Jan 22, 2024

Add example strategies on foreign markets #125

Closed

enzbus added 11 commits January 22, 2024 11:51

Merge branch 'master' into improved_data_module

699035e

Split history cvxportfolio/data.py to cvxportfolio/market_data.py

b45b9d7

Split history cvxportfolio/data.py to cvxportfolio/market_data.py

7988b64

Split history cvxportfolio/data.py to cvxportfolio/market_data.py

e104387

Split history cvxportfolio/data.py to cvxportfolio/market_data.py

4f4fa68

moved both files

ade1db5

Applied changes of commit 564e1fe

87ba504

Applied changes of commit 13f119f

7d86c9c

Applied changes of commit b794c7d

60b1459

git merge master

dcd9863

enzbus mentioned this pull request Jan 29, 2024

Make test suite fail on warnings #130

Closed

enzbus added 5 commits February 7, 2024 11:05

trying different approach for timestamping last row in yahoofinance

24489eb

minor

66e7b90

git merge master

4daaca5

git merge master

78b89e4

symbol_data

788e72e

enzbus mentioned this pull request Feb 10, 2024

Example request - Margin in a different currency #135

Closed

enzbus added 9 commits February 12, 2024 11:25

refactoring _process of OLHCV

96b239a

refactoring

fddac3e

refactoring,

a598854

test cvxportfolio/tests/test_data.py TestData.test_yfinance_download became fragile, need to understand why

more, cleaning needed

af76cb9

some cleaning, adding read_only

f085125

testing

4072d54

basic anomalous cleaning

aea2f76

basic pipeline, needs improvement

b61eb91

better

f4f1f3e

enzbus added 16 commits February 13, 2024 16:28

testing

39e8939

minor

0488b7e

mostly done

40ff3b5

typo

c55ab7c

assertNoLogs not available on py < 3.10

d75b0ac

preload warning on RMS logreturn not abs mean

0aa09be

historical data cleaning

e81932a

removing phony adjcloses and data around them

db7477c

improving cleaning of bad adjcloses, more analysis needed

cde79b5

added more adjclose filtering

26f03cd

typo

7ccdd54

tested on current example universes;

5900968

names that get historical data trimmed down are HUBB, JCI, NVR, and seem reasonable

data cleaning example

572b4df

adjusted log level of cleaning to info

4a207c4

testcase typo

b9d5d01

data cleaning example docs

c9c11ff

enzbus merged commit 67e1917 into master Feb 15, 2024
16 checks passed

enzbus deleted the improved_data_module branch July 13, 2024 19:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved data module #127

Improved data module #127

enzbus commented Jan 22, 2024

Improved data module #127

Improved data module #127

Conversation

enzbus commented Jan 22, 2024