Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More KeyStats #31

Open
Phishcook opened this issue Jan 15, 2020 · 1 comment
Open

More KeyStats #31

Phishcook opened this issue Jan 15, 2020 · 1 comment

Comments

@Phishcook
Copy link

Hi! I just cloned your project and am messing around with it. Though I am an experienced software engineer, I am new to machine learning so feel free to tell me my insights are incorrect!

After reading the code I noticed prediction modeling heavily relies on the KeyStats, however data is extremely limited. Would it not be SUPER beneficial to back fill this data with a record per quarter (the provided data is very erratic, yet most 'feature' data points are provided be the company every quarter).

In addition to this, a cron or a simple get_missing_quartly_keystats.py script that can be invoked on demand to fill in new stats to accommodate longevity and modern accuracy of this project would help this project modeling become more accurate (more data sets), but also bring it closer to becoming a practical live use tool.

Most of the historical quarterly features data points can be found directly or through calculations on https://www.macrotrends.net/. Example: https://www.macrotrends.net/stocks/charts/GNW/genworth-financial/financial-statements

There are many categories with sub categories that can most likely be scraped and parsed. For example, the full historical market cap chart served here: https://www.macrotrends.net/stocks/charts/GNW/genworth-financial/market-cap
can be parsed out as in the html is a <script> tag that defines var chartData with all the values by date.

between the balance sheets and financial records they provide you may even find other influential data points to add to the ML portion of this script.

Let me know what you think, or if my logic is simply way off. If you think it is a good Idea I can help out with refactoring!

@robertmartin8
Copy link
Owner

Hi,

You have struck upon the core issue when it comes to financial data science – data availability. I fully agree that this current collection of keystats data is not great. This project is meant to be a starting point for people to see a complete machine learning pipeline applied to investing.

Good find regarding macrotrends – the data looks pretty good! If you submit a PR with a scraper I'd be more than happy to merge it and credit you in the readme.

Best,
Robert

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants