Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data sources aside from Yahoo #258

Open
cmjordan42 opened this issue Jan 5, 2024 · 10 comments
Open

Data sources aside from Yahoo #258

cmjordan42 opened this issue Jan 5, 2024 · 10 comments

Comments

@cmjordan42
Copy link

@dpguthrie and I briefly discussed this prospect with @ValueRaider a while back and I find myself considering it again. The quality of Y!Finance data has really gone off of a cliff in the past 6 months. Quarterly earnings data is now very often missing and even tickers with billions in market cap are missing swaths of data. It seems that Yahoo's dedication to their finance product is waning.

Have you guys or anyone else considered expanding yahooquery (at the time I think we were talking about yfinance) to other sources such as Marketwatch? Their data seems to be far more consistent, which makes sense since it's owned by Dow Jones who actually does this as a business.

I suppose the name would have to change from yahooquery ;)

@cmjordan42 cmjordan42 changed the title Data sources asides from Yahoo Data sources aside from Yahoo Jan 5, 2024
@RudyNL
Copy link

RudyNL commented Jan 7, 2024

One of the advantages of Yahoo Finance is that it is covering a major part of the European stock exchanges. For me I am also interested in Helsinki, Oslo, Stockholm, Copenhagen, Brussels, Amsterdam, London, Paris, Milan, Madrid, Zürich, Vienna, Frankfurt and DAX. An all American solution isn't a solution for me. The main advantage of Yahoo Finance is the world-wide coverage of the financial markets.

@cmjordan42
Copy link
Author

Good to know. I'm a bit confused though - are you saying that MarketWatch doesn't cover those domains? I'm not very active outside of the US exchanges so maybe I'm misunderstanding, but MW seems to cover a lot of the world.

@RudyNL
Copy link

RudyNL commented Jan 9, 2024

At my first trial I landed here [Marketwatch][https://store.marketwatch.com] and didn't find a way out without payment. Today I found an alternative link [Marketwatch][https://www.marketwatch.com/] which is giving sufficient access. I checked a number of stocks at different exchanges and MarketWatch is looking fine.

@ms82494
Copy link

ms82494 commented Jan 15, 2024

I have a Yahoo Finance Plus subscription, but don't currently plan on renewing it. My number one dissatisfaction is the fragility of the community-supplied APIs, thanks to the seemingly random changes introduced by Yahoo's product management.

I have tried FinancialModelingPrep.com. They have a lot of data, at a very compelling price, but there's no quality assurance whatsoever. The poor data quality made it unusable.

I am currently testing tikr.com, which doesn't have a proprietary API but a couple of Github projects that offer community-supplied APIs. So far, so good. It's paid but they do offer a nice amount of data for the price: financials, estimates, earnings surprises, company guidance (if available), segment information, and earnings call transcripts.

There's also QuickFS (quickfs.net), which seems really promising as it offers an API with both a free and a paid service level and reasonable rate limits. But the founder, who used to promote the service in various subreddits, seems to have gone MIA. I've also reached out to their support and never received any answers. Since I don't want to deal with abandonware projects I decided against subscribing, but if anyone else has had a good experience with QuickFS I'd be open to give that one a try.

All three of these services offer data from various international exchanges.

@cmjordan42
Copy link
Author

cmjordan42 commented Jan 15, 2024

I have very minimal issues with yahooquery. I previously used yfinance which was a lot more susceptible to Yahoo's instability, but the standard distributed systems defensive measures I wrapped around yahooquery make it pretty rock solid. For example, I have not had to tweak it for over 6 months.

The issue that has me at my wit's end is agnostic to the API. If I pull data for a symbol from the API it shows me what's on Yahoo's website UI... which is just missing loads of data, particularly on quarterly/annual earnings/financial reports. I'm curious @ms82494 - with your YF Plus subscription, do you NOT see missing data for recent periods of loads of common symbols? Perhaps they're deliberately withholding data from the non-Plus folks.

https://finance.yahoo.com/quote/AAPL/financials?p=AAPL
image

@ms82494
Copy link

ms82494 commented Jan 16, 2024

@cmjordan42: I'd really be curious what "defensive measures" have insulated you from the mayhem during last October/November. Even now there are a lot of users in Europe complaining about inability to access yahooquery. I was certainly affected at times, both with yahooquery and yfinance. I don't mean this as a criticism of @dpguthrie or @ValueRaider, to whom I am immensely grateful. I just think that the adversarial attitude that Yahoo management takes to programmatic users (whether they are paid or free users) is really impacting me.

That said, Yahoo Finance data quality is second to none, imo. The financials are sourced from Morningstar, and those guys do a good job. The financials arrive timely, are highly detailed, and they provide separate charts of accounts for banks and insurance companies. And they go back to the dawn of time (in Apple's case 1985). I have seen more issues with (expensive!) financial statements from Mergent (part of FTSE/Russell group) than Morningstar financials provided by Yahoo Finance. The tikr.com financials, while also good, don't have the level of detail that Yahoo provides.

To answer your question about missing data with Yahoo financials with the YF+ service: I haven't noticed any issues, unless you go VERY far back in time. So, in Apple's case, the Net Income line is missing for quarterly reports prior to 1989-12-31. But that's probably due to different line item labels being used in early reports that predate the creation of SEC Edgar. There's no missing data for more recent reports. See below:

Type 'copyright', 'credits' or 'license' for more information
IPython 8.18.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import yahooquery as yq

In [2]: import os, operator

In [3]: yq.__version__
Out[3]: '2.3.7'

In [4]: YUSER, YPASS = operator.itemgetter('YUSER','YPASS')(os.environ)

In [5]: yqclient = yq.Ticker('AAPL', username=YUSER, password=YPASS)

In [6]: yqclient.p_get_financial_data(types=['TotalRevenue', 'NetIncome'], frequency='q', t
   ...: railing=False)
Out[6]: 
         asOfDate periodType currencyCode     NetIncome  TotalRevenue
symbol                                                               
AAPL   1985-09-30         3M          USD           NaN  4.097000e+08
AAPL   1985-12-31         3M          USD           NaN  5.339000e+08
AAPL   1986-03-31         3M          USD           NaN  4.089000e+08
AAPL   1986-06-30         3M          USD           NaN  4.483000e+08
AAPL   1986-09-30         3M          USD           NaN  5.108000e+08
...           ...        ...          ...           ...           ...
AAPL   2022-09-30         3M          USD  2.072100e+10  9.014600e+10
AAPL   2022-12-31         3M          USD  2.999800e+10  1.171540e+11
AAPL   2023-03-31         3M          USD  2.416000e+10  9.483600e+10
AAPL   2023-06-30         3M          USD  1.988100e+10  8.179700e+10
AAPL   2023-09-30         3M          USD  2.295600e+10  8.949800e+10

[153 rows x 5 columns]

@cmjordan42
Copy link
Author

cmjordan42 commented Jan 16, 2024

Thanks. You made me dig into other API calls to see if they match. It turns out that Yahoo just has bugs in how it forms and interprets the JSON which their web UI pulls from - in some cases, that's my data source. For example,

yq.Ticker('AAPL').get_financial_data(frequency='q', types=['NetIncome','TotalRevenue'], trailing=False)
         asOfDate periodType currencyCode     NetIncome  TotalRevenue
symbol
AAPL   2022-12-31         3M          USD  2.999800e+10  1.171540e+11
AAPL   2023-03-31         3M          USD  2.416000e+10  9.483600e+10
AAPL   2023-06-30         3M          USD  1.988100e+10  8.179700e+10    <-
AAPL   2023-09-30         3M          USD  2.295600e+10  8.949800e+10

yq.Ticker('AAPL').earnings

{'AAPL': ... 'quarterly': [
{'date': '4Q2022', 'revenue': 117154000000, 'earnings': 29998000000}, 
{'date': '2Q2023', 'revenue': 94836000000, 'earnings': 24160000000}, 
{'date': '3Q2023', 'revenue': 81797000000, 'earnings': 19881000000},   <- 3Q2023???????
{'date': '3Q2023', 'revenue': 89498000000, 'earnings': 22956000000}]},  
'financialCurrency': 'USD'}}

In forming the JSON, they're listing 3Q2023 twice and then their UI is ignoring the second (correct) one and displaying the first (incorrect) one which is actually the prior quarter, and it cascades making all of the data incorrect. When I reported the data quality issues to Yahoo they ignored me. What a joke that this bug exists and has presumably existed for months since I've seen these missing quarterlies emerging for quite awhile. I'll just switch off of all of their JSON format APIs in favor of the tabular DataFrame formats which seem to be correct.

I'll again note that not only does MarketWatch have all of this data, but if you want to look at MarketWatch's UI it also is correct. YF seems to have a competent backend team and an incompetent frontend team; MW seems to be competent across the board.

@cmjordan42
Copy link
Author

@ms82494 Regarding defensive measures... first, I wish there was GH messaging, since we're hijacking this thread talking about API stability.

There are several things that I do to improve stability and shield my side from the Yahoo side:

  • Federate my retrieval across multiple machines - yahooquery calls are distributed across multiple machines / IPs to reduce the likelihood of one of my agents getting stuck on bad DNS routing, affining to a bad YF host, being identified as a crawler, blocked etc.
  • Throttling - Within each agent, calls to yahooquery are throttled to ensure that an agent never exceeds X / 5 minute sliding time frame.
  • Symbol failure retry - Within each agent, responses to calls to yahooquery are analyzed in aggregate to see if they're blatantly erroneous; if so, wait Y (retry1) and then try it again. If it continues to fail that test, that request is shelved and will be attempted much later (retry2).
  • System failure retry - Within each agent, if it passes the above tests, prior to actually ingesting the data, every individual analytic is checked to see if its within the bounds of what is possible for that analytic. If Z% fail, it gets sent back to retry2.
  • Monitoring - This doesn't help actively remediate, but I also do downstream analysis on error rate of analytics. This is how I initially noticed that a rising % of analytics from two quarters ago were bad. It helps to monitor very broad data quality outside of an individual symbol.

@ValueRaider
Copy link

ValueRaider commented Jan 16, 2024

considered expanding yahooquery to other sources such as Marketwatch?

IMO these YF wrappers should stay focused on YF, just good software design. Modularity. Better to spin up another package for another source, or a "meta" package to combine multiple fetchers - something like OpenBB but without the GUI bloat would be neat.

@ms82494
Copy link

ms82494 commented Jan 16, 2024

@cmjordan42 : Firstly, thank you for the detail on how you ensure robustness for your data gathering from YF. That's definitely much more elaborate than what I do and I appreciate the ideas.

Secondly, on the MW data: For financials, I only see them deliver the most recent five quarterly and annual statements. IMO that's not enough to really figure out seasonality (need more Qs), or cyclicality (need more years). Maybe they have a paid plan that offers more data, but otherwise it wouldn't replace YF+ for me. Detail seems good, though. And kudos for not trying to shoehorn financials into a C&I chart of accounts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants