Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant time delays in get_historical_data() #80

Closed
cianryan09 opened this issue Nov 23, 2023 · 9 comments
Closed

Significant time delays in get_historical_data() #80

cianryan09 opened this issue Nov 23, 2023 · 9 comments

Comments

@cianryan09
Copy link

Hi,

I am having significant speed issues when running the get_historical_data() function.
The Toolkit works fine but running the above even for three stocks, takes ~135 seconds. Running for 100 takes 200 seconds. If I run for a large amount of stocks (say the 9500 or so of the large, mid and small cap stocks in the financedatabase module) the code always crashes and I get a long list of exception errors in multiple threads.

I ran CProfile, and no specific line of code seems to cause the backlog (tottime all < 0.001).

Does anyone have any idea on how to what could be the cause of this and stop errors from occurring when running for large numbers of stocks? Code snippet below:

`companies = Toolkit(
tickers=ticker_list,
start_date = date,
api_key="xxxxxx",
)

hist_data = companies.get_historical_data()`

@JerBouma
Copy link
Owner

JerBouma commented Nov 23, 2023

What FinancialModelingPrep are you on? I've built in a wait timer when you reach your rate limit per minute. E.g. the Starter plan has a limit of 250 per minute. Most likely you've hit that and then when you try to run for a few companies straight after it waits again.

The downside of collecting data from FMP for historical data is that I do two API calls per company given that I obtain both the market data and dividends. I can build in an argument for that as well to exclude dividends if desired.

To overcome this, you'd have to upgrade your plan sadly. If this doesn't seem to be the issue please let me know and provide all the errors you get!

By the way, I am aware Yahoo Finance also has the historical data but you will get rate limited just as quickly with them which requires a much longer wait time once you do.

PS: I can make this more obvious by providing a print statement. Does that make sense to you?

EDIT: For thousands of companies, you are stretching the limit of my package. I suggest dividing it up in groups but do provide me with the errors you get!

@JerBouma
Copy link
Owner

JerBouma commented Dec 4, 2023

Hi @cianryan09, this issue should now be resolved with the release of v1.6.1 which will automatically disable the wait timer that you are having issues with in this case. See: https://github.com/JerBouma/FinanceToolkit/releases/tag/v1.6.3

Please let me know if this doesn't solve the issue.

@JerBouma JerBouma closed this as completed Dec 4, 2023
@cianryan09
Copy link
Author

Thank you Jeroen for the reply and apologies for the delay in getting back to your original response. I am currently on the free plan just to see what it's like. Does 1 stock = 1 call to the API or do all stocks in the 'tickers' variable count as one call? E.g. if I have 250 tickers in the ticker list and run Toolkit, does that exhaust my calls for the day?

I also updated to 1.6.3 and it did not seem to result in any speed improvements but that may just be purely due to the call limits described above.

@JerBouma
Copy link
Owner

JerBouma commented Dec 4, 2023

Let's say you input TSLA, AAPL and MSFT into the Toolkit. Every time you call a function that collects data, it will costs about 3 API calls. So for balance sheets, income statements, cash flow statements, historical data and more. So for example if you call Balance + Income + Cash Flow that's 9 API calls.

I don't fully understand how it takes so long for you to collect data as it should be almost instantly. For example I am using a Free key here and it takes less than 3 seconds. Could you elaborate further?

image

@cianryan09
Copy link
Author

cianryan09 commented Dec 4, 2023

good to know, thanks. This is the code I have run and I timed it:
image
image
(The date is just today's date five years ago)
so the same three stocks are taking me nearly 9 seconds.

Is it possibly to do with how my environment is set up?
I set up a new conda environment with python 3.10 and pip installed financetoolkit. I am running the above a script from a file saved locally on VS code. Not really a coding expert so excuse me if the question is basic or off-topic.

EDIT: it also seems the API limit on the free plan is 250 / DAY and not minute, which probably explains the crashes happening with the tickers list gets into the hundreds
image

@JerBouma
Copy link
Owner

JerBouma commented Dec 4, 2023

Does it change anything if you define the start_time right before the hist_data? I can't imagine it being that long. The environment you are using is fine, it shouldn't be an issue there.

The API limit should not be an issue as I've made sure that once you hit the limit it will just tell you no data could be collected instead of giving you errors or letting you wait.

@cianryan09
Copy link
Author

cianryan09 commented Dec 4, 2023

I tried the above but it made almost no difference. I also upgraded to one of the paid plans and still no speed improvement. And its nothing else in my script causing the slowdown - if I comment out the .get_historical() line it drops to 2 seconds.
It's hardly a hardware issue? Maybe all the threading uses a lot of memory/CPU that my older equipment can't handle?

@JerBouma
Copy link
Owner

JerBouma commented Dec 5, 2023

Hi! I am expecting this to be a hardware issue or networking issue. What you can try is using Google Colab which already cuts it down to 5 seconds. For other components this can be as little as 2-3 seconds in Google Colab. If it doesn't for you, then its 100% a network issue.

image

@cianryan09
Copy link
Author

Hi - using Colab does cut the time down by a lot. 1000 tickers takes around 2 minutes. Thank you for the suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants