Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set cookies and crumb #205

Merged
merged 18 commits into from
Oct 28, 2023
Merged

Set cookies and crumb #205

merged 18 commits into from
Oct 28, 2023

Conversation

dpguthrie
Copy link
Owner

  • Initialize session with required cookies
  • Create larger list of rotating user-agents with related headers
  • Added webdriver-manager for selenium functionality
  • Changed from flit to poetry
  • Added additional package files that will help with testing and development
  • Added function in base client class to retrieve a crumb and then add as a query parameter

@ValueRaider
Copy link

ValueRaider commented Jul 17, 2023

I quickly tried this but get exception:
yq.Ticker('AMZN', username='redacted', password='redacted')

File ".../yahooquery/yahooquery/login.py", line 27, in init
self.driver = webdriver.Chrome(
^^^^^^^^^^^^^^^^^
TypeError: WebDriver.init() got an unexpected keyword argument 'service'

selenium=3.141.0 webdriver_manager=3.8.6

@dpguthrie
Copy link
Owner Author

dpguthrie commented Jul 17, 2023

selenium=3.141.0 webdriver_manager=3.8.6

@ValueRaider Can you try upgrading selenium?

@ValueRaider
Copy link

ValueRaider commented Jul 17, 2023

Selenium 4 just generates a different exception:

requests.exceptions.RetryError: HTTPSConnectionPool(host='query2.finance.yahoo.com', port=443): Max retries exceeded with url: /v1/test/getcrumb (Caused by ResponseError('too many 500 error responses'))

I prefer v3 because calls with v4 hang for 2 minutes, I've no idea why

@dpguthrie
Copy link
Owner Author

Hmm this looks like it’s no longer a problem with selenium but a problem now with getting the crumb from that endpoint after you’ve logged in.

@dpguthrie
Copy link
Owner Author

Do you get the same error when not passing in a username and password?

@ValueRaider
Copy link

Without username & password, same "max retries" error and now with either version of Selenium. Note: does not happen with yahooquery 2.3.2 (latest official)

yahooquery/base.py Outdated Show resolved Hide resolved
@dpguthrie
Copy link
Owner Author

@ValueRaider I wasn't able to reproduce your error. I did change the code slightly though so it may be working for you now. I also deployed it temporarily to the streamlit app where it seems to be working again.

@ValueRaider
Copy link

Same behaviour, "Max retries" errors. What are other people experiencing?

@dpguthrie
Copy link
Owner Author

Same behaviour, "Max retries" errors. What are other people experiencing?

That’s weird. I have it working locally and I have this code running in the streamlit app as well - https://yahooquery.streamlit.app/

@dpguthrie
Copy link
Owner Author

@ValueRaider I’m wondering if it has something to do with where you’re based. I’m naively navigating to finance.yahoo.com to retrieve cookies - but I believe you’re based in the UK and would actually navigate to uk.finance.yahoo.com. Curious if you pulled this code down and just changed this if it would start working for you.

@ValueRaider
Copy link

ValueRaider commented Jul 20, 2023

I don't quite understand where I would make the change. But to check if worth investigating, I logged in with Firefox Debugger active and don't see UK-specific URLs, but the same as you e.g. https://query2.finance.yahoo.com/v1/test/getcrumb

EDIT: let me check header for anything UK-specific ...
EDIT: Some progress. Copying over my headers from browser solves the exception, but crumb is empty string

def setup_session_with_cookies_and_crumb(session: Session):
headers = {**random.choice(HEADERS), **addl_headers}
session.headers = headers
response = session.get('https://finance.yahoo.com', hooks={'response': get_crumb})
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ValueRaider This is the place where you would modify to (I'm guessing) https://uk.finance.yahoo.com

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also be curious what headers you're seeing on your side when that request is being made.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding 'uk' = no difference.

But, adding key headers['Host'] = 'query1.finance.yahoo.com' did address exception. Crumb an empty string, maybe because I'm not logging in? I can't login because I updated Chrome yesterday and webdriver complaining it's too new, but that's my problem.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it has something to do with accepting their cookies. I tried navigating to the uk site and got their consent screen.

image

I’m guessing without accepting that then YF is unable to actually set the appropriate cookies.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this code doesn’t use selenium. It’s only used to login to YF (and not even sure that’s working right now as it looks like they’ve added a captcha)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getcrumb response.text contains string "HTTP Status 403 - Forbidden". Is that typical of Yahoo expecting a cookie?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense if the session making that request isn't set up with the appropriate cookies. Those cookies are supposed to be set when making the initial request to the YF home page.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry wrong part of code, that 403 happens in setup_session_with_cookies_and_crumb() on https://finance.yahoo.com

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really surprising! I'm not sure what would make that a forbidden request just to the YF home page.

@cmjordan42
Copy link

Hey y'all, interested in this one as I currently use all_modules in an occasional batch run that computes a ton of single security analytics. Sounds like we're still pretty far from having a solution we like here, and that maybe Yahoo is deliberately discouraging it. Given that, I'm probably gonna switch over to a bunch of individual targeted calls to get what's in all_modules on Sunday.

@dpguthrie
Copy link
Owner Author

Hey y'all, interested in this one as I currently use all_modules in an occasional batch run that computes a ton of single security analytics. Sounds like we're still pretty far from having a solution we like here, and that maybe Yahoo is deliberately discouraging it. Given that, I'm probably gonna switch over to a bunch of individual targeted calls to get what's in all_modules on Sunday.

@cmjordan42 Where are you based? My somewhat uneducated theory right now is that this would work for anyone based in the U.S, which is why it works for me and it works inside of the streamlit app. If you are U.S. based, you could take this branch for a spin and see if you're able to access the all_modules property in that way.

Another somewhat uneducated guess - I think the solution to this is to almost have country/region specific headers that map to what they would be if that person navigated to the YF home page on a browser. I think what's happening right now is instead of going directly to the home page, it first navigates to consent.yahoo.com and has you accept cookies, which you can't do using simply requests.

@cmjordan42
Copy link

I'm in US - EST. I'd be happy to test drive if it would be helpful.

@cmjordan42
Copy link

Just tested with this branch, it appears to work fine. Definitely an improvement even if some regions have more complicated issues. May as well publish it to get all_modules working for some people.

However, it occasionally fails with something a response along the lines of:

"For input string: "-91000.0000000002""

@cmjordan42
Copy link

And that appears to be transient; if you request the same security again, it may work. Not sure if that transience is on the yahooquery side or the Yahoo side.

@dpguthrie
Copy link
Owner Author

@ValueRaider, @cmjordan42 Would you mind giving this another go? Made some changes last night that hopefully both handles errors but also provides a fallback option with selenium (if it's installed) to retrieve cookies/crumb.

@ValueRaider
Copy link

No change, and I tried changing URL and header parameters. These are my browser headers btw:

{
    "Host": "query1.finance.yahoo.com",
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0",
    "Accept": "*/*",
    "Accept-Language": "en-GB,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br",
    "Content-Type": "text/plain",
    "Origin": "https://finance.yahoo.com",
    "Referer": "https://finance.yahoo.com/"
}

@dpguthrie
Copy link
Owner Author

dpguthrie commented Jul 24, 2023

@ValueRaider Is this where the error is happening (the explicit request to retrieve the crumb)?

@ValueRaider
Copy link

That line returns an empty string. This line returns 403 Forbidden.

@ValueRaider
Copy link

ValueRaider commented Aug 7, 2023

Does this work for you in USA? https://stackoverflow.com/questions/76065035/yahoo-finance-v7-api-now-requiring-cookies-python
Works for me in Europe.

@dpguthrie dpguthrie merged commit 5da0b49 into master Oct 28, 2023
1 check passed
@dpguthrie dpguthrie deleted the feat/set-cookie-and-crumb branch October 28, 2023 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants