Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for new Twitter API change #95

Merged
merged 15 commits into from
Jul 9, 2023
Merged

Fix for new Twitter API change #95

merged 15 commits into from
Jul 9, 2023

Conversation

mikelei8291
Copy link
Contributor

Fixes #94.
Almost rewrote the entire networking part. I had some inspirations from HitomaruKonpaku/twspace-crawler (ISC license), but all code was implemented on my on.

Tested locally with two spaces, one recorded and one with master_url provided. All worked as expected.

Things to note:

  1. Guest login no longer supported (Twitter API shutdown).
  2. User email and password login removed (need to rework entire API so I just give up), but supplying a cookies file should be easy so I think it's OK to just abandon this feature.
  3. Due to always using user cookies to access API, the metadata file would always include the user as a listener. (Can be turned off with variable withListeners set to false for the AudioSpaceById endpoint, but all other listeners would be removed too. Not sure if it's worth implementing a toggle.)

Other refactors and fixes:

  1. Some minor refactors to __main__.
  2. Added httpproxy to the protocol whitelist of ffmpeg to support downloading over a proxy.

@mikelei8291
Copy link
Contributor Author

Tested using user screen_name and also worked.

@Ryu1845
Copy link
Collaborator

Ryu1845 commented Jul 2, 2023

Wow, that's great thank you very much! I'll take a look at the code to be sure nothing is amiss and I'll merge if so.

@mikelei8291
Copy link
Contributor Author

Added a dummy API class to help displaying a clearer message when the APIs were called before they were initialized.

I'm also writing documentations for all the classes and methods I've added in this PR.

@BakungaBronson
Copy link

BakungaBronson commented Jul 3, 2023

Probably a dumb ask but could you show and example of the command you used with the --output-cookie-file. I've tried with my Netscape cookie file and I just keep getting. TypeError: 'type' object is not subscriptable. Thank you for the refactor.

@mikelei8291
Copy link
Contributor Author

@BakungaBronson Could you post the full command and log with -v?

Also note that user login with email and password is removed, so you shouldn't be using --output-cookie-file. Instead, you should save your cookies from Twitter in Netscape format and use --input-cookie-file to load it.

That being said, thanks for reminding me to update the command line documentation. I'll update this PR soon.

@michael-borisov
Copy link

Here is an example of how to use the fix from the code:

from twspace_dl.api import API
from twspace_dl import Twspace, TwspaceDL, CookiesLoader

cookies = CookiesLoader.load("twitter.cookies")
API.init_apis(cookies)

twspace = Twspace.from_space_url(url)
twspace_dl = TwspaceDL(twspace, "filename")
twspace_dl.download()

twitter.cookies file I created with extension https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/

@BakungaBronson
Copy link

@BakungaBronson Could you post the full command and log with -v?

Also note that user login with email and password is removed, so you shouldn't be using --output-cookie-file. Instead, you should save your cookies from Twitter in Netscape format and use --input-cookie-file to load it.

That being said, thanks for reminding me to update the command line documentation. I'll update this PR soon.

This helped me fix it actually. I was using --output-cookie-file instead (no idea why).
I installed your fix to my system using pip3 install git+https://github.com/mikelei8291/twspace-dl@twitter-api-fix then I ran twspace_dl -i https://twitter.com/i/spaces/1OyKAVDRjMzGb --input-cookie-file cookie.txt -v with the cookie file I got using https://chrome.google.com/webstore/detail/cookie-editor/hlkenndednhfkekhgcdicdfddnkalmdm and it worked. Thank you.

@alimirjahani7
Copy link

alimirjahani7 commented Jul 4, 2023

Hi
is there any place I could find the format of cookie file and what headers to put in it?
And how long does the cookie file last?
should we update it every day or once in a while?

@BakungaBronson
Copy link

Hi is there any place I could find the format of cookie file and what headers to put in it? And how long does the cookie file last? should we update it every day or once in a while?

You need the Netscape format. You can save it as txt.
If you use the extension I mentioned above you can go to the export icon and choose Netscape.
image

As for the rest of the questions, I'm not too sure.

@alimirjahani7
Copy link

Thanks for your help

@mikelei8291
Copy link
Contributor Author

@michael-borisov Thanks for the quick example and the suggestion of the browser extension to export cookies. Also, you could import the API constant directly from the twspace_dl module, as it was imported in the __init__.py of the module. Like this:

from twspace_dl import API, Twspace, CookiesLoader, TwspaceDL

@alimirjahani7 You could find the specs of the Netscape cookies format here: https://curl.se/docs/http-cookies.html.

However, it is easier to just install a browser extension that exports the cookies in the Netscape format for you. Like what @michael-borisov suggested for Firefox and @BakungaBronson suggested for Chrome, and there's a whole bunch of extensions like these. Just search for "cookies" in the respective extensions store, but be sure to check their background (like whether they were open-source and their reputation), cause the last thing you want when browsing the internet is someone accessing your cookies without your permission.

As for how long the cookies will remain valid, you could actually check that yourself in the "storage" tab of the dev tools. Usually pressing F12 while on the webpage you want to check would open it. On my end, both the auth_token and the ct0 cookies are valid for 5 years since the creation of them. Also, since you are using the same cookies to access Twitter as your browser, the exported cookies will remain valid as long as your account remains logged in in your browser. So that can be an indication of when the exported cookies became invalid.

@mikelei8291
Copy link
Contributor Author

OK, I've just found out that Twitter has put a strict API rate limit on the UserByScreenName endpoint, which is used to retrieve numeric user IDs. This would prevent downloading Twitter Spaces.

However, I've also found out a new endpoint that Twitter uses to retrieve user IDs, so I've implemented this endpoint as a backup for the UserByScreenName endpoint, and it will be used if the primary endpoint got the 429 TOO MANY REQUESTS status code. It is unclear though if the backup endpoint also had a strict rate limit. Guess only time will tell.

Finally, I've removed unsupported command line options from the ArgumentParser so users won't get confused anymore if they used these options by accident. A short option -c has also been added to the long option --input-cookie-file to make it easier to type, given that it is very much a required option now.

I'll update the README.md file to reflect the command line options change in the next commit.

@mikelei8291
Copy link
Contributor Author

I've updated the README.md file and the ArgumentParser, as well as properly implemented the cookies validation method. Tested locally and all worked as expected.

This PR is now complete and ready to merge once you've finished your review, unless Twitter changed their API again, but that could be fixed in another PR.

@Ryu1845
Copy link
Collaborator

Ryu1845 commented Jul 8, 2023

Alright! Thank you so much for your hard work.

Copy link
Collaborator

@Ryu1845 Ryu1845 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is high quality code and there's not much to say about it.

Copy link
Collaborator

@Ryu1845 Ryu1845 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been linked in the previous comment oops.

twspace_dl/__main__.py Outdated Show resolved Hide resolved
twspace_dl/cookies.py Outdated Show resolved Hide resolved
@mikelei8291
Copy link
Contributor Author

Seems that Twitter also has a strict rate limit on the /fleets/v1/avatar_content endpoint. That's another issue to fix. I need to properly implement a rate limit detection mechanism for these endpoints, but I'll open a new PR for it.

@mikelei8291 mikelei8291 requested a review from Ryu1845 July 9, 2023 15:10
Copy link
Collaborator

@Ryu1845 Ryu1845 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK looks good to me!

@Ryu1845
Copy link
Collaborator

Ryu1845 commented Jul 9, 2023

@mikelei8291 Is it okay for me to merge?

@mikelei8291
Copy link
Contributor Author

Yes, please merge it. I'll address the rate limit issue in a new PR, but it may take some time.

@Ryu1845 Ryu1845 merged commit 1245fb9 into HoloArchivists:main Jul 9, 2023
2 checks passed
@Ryu1845
Copy link
Collaborator

Ryu1845 commented Jul 9, 2023

Thank you again

@mikelei8291
Copy link
Contributor Author

Thank you too for the review and this awesome project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Twitter API change again?
5 participants