-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUESTION] Scroll time on Docker #38
Comments
What version of the library are you using? Also can you share the snippet of code you're using that doesn't seem to be working in Docker? |
Hi, I am no longer going the Docker route. But I did run into the same problem (only 30 videos scraped even with a scroll time set), but on my local computer. It was working fine but now seems like no matter how high the scroll_time value is set to, it only get the first 30 videos. I am doing the json dump, and usually the extra videos are in the "extras" field. That is now blank. I am using version 0.1.11. I had usually been using scroll times between 10 and 300 sec, and it always seemed to return the extras pages with the full list of videos. Now it is not? Hmmm.
|
I have the same issue currently ; can't seem to load more than 30 videos no matter how I setup |
Previously, this problem arose due to what seemed to be a bug in Playwright. The fix at that time was to switch the web driver to Firefox, but if you're both having issues, it might mean the issue is presenting itself in Firefox now. I don't have a whole lot of time to address this, being a full time masters student, but I'll try to take a look soon. |
Hi! Have the same issue. Tried:
all these combinations. Nothing works out of the box :( |
What can I use for scraping user video stats? Used LightVideo from user model. Can I get it the other way? |
If you have a User object and want to grab data on that user's videos, use the |
@emacollins @CarlCochet @vladisalv Please try again with version 0.1.12. I've added new parameters to the API constructors that you can try messing with:
I also suggest updating all dependencies. Use:
Explanation: Notably, TikTok provides browsers with an msToken cookie, and scrolling down doesn't work until this cookie is provided. If you scroll down too fast, you'll deadlock TikTok. Scrolling down further won't make any more API calls. The only way for this deadlock to be removed is to scroll back up and then back down. TikTokPy scrolls up a bit every other scroll-down, but if the iterative scroll-downs happen too fast, the deadlock might not let up. These two new parameters can alleviate these issues. |
Hi @Russell-Newton ! I checked it not in Docker with good internet speed, but it doesn't work. Scraped only 30 videos from 300. I looked at code, you use evaluate. Maybe use mouse wheel? |
What values for |
As you suggested above I started with:
I increased it step by step and finished with these values:
But has only 30 videos from more than 300. As I understand, it scrolled down videos. Because by default I got just 27 videos. So, it scrolls page, but stopped at first iteration pagination. |
@vladisalv Please try again on version 0.1.13, if you aren't already using it. I made some changes that should hopefully fix an issue with collecting extra videos. |
@Russell-Newton still doesn't work for clarifying how I use code:
Output:
Also, I got with new 0.1.13 version this exception:
|
* Add a print statement to indicate when scrolling down fails on user pages
@vladisalv could you try again on your system with the pip install -U https://github.com/Russell-Newton/TikTokPy.git@38-post-list-scroll-failure And then you can try something simple like: with TikTokAPI(scroll_down_time=120) as api:
api.user("tiktok") If my hunch is correct, the message Looking at the network logs, it seems like the API requests that attempt to grab the user posts sometimes return with a completely empty body. I'm able to recreate this locally, but it's inconsistent. I suspect I may have to do an overhaul like I suggest in #21 in order to completely fix this issue. |
* Create functions for executing API calls of 4 kinds: * comment/list/ - video comments * post/item_list/ - user posts * challenge/item_list/ - popular videos tagged with a challenge * related/item_list/ - videos related to this one * Opens up potential future resolutions for #35, #38, #40, #43, and #44
I think the changes I've been working on with v0.2 might fix this issue. It could be worth checking out: pip install -U git+https://github.com/Russell-Newton/TikTokPy.git@v0.2-overhaul I removed the scrolling parameters, but it should (fingers crossed) work without any API constructor parameters. You should be able to get away with: with TikTokAPI() as api:
user = api.user("tiktok")
for video in user.videos:
# do something This should iterate over all of a user's videos. You can limit this using the @emacollins @CarlCochet @vladisalv If one or all of you could try with the WIP changes, that would be very helpful. It works for me, but it's worth verifying that it works for you. |
Ask your question
I tried containerizing my script with this package in Docker (Dockerfile below). When it runs, I am able to get user information back, but it seems that the scroll time is not taken into account? When I set a high scroll time running on my host locally, it returns all of a users videos, even if they have a lot. When running the same code on my container, it only returns a fraction of the data (first 30 videos). I am using the data_dump_file (I can see the file size is much smaller on the data file when running through Docker) Any ideas?
The text was updated successfully, but these errors were encountered: