Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Scroll time on Docker #38

Closed
emacollins opened this issue Mar 25, 2023 · 15 comments
Closed

[QUESTION] Scroll time on Docker #38

emacollins opened this issue Mar 25, 2023 · 15 comments
Labels
question Further information is requested

Comments

@emacollins
Copy link

emacollins commented Mar 25, 2023

Ask your question
I tried containerizing my script with this package in Docker (Dockerfile below). When it runs, I am able to get user information back, but it seems that the scroll time is not taken into account? When I set a high scroll time running on my host locally, it returns all of a users videos, even if they have a lot. When running the same code on my container, it only returns a fraction of the data (first 30 videos). I am using the data_dump_file (I can see the file size is much smaller on the data file when running through Docker) Any ideas?

# Use an official Python runtime as a parent image
FROM python:3.10-slim

# Set the working directory to /app
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY . /app

# Install any needed packages specified in requirements.txt
RUN pip install --trusted-host pypi.python.org -r requirements.txt

# Make port 80 available to the world outside this container
EXPOSE 80

# Install system dependencies for Playwright
RUN playwright install-deps

# Install Playwright browser dependencies
RUN python -m playwright install

# Run run.py when the container launches
CMD ["python", "run.py"]
@emacollins emacollins added the question Further information is requested label Mar 25, 2023
@Russell-Newton
Copy link
Owner

What version of the library are you using? Also can you share the snippet of code you're using that doesn't seem to be working in Docker?

@emacollins
Copy link
Author

emacollins commented Mar 31, 2023

Hi, I am no longer going the Docker route. But I did run into the same problem (only 30 videos scraped even with a scroll time set), but on my local computer. It was working fine but now seems like no matter how high the scroll_time value is set to, it only get the first 30 videos. I am doing the json dump, and usually the extra videos are in the "extras" field. That is now blank.

I am using version 0.1.11.

I had usually been using scroll times between 10 and 300 sec, and it always seemed to return the extras pages with the full list of videos. Now it is not? Hmmm.

with TikTokAPI(scroll_down_time=scroll_time,navigation_retries=5, navigation_timeout=0, 
                data_dump_file=filename) as api:
            try:
                user_object = api.user(user, video_limit=0)
           except:
                pass



@CarlCochet
Copy link

I have the same issue currently ; can't seem to load more than 30 videos no matter how I setup scroll_down_time, also using version 0.1.11.

@Russell-Newton
Copy link
Owner

Previously, this problem arose due to what seemed to be a bug in Playwright. The fix at that time was to switch the web driver to Firefox, but if you're both having issues, it might mean the issue is presenting itself in Firefox now. I don't have a whole lot of time to address this, being a full time masters student, but I'll try to take a look soon.

@vladisalv
Copy link

Hi! Have the same issue. Tried:

  • driver: firefox/chromium
  • mobile emulate True/False
  • playwright 1.29/1.31/1.33

all these combinations. Nothing works out of the box :(

@vladisalv
Copy link

What can I use for scraping user video stats? Used LightVideo from user model. Can I get it the other way?

@Russell-Newton
Copy link
Owner

What can I use for scraping user video stats? Used LightVideo from user model. Can I get it the other way?

If you have a User object and want to grab data on that user's videos, use the user.videos iterator. Iterating through this will load each video on demand, getting accurate statistics, video info, etc. If all you need are loose statistics, the LightVideos are faster.

Russell-Newton added a commit that referenced this issue May 16, 2023
* #33 - Make sure going to a page only waits for #SIGI-STATE, not load
* #38 - Parameterize scroll down delays
* Rollover to 0.1.12 for future release
@Russell-Newton
Copy link
Owner

Russell-Newton commented May 16, 2023

@emacollins @CarlCochet @vladisalv Please try again with version 0.1.12. I've added new parameters to the API constructors that you can try messing with:

  • scroll_down_delay sets the time (in seconds) before scrolling down is started. This is useful if your network is slow (e.g.: you're running TikTokPy in a Docker container)
  • scroll_down_iter_delay sets the time (in seconds) between scrolls. This can also be useful to tinker with if your network is slow.

I also suggest updating all dependencies.


Use:

scroll_down_delay now defaults to 1 second instead of an implicit 0 seconds. If this does not immediately fix your problems, my suggestions are as follows:

  • Try increasing scroll_down_iter_delay to 0.5 from the default 0.2. This will slow down the scrolling, which could help load the msToken cookie (see Explanation)
  • Try increasing scroll_down_delay to 3. This should also help load the msToken cookie.

Explanation:

Notably, TikTok provides browsers with an msToken cookie, and scrolling down doesn't work until this cookie is provided. If you scroll down too fast, you'll deadlock TikTok. Scrolling down further won't make any more API calls. The only way for this deadlock to be removed is to scroll back up and then back down. TikTokPy scrolls up a bit every other scroll-down, but if the iterative scroll-downs happen too fast, the deadlock might not let up. These two new parameters can alleviate these issues.

@vladisalv
Copy link

Hi @Russell-Newton ! I checked it not in Docker with good internet speed, but it doesn't work. Scraped only 30 videos from 300.

I looked at code, you use evaluate. Maybe use mouse wheel?

@Russell-Newton
Copy link
Owner

Russell-Newton commented May 18, 2023

What values for scroll_down_time, scroll_down_delay, and scroll_down_iter_delay of you have set @vladisalv?

@vladisalv
Copy link

As you suggested above I started with:

  • scroll_down_time: 10
  • scroll_down_delay: 3
  • scroll_down_iter_delay: 0.5

I increased it step by step and finished with these values:

  • scroll_down_time: 120
  • scroll_down_delay: 60
  • scroll_down_iter_delay: 10

But has only 30 videos from more than 300.

As I understand, it scrolled down videos. Because by default I got just 27 videos. So, it scrolls page, but stopped at first iteration pagination.

@Russell-Newton
Copy link
Owner

@vladisalv Please try again on version 0.1.13, if you aren't already using it. I made some changes that should hopefully fix an issue with collecting extra videos.

@vladisalv
Copy link

@Russell-Newton still doesn't work

for clarifying how I use code:

        with TikTokAPI(scroll_down_time=20, scroll_down_delay=5, scroll_down_iter_delay=5) as api:
            user_stat = api.user(self.username, video_limit=1)
            video_count = user_stat.stats.video_count

            videos = []
            scroll_time = 20
            while True:
                print("Scroll time:", scroll_time)
                user = api.user(self.username,
                    scroll_down_time=scroll_time,
                    scroll_down_delay=5,
                    scroll_down_iter_delay=5,
                )
                scroll_time *= 2
                videos.clear()

                for v in user.videos.light_models:
                    videos.append(v)

                print("len of videos:", len(videos))
                print("we are waiting for", video_count)

                if len(videos) == video_count:
                    break

Output:

Scroll time: 20
len of videos: 30                                                                                         
we are waiting for 302                                                                                    
...
Scroll time: 160
len of videos: 30                                                                                         
we are waiting for 302                                                                                    

Also, I got with new 0.1.13 version this exception:

 File ".../.venv/lib/python3.10/site-packages/playwright/_impl/
_connection.py", line 96, in inner_send
    result = next(iter(done)).result()
playwright._impl._api_types.Error: Protocol error (Network.getResponseBody): No data found for resource with given identifier

Russell-Newton added a commit that referenced this issue May 31, 2023
* Add a print statement to indicate when scrolling down fails on user pages
@Russell-Newton
Copy link
Owner

Russell-Newton commented May 31, 2023

@vladisalv could you try again on your system with the 38-post-list-scroll-failure branch's code? Just to help with my debugging.

pip install -U https://github.com/Russell-Newton/TikTokPy.git@38-post-list-scroll-failure

And then you can try something simple like:

with TikTokAPI(scroll_down_time=120) as api:
    api.user("tiktok")

If my hunch is correct, the message Something went wrong should get printed out if you only collect 30 or so videos. If this is the case, that'll give me some more information about what's going wrong so that I may be able to fix it. My hunch is that it's related to this Reddit post: https://www.reddit.com/r/Tiktokhelp/comments/wybfcg/something_went_wrong_error_on_tiktok_web_via/.

Looking at the network logs, it seems like the API requests that attempt to grab the user posts sometimes return with a completely empty body. I'm able to recreate this locally, but it's inconsistent. I suspect I may have to do an overhaul like I suggest in #21 in order to completely fix this issue.

Russell-Newton added a commit that referenced this issue Jun 15, 2023
* Create functions for executing API calls of 4 kinds:
    * comment/list/ - video comments
    * post/item_list/ - user posts
    * challenge/item_list/ - popular videos tagged with a challenge
    * related/item_list/ - videos related to this one

* Opens up potential future resolutions for #35, #38, #40, #43, and #44
@Russell-Newton
Copy link
Owner

I think the changes I've been working on with v0.2 might fix this issue. It could be worth checking out:

pip install -U git+https://github.com/Russell-Newton/TikTokPy.git@v0.2-overhaul

I removed the scrolling parameters, but it should (fingers crossed) work without any API constructor parameters. You should be able to get away with:

with TikTokAPI() as api:
    user = api.user("tiktok")
    for video in user.videos:
        # do something

This should iterate over all of a user's videos. You can limit this using the video_limit parameter in api.user or using the limit method attached to user.videos (for video in user.videos.limit(30)).

@emacollins @CarlCochet @vladisalv If one or all of you could try with the WIP changes, that would be very helpful. It works for me, but it's worth verifying that it works for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants