[QUESTION] Scroll time on Docker #38

emacollins · 2023-03-25T16:18:15Z

Ask your question
I tried containerizing my script with this package in Docker (Dockerfile below). When it runs, I am able to get user information back, but it seems that the scroll time is not taken into account? When I set a high scroll time running on my host locally, it returns all of a users videos, even if they have a lot. When running the same code on my container, it only returns a fraction of the data (first 30 videos). I am using the data_dump_file (I can see the file size is much smaller on the data file when running through Docker) Any ideas?

# Use an official Python runtime as a parent image
FROM python:3.10-slim

# Set the working directory to /app
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY . /app

# Install any needed packages specified in requirements.txt
RUN pip install --trusted-host pypi.python.org -r requirements.txt

# Make port 80 available to the world outside this container
EXPOSE 80

# Install system dependencies for Playwright
RUN playwright install-deps

# Install Playwright browser dependencies
RUN python -m playwright install

# Run run.py when the container launches
CMD ["python", "run.py"]

The text was updated successfully, but these errors were encountered:

Russell-Newton · 2023-03-28T15:21:17Z

What version of the library are you using? Also can you share the snippet of code you're using that doesn't seem to be working in Docker?

emacollins · 2023-03-31T21:48:03Z

Hi, I am no longer going the Docker route. But I did run into the same problem (only 30 videos scraped even with a scroll time set), but on my local computer. It was working fine but now seems like no matter how high the scroll_time value is set to, it only get the first 30 videos. I am doing the json dump, and usually the extra videos are in the "extras" field. That is now blank.

I am using version 0.1.11.

I had usually been using scroll times between 10 and 300 sec, and it always seemed to return the extras pages with the full list of videos. Now it is not? Hmmm.

with TikTokAPI(scroll_down_time=scroll_time,navigation_retries=5, navigation_timeout=0, 
                data_dump_file=filename) as api:
            try:
                user_object = api.user(user, video_limit=0)
           except:
                pass

CarlCochet · 2023-04-06T13:23:54Z

I have the same issue currently ; can't seem to load more than 30 videos no matter how I setup scroll_down_time, also using version 0.1.11.

Russell-Newton · 2023-04-06T15:44:58Z

Previously, this problem arose due to what seemed to be a bug in Playwright. The fix at that time was to switch the web driver to Firefox, but if you're both having issues, it might mean the issue is presenting itself in Firefox now. I don't have a whole lot of time to address this, being a full time masters student, but I'll try to take a look soon.

vladisalv · 2023-05-15T14:39:01Z

Hi! Have the same issue. Tried:

driver: firefox/chromium
mobile emulate True/False
playwright 1.29/1.31/1.33

all these combinations. Nothing works out of the box :(

vladisalv · 2023-05-15T14:42:26Z

What can I use for scraping user video stats? Used LightVideo from user model. Can I get it the other way?

Russell-Newton · 2023-05-15T17:29:08Z

What can I use for scraping user video stats? Used LightVideo from user model. Can I get it the other way?

If you have a User object and want to grab data on that user's videos, use the user.videos iterator. Iterating through this will load each video on demand, getting accurate statistics, video info, etc. If all you need are loose statistics, the LightVideos are faster.

* #33 - Make sure going to a page only waits for #SIGI-STATE, not load * #38 - Parameterize scroll down delays * Rollover to 0.1.12 for future release

Russell-Newton · 2023-05-16T04:16:38Z

@emacollins @CarlCochet @vladisalv Please try again with version 0.1.12. I've added new parameters to the API constructors that you can try messing with:

scroll_down_delay sets the time (in seconds) before scrolling down is started. This is useful if your network is slow (e.g.: you're running TikTokPy in a Docker container)
scroll_down_iter_delay sets the time (in seconds) between scrolls. This can also be useful to tinker with if your network is slow.

I also suggest updating all dependencies.

Use:

scroll_down_delay now defaults to 1 second instead of an implicit 0 seconds. If this does not immediately fix your problems, my suggestions are as follows:

Try increasing scroll_down_iter_delay to 0.5 from the default 0.2. This will slow down the scrolling, which could help load the msToken cookie (see Explanation)
Try increasing scroll_down_delay to 3. This should also help load the msToken cookie.

Explanation:

Notably, TikTok provides browsers with an msToken cookie, and scrolling down doesn't work until this cookie is provided. If you scroll down too fast, you'll deadlock TikTok. Scrolling down further won't make any more API calls. The only way for this deadlock to be removed is to scroll back up and then back down. TikTokPy scrolls up a bit every other scroll-down, but if the iterative scroll-downs happen too fast, the deadlock might not let up. These two new parameters can alleviate these issues.

vladisalv · 2023-05-18T12:15:17Z

Hi @Russell-Newton ! I checked it not in Docker with good internet speed, but it doesn't work. Scraped only 30 videos from 300.

I looked at code, you use evaluate. Maybe use mouse wheel?

Russell-Newton · 2023-05-18T12:17:59Z

What values for scroll_down_time, scroll_down_delay, and scroll_down_iter_delay of you have set @vladisalv?

vladisalv · 2023-05-20T13:57:06Z

As you suggested above I started with:

scroll_down_time: 10
scroll_down_delay: 3
scroll_down_iter_delay: 0.5

I increased it step by step and finished with these values:

scroll_down_time: 120
scroll_down_delay: 60
scroll_down_iter_delay: 10

But has only 30 videos from more than 300.

As I understand, it scrolled down videos. Because by default I got just 27 videos. So, it scrolls page, but stopped at first iteration pagination.

Russell-Newton · 2023-05-20T14:20:17Z

@vladisalv Please try again on version 0.1.13, if you aren't already using it. I made some changes that should hopefully fix an issue with collecting extra videos.

vladisalv · 2023-05-23T16:40:14Z

@Russell-Newton still doesn't work

for clarifying how I use code:

        with TikTokAPI(scroll_down_time=20, scroll_down_delay=5, scroll_down_iter_delay=5) as api:
            user_stat = api.user(self.username, video_limit=1)
            video_count = user_stat.stats.video_count

            videos = []
            scroll_time = 20
            while True:
                print("Scroll time:", scroll_time)
                user = api.user(self.username,
                    scroll_down_time=scroll_time,
                    scroll_down_delay=5,
                    scroll_down_iter_delay=5,
                )
                scroll_time *= 2
                videos.clear()

                for v in user.videos.light_models:
                    videos.append(v)

                print("len of videos:", len(videos))
                print("we are waiting for", video_count)

                if len(videos) == video_count:
                    break

Output:

Scroll time: 20
len of videos: 30                                                                                         
we are waiting for 302                                                                                    
...
Scroll time: 160
len of videos: 30                                                                                         
we are waiting for 302

Also, I got with new 0.1.13 version this exception:

 File ".../.venv/lib/python3.10/site-packages/playwright/_impl/
_connection.py", line 96, in inner_send
    result = next(iter(done)).result()
playwright._impl._api_types.Error: Protocol error (Network.getResponseBody): No data found for resource with given identifier

* Add a print statement to indicate when scrolling down fails on user pages

Russell-Newton · 2023-05-31T15:42:30Z

@vladisalv could you try again on your system with the 38-post-list-scroll-failure branch's code? Just to help with my debugging.

pip install -U https://github.com/Russell-Newton/TikTokPy.git@38-post-list-scroll-failure

And then you can try something simple like:

with TikTokAPI(scroll_down_time=120) as api:
    api.user("tiktok")

If my hunch is correct, the message Something went wrong should get printed out if you only collect 30 or so videos. If this is the case, that'll give me some more information about what's going wrong so that I may be able to fix it. My hunch is that it's related to this Reddit post: https://www.reddit.com/r/Tiktokhelp/comments/wybfcg/something_went_wrong_error_on_tiktok_web_via/.

Looking at the network logs, it seems like the API requests that attempt to grab the user posts sometimes return with a completely empty body. I'm able to recreate this locally, but it's inconsistent. I suspect I may have to do an overhaul like I suggest in #21 in order to completely fix this issue.

* Create functions for executing API calls of 4 kinds: * comment/list/ - video comments * post/item_list/ - user posts * challenge/item_list/ - popular videos tagged with a challenge * related/item_list/ - videos related to this one * Opens up potential future resolutions for #35, #38, #40, #43, and #44

Russell-Newton · 2023-06-27T17:26:06Z

I think the changes I've been working on with v0.2 might fix this issue. It could be worth checking out:

pip install -U git+https://github.com/Russell-Newton/TikTokPy.git@v0.2-overhaul

I removed the scrolling parameters, but it should (fingers crossed) work without any API constructor parameters. You should be able to get away with:

with TikTokAPI() as api:
    user = api.user("tiktok")
    for video in user.videos:
        # do something

This should iterate over all of a user's videos. You can limit this using the video_limit parameter in api.user or using the limit method attached to user.videos (for video in user.videos.limit(30)).

@emacollins @CarlCochet @vladisalv If one or all of you could try with the WIP changes, that would be very helpful. It works for me, but it's worth verifying that it works for you.

emacollins added the question Further information is requested label Mar 25, 2023

Russell-Newton added a commit that referenced this issue May 16, 2023

#33, #38 - Tandem fixes (see desc.)

a29f5c9

* #33 - Make sure going to a page only waits for #SIGI-STATE, not load * #38 - Parameterize scroll down delays * Rollover to 0.1.12 for future release

Russell-Newton mentioned this issue May 16, 2023

[TODO] - Make more API calls, load less pages #21

Open

Russell-Newton added a commit that referenced this issue May 31, 2023

#38 - Debugging for headless use

f43d09b

* Add a print statement to indicate when scrolling down fails on user pages

Russell-Newton closed this as completed in a3e6cc9 Jul 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] Scroll time on Docker #38

[QUESTION] Scroll time on Docker #38

emacollins commented Mar 25, 2023 •

edited

Loading

Russell-Newton commented Mar 28, 2023

emacollins commented Mar 31, 2023 •

edited

Loading

CarlCochet commented Apr 6, 2023

Russell-Newton commented Apr 6, 2023

vladisalv commented May 15, 2023

vladisalv commented May 15, 2023

Russell-Newton commented May 15, 2023

Russell-Newton commented May 16, 2023 •

edited

Loading

vladisalv commented May 18, 2023

Russell-Newton commented May 18, 2023 •

edited

Loading

vladisalv commented May 20, 2023

Russell-Newton commented May 20, 2023

vladisalv commented May 23, 2023

Russell-Newton commented May 31, 2023 •

edited

Loading

Russell-Newton commented Jun 27, 2023

[QUESTION] Scroll time on Docker #38

[QUESTION] Scroll time on Docker #38

Comments

emacollins commented Mar 25, 2023 • edited Loading

Russell-Newton commented Mar 28, 2023

emacollins commented Mar 31, 2023 • edited Loading

CarlCochet commented Apr 6, 2023

Russell-Newton commented Apr 6, 2023

vladisalv commented May 15, 2023

vladisalv commented May 15, 2023

Russell-Newton commented May 15, 2023

Russell-Newton commented May 16, 2023 • edited Loading

vladisalv commented May 18, 2023

Russell-Newton commented May 18, 2023 • edited Loading

vladisalv commented May 20, 2023

Russell-Newton commented May 20, 2023

vladisalv commented May 23, 2023

Russell-Newton commented May 31, 2023 • edited Loading

Russell-Newton commented Jun 27, 2023

emacollins commented Mar 25, 2023 •

edited

Loading

emacollins commented Mar 31, 2023 •

edited

Loading

Russell-Newton commented May 16, 2023 •

edited

Loading

Russell-Newton commented May 18, 2023 •

edited

Loading

Russell-Newton commented May 31, 2023 •

edited

Loading