Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update last item scraped more often #95

Open
anoduck opened this issue May 9, 2021 · 2 comments · May be fixed by #98
Open

update last item scraped more often #95

anoduck opened this issue May 9, 2021 · 2 comments · May be fixed by #98
Labels
enhancement New feature or request

Comments

@anoduck
Copy link

anoduck commented May 9, 2021

Is your feature request related to a problem? Please describe.
The time span between updating when the last item was scraped is too great, and currently if for some reason the script crashes, one has to start at the beginning and scrape all the items over again.

Describe the solution you'd like
Change the timing of when the "update last item downloaded" function to where it occurs every other item or every five or so items.

Describe alternatives you've considered
You could also write this in as an exception to when the script fails and crashes it updates the last item scraped before exiting.

Additional context
Will take a small bit of coding, but will save a lot of time.

@Dineshkarthik Dineshkarthik added the enhancement New feature or request label May 10, 2021
@Dineshkarthik
Copy link
Owner

The update of the config file is based on the batch of messages processed. Currently, a batch of 100 messages are being processed concurrently and once the batch is done the config is updated.

I like the solution of graceful exit i.e, the script crashes update the config with the latest message_id before exit. Will add it to the next set of features.

@anoduck
Copy link
Author

anoduck commented May 11, 2021

A batch of 100 messages? That is a lot, especially if you don't have the fastest network, and your downloading rather large files. I am assuming that is set in the following line:

begin_import(config, pagination_limit=100)

Regardless, the graceful stop will accomplish the same desired end result. Although, I am completely clueless how to implement it.
That works. I will leave it for you to close out the issue when desired.

@Dineshkarthik Dineshkarthik linked a pull request May 19, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging a pull request may close this issue.

2 participants