You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 16, 2023. It is now read-only.
I store the audits for a user as they say it, which is good as long as the bot is running. Eventually, I would like to have the bot give the right information regardless of if it's running or not.
One of the big things I can do is to scrape a user's profile to get a list of all close vote review items and determine if they are audits or not.
Sam was nice enough to write me a library to get that data, but I will need to make some modifications to it. The default 10 pages makes SO block my IP for a few seconds because of the number of requests.
So this is my grand plan so far:
Store all scraped CV review items in my own database. This will make lookup and querying much faster. Also, review items don't change, so I don't have to worry updating data once it has been inserted.
Make a "refresh audits" command. This will initiate the search for new review items. The chat bot will say:
Searching for new close vote review items. This may take a while. I will reply when I have finished.
I have added 43 new close vote review items of yours into the database. Use the command audit stats to see a breakdown of your audits.
For the actual parsing process: start at the beginning and go until you hit the first review item that is already stored in the database. Because the order of items should not change on the website, it will be safe to assume that any item past the first item you have already processed will also be processed already. This will severely shorten the time for subsequent data refreshes.
Page grabs will need to be time-delayed. I don't want to get 503'ed by SO because i'm calling the pages too often. I'm thinking a 1 second delay between page grabs. Yes, this is going to be painfully slow on the first grab, but refreshes afterwards should be under 10 seconds or so.
The text was updated successfully, but these errors were encountered:
[...] Also, review items don't change, so I don't have to worry updating data once it has been inserted.
My own emphasis.
This is partially incorrect. As the review item itself may still be in the process of being reviewed (i.e., it hasn't been completed yet). So, depending on whether or not you need to access info about other reviewers of a review item, in the future, you may need to wait for the review to be completed before scrapping.
We should make sure that we use String.ToLower or something on them all so that we don't end up with duplicates from users who capitalize tags when manually reporting audits.
I store the audits for a user as they say it, which is good as long as the bot is running. Eventually, I would like to have the bot give the right information regardless of if it's running or not.
One of the big things I can do is to scrape a user's profile to get a list of all close vote review items and determine if they are audits or not.
Sam was nice enough to write me a library to get that data, but I will need to make some modifications to it. The default 10 pages makes SO block my IP for a few seconds because of the number of requests.
So this is my grand plan so far:
Store all scraped CV review items in my own database. This will make lookup and querying much faster. Also, review items don't change, so I don't have to worry updating data once it has been inserted.
Make a "refresh audits" command. This will initiate the search for new review items. The chat bot will say:
For the actual parsing process: start at the beginning and go until you hit the first review item that is already stored in the database. Because the order of items should not change on the website, it will be safe to assume that any item past the first item you have already processed will also be processed already. This will severely shorten the time for subsequent data refreshes.
Page grabs will need to be time-delayed. I don't want to get 503'ed by SO because i'm calling the pages too often. I'm thinking a 1 second delay between page grabs. Yes, this is going to be painfully slow on the first grab, but refreshes afterwards should be under 10 seconds or so.
The text was updated successfully, but these errors were encountered: