Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup of likes doesn't stop at expected count #217

Open
arete06 opened this issue Sep 20, 2020 · 17 comments
Open

Backup of likes doesn't stop at expected count #217

arete06 opened this issue Sep 20, 2020 · 17 comments
Assignees
Labels
bug has PR regression Something no longer works as intended

Comments

@arete06
Copy link

arete06 commented Sep 20, 2020

I just installed tumblr-utils and, trying to download all my likes, it just keeps running even after passing the expected posts. Example:

blogname: Getting posts 1050 to 1099 (of 1410 expected)

blogname: Getting posts 1100 to 1149 (of 1410 expected)         
     
blogname: Getting posts 1150 to 1199 (of 1410 expected)        
      
blogname: Getting posts 1200 to 1249 (of 1410 expected)       
       
blogname: Getting posts 1250 to 1299 (of 1410 expected)        
      
blogname: Getting posts 1300 to 1349 (of 1410 expected)        
      
blogname: Getting posts 1350 to 1399 (of 1410 expected)           
   
blogname: Getting posts 1400 to 1449 (of 1410 expected)              

blogname: Getting posts 1450 to 1499 (of 1410 expected)           
   
blogname: Getting posts 1500 to 1549 (of 1410 expected)           
   
blogname: Getting posts 1550 to 1599 (of 1410 expected)              

blogname: Getting posts 1600 to 1649 (of 1410 expected)             
 
blogname: Getting posts 1650 to 1699 (of 1410 expected`)

@cebtenzzre
Copy link
Collaborator

cebtenzzre commented Sep 21, 2020

If it's caused by a regression, I have a suspicion commit 4961a2f (script here) would work better for you. But chances are https://github.com/aggroskater/tumblr-utils still has better support for likes.

@cebtenzzre cebtenzzre added the bug label Sep 21, 2020
@cebtenzzre
Copy link
Collaborator

cebtenzzre commented Sep 21, 2020

It's possible that this condition is failing:

return doc if doc.get('meta', {}).get('status', 0) == 200 else None

You could try replacing that line with this more verbose code:

if doc.get('meta', {}).get('status', 0) != 200:
    sys.stderr.write('API response has non-200 status:\n{}\n'.format(doc))
    return None
return doc

@bbolli
Copy link
Owner

bbolli commented Sep 21, 2020

I don't get what the more verbose code does differently, ecxept for printing the document if the response code is not 200. How would this help in this situation?

@arete06
Copy link
Author

arete06 commented Sep 21, 2020

If it's caused by a regression, I have a suspicion commit 4961a2f (script here) would work better for you. But chances are https://github.com/aggroskater/tumblr-utils still has better support for likes.

I tried this commit and it did indeed finish and created the html file. However, it clearly did not save all my likes so it's still not working for me.

@arete06
Copy link
Author

arete06 commented Sep 21, 2020

Actually, it now seems more likely to me that this condition is failing:

return doc if doc.get('meta', {}).get('status', 0) == 200 else None

Chances are your likes are private (not yet supported, see #128), but maybe something else is going on. (edit: this would yield an HTTP Error 401, so it wouldn't explain this). You could try replacing that line with this more verbose code:

if doc.get('meta', {}).get('status', 0) != 200:
    sys.stderr.write('API response has non-200 status:\n{}\n'.format(doc))
    return None
return doc

I actually have some code like that in my own version of the script.

This did not work either.

@cebtenzzre
Copy link
Collaborator

@bbolli If that None return was causing the backup loop to "try the next batch", it would be good to know what the response status was for debugging reasons (even if HTTP status was 200). But I forgot that the loop skips single posts now, so that wouldn't cause this.

@cebtenzzre
Copy link
Collaborator

cebtenzzre commented Sep 21, 2020

@sldx12 If the older commit worked for you then try the latest version of tumblr-utils, which has a potential fix for this issue. If neither gets all your likes then aggroskater's might -- it walks them by timestamp instead of offset.

@arete06
Copy link
Author

arete06 commented Sep 23, 2020

@cebtenzzre none of those worked. The older commit keeps having the same bug and aggroskater's one doesn't get all likes.

@cebtenzzre
Copy link
Collaborator

Does the latest version (download it fresh from GitHub or update your clone if you made one) still try to download more likes than expected? If that's fixed, we can close this issue and open a new one for not downloading all of the likes.

@arete06
Copy link
Author

arete06 commented Sep 23, 2020

Yes, the latest version still tries to download more likes than expected.

@cebtenzzre cebtenzzre changed the title Post number exceds expected number or script keeps running Backup of likes doesn't stop at expected count Sep 23, 2020
@cebtenzzre cebtenzzre added cannot reproduce Bug cannot be reproduced verified Bugs that can be reproduced and removed cannot reproduce Bug cannot be reproduced verified Bugs that can be reproduced labels Sep 23, 2020
@cebtenzzre
Copy link
Collaborator

cebtenzzre commented Sep 23, 2020

I can't reproduce the issue on a test blog with ~30 likes - I thought I could at one point but I realized I didn't have enough likes to prove my theory.
@bbolli I have a suspicion this is because of 29e4c84 being effectively reverted by dd40a88. Did you ever find that commits to be necessary? If less than MAX_POSTS likes are being backed up at a time (maybe they started selectively enforcing the 20 post limit?) then the backed-up count could read as high as 3525 before it stops.
@sldx12 You could test this by adding this line before i += MAX_POSTS on the latest version:

print len(posts)

@cebtenzzre cebtenzzre added the regression Something no longer works as intended label Sep 23, 2020
@arete06
Copy link
Author

arete06 commented Sep 24, 2020

@cebtenzzre I'm not sure if this what you asked me but here's the result I got:

blogname: Getting posts 0 to 49 (of 1410 expected) 41
blogname: Getting posts 50 to 99 (of 1410 expected) 35
blogname: Getting posts 100 to 149 (of 1410 expected) 36
blogname: Getting posts 150 to 199 (of 1410 expected) 39
blogname: Getting posts 200 to 249 (of 1410 expected) 42
blogname: Getting posts 250 to 299 (of 1410 expected) 37
blogname: Getting posts 300 to 349 (of 1410 expected) 38
blogname: Getting posts 350 to 399 (of 1410 expected) 39
blogname: Getting posts 400 to 449 (of 1410 expected) 37
blogname: Getting posts 450 to 499 (of 1410 expected) 38
blogname: Getting posts 500 to 549 (of 1410 expected) 39
blogname: Getting posts 550 to 599 (of 1410 expected) 39
blogname: Getting posts 600 to 649 (of 1410 expected) 30
blogname: Getting posts 650 to 699 (of 1410 expected) 36
blogname: Getting posts 700 to 749 (of 1410 expected) 36
blogname: Getting posts 750 to 799 (of 1410 expected) 42
blogname: Getting posts 800 to 849 (of 1410 expected) 40
blogname: Getting posts 850 to 899 (of 1410 expected) 41
blogname: Getting posts 900 to 949 (of 1410 expected) 46
blogname: Getting posts 950 to 999 (of 1410 expected) 31
blogname: Getting posts 1000 to 1049 (of 1410 expected) 43
blogname: Getting posts 1050 to 1099 (of 1410 expected) 43
blogname: Getting posts 1100 to 1149 (of 1410 expected) 43
blogname: Getting posts 1150 to 1199 (of 1410 expected) 43
blogname: Getting posts 1200 to 1249 (of 1410 expected) 43
blogname: Getting posts 1250 to 1299 (of 1410 expected) 43
blogname: Getting posts 1300 to 1349 (of 1410 expected) 43
blogname: Getting posts 1350 to 1399 (of 1410 expected) 43
blogname: Getting posts 1400 to 1449 (of 1410 expected) 43
blogname: Getting posts 1450 to 1499 (of 1410 expected) 43
blogname: Getting posts 1500 to 1549 (of 1410 expected) 43

@cebtenzzre
Copy link
Collaborator

cebtenzzre commented Sep 24, 2020

Yeah, that's what I wanted to see. I see two problems:

  1. You are not retrieving all of your likes, either because Tumblr's API won't give them to you, or because the script is skipping them. Less than 50 posts per response could explain either.
    • Potential fix (try this): Replace i += MAX_POSTS with i += len(posts) and see if more posts are backed up this way (by number of files in the posts folder). This would align with the older commit's behavior but is not how the API is supposed to work.
  2. len(posts) gets stuck at 43. Maybe this indicates no new posts? If you have only ~805 files in your posts folder and not ~1235, that's more evidence of this. The older commit compared the total len(posts) against the expected count, but if the API gets stuck that limit can no longer break the cycle.
    • Potential fix (try this): Track (or even use) _links since it works for the aggroskater fork. Put some code before posts = _get_content(soup) so it looks like this:
try:
    print '\nnext before is {}'.format(soup['response']['_links']['next']['query_params']['before'])
except KeyError:
    print '\nno next before, should probably stop'
posts = _get_content(soup)

@arete06
Copy link
Author

arete06 commented Sep 25, 2020

  1. Potential fix gives the following: https://pastebin.com/izfAibgV
  2. Potential fix gives the following: https://pastebin.com/FKkwcc7u

cebtenzzre added a commit to cebtenzzre/tumblr-utils that referenced this issue Sep 25, 2020
Sometimes, at least when backing up likes, the API can get stuck
endlessly returning the same set of posts instead of returning an empty
list. Inspect _links and stop if the offset/before fails to change.

Fixes bbolli#217
cebtenzzre added a commit to cebtenzzre/tumblr-utils that referenced this issue Sep 25, 2020
Sometimes, at least when backing up likes, the API can get stuck
endlessly returning the same set of posts instead of returning an empty
list. Inspect _links and stop if the offset/before fails to change.

Fixes bbolli#217
@cebtenzzre cebtenzzre self-assigned this Sep 25, 2020
@cebtenzzre
Copy link
Collaborator

@sldx12 Try the script from PR #219 and see if it stops on its own. Also, sorry if I wasn't clear: The first potential fix exists because you said "it clearly did not save all my likes" - I want to know if that change will allow either the older commit or my PR to save more (or all) of your likes.

@arete06
Copy link
Author

arete06 commented Sep 25, 2020

@cebtenzzre Oh, sorry. I don't think that the first potential fix saved all my likes. It's hard to know because, since I have to stop the script, it doesn't generate the html file. However, I looked at the media folder and it didn't look like it had all the likes.

The script from PR #219 stopped on its own, gave the following output and did not save all my likes: https://pastebin.com/ktJPmL89

@cebtenzzre
Copy link
Collaborator

Leave this issue open so it will be closed when (if?) the PR is merged. The issue of not all likes downloading even with the PR is probably Issue #118, so discuss that there. According to that issue, the offset parameter is limited to 1,000 for likes, which explains why anything past offset=1000 is the same as offset=1000; I had forgotten about this as I use aggroskater's fork for likes anyway.

If not even aggroskater's fork backs up all of your likes, feel free to open a new issue.

cebtenzzre added a commit to cebtenzzre/tumblr-utils that referenced this issue Sep 25, 2020
When backing up likes, the API repeats responses past offset=1000.
Inspect _links and stop if the "before" parameter fails to change.

Fixes bbolli#217
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug has PR regression Something no longer works as intended
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants