Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed 8muses ripper #5

Merged
merged 8 commits into from
Oct 7, 2017
Merged

Fixed 8muses ripper #5

merged 8 commits into from
Oct 7, 2017

Conversation

cyian-1756
Copy link
Collaborator

@cyian-1756 cyian-1756 commented Jun 25, 2017

Category

This change is exactly one of the following (please change [ ] to [x]) to indicate which:

Description

I updated the cdn url (So downloading images no longer throws 404s) and changed how the ripper tells if it is ripping a album or a series of sub-albums*

  • I used a regex to do this, but the regex will fail for comics who titles are all numbers, if anyone can think of a better way to do this please let me know

The regex I used /comix/[a-zA-Z0-9\\-_/]*/\\d+

Testing

Required verification:

  • I've verified that there are no regressions in mvn test (there are no new failures or errors).
  • I've verified that this change works as intended.
    • Downloads all relevant content.
    • Downloads content from multiple pages (as necessary or appropriate).
    • Saves content at reasonable file names (e.g. page titles or content IDs) to help easily browse downloaded content.
  • I've verified that this change did not break existing functionality (especially in the Ripper I modified).

Optional but recommended:

  • I've added a unit test to cover my change.

Test links:

Single album: https://www.8muses.com/comix/album/prismgirls-comics/bikini-space-police

Sub-albums: https://www.8muses.com/comix/album/the-foxxx-comics/the-anal-plumber

edit: I found an error, don't merge yet

Edit_2: I add a hackish workaround for the error I ran into

Edit_3: I removed the hack and added a much better solution

@metaprime
Copy link
Contributor

metaprime commented Jun 25, 2017

I used a regex to do this, but the regex will fail for comics who titles are all numbers, if anyone can think of a better way to do this please let me know

The regex I used /comix/[a-zA-Z0-9\\-_/]*/\\d+

Assuming you mean the part of the URL would look like this:

  • comix/album/prismgirls-comics/bikini-space-police
  • Or like this: comix/album/123456 (I think this would match your regex)
  • Or like this: comix/123456
    • You could use /comix/([a-zA-Z0-9\\-_/]*/)?\\d+

Can you give an example of a URL you think would fail?

@cyian-1756
Copy link
Collaborator Author

cyian-1756 commented Jun 25, 2017

comix/album/prismgirls-comics/bikini-space-police

Something like comix/album/prismgirls-comics/25675 would fail

Also do you have any idea how to have getURLsFromPage return nothing, but not throw an error?

Because when ripping sub-albums getURLsFromPage returns nothing (Which makes ripme error even tho everything downloaded) or I return "http://" which makes the rip never compete (It just stays at pending 1)

@cyian-1756
Copy link
Collaborator Author

cyian-1756 commented Jun 25, 2017

I implemented a workaround for the getURLsFromPage error I ran into, getURLsFromPage now returns "http://DONTDOWNLOAD" which ripme can't download (and for some reason doesn't come up as an error in the interface)

My work around is pretty hackish so I'd like to know if anyone has any better ideas

I removed the workaround and replaced it with a simple map that stores the url and album title

@metaprime
Copy link
Contributor

metaprime commented Jul 9, 2017

do you have any idea how to have getURLsFromPage return nothing, but not throw an error?

I ran into this recently (while working on #10). AFAICT, it crashes by design. We could remove the check that causes the error, but I'm not sure if that would cause trouble elsewhere.

I think some refactoring may be in order. The structure of the code has not really been that conducive to sites where you would rather queue up work one at a time and delay because of rate limiting. Rate limiting is a much bigger problem now so many of the rippers have to do funny acrobatics to make it work.

@rautamiekka
Copy link
Contributor

rautamiekka commented Jul 9, 2017

I found 2 viable options for testing Java RegEx online with multiple RegEx sentences and multiple target strings:

They're nowhere nearly as fancy as regex101.com (which tests only Perl, JavaScript (way different from Java), Python and Golang), but at least shows which parts your sentence matches; ocpsoft uses color-coding and underlining (lack of both = no match) and regexplanet uses HTML tables and code tags.

Also, you forgot the leading /, that's why they don't match.

Also note that you have to drop all escapes from the sentence since the sites do the escaping automatically, which with your escapes results in double escapes => no matches.

  • /comix/[a-zA-Z0-9\\-_/]*/\\d+ and /comix/([a-zA-Z0-9\\-_/]*/)?\\d+ matches nothing on the sites.
  • /comix/[a-zA-Z0-9-_/]*/\d+ and /comix/[a-zA-Z0-9\-_/]*/\d+ matches /comix/album/123456; notice that escaping the minus is redundant as it's as per the rules.
  • /comix/([a-zA-Z0-9-_/]*/)?\d+ matches
/comix/album/123456
/comix/123456

and creates a group from album/ on the first line.

@cyian-1756
Copy link
Collaborator Author

We could remove the check that causes the error, but I'm not sure if that would cause trouble elsewhere.

I think a better idea would be to create a new flag for getURLsFromPage that causes it to return without nothing without throwing errors (Just wrap the check in a if basically)

@cyian-1756
Copy link
Collaborator Author

@metaprime I changed the regex to your suggestion of '/comix/([a-zA-Z0-9\-_/]*/)?\d+' and removed my hackish work around for getURLSFromPage, the ripper should work fine now. You mind testing it and confirming?

@metaprime metaprime self-assigned this Jul 26, 2017
@metaprime
Copy link
Contributor

metaprime commented Aug 10, 2017

Single-album: https://www.8muses.com/comix/album/prismgirls-comics/bikini-space-police -- worked fine

Sub-albums: https://www.8muses.com/comix/album/the-foxxx-comics/the-anal-plumber -- kept saying it was downloading things but nothing ever showed a completion message or ended up on disk

I tried one of the links from 4pr0n/ripme#543:

(NSFW) https://www.8muses.com/comix/community/album/stormageddon/curies-curiosities-by-hizzacked:

Error: null

Copy link
Contributor

@metaprime metaprime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment

@cyian-1756
Copy link
Collaborator Author

It seems that 8muses is having some issues right now (Trying to connect to the site keeps either timing out or returning 500 errors) so I'm going to hold off on fixing this for a day or so (or until 8muses starts working again)

@cyian-1756
Copy link
Collaborator Author

It looks like 8muses is in the middle of another over haul right now so I'm going to hold off on fixing this until they're done

@cyian-1756
Copy link
Collaborator Author

I've fixed the ripper (Every thing seems to work minus missing pages when the site 502s/timeouts). No idea when we'll be able to test the ripper properly as 8muses is showing no signs of getting fixed

@cyian-1756 cyian-1756 mentioned this pull request Sep 8, 2017
11 tasks
@cyian-1756
Copy link
Collaborator Author

@metaprime it seems that 8muses is working again so would you mind testing and merging?

@cyian-1756
Copy link
Collaborator Author

@rautamiekka You mind testing this and confirming it works? I'd like to have another person check it before I merge it in

@rautamiekka
Copy link
Contributor

^ Would love to, but since I don't have Eclipse running properly I can't compile anything.

@cyian-1756
Copy link
Collaborator Author

@rautamiekka Well damn. I suppose it doesn't matter that much, I tested it pretty thoroughly.

I'll merge it in a day or so

@cyian-1756 cyian-1756 merged commit a73e558 into RipMeApp:master Oct 7, 2017
@cyian-1756 cyian-1756 deleted the 8muses branch October 22, 2017 00:24
cyian-1756 pushed a commit that referenced this pull request Jul 4, 2018
cyian-1756 pushed a commit that referenced this pull request Sep 14, 2018
cyian-1756 pushed a commit that referenced this pull request Dec 3, 2018
lbalmaceda pushed a commit to lbalmaceda/ripme that referenced this pull request Oct 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8muses ripper competely broken
3 participants