-
Notifications
You must be signed in to change notification settings - Fork 630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed 8muses ripper #5
Conversation
Assuming you mean the part of the URL would look like this:
Can you give an example of a URL you think would fail? |
Something like comix/album/prismgirls-comics/25675 would fail Also do you have any idea how to have getURLsFromPage return nothing, but not throw an error? Because when ripping sub-albums getURLsFromPage returns nothing (Which makes ripme error even tho everything downloaded) or I return "http://" which makes the rip never compete (It just stays at pending 1) |
I removed the workaround and replaced it with a simple map that stores the url and album title |
I ran into this recently (while working on #10). AFAICT, it crashes by design. We could remove the check that causes the error, but I'm not sure if that would cause trouble elsewhere. I think some refactoring may be in order. The structure of the code has not really been that conducive to sites where you would rather queue up work one at a time and delay because of rate limiting. Rate limiting is a much bigger problem now so many of the rippers have to do funny acrobatics to make it work. |
I found 2 viable options for testing Java RegEx online with multiple RegEx sentences and multiple target strings:
They're nowhere nearly as fancy as regex101.com (which tests only Perl, JavaScript (way different from Java), Python and Golang), but at least shows which parts your sentence matches; ocpsoft uses color-coding and underlining (lack of both = no match) and regexplanet uses HTML tables and code tags. Also, you forgot the leading Also note that you have to drop all escapes from the sentence since the sites do the escaping automatically, which with your escapes results in double escapes => no matches.
and creates a group from |
I think a better idea would be to create a new flag for getURLsFromPage that causes it to return without nothing without throwing errors (Just wrap the check in a if basically) |
@metaprime I changed the regex to your suggestion of '/comix/([a-zA-Z0-9\-_/]*/)?\d+' and removed my hackish work around for getURLSFromPage, the ripper should work fine now. You mind testing it and confirming? |
Single-album: https://www.8muses.com/comix/album/prismgirls-comics/bikini-space-police -- worked fine Sub-albums: https://www.8muses.com/comix/album/the-foxxx-comics/the-anal-plumber -- kept saying it was downloading things but nothing ever showed a completion message or ended up on disk I tried one of the links from 4pr0n/ripme#543: (NSFW) https://www.8muses.com/comix/community/album/stormageddon/curies-curiosities-by-hizzacked:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment
It seems that 8muses is having some issues right now (Trying to connect to the site keeps either timing out or returning 500 errors) so I'm going to hold off on fixing this for a day or so (or until 8muses starts working again) |
It looks like 8muses is in the middle of another over haul right now so I'm going to hold off on fixing this until they're done |
I've fixed the ripper (Every thing seems to work minus missing pages when the site 502s/timeouts). No idea when we'll be able to test the ripper properly as 8muses is showing no signs of getting fixed |
@metaprime it seems that 8muses is working again so would you mind testing and merging? |
@rautamiekka You mind testing this and confirming it works? I'd like to have another person check it before I merge it in |
^ Would love to, but since I don't have Eclipse running properly I can't compile anything. |
@rautamiekka Well damn. I suppose it doesn't matter that much, I tested it pretty thoroughly. I'll merge it in a day or so |
add gradle build, faster testing
Category
This change is exactly one of the following (please change
[ ]
to[x]
) to indicate which:Description
I updated the cdn url (So downloading images no longer throws 404s) and changed how the ripper tells if it is ripping a album or a series of sub-albums*
The regex I used
/comix/[a-zA-Z0-9\\-_/]*/\\d+
Testing
Required verification:
mvn test
(there are no new failures or errors).Optional but recommended:
Test links:
Single album: https://www.8muses.com/comix/album/prismgirls-comics/bikini-space-police
Sub-albums: https://www.8muses.com/comix/album/the-foxxx-comics/the-anal-plumber
edit: I found an error, don't merge yetEdit_2: I add a hackish workaround for the error I ran intoEdit_3: I removed the hack and added a much better solution