Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine tune link scraping #2

Closed
ItsNoted opened this issue Apr 15, 2023 · 1 comment
Closed

Fine tune link scraping #2

ItsNoted opened this issue Apr 15, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@ItsNoted
Copy link
Collaborator

Some links are being black listed when they are working.

Example:
Moon Knight Vol. 1 – The Midnight Mission (TPB) (2022)

This has good links but none are being downloaded.

@ItsNoted ItsNoted added the bug Something isn't working label Apr 15, 2023
@Casvt
Copy link
Owner

Casvt commented Apr 18, 2023

In this case, it's not that working links are tested as broken. Instead, the whole page is blocklisted because the page itself matched but nothing inside the page matched. This can be proven by the fact the the page is blocklisted, but no actual download link.

The reason here is that according to comicvine, the volume is volume 7. However, according to getcomics, the volume is volume 1. This mismatch leads to it being blocklisted.

When manually searching in Kapowarr for the volume, the getcomics page doesn't come up in the results (as it should because of the volume number mismatch) and thus I'm curious how you got Kapowarr to select that download.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants