Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more comic strips #86

Open
3 of 6 tasks
ArtskydJ opened this issue Aug 12, 2018 · 25 comments
Open
3 of 6 tasks

Add more comic strips #86

ArtskydJ opened this issue Aug 12, 2018 · 25 comments

Comments

@ArtskydJ
Copy link
Owner

ArtskydJ commented Aug 12, 2018

Before you write a scraper for comicsrss, please know that I don't want comicsrss to have some types of comic strips.

I don't want comicsrss to have sexually-suggestive comics. For example, I've considered killing the rss feed for 9 Chickweek Lane, and I still might kill it someday. I'm going to avoid adding anything to comicsrss that's more suggestive than that.

I might kill off political comics. I haven't yet, but I've been strongly considering it for a while now. Internet politics discussions tend to be tribal and echo-chambery, but political comics step that up a few notches.


List of comic strips/websites that folks have requested, and who requested them

Planned:

@ghost
Copy link

ghost commented Nov 15, 2018

I'd love to see some Comics Kingdom strips added, if possible. (For me, personally, mainly Bizarro, Rhymes with Orange, and Darrin Bell.)

@ArtskydJ
Copy link
Owner Author

Arcamax has Bizarro, Dilbert, and Rhymes with Orange, and Darrin Bell.

Both Comics Kingdom, and Arcamax look like they will be much more difficult to scrape than gocomics.

@ArtskydJ
Copy link
Owner Author

Added Dilbert today.

@ArtskydJ
Copy link
Owner Author

ArtskydJ commented Jan 4, 2019

I don't remember why I thought Arcamax would be particularly difficult. It doesn't look like it will be that hard...

<a class="prev" href="/thefunnies/brilliantmindofedisonlee/s-2160999" title="Brilliant Mind of Edison Lee 1/3/2019"><span class="entypo-left-open"></span></a>
  <span class="cur">January  4</span>
<a class="next-off" href="#"><span class="entypo-right-open"></span></a>

<!-- ... -->

<figure class="comic">
  <img id="comic-zoom" data-zoom-image="/newspics/168/16885/1688589.gif" src="/newspics/168/16885/1688589.gif"  data-width="600" data-height="187" alt="" class="img-responsive the-comic" title="click or tap to zoom" />
  <cite class="comic-copyright">(c) 2019 John Hambrock.  Dist. by King Features Syndicate, Inc.</cite>
</figure>

Hopefully I'll get around to it within a few weeks.

Repository owner deleted a comment May 22, 2019
@infinitytec
Copy link

Could I request Sherman's Lagoon and Freefall (the latter is a webcomic found at freefall.purrsia.com)?

@ArtskydJ
Copy link
Owner Author

Sherman's lagoon is on Comics Kingdom. If/when I add comics Kingdom, I can @ you in this thread.

I doubt I'll add Freefall unless it is part of a larger site like Comics Kingdom or Arcamax. If there's enough demand for it, I might add it.

Or you could look into adding it similar to dilbert was added:
https://github.com/ArtskydJ/comicsrss.com/blob/gh-pages/_generator/scraper-dilbert/index.js
There isn't really an API for making a scraper... ☹️


This is what I did for dilbert (and the process would be similar on freefall):

  1. Grab a page that shows multiple comics, including the latest comic
    a. For dilbert it was https://dilbert.com
    b. For freefall it might be http://freefall.purrsia.com/lastthree.htm
  2. Parse the HTML to turn it into an array like this:
[
    {
        "titleAuthorDate": "Freefall by Tugrik for Wednesday 6/12/2019",
        "url": "http://freefall.purrsia.com/ff3300/fc03290.htm",
        "date": "2019-06-12",
        "comicImageUrl": "http://freefall.purrsia.com/ff3300/fc03290.png"
    },
    ...
]
  1. Open the cached version of that array, and merge them together. (If I don't have the latest comic in the cached array, then I need to push it onto the array.)
  2. Write the cached file to disk.
  3. Integrate it with the rest of the system. (If you do everything else I would be more than happy to integrate your scraper.)

@infinitytec
Copy link

Thanks for the information! I'll look into it and see what I can do!

@ArtskydJ
Copy link
Owner Author

I made an API and published it in the README.

@ArtskydJ ArtskydJ pinned this issue Jul 17, 2019
@ArtskydJ ArtskydJ changed the title Add other comic strips Add more comic strips Jul 17, 2019
@jgbishop
Copy link

jgbishop commented Jan 1, 2020

Any progress on this? I've looked into scraping Comics Kingdom in the past year myself, and it's pretty difficult. Lots of the page gets loaded dynamically when first visited in a web browser. The publishers are clearly trying their best to prevent scraping, but my scraping knowledge is fairly limited when it comes to dynamic data. Maybe the arcamax website would be easier?

@ArtskydJ
Copy link
Owner Author

ArtskydJ commented Jan 8, 2020

@jgbishop Very little progress. You can see in _generator/site-scrapers/ that there are 2 Work In Progress folders. I haven't done anything since then.

Getting a functional scraper is probably around 2-10 hours of work. (Depending on how smoothly it goes, and if you run into any issues, like rate-limiting.) The reason that I haven't made another site scraper is not because of a technical issue blocking the way. It's just I haven't made it a priority.

And I personally don't have a ton of incentive to expand comicsrss since it does all that I need. I still want to scrape more sites.

If you have a specific comic strip that you're wanting, you could try making a scraper just for it, instead of the entire arcamax/comics kingdom site. And that might be a nice starting point for me to expand it to the whole site.

One more thing to note is that if/when arcamax or comics kingdom is added, the site generator will have to avoid making two entries when a comic is in both gocomics.com and the added site.

@ArtskydJ
Copy link
Owner Author

@jgbishop I finally added Arcamax comics.

@jgbishop
Copy link

Woo-hoo! Thanks! 👏 🍰

@ghost
Copy link

ghost commented Jul 5, 2020

Beetle Bailey and Hagar the Horrible, at last!

@infinitytec
Copy link

Well, I may have figured out something for Comics Kingdom: https://jsfiddle.net/p0tojns1/1/

Not a full scraper, and only for Sherman's Lagoon, but it may help.

@ArtskydJ
Copy link
Owner Author

Interesting...

Earlier, I'd decided not to write a scraper for Comics Kingdom, because I remembered Comics Kingdom being very dynamic. But it looks quite do-able to scrape that site now?

So I'm now planning to write a scraper for Comics Kingdom. I'm not promising anything. 😁 Difficulties might come up where I change my mind again, and abandon Comics Kingdom again. But I hope to get it working!

@jalberto
Copy link

I would like to suggest https://workchronicles.com

@ArtskydJ
Copy link
Owner Author

I would like to suggest workchronicles.com

They already have an RSS feed: https://workchronicles.com/feed/

@jalberto
Copy link

jalberto commented Nov 16, 2021 via email

@twizzayy
Copy link

twizzayy commented Apr 7, 2022

Cant wait for The Far side to be added. Thanks for this awesome resource. :)

@infinitytec
Copy link

Hey, looks like Sherman's Lagoon is now on GoComics so it's being scraped!

@ArtskydJ
Copy link
Owner Author

ArtskydJ commented Aug 9, 2022

I added Comics Kingdom strips to https://www.comicsrss.com/

@infinitytec

@tylerbenson
Copy link

Would it be difficult to add support for https://tinyview.com/ and https://www.webtoons.com/ hosted comics?

Webtoons has an RSS feed, but usually only shows the first pane of the comic.

Thanks!

@tylerbenson
Copy link

I tried to add additional details for tinyview: #141.

@ArtskydJ
Copy link
Owner Author

I just updated the original post.

Webtoons has some "mature"-rated comics, which I don't want on comicsrss. The "young adult"-rated comics varied a lot in their suggestiveness. Webtoons, by nature of its user-generated content, is difficult to categorize. If someone wrote a scraper for webtoons, even with the "mature"-rated comics filtered out, I'm not sure if I'd merge it into comicsrss.

I'd probably merge a scraper for tinyview. Most seemed fine. Maybe I'd filter out "Eggs n' Ben", IDK.

@tylerbenson
Copy link

tylerbenson commented Oct 27, 2023

Makes sense... For the record, I was interested in some of the family friendly cartoons for each, and I totally respect your desire to keep things clean. (I've sent my teen son to your site to find comics to read.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants