Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support fetching related text with more formatting #15

Open
jodal opened this issue May 23, 2011 · 2 comments
Open

Support fetching related text with more formatting #15

jodal opened this issue May 23, 2011 · 2 comments

Comments

@jodal
Copy link
Owner

jodal commented May 23, 2011

E.g. Darths & Drois has a huge formatted text associated with each comic. Since these texts often are half the fun, comics should support fetching larger pieces of text with formatting, and keep a sane amount of this formatting, e.g. headers and bullet lists.

@jodal
Copy link
Owner Author

jodal commented Jun 7, 2012

I believe @xim have been looking a bit at this, ref. xim/comics@fdea722.

@xim
Copy link
Contributor

xim commented Jun 11, 2012

I don't remember what we ended up with as a preferred approach. I made a tiny, general converter on my local computer. The idea was:

  1. Get the formatted HTML
  2. Use a dict that transforms elements, something like {'p': lambda data: ' '.join(data.split()) + '\n\n', ...}
  3. Allow the individual crawler to override any element type in this dict

I only tested this with rom.ac and QC, but it should enable good results on any comic. Further suggestions? =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants