Support fetching related text with more formatting #15

jodal · 2011-05-23T17:03:16Z

E.g. Darths & Drois has a huge formatted text associated with each comic. Since these texts often are half the fun, comics should support fetching larger pieces of text with formatting, and keep a sane amount of this formatting, e.g. headers and bullet lists.

jodal · 2012-06-07T13:17:38Z

I believe @xim have been looking a bit at this, ref. xim/comics@fdea722.

xim · 2012-06-11T13:45:18Z

I don't remember what we ended up with as a preferred approach. I made a tiny, general converter on my local computer. The idea was:

Get the formatted HTML
Use a dict that transforms elements, something like {'p': lambda data: ' '.join(data.split()) + '\n\n', ...}
Allow the individual crawler to override any element type in this dict

I only tested this with rom.ac and QC, but it should enable good results on any comic. Further suggestions? =)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support fetching related text with more formatting #15

Support fetching related text with more formatting #15

jodal commented May 23, 2011

jodal commented Jun 7, 2012

xim commented Jun 11, 2012

Support fetching related text with more formatting #15

Support fetching related text with more formatting #15

Comments

jodal commented May 23, 2011

jodal commented Jun 7, 2012

xim commented Jun 11, 2012