Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Neaten up HTML formatting

Kindlefodder: strip out leading spaces at top of p and li contents.
Trader Joes: ul sections should be cleared in case after a floated
image.
  • Loading branch information...
commit 7bdc46af22160651b380a1adb64bc8bd12b319eb 1 parent aacfd89
@danchoi authored
Showing with 12 additions and 3 deletions.
  1. +10 −2 lib/kindlefodder.rb
  2. +2 −1  recipes/trader_joes.rb
View
12 lib/kindlefodder.rb
@@ -83,10 +83,12 @@ def build_kindlerb_tree
# hacks to get the articles list to appear properly with summaries
out.sub!('html xmlns="http://www.w3.org/1999/xhtml"',
'\& xml:lang="en" lang="en"')
- # out.sub!(/<!DOCTYPE.*$/, '')
+ out.sub!(/<!DOCTYPE.*$/, '')
out.sub!('meta http-equiv="Content-Type" content="text/html; charset=UTF-8"',
'meta content="http://www.w3.org/1999/xhtml; charset=utf-8" http-equiv="Content-Type"')
out.strip!
+ # remove these useless characters:
+ out.gsub!('&#13;', '')
File.open(item_path, 'w:utf-8'){|f| f.puts out}
puts " #{item_path} -> #{article_title}"
@@ -164,10 +166,16 @@ def fixup_html! doc
p.swap p.children
p.remove
}
+ }
+
+ doc.search('li,p').each {|li|
# remove any leading spaces before elements inside any li tag
# THIS causes encoding problems!
#li.inner_html = li.inner_html.strip
- if (n = li.children.first).text?
+ if (n = li.children.first) && n.text?
+ n.content = n.content.strip
+ end
+ if (n = li.children.last) && n.text?
n.content = n.content.strip
end
}
View
3  recipes/trader_joes.rb
@@ -105,7 +105,8 @@ def save_article_and_return_path href, filename=nil
title_n = p.at(".text")
title_n.name = 'h3'
recipe_content.at("img").before title_n
- recipe_content.at("img").after Nokogiri::XML::Node.new("br", recipe_content)
+ recipe_content.search("ul").each {|ul| ul[:style] = "clear:both"}
+ # recipe_content.at("img").after Nokogiri::XML::Node.new("br", recipe_content)
p.remove
puts "Inlining recipe: #{title_n.inner_text}"
article_doc.at('hr').xpath("./following-sibling::*").each(&:remove)
Please sign in to comment.
Something went wrong with that request. Please try again.