Skip to content

Commit

Permalink
Sniff H2 sections when ingesting documents
Browse files Browse the repository at this point in the history
Documents may already have NAME sections which already have name and
tagline, so we should try that before looking at the H1 header.

The sections may use the hyphenated form, which we can split up as
name and tagline, or they may just have the name of the page.  We can
handle both.
  • Loading branch information
adminspotter committed Jan 6, 2024
1 parent 8d2725c commit 1c085db
Show file tree
Hide file tree
Showing 2 changed files with 31 additions and 1 deletion.
20 changes: 19 additions & 1 deletion lib/ronn/document.rb
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,25 @@ def sniff
html = Kramdown::Document.new(data[0, 512], auto_ids: false,
smart_quotes: ['apos', 'apos', 'quot', 'quot'],
typographic_symbols: { hellip: '...', ndash: '--', mdash: '--' }).to_html
sniff_h1_heading(html) or [nil, nil, nil]
sniff_h2_headings(html) or sniff_h1_heading(html) or [nil, nil, nil]
end

# If the document has a '## NAME' heading, see if we can sniff out
# some of the document metadata.
def sniff_h2_headings(html)
html.split('<h2>').each do |section|
case section
when /^NAME<\/h2>\s*<p>([\w_.\/\[\]~+=@:<>-]+)\s+-+\s+([\w_.\/\[\]~+=@: -]*)<\/p>/m
# name -- description
description = $2
name = $1.gsub(/<[^>]+>/, '')
return [name, nil, description]
when /^NAME<\/h2>\s*<p>([\w_.\/\[\]~+=@:<>-]+)<\/p>/m
# name
return [$1.gsub(/<[^>]+>/, ''), nil, nil]
end
end
nil
end

# If the document has a top-level '# <data>' type heading, see
Expand Down
12 changes: 12 additions & 0 deletions test/test_ronn_document.rb
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,18 @@ def canonicalize(text)
assert_equal '5', doc.section
assert_equal 'wootderitis', doc.tagline
end

test "new with NAME heading with #{i} dashes and description" do
doc = Ronn::Document.new { "# whatever\n\n## NAME\n\n`foo` #{dashes} bar" }
assert_equal 'foo', doc.name
assert_equal 'bar', doc.tagline
end
end

test 'new with NAME heading without description' do
doc = Ronn::Document.new { "# whatever\n\n## NAME\n\n`foo`" }
assert_equal 'foo', doc.name
assert_equal nil, doc.tagline
end

context 'simple conventionally named document' do
Expand Down

0 comments on commit 1c085db

Please sign in to comment.