public
Description: Ruby on Rails
Homepage: http://rubyonrails.org
Clone URL: git://github.com/rails/rails.git
Search Repo:
Added TextHelper#strip_tags for removing HTML tags from a string (using 
HTMLTokenizer) (closes #2229) [marcin@junkheap.net]

git-svn-id: http://svn-commit.rubyonrails.org/rails/trunk@2750 
5ecf4fe2-1ee6-0310-87b1-e25e094e27de
dhh (author)
Wed Oct 26 06:26:04 -0700 2005
commit  4e9bc0f02ddd7d90740e289e565d7f4ebd6e2c1d
tree    b113d96d79d9dc8cf962833e976b9543073a8acc
parent  82f1e19e4c493920e692309d15f66677ac8063e5
...
1
2
 
 
3
4
5
...
1
2
3
4
5
6
7
0
@@ -1,5 +1,7 @@
0
 *SVN*
0
 
0
+* Added TextHelper#strip_tags for removing HTML tags from a string (using HTMLTokenizer) #2229 [marcin@junkheap.net]
0
+
0
 * Added a reader for flash.now, so it's possible to do stuff like flash.now[:alert] ||= 'New if not set' #2422 [Caio Chassot]
0
 
0
 
...
202
203
204
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
205
206
207
...
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
0
@@ -202,6 +202,28 @@
0
         html
0
       end
0
       
0
+ # Strips all HTML tags from the input, including comments. This uses the html-scanner
0
+ # tokenizer and so it's HTML parsing ability is limited by that of html-scanner.
0
+ #
0
+ # Returns the tag free text.
0
+ def strip_tags(html)
0
+ if html.index("<")
0
+ text = ""
0
+ tokenizer = HTML::Tokenizer.new(html)
0
+
0
+ while token = tokenizer.next
0
+ node = HTML::Node.parse(nil, 0, 0, token, false)
0
+ # result is only the content of any Text nodes
0
+ text << node.to_s if node.class == HTML::Text
0
+ end
0
+ # strip any comments, and if they have a newline at the end (ie. line with
0
+ # only a comment) strip that too
0
+ text.gsub(/<!--(.*?)-->[\n]?/m, "")
0
+ else
0
+ html # already plain text
0
+ end
0
+ end
0
+
0
       # Returns a Cycle object whose to_s value cycles through items of an
0
       # array every time it is called. This can be used to alternate classes
0
       # for table rows:
...
268
269
270
 
 
 
 
 
 
 
 
 
271
...
268
269
270
271
272
273
274
275
276
277
278
279
280
0
@@ -268,5 +268,14 @@
0
     assert_equal(%w{Specialized Fuji Giant}, @cycles)
0
   end
0
 
0
+ def test_strip_tags
0
+ assert_equal("This is a test.", strip_tags("<p>This <u>is<u> a <a href='test.html'><strong>test</strong></a>.</p>"))
0
+ assert_equal("This is a test.", strip_tags("This is a test."))
0
+ assert_equal(
0
+ %{This is a test.\n\n\nIt no longer contains any HTML.\n}, strip_tags(
0
+ %{<title>This is <b>a <a href="" target="_blank">test</a></b>.</title>\n\n<!-- it has a comment -->\n\n<p>It no <b>longer <strong>contains <em>any <strike>HTML</strike></em>.</strong></b></p>\n}))
0
+ assert_equal("This has a here.", strip_tags("This has a <!-- comment --> here."))
0
+ end
0
+
0
 end

Comments

    No one has commented yet.