Skip to content

Commit

Permalink
Actual released version of faq
Browse files Browse the repository at this point in the history
  • Loading branch information
ranguard committed Jan 8, 2012
1 parent f5b9c93 commit 1280d90
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 32 deletions.
29 changes: 1 addition & 28 deletions docs/learn/faq/perlfaq6.html
Expand Up @@ -168,34 +168,7 @@ <h2 id="How-can-I-pull-out-lines-between-two-patterns-that-are-themselves-on-dif

<h2 id="How-do-I-match-XML-HTML-or-other-nasty-ugly-things-with-a-regex-">How do I match XML, HTML, or other nasty, ugly things with a regex? </h2>

<p>(contributed by brian d foy)</p>

<p>If you just want to get work done, use a module and forget about the regular expressions. The <a href="https://metacpan.org/module/XML::Parser">XML::Parser</a> and <a href="https://metacpan.org/module/HTML::Parser">HTML::Parser</a> modules are good starts, although each namespace has other parsing modules specialized for certain tasks and different ways of doing it. Start at CPAN Search ( <a href="http://search.cpan.org/">http://search.cpan.org/</a> ) and wonder at all the work people have done for you already! :)</p>

<p>The problem with things such as XML is that they have balanced text containing multiple levels of balanced text, but sometimes it isn&#39;t balanced text, as in an empty tag (<code>&lt;br/&gt;</code>, for instance). Even then, things can occur out-of-order. Just when you think you&#39;ve got a pattern that matches your input, someone throws you a curveball.</p>

<p>If you&#39;d like to do it the hard way, scratching and clawing your way toward a right answer but constantly being disappointed, besieged by bug reports, and weary from the inordinate amount of time you have to spend reinventing a triangular wheel, then there are several things you can try before you give up in frustration:</p>

<ul>

<li><p>Solve the balanced text problem from another question in <a href="perlfaq6.html">perlfaq6</a></p>

</li>
<li><p>Try the recursive regex features in Perl 5.10 and later. See <a href="https://metacpan.org/module/perlre">perlre</a></p>

</li>
<li><p>Try defining a grammar using Perl 5.10&#39;s <code>(?DEFINE)</code> feature.</p>

</li>
<li><p>Break the problem down into sub-problems instead of trying to use a single regex</p>

</li>
<li><p>Convince everyone not to use XML or HTML in the first place</p>

</li>
</ul>

<p>Good luck!</p>
<p>Do not use regexes. Use a module and forget about the regular expressions. The <a href="https://metacpan.org/module/XML::LibXML">XML::LibXML</a>, <a href="https://metacpan.org/module/HTML::TokeParser">HTML::TokeParser</a> and <a href="https://metacpan.org/module/HTML::TreeBuilder">HTML::TreeBuilder</a> modules are good starts, although each namespace has other parsing modules specialized for certain tasks and different ways of doing it. Start at CPAN Search ( <a href="http://metacpan.org/">http://metacpan.org/</a> ) and wonder at all the work people have done for you already! :)</p>

<h2 id="I-put-a-regular-expression-into-but-it-didnt-work.-Whats-wrong-">I put a regular expression into $/ but it didn&#39;t work. What&#39;s wrong? </h2>

Expand Down
8 changes: 4 additions & 4 deletions docs/learn/faq/perlfaq9.html
Expand Up @@ -53,7 +53,7 @@ <h2 id="Should-I-use-a-web-framework-">Should I use a web framework?</h2>

<p>Yes. If you are building a web site with any level of interactivity (forms / users / databases), you will want to use a framework to make handling requests and responses easier.</p>

<p>If there is no interactivity then you may still want to look at using something like <a href="https://metacpan.org/module/Template">Template Toolkit</a> or <a href="https://metacpan.org/module/Plack::Middleware::TemplateToolkit">Plack::Middleware::TemplateToolkit</a> so maintenence of your HTML files (and other assets) is easier.</p>
<p>If there is no interactivity then you may still want to look at using something like <a href="https://metacpan.org/module/Template">Template Toolkit</a> or <a href="https://metacpan.org/module/Plack::Middleware::TemplateToolkit">Plack::Middleware::TemplateToolkit</a> so maintenance of your HTML files (and other assets) is easier.</p>

<h2 id="Which-web-framework-should-I-use-">Which web framework should I use? </h2>

Expand All @@ -75,7 +75,7 @@ <h2 id="Which-web-framework-should-I-use-">Which web framework should I use?
</dd>
</dl>

<p>These are just a few of the more widley used ones. All of them interact with or use <a href="https://metacpan.org/module/Plack">Plack</a> which is worth understanding the basics of, as there is a lot of useful <a href="https://metacpan.org/search?q=plack%3A%3Amiddleware">Plack::Middleware</a>.</p>
<p>These are just a few of the more widely used ones. All of them interact with or use <a href="https://metacpan.org/module/Plack">Plack</a> which is worth understanding the basics of, as there is a lot of useful <a href="https://metacpan.org/search?q=plack%3A%3Amiddleware">Plack::Middleware</a>.</p>

<p>As to which one you should use, it depends on the complexity of the site you are trying to build. Review the three listed above and see which suites your needs best. Many of them share the same <a href="http://en.wikipedia.org/wiki/Model-view-controller">MVC</a> concepts so once you understand one it is easier to understand the others.</p>

Expand All @@ -89,11 +89,11 @@ <h2 id="What-is-Plack-and-PSGI-">What is Plack and PSGI?</h2>

<h2 id="How-do-I-remove-HTML-from-a-string-">How do I remove HTML from a string?</h2>

<p>The best way it to use <a href="https://metacpan.org/module/HTML::Parser">HTML::Parser</a>from CPAN. You could also use <a href="https://metacpan.org/module/HTML::FormatText">HTML::FormatText</a> which not only removes HTML but also attempts to do a little simple formatting of the resulting plain text.</p>
<p>Use <a href="https://metacpan.org/module/HTML::Strip">HTML::Strip</a>, or <a href="https://metacpan.org/module/HTML::FormatText">HTML::FormatText</a> which not only removes HTML but also attempts to do a little simple formatting of the resulting plain text.</p>

<h2 id="How-do-I-extract-URLs-">How do I extract URLs?</h2>

<p><a href="https://metacpan.org/module/HTML::SimpleLinkExtor">HTML::SimpleLinkExtor</a> will extract URLS from HTML, it handles anchors, images, objects, frames, and many other tags that can contain a URL. If you need anything more complex, you can create your own subclass of <a href="https://metacpan.org/module/HTML::LinkExtor">HTML::LinkExtor</a> or <a href="https://metacpan.org/module/HTML::Parser">HTML::Parser</a>. You might even use <a href="https://metacpan.org/module/HTML::SimpleLinkExtor">HTML::SimpleLinkExtor</a> as an example for something specifically suited to your needs.</p>
<p><a href="https://metacpan.org/module/HTML::SimpleLinkExtor">HTML::SimpleLinkExtor</a> will extract URLs from HTML, it handles anchors, images, objects, frames, and many other tags that can contain a URL. If you need anything more complex, you can create your own subclass of <a href="https://metacpan.org/module/HTML::LinkExtor">HTML::LinkExtor</a> or <a href="https://metacpan.org/module/HTML::Parser">HTML::Parser</a>. You might even use <a href="https://metacpan.org/module/HTML::SimpleLinkExtor">HTML::SimpleLinkExtor</a> as an example for something specifically suited to your needs.</p>

<p>You can use <a href="https://metacpan.org/module/URI::Find">URI::Find</a> to extract URLs from an arbitrary text document.</p>

Expand Down

0 comments on commit 1280d90

Please sign in to comment.