parisman edited this page Dec 1, 2010 · 4 revisions
Clone this wiki locally

Element Parser
A simple way to parse and use XML and HTML data in your Cocoa applications.

It doesn’t do everything. It aspires to do “just enough”.

Accessing and manipulating HTML and XML in Cocoa can be incredibly frustrating. There are two existing choices (NSXMLParser and lib2xml) but neither work with HTML or “real-world” XML documents that are often not “perfect”. Their interfaces put all the work on you to map between the document and your program’s domain objects. They force you to write code that is hard to write and maintain. Somehow, something that starts out looking straightforward ends up becoming a science project or worse.
ElementParser is lightweight framework to provide easy access to xml and html content. Rather than get lost in the complexities of the HTML and XML specifications, it aspires to not obscure their essential simplicity. It doesn’t do everything, it aspires to do “just enough”.

Let’s begin with some examples.

document = [Element parseHTML: source];

Document is a special element that holds the top level element(s) (e.g. or ) of your document. You now have a tree of Element objects which you can walk using methods like firstChild, nextSybling and parent. You can also access the data each contains with methods like tagName, attributes, contentsText and contentsOfChildren. Nice start. And sometimes this is enough. But let’s say you don’t want to walk the tree to find the data you need. How about:

linkElement = [element selectElement: @"div.nextLink a"];

Here we’re using a css-type selector to locate and return a matching element. Nice. Now we can parse a document and conveniently find elements of interest. (Yes, there is a corresponding selectElements: method that returns all matches.)

Next, let’s bind together your world of objects and the world of elements more closely. To do this, we’ll use the ElementParser directly to register callbacks into your code when an element is found (and its contents parsed).

ElementParser* parser = [[ElementParser alloc] initWithCallbacksDelegate: self];
[parser performSelector:@selector(gotFeedElement:) forElementsMatching: @"feed"];
documentRoot = [parser parseXML: source];

Your code could then look like this:

-(FeedItem*) gotFeedElement:(Element*)element{
  FeedItem* feedItem = [[[FeedItem alloc] init] autorelease];
  feedItem.title = [[element selectElement: @"title"] contentsText];
  feedItem.description = [[element selectElement: @"description"] contentsText];
  feedItem.enclosure = [[element selectElement: @"enclosure"] contentsText]; 
  element.setDomainObject = feedItem; //optional

Finally, all these html and xml documents often reside on the web. Wouldn’t it be nice if we could use the pattern above to process the documents incrementally as soon as they appear?

How about:

URLParser* parser = [[URLParser alloc] initWithCallbackDelegate: self];
[parser performSelector:@selector(gotChanElement:) forElementsMatching: @"channel"];
[parser performSelector:@selector(gotFeedElement:) forElementsMatching: @"feed"];
[parser parseURL: myURL];

There is alot more available under the covers but this may be all you need. Hopefully its just enough. We’d love your feedback at feedback@touchtankapps.com.

Terms of Use

The ElementParser framework (and its source code) is free of charge for non commercial uses (via a GPL license). For other commercial uses, the license fee is $100 per product. (That’s a couple of hours of your time, right?) Support plans are also available. Please contact sales@touchtankapps.com.