Skip to content

Parsing gmane with factor

Marco Rimoldi edited this page Dec 14, 2015 · 34 revisions

This is an introductory tutorial to the Factor programming language. It starts of where Your first program ends and teaches you how to build a larger multi-vocabulary application in Factor. The tutorial is divided into four main parts where each part guides you through creating one of the vocabularies that the program consists of.

Why Factor?

Because learning a new language that is foreign to most mainstream languages is heaps of fun! Factor's syntax is not similar to Java, C#, Python, Haskell, Erlang, Lisp or any popular language. Perhaps one could describe it as "lisp-like without thousands of parentheses." It's semantics are strongly dynamically typed meaning that it is easy to write generic code, but the runtime system prevents you from doing stupid things like adding a string to a number.

Factor is stack-based like Forth, but that's where the similarities end. Factor is functional (although not purely functional) and high-level, Forth is low-level and imperative. Its most important feature is being concatenative like J and APL.

Thanks to its concatenative and postfix nature, it requires amazingly little code to express algorithms that would be much more verbose in other languages.

Postfix also means it lends itself very well to interactive experimentation. You can develop your code as a single line on the listener, then when you're done just copy-paste it into your vocabulary and give it a name.

See also Why Concatenative Programming Matters

Why Not Factor?

It is more of an underdog than even Clojure. No O'Reilly Factor book exists yet and your next salaried job is very unlikely to involve Factor.

Factor forces you to write code in point-free style. That is, without named variables. It is different and, initially, very difficult to do. Kind of how hard it was to correctly use functional constructs like map and reduce before you got used to them. Expect a pain period which one has to go through.

Stable version of Factor is at 0.96 so there is not even a 1.0 release yet. Therefore it is pretty likely that code you write today will not work in the future because breaking changes may be introduced in Factor itself or its standard library.

Typographical Conventions and Factor Specific Terms

Factor makes it easy to interactively experiment using its listener which is what its interactive interpreter is called. In other languages it is known as the REPL. Snippets you are supposed to type into the listener look like this:

>> 10 iota [ sq ] map 

--- Data stack:
{ 0 1 4 9 16 25 36 49 64 81 }
IN: scratchpad 

>> represents Factors listener prompt. Usually it is the string IN: scratchpad. The 10 iota [ sq ] map is the string to be entered and the text after is what the listener replies with. All other code examples are meant to be placed inside a vocabulary. A vocabulary is the same thing as a Factor module which is just a source file with the .factor extension. I consider the terms vocabulary and module synonyms and will use them interchangeably, though "module" may be more familiar to programmers coming from other languages.

It is the same with word and function.

About the Program

The program we will build is a simple scraper that scrapes the Gmane mailing list archive and stores the mails in a SQLite database. A presumptive user can then read the scraped emails from within Factor's listener.

Gmane already has a great web interface, so the program isn't very practical, but it demonstrates some important concepts with which one could write a scraper for any other site:

  • How to scrape html pages
  • How to query a database
  • How to format text

Finished Result

You can check out the finished program by cloning the repository below:

git clone https://github.com/bjourne/playground-factor

Then enter the playground-factor directory and run factor like this:

$ cd playground-factor
$ FACTOR_ROOTS="." factor

In the listener window that pops up, issue the following commands:

IN: scratchpad USE: gmane.console
... stuff loads
IN: scratchpad init
Database created.
IN: scratchpad import
... stuff scrolls by

The database is now initialized with a bunch of mails from the comp.lang.factor.general newsgroup. Type recent to get a list of them:

IN: scratchpad recent

If everything works, you should get a table showing mails sent to the mailing list. Type 10 read-mail to read one of them:

IN: 10 read-mail

Factor will ask you what read word you want to use, choose the one in the gmane.console vocabulary. You should see the same mail that gmane has archived on this url http://article.gmane.org/gmane.comp.lang.factor.general/10.

See Missing sqlite3_open if Factor fails to load SQLite.

Subsections