how_do_i.xml

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE chapter SYSTEM "chapter.dtd">
<chapter>
  <header>
    <title>How do I...</title>
    <prepared>Matthias Lang</prepared>
    <docno></docno>
    <date>2007-09-12</date>
    <rev>1.0</rev>
  </header>


  <p>
	This section is intended for the impatient. Most of these
	questions would resolved by working through the (on-line)
	Erlang manuals, but sometimes we just want a quick answer...
	</p><p>

	Keep in mind that the program fragments are intended to 
	illustrate an idea, not serve as re-useable, robust modules!

</p>

  <section><title>...compare numbers?</title>
    <p>

	The operators for comparing numbers are <c> &gt;, &gt;=, &lt;, 
	=&lt;, == and =/= </c>. Some of these are a little different
	to C for historical reasons. Here are some examples:
	</p>

	<pre>
	Eshell V4.9.1  (abort with ^G)
	1> 13 > 2.
	true
	2> 18.2 >= 19. 
	false
	3> 3 == 3.
	true
	4> 4 =/= 4.
	false
	5> 3 = 4.
	** exited: {{badmatch,4},[{erl_eval,expr,3}]} **
	</pre>

    <p>
	The last example is a (failed) pattern match rather than a
	comparison.
    </p>

  </section>
  <section><title>...represent a text-string?</title>

    <p>
    As a list of characters. You can write
    </p>

	<pre>A = "hello world".</pre>

    <p>	which is exactly the same as writing</p>

	<pre>A = [104,101,108,108,111,32,119,111,114,108,100].
	</pre>

    <p>and also the same as writing</p>

	<pre>A = [$h,$e,$l,$l,$o,$ ,$w,$o,$r,$l,$d].</pre>

    <p>
	Each character consumes 8 bytes of memory on a 32 bit
        machine (a 32 bit integer and a 32 bit pointer) and twice
        as much on 64 bit machines. Access to the nth. element takes
        O(n) time. Access to the first element takes O(1) time, as
        does prepending a character.
	</p><p>

	The bit-syntax, which is described in <url 
	href="http://www.erlang.org/doc/reference_manual/expressions.html#6.16">
	the manual</url> provides an alternative, but somewhat
	limited, way to represent strings compactly, for instance:
</p>

    <pre><![CDATA[
	1> A = <<"this string consumes one byte per character">>.
	<<116,104,105,115,32,115,116,114,...>>
	2> size(A).
	43
	3> <<_:8/binary,Q:3/binary,R/binary>> = A.
	<<116,104,105,115,32,115,116,114,...>>
	4> Q.
	<<105,110,103>>
	5> binary_to_list(Q).
	"ing"
	]]></pre>
    <p>
	This style of pattern-matching manipulation of strings 
	provides O(1) equivalents of some string operations which are
	O(N) when using the list representation. Some other operations
	which are O(1) in the list representation become O(N) when
	using the bit-syntax.
	</p><p>

	There are general ways to
	<seealso marker="academic#string-performance">improve 
        string performance.</seealso>

</p></section>
  <section><title>...convert a string to lower case?</title>

    <pre>
        1> string:to_lower("Five Boxing WIZARDS").
	"five boxing wizards"
	2> string:to_upper("jump QuickLY").                           
	"JUMP QUICKLY"
        </pre>

    <p>
	If you have an older version of Erlang (prior to R11B-5), you can
        find these two functions in the <c>httpd_util</c> module.
        </p>
  </section>

  <section>
    <marker id="howto-string-to-term"/>
    <title>...convert a text string representation of a term to a term?</title>

    <pre>
        1> {ok,Tokens,_} = erl_scan:string("[{foo,bar},x,3].").
        {ok,[{'[',1}, {'{',1}, {atom,1,foo}, {',',1}, {atom,1,bar},...
        2> {ok,Term} = erl_parse:parse_term(Tokens).
        {ok,[{foo,bar},x,3]}
        </pre>

    <p>
        See also <c>file:consult/1</c>
    </p></section>
  
  <section><title>...use unicode/UTF-8?</title>
    <p>

	Representing strings as lists of integers also has advantages,
	one of which is that unicode (and any other character encodings)
	is automatically supported in the sense that you can pass around
	lists of unicode characters.
	</p><p>

	In practice, supporting unicode means a bit more, for instance
	being able to read unicode from a file and being able to sort 
	unicode	strings.
	</p><p>

	There is some undocumented code in the OTP XMERL application
        (the xmerl_ucs module) to help with some of that.
	</p><p>

	There's also <url href="http://12monkeys.co.uk/starling/">alpha
        code</url> to help with basic operations over
	unicode strings: comparison, concatention, substring extraction,
	substring search and case conversion.

</p></section>
  <section><title>...destructively update data, like in C</title>
    <p>
        Not being able to do this is considered a feature of Erlang. The
        <url href="http://www.erlang.org/download/erlang-book-part1.pdf">
        Erlang book</url> explains this in chapter 1.

<!-- REVISIT: I should read Joe's book and find a relevant passage there instead -->
</p><p>
        Having said that, there are common methods to achieve 
        effects similar to destructive update: store the data
        in a process (use messages to manipulate it), store the data
        in ETS or use a data structure designed
        for relatively cheap updates (the <c>dict</c> library 
        module is one common solution).

</p></section>
  <section>
    <marker id="noshell"/>
    <title>...run an Erlang program directly from the unix shell?</title>
    <p>
	As of Erlang/OTP R11B-4, you can run Erlang code without 
        compiling by using Escript. The <url href="http://www.erlang.org/doc/man/escript.html">manual page</url> has a
        complete example showing how to do it.
 
	</p><p>
	Escript is intended for short, "script-like" programs. A more
        general way to do it, which also works for older Erlang releases,
        is to start the Erlang VM without a shell and pass more switches
	to the Erlang virtual machine. Here's hello world again:
	</p>

	<code>
	-module(hello).
	-export([hello_world/0]).
	
	hello_world() -> 
		io:fwrite("hello, world\\n").
	</code>

        <p>
	Save this as <c>hello.erl</c>, compile it and 
	run it directly	from the unix (or msdos) command line:
	</p>

	<pre>
	matthias >erl -compile hello
	matthias >erl -noshell -s hello hello_world -s init stop
	hello, world
	</pre>
	
  </section>
  <section><title>...communicate with non-Erlang programs?</title>
    <p>

	Erlang has several mechanisms for communicating with programs
	written in other languages, each with different tradeoffs. The
	Erlang documentation includes a 
	<url href=
	"http://www.erlang.org/doc/tutorial/part_frame.html">
	guide to interfacing with other languages</url> which describes
	the most common mechanisms:
</p>

    <list>
      <item><p>Distributed Erlang</p></item>
      <item><p>Ports and linked-in drivers</p></item>
      <item><p>EI (C interface to Erlang, replaces the 
                    mostly deprecated erl_interface)</p></item>
      <item><p>Jinterface (Java interface to Erlang)</p></item>
      <item><p>IC (IDL compiler, used with C or Java)</p></item>
      <item><p>General TCP or UDP protocols, including ASN.1</p></item>
    </list>

    <p>
	Some of the above methods are easier to use than others. Loosely
	coupled approaches, such as using a custom protocol over a TCP socket,
	are easier to debug than the more tightly coupled options such as
	linked-in drivers. The Erlang-questions mailing list archives 
	contain many desperate cries for help with memory corruption problems 
	caused by using Erl Interface and linked-in drivers incorrectly...
	</p><p>
	There are a number of user-supported methods and tools, including
</p>
    <list>
      <item><p><url href="http://epi.sourceforge.net/">A set of C++ classes</url></p></item>
      <item><p><url href="http://www.lysator.liu.se/~tab/erlang/py_interface/">A Python implementation of an Erlang hidden node</url></p></item>
      <item><p><url href="http://www.snookles.com/erlang/edtk/">
	EDTK, a port-driver toolkit,</url> roughly equivalent
        to Java's SWIG.</p></item>
      <item><p><url href="http://dryverl.objectweb.org/">Dryverl,
        an Erlang-to-C binding</url>.</p></item>
    </list>

  </section>
  <section><title>...write a unix pipe program in Erlang?</title>
    <p>

	Lots of useful utilities in unix can be chained together 
	by piping one program's output to another program's input.
	Here's an example which rot-13 encodes standard input
	and sends it to standard output:

	</p>
    <code><![CDATA[
	-module(rot13).
	-export([rot13/0]).

	rot13() ->
	  case io:get_chars('', 8192) of
	    eof -> init:stop();
	  Text ->
	    Rotated = [rot13(C) || C &lt;- Text],
	    io:put_chars(Rotated),
	    rot13()
	end.
	
	rot13(C) when C >= $a, C =&lt; $z -> $a + (C - $a + 13) rem 26;
	rot13(C) when C >= $A, C =&lt; $Z -> $A + (C - $A + 13) rem 26;
	rot13(C) -> C.
        ]]></code>

    <p>
	After compiling this, you can run it from the command line:

	</p>
    <pre>
	matthias> cat ~/.bashrc | erl -noshell -s rot13 rot13 | wc
	</pre>
    
  </section>
  <section><title>...communicate with a DBMS?</title>
    <p>

	There are two completely different ways to communicate with
	a database. One way is for Erlang to act as a database for
	another system. The other way is for an Erlang program to
	access a database. The former is a "research" topic. The
	latter is easily accomplished by going via ODBC, which allows
	you to access almost any commercial DBMS. The 
	<url href=
	"http://www.erlang.org/doc/apps/odbc/index.html">
	OTP ODBC manual</url>	explains how to do this.

</p></section>
  <section><title>...decode binary Erlang terms</title>
    <p>

	Erlang terms can be converted to and from binary representations
	using bifs:

	</p>
    <code><![CDATA[
	1> term_to_binary({3, abc, "def"}).
	<<131,104,3,97,3,100,0,3,97,98,99,107,0,3,100,101,102>>
	2> binary_to_term(v(1)).
	{3,abc,"def"}
        ]]></code>

    <p>

	If you want to decode them using C programs, take a look
	at <url href="http://erlang.org/doc/man/ei.html">EI</url>.

</p></section>
  <section><title>...decode Erlang crash dumps</title>
    <p>

	Erlang crash dumps provide information about the state of the
	system when the emulator crashed. The 
	<url href="http://erlang.org/doc/apps/erts/crash_dump.html">manual explains</url> how to interpret them.


</p></section>
  <section><title>...estimate performance of an Erlang system?</title>
    <p>

	Mike Williams, one of Erlang's original developers, is fond 
	of saying
</p> <quote><p>"If you don't run experiments before
	you start designing a <em>new</em> system, your 
	<em>entire system</em> will be an experiment!"</p></quote>
    <p>

	This philosophy is widespread around Erlang projects, in part
	because the Erlang development environment encourages development
	by prototyping. Such prototyping will also allow sensible 
	performance estimates to be made.
	</p><p>

	For those of you who want to leverage experience with C
	and C++, some rough rules of thumb are:
	</p>
    
    <list>
      <item><p>Code which involves mainly number crunching and
	data processing will run about 10 times slower than
	an equivalent C program. This includes almost all 
	"micro benchmarks"</p></item>
      
      <item><p>Large systems which spent most of their time 
	communicating with other systems, recovering from faults and 
	making complex decisions run at least as fast as equivalent 
	C programs.</p></item>
    </list>

    <p>
	Like in any other language or system, experienced developers
	develop a sense of which operations are expensive and which 
	are cheap. Erlang newcomers accustomed to the relatively slow
	interprocess communication facilities in other languages 
	tend to over-estimate the cost of creating Erlang processes and
	passing messages between them.

</p></section>
  <section><title>...measure performance of an Erlang system?</title>
    <p>

	The <em>timer</em> module measures the wall clock time
	elapsed during execution of a function:

	</p>
    <code>
	7> timer:tc(lists, reverse, ["hello world"]).                        
	{27,"dlrow olleh"}
	8> timer:tc(lists, reverse, ["hello world this is a longer string"]).
	{34,"gnirts regnol a si siht dlrow olleh"}
	</code>
    
    <p>

	The <c>eperf</c> library provides a way to profile
	a system.

</p></section>
  <section><title>...measure memory consumption in an Erlang system?</title>
    <p>

	Memory consumption is a bit of a tricky issue in Erlang. Usually,
	you don't need to worry about it because the garbage collector
	looks after memory management for you. But, when things go 
	wrong, there are several sources of information. Starting 
	from the most general:
</p><p>
	Some operating systems provide detailed information
	about process memory use with tools like top, 
	ps or the linux /proc filesystem:
</p>

    <pre>
	cat /proc/5898/status

	VmSize:     7660 kB
	VmLck:         0 kB
	VmRSS:      5408 kB
	VmData:     4204 kB
	VmStk:        20 kB
	VmExe:       576 kB
	VmLib:      2032 kB
	</pre>

    <p>
	This gives you a rock-solid upper-bound on the amount of
	memory the entire Erlang system is using.
</p><p>
	<c>erlang:system_info</c> reports 
	interesting things about some globally allocated structures in bytes:
</p>

    <pre>
        3> erlang:system_info(allocated_areas).  
	[{static,390265},
	 {atom_space,65544,49097},
	 {binary,13866},
	 {atom_table,30885},
	 {module_table,944},
	 {export_table,16064},
	 {register_table,240},
	 {loaded_code,1456353},
	 {process_desc,16560,15732},
	 {table_desc,1120,1008},
	 {link_desc,6480,5688},
	 {atom_desc,107520,107064},
	 {export_desc,95200,95080},
	 {module_desc,4800,4520},
	 {preg_desc,640,608},
	 {mesg_desc,960,0},
	 {plist_desc,0,0},
	 {fixed_deletion_desc,0,0}]
	</pre>

    <p>
	Information about individual processes can be obtained
	from <c>erlang:process_info/1</c> or 
	<c>erlang:process_info/2</c>:
</p>
    <pre>
	2> erlang:process_info(self(), memory).
	{memory,1244}
	</pre>
    <p>
	The shell's <c>i()</c> and the <c>pman
	</c> tool also give useful overview information.
</p><p>
	Don't expect the sum of the results from <c>process_info
	</c> and <c>system_info</c> to add up to
	the total memory use reported by the operating system. The Erlang
	runtime also uses memory for other things.
</p><p>	
	A typical approach when you suspect you have memory problems
	is
</p><p>
	1. Confirm that there really is a memory problem by checking
	that memory use as reported by the operating system is unexpectedly
	high.
</p><p>
	2. Use <c>pman</c> or the shell's <c>
	i()</c> command to make sure there isn't an out-of-control
	erlang process on the system. Out-of-control processes often have
	enormous message queues. A common reason for Erlang processes to
	get unexpectedly large is an endlessly looping function which isn't
	tail recursive. 
</p><p>
	3. Check the amount of memory used for binaries (reported
	by <c>system_info</c>). Normal data in Erlang is
	put on the process heap, which is garbage collected. Large binaries,
	on the other hand, are reference counted. This has two interesting
	consequences. Firstly, binaries don't count towards a process'
	memory use. Secondly, a lot of memory can be allocated in binaries
	without causing a process' heap to grow much. If the heap doesn't
	grow, it's likely that there won't be a garbage collection, which
	may cause binaries to hang around longer than expected. A
	strategically-placed call to <c>erlang:garbage_collect()
	</c> will help.
</p><p>
	4. If all of the above have failed to find the problem,
	start the Erlang runtime system with the -instr switch.

</p></section>
  <section><title>...estimate productivity in an Erlang project?</title>
    <p>

	A rough rule of thumb is that about the same number of lines
	of code are produced per developer as in a C project.
	A reasonably complex problem involving distribution and
	fault tolerance will be roughly five times shorter in 
	<em>Erlang</em> than in C.
	</p><p>
	
	The traditional ways of slowing down projects, like adding
	armies of consultants halfway through, 
	spending a year writing detailed design specifications
	before any code is written, rigidly following a waterfall
	model, spreading development across several countries and
	holding team meetings to decide on the colour of the serviettes used
	at lunch work just as well for Erlang as for other languages.

</p></section>
  <section><title>...use multiple CPUs (or cores) in one server?</title>
    <p>
        As of OTP R12B, Erlang has SMP support on all the major
        platforms, and it's enabled by default. Some platforms
        (Solaris, Linux, MacOSX on PPC) had SMP support in earlier
        versions.
</p><p>
        Given that Erlang programs are naturally written in a concurrent 
        manner, most programs require no modification at all to take
        advantage of SMP.
</p><p>
	A more loosely coupled approach is to start multiple Erlang nodes
        on the same server.

</p></section>
  <section><title>...run distributed Erlang through a firewall?</title>
    <p>
	The simplest approach is to make an a-priori restriction to the
        TCP ports distributed Erlang uses to communicate through by
	setting the (undocumented) kernel variables 'inet_dist_listen_min' 
	and 'inet_dist_listen_max'. Example:
</p>

    <pre>
	application:set_env(kernel, inet_dist_listen_min, 9100).
	application:set_env(kernel, inet_dist_listen_max, 9105).
	</pre>

    <p>
	This forces Erlang to use only ports 9100--9105 for distributed
	Erlang traffic.

        In the above example, you would then need to configure your
        firewall to pass ports 9100--9105 as well as port 4369 (for
        the erlang port mapper).
</p><p>
	There are other approaches, such as tunnelling the information
	through SSH or writing your own distribution handler.

</p></section>
  <section>
    <marker id="crash-line-number"/>
    <title>...find out which line my Erlang program crashed at?</title>
    <p>
        The run-time system's error reports tell you which function
        crashed, but not the line number. Consider this module:
</p>

    <pre>
        -module(crash).
        -export([f/0]).

        f() ->
          g().

        g() ->
          A = 4,
          b = 5,  % Error is on this line.
          A.
        </pre>

    <pre>
        3> crash:f().
        ** exited: {{badmatch,5},[{crash,g,0},{shell,exprs,6},{shell,eval_loop,3}]} **
        </pre>

    <p>
        The crash message tells us that the crash was in the function
        <c>crash:g/1</c>, and that it was a badmatch. For
        programs with short functions, this is enough information for
        an experienced programmer to find the problem.
</p><p>
        In cases where you want more information, there are two options.
        The first is to use the <url href="http://www.erlang.org/doc/apps/debugger/">erlang debugger</url> to single-step
        the program.
</p><p>
        Another option is to compile your code using the smart 
        exceptions package, available from 
        <url href='http://jungerl.cvs.sourceforge.net/jungerl/jungerl/lib/smart_exceptions/src/'>the jungerl</url>. Smart exceptions then provide line
	number information:
</p>

    <pre>
        4> c(crash, [{parse_transform, smart_exceptions}]).    
        {ok,crash}
        5> crash:f().
        ** exited: {{crash,g,0},{line,9},match,[5]} **
        </pre>

  </section>
  <section>
    <marker id="erlrt-and-sae"/>
    <title>...distribute the Erlang programs I write to my friends/colleagues/users?</title>

    <p>
        Erlang programs only run on the Erlang VM, so every machine which
        is going to run an Erlang program needs to have a copy of the
        Erlang runtime installed.
</p><p>
        Installing the entire Erlang system from erlang.org (or, perhaps,
        indirectly via a packaging system such as Debian's or BSD's) is
        the simplest option in many cases.
</p><p>
        A more modular alternative is to install the 
	<url href="http://cean.process-one.net/">CEAN</url>
        runtime.
</p><p>
        A further alternative which has fallen into disuse is SAE: 
        stand-alone Erlang. SAE allows an Erlang program to be distributed
        as just two files, totalling about 500k. SAE no longer works
        on current Erlang releases, but for historical interest, there is
	<url href="http://www.sics.se/~joe/sae.html">
	a page about SAE</url> which shows you how to build a self-contained
	system for the (somewhat outdated) R9B Erlang release. 

</p></section>
  <section>
    <marker id="stderr"/>
    <title>...write to standard error (stderr) on a unix system?</title>

    <p>The simplest approach is to just open file descriptor two:</p>

    <code>
            P = open_port({fd,0,2}, [out]),
            port_command(P, "this text goes to stderr\\n").
    </code>
  </section>
</chapter>