In [1]:
string = "hello"

"hello"

In [5]:
is_binary(string)

true

use `?` in front of character literal to reveal its code point.

In [3]:
?a

97

In [4]:
?ł

322

In [6]:
"\u0061" === "a"

true

In [7]:
0x0061 = 97 = ?a

97

In [8]:
string = "hełło"

"hełło"

In [9]:
String.length(string)

5

In [10]:
byte_size(string)

7

In [11]:
"hello" <> <<0>>

<<104, 101, 108, 108, 111, 0>>

You can view a string's binary representation by using `IO.inspect/2`

In [12]:
IO.inspect("hełło", binaries: :as_binaries)

<<104, 101, 197, 130, 197, 130, 111>>


"hełło"

A bitstrig is a fundamental data type in Elixir, denoted with the `<<>>` syntax. A bitstring is contiguous sequence of bits in memory.

By default, 8 bits (1 byte) is used to store each number in a bitstring, but you can manually specify the number of bits vis `::n` modifier to denote the size in `n` bits, or you can use the more verbose declaration `::size(n)`

In [13]:
<<42>> === <<42::8>>

true

In [14]:
<<3::4>>

<<3::size(4)>>

In [15]:
<<0::1, 0::1, 1::1, 1::1>> === <<3::4>>

true

Any value that exceeds what can be stored by the number of bits provisioned is truncated:

In [16]:
<<1>> === <<257>>

true

## Binaries

A binary is a bitstring where the number of bits is divisible by 8. That means that every binary is a bitstring, but not every bitstring is a binary. We can use the `is_bitstring/1` and `is_binary/1` function to demonstrate this.

In [17]:
is_bitstring(<<3::4>>)

true

In [18]:
is_binary(<<3::4>>)

false

In [19]:
is_bitstring(<<0, 255, 42>>)

true

In [20]:
is_binary(<<0, 255, 42>>)

true

In [21]:
is_binary(<<42::16>>)

true

we can pattern match on binaries/bitstrings

In [22]:
<<0, 1, x >> = <<0, 1, 2>>

<<0, 1, 2>>

In [23]:
x

2

In [24]:
<< 0, 1, x>> = <<0, 1,2, 3>>

MatchError: 1

Note that if you explicitly use `::` modifiers, each entry in the binary pattern is expected to match a single byte(exactly 8 bits). If we want to match on a binary of unknown size, we can use the `binary` modifier at the end of the pattern:

In [24]:
<<0, 1, x::binary>> = <<0, 1,2, 3>>

<<0, 1, 2, 3>>

In [25]:
x

<<2, 3>>

There are a couple other modifiers that can be useful when doing pattern matches on binaries. The `binary-size(n)` modifier will match `n` bytes in a binary:

In [26]:
<<head::binary-size(2), rest::binary>> = <<0,1,2,3>>


<<0, 1, 2, 3>>

In [27]:
head

<<0, 1>>

In [28]:
rest

<<2, 3>>

**A string is a UTF-8 encoded binary**, where the code point for each char is encoded using 1 to 4 bytes. Thus every string is a binary , but due to the UTF-8 standard encoding rules, not every binary is a valid string.

In [29]:
is_binary("hello")

true

In [30]:
is_binary(<<239,191,19>>)

true

In [31]:
String.valid?(<<239,191,19>>)

false

The string concatenation operator `<>` is actually a binary concatenation operator:

In [32]:
"a" <> "ha"

"aha"

In [33]:
<<0, 1>> <> <<2, 3>>

<<0, 1, 2, 3>>

Given that strings are binaries, we can also pattern match on strings:

In [34]:
<<head, rest::binary>> = "banana"

"banana"

In [35]:
head == ?b

true

In [36]:
rest

"anana"

However, remember that binary pattern matching works on bytes, so matching on string like "über" with multibyte characters won't match on the character, it will match on the first byte of that character

In [37]:
"ü" <> <<0>>

<<195, 188, 0>>

In [38]:
<<x, rest::binary>> = "über"

"über"

In [41]:
x

195

In [40]:
rest

<<188, 98, 101, 114>>

Above, `x` matched on only the first byte of the multibyte `ü` character

Therefore, when pattern matching on strings, it is important to use the `utf-8` modifier

In [42]:
<<x::utf8, rest::binary>> = "über"

"über"

In [43]:
x == ?ü

true

In [44]:
rest

"ber"

A charlist is a list of integers wher all the integers are valid code points. In practice, you will not com across them often, except perhaps when interfacing with Erlang, in particular when using older libraries that do not accept binaries as arguments.

In [45]:
'hełło'

[104, 101, 322, 322, 111]

In [46]:
is_list 'hełło'

true

In [47]:
'hello'

'hello'

In [48]:
List.first('hello')

104

You can see that instead of containing bytes, a charlist contains integer code points. By default, Iex will only output code points if any of the integers falls outside the ASCII range of 0 to 127

In [49]:
'hello'

'hello'

In [52]:
'hełło'

[104, 101, 322, 322, 111]

Inte

In [51]:
to_string 'hełło'

"hełło"

Interpreting integers as codepoints may lead to some surprising behavior. For example, if you are storing a list of integers that happen to range between 0 and 127, be default IEx will interpret this as charlist and it will display the corresponding ASCII characters.

In [53]:
heartbeats_per_min = [99, 97,116]

'cat'

You can convert a charlist to a string and back using the `to_string/1` and `to_charlist/1` functions

In [54]:
to_charlist "hełło"

[104, 101, 322, 322, 111]

In [55]:
to_string 'hełło'

"hełło"

In [56]:
to_string :hello

"hello"

In [57]:
to_string 1

"1"

String(binary) concat uses `<>` operator but charlists, being lists, use the list concat operator `++`

In [58]:
'this' <> 'fails'

CompileError: 1

In [58]:
'this' ++ 'works'

'thisworks'

In [59]:
"he" ++ "llo"

ArgumentError: 1

In [None]:
"he" <> "llo"

"hello"