
# Strings in julia


#### Characters: the `Char` type


Characters are simply symbols such as letters {`A, B, C,` ...} , punctuations symbols {`;,:,`...} or digits {`1,2,3,`...}. In English these characters are standardized together with a mapping to integer values between 0 and 127 by the ASCII standard. 

Julia has the type `Char` which is used to define a character. Characters are defined between single quotes.
```julia
x = 't'
typeof(x)
Char
```


In [60]:
x = 't'
typeof(x)

Char

We can convert `Char` types to integers to get the numeric integer associated to each charater. 

In [78]:
Int(' '), Int('!')

(32, 33)

We can also convert integers to characters

In [106]:
[Char(x) for x in 32:50]

19-element Array{Char,1}:
 ' ' 
 '!' 
 '"' 
 '#' 
 '$' 
 '%' 
 '&' 
 '\''
 '(' 
 ')' 
 '*' 
 '+' 
 ',' 
 '-' 
 '.' 
 '/' 
 '0' 
 '1' 
 '2' 

There are some "special" characters that do not have any special symbol assigned to them. These characters from the ASCII encoding are usually written using combinations standard symbols, for example `x = '\x01'` is the first ASCII character.


In [105]:
typeof('\x01')

Char

In [128]:
[Char(x) for x in 1:10]

10-element Array{Char,1}:
 '\x01'
 '\x02'
 '\x03'
 '\x04'
 '\x05'
 '\x06'
 '\a'  
 '\b'  
 '\t'  
 '\n'  

## ASCII characters and beyond


In order to verify if a character is in ASCII, julia has the function **`isascii`** function.

Unicode characters extend ASCII into a huge number of symbols. https://unicode-table.com/en/#hangul-jamo

In [130]:
isascii('c'), isascii('ç')

(true, false)


#### Strings: the `String` type

Strings are sequences of characters. Strings are defined between quotes. For example, `x = "This is a string"`, is a `string`.


In order to verify if a string is in ASCII, julia has the function **`isascii`** which returns `true` if all the characters of the `string` anre ASCII (and false otherwise).
```julia
println(isascii("hunter"), " ",  isascii("caçador"))
true false
```

There are many other characters used in non-English languages, including variants of the ASCII characters with accents and other modifications, related scripts such as Cyrillic and Greek, and scripts completely unrelated to ASCII and English, including Arabic, Chinese, Hebrew, Hindi, Japanese, and Korean. 


The Unicode standard tackles the complexities of what exactly a character is, and is generally accepted as the definitive standard addressing this problem. Depending on your needs, you can either ignore these complexities entirely and just pretend that only ASCII characters exist, or you can write code that can handle any of the characters or encodings that one may encounter when handling non-ASCII text. 

Julia makes dealing with plain ASCII text simple and efficient, and handling Unicode is as simple and efficient as possible. In particular, you can write C-style string code to process ASCII strings, and they will work as expected, both in terms of performance and semantics. If such code encounters non-ASCII text, it will gracefully fail with a clear error message, rather than silently introducing corrupt results. When this happens, modifying the code to handle non-ASCII data is straightforward.



In [31]:
a = "the house is big"
typeof(a)

String

In [30]:
println(isascii("hunter")," ",  isascii("caçador"))

true false
