### CS102/CS103

Prof. Götz Pfeiffer<br />
School of Mathematics, Statistics and Applied Mathematics<br />
NUI Galway

# Lecture 3: Computing with Strings

We have seen the different types of numbers that a `python` 
program can handle and perform the standard arithmetic 
operations and more on.

Another basic data type in `python`, called `str` (which is short for **string**) is concerned with (small or large) bits of text. 
**Text processing** is an important feature of any programming
language, as much of the exisiting data and many of the problems
a program can deal with are text based.

Before discussing its properties
and the operations that apply and make it easy to build
powerful text processing applications, we look at a simple
example of a `python` program.

## A Python Program

The following code represents a complete `python` program
and illustrates the typical workings of such:

* It asks its user for some input (a number between $0$ and $1$)

* It processes the data

* It outputs a response.

In fact, this program repeats the processing and output steps
a number of times.

In [1]:
# A simple computer program illustrating chaotic behaviour
def chaos():
    print("This program illustrates a chaotic function")
    x = eval(input("Enter a number between 0 and 1: "))
    for i in range(10):
        x = 3.9 * x * (1 - x)
        print(x)

Highlighted by different colors, we can see in this example some of
the typical elements of a `python` program.

The first line
```python
    # A simple computer program illustrating chaotic behaviour
```
is not really part of the program, it is a **comment** for human
readers of the code, briefly explaining its purpose.  Comments in
`python` start with a has symbol (`#`) and continue from there
to the end of the line.

Text enclosed in double quotes (`"`) like
```python
    "Enter a number between 0 and 1: "
```
is a **string**.  More on that later.

This function **calls** other functions: `print()`, `input()`,
`eval()` and `range()`.

Some words in the code, `def`, `for` and `in` are **keywords**
that have a specific meaning in all `python` programs.

Other words, `i` and `x`, are **variables** with a value that can
change over time and between different `python` programs.

The line
```python
   def chaos():
```
starts a **function definition**, which consists of all the code
that is indented under this heading.  It gives the **name**
`chaos` to the function to be defined.  Note how the line
ends with a colon (`:`).  This symbol is not optional, omitting
it would cause an error.

The line
```python
   for i in range(10):
```
starts a **for loop** consisting of all the code
that is indented under this heading.  In this case, the commands
in the loop are to be repeated 10 times.  This loop is a first
example of the expressive power that comes with programming languages:
repeated actions only need to be spelled out once.

Indentation by **spaces** at the beginning of a line matters in
`python` and is used to indicate **groupings**, for example of the lines
that make up a function definition, or the ones making up a `for` loop.  This concept allows **blocks** of code to be **nested**
inside each other ...

So, this code is merely a *definition* of a 
program, called a **function** in `python`.  In order to get
it to do something, we need to **call** the function (by its **name**):

In [2]:
chaos()

This program illustrates a chaotic function
Enter a number between 0 and 1: 0.25
0.73125
0.76644140625
0.6981350104385375
0.8218958187902304
0.5708940191969317
0.9553987483642099
0.166186721954413
0.5404179120617926
0.9686289302998042
0.11850901017563877


The behavior of the program is _chaotic_, because a small change in the
input cause a big change in the output.

In [3]:
chaos()

This program illustrates a chaotic function
Enter a number between 0 and 1: 0.26
0.75036
0.73054749456
0.7677066257332165
0.6954993339002887
0.8259420407337192
0.5606709657211202
0.9606442322820199
0.14744687593470315
0.49025454937601765
0.9746296021493285


## Strings

### String objects

A **string** is a sequence of characters, enclosed in double quotes (`"`) or single quotes (`'`).

In [4]:
"double"

'double'

In [5]:
'single'

'single'

In [6]:
print("double", 'single')

double single


In [7]:
type("double")

str

### Length, Indexing and Slicing

A string is made of characters.
The number of characters in a string is its **length**,
which can be obtained by calling the `len()` function.
The individual characters making up a string
can be accessed through their position in the string, using
an **index** inside a pair of square brackets (`[ ... ]`).

In [8]:
len("python")

6

In [9]:
#012345
"python"[1]

'y'

```python
"python"[6]   # this will cause an error
```

* Note how a character is a string of length $1$.

* And how the positions begin at $0$ and end at length minus one.

* Negative indices can be used, they count from the back of the string.

In [10]:
"python"[-1]

'n'

```python
"python"[-7]  # this will cause an error
```

**Slicing.** 
Square brackets can be used to extract substrings of length bigger than $1$.  Here two positional indices are needed, one to mark the
start and one to mark the end of the substring, separated by a colon (`:`).

In [11]:
#01234567890123
"We love python"[3:7]

'love'

* Note how the substring starts at the first position
and stops just short of the second position.  

* In this way the
difference between the two indices ($7-3$) equals the
length of the substring ($4$).

* Negative indices can be used for slicing.

* If any of the indices is omitted, it defaults to
th beginning ($0$) or to the end ($-1$) of the string

In [12]:
"python programs are fun"[-3:]

'fun'

In [13]:
"python"[:]  # a quick copy of the string

'python'

### String Arithmetic

Strings can be added and multiplied with natural numbers.

* **Concatenation.**  The `+` operator can be used to 
glue two or more strings together into a new one.

In [14]:
"hello " + "world!"

'hello world!'

* **Repetition.**  To repeat a string a fixed number of times, multiply the string with that number.

In [15]:
"ab" * 5

'ababababab'

### Other String Operations

There are many **methods** for string manipulations.

In [16]:
"a little string".capitalize()

'A little string'

In [17]:
"A Title".center(20)

'      A Title       '

In [18]:
"ATCGACTGATCGATCGTACGAT".count("AT")

4

In [19]:
"ATCGACTGATCGATCGTACGAT".find("ACT")

4

In [20]:
"ATCGACTGATCGATCGTACGAT".lower()

'atcgactgatcgatcgtacgat'

In [21]:
"ATCGACTGATCGATCGTACGAT".replace("AT", "TA")

'TACGACTGTACGTACGTACGTA'

In [22]:
"a little string".upper()

'A LITTLE STRING'

* The `string` module contains more useful string manipulation operations

### The ASCII code

A `python` program is usually composed of the characters in 
a fixed set of 128 symbols, called the ASCII code, based on the symbols that are found
on US keyboards.  Here, each character corresponds to a **number**
between $0$ and $127$.

<pre>
 32      48 0    64 @    80 P    96 `   112 p 
 33 !    49 1    65 A    81 Q    97 a   113 q 
 34 "    50 2    66 B    82 R    98 b   114 r 
 35 #    51 3    67 C    83 S    99 c   115 s 
 36 $    52 4    68 D    84 T   100 d   116 t 
 37 %    53 5    69 E    85 U   101 e   117 u 
 38 &    54 6    70 F    86 V   102 f   118 v 
 39 '    55 7    71 G    87 W   103 g   119 w 
 40 (    56 8    72 H    88 X   104 h   120 x 
 41 )    57 9    73 I    89 Y   105 i   121 y 
 42 *    58 :    74 J    90 Z   106 j   122 z 
 43 +    59 ;    75 K    91 [   107 k   123 { 
 44 ,    60 <    76 L    92 \   108 l   124 | 
 45 -    61 =    77 M    93 ]   109 m   125 } 
 46 .    62 >    78 N    94 ^   110 n   126 ~ 
 47 /    63 ?    79 O    95 _   111 o   127 DEL 
</pre>

The `python` functions `ord()` and `chr()` convert between
a character and its numercial (ASCII) value:

* `ord()` returns the ASCII value of a character

* `chr()` returns the character for a given ASCII value

In [23]:
ord('A')

65

In [24]:
chr(65)

'A'

## Summary: Strings

* Strings are **sequences** of characters, and **characters** are strings of length $1$.

* String **literals** are enclosed in single (`'`) or double (`"`) quotes.

* There are builtin operations for **concatenation** (`+`),
**repetition** (`*`), **indexing** (`[]`), **slicing** (`[:]`) and **length** (`len()`).

* Internally, characters are represented by **numerical codes** (ASCII),
the functions `ord()` and `chr()` convert between the two representations.

* String objects support many useful **string manipulation** functions.

* Additional string methods are defined in **modules** such as the `string` module.