# Lecture 4 - Theory
# Topic : Strings and Lists

## 1. Strings

### Overview

We can see a string as a list, only that each element of that list is a character. For example, the string "summer" has length 6, the first character being "s" and the last one being "r". As we will see, the order in which the letters come from is important, like the order in a list.

With strings (and lists), we can extract some of the characters (or elements) by using brackets. Note that all indices start at 0. In "summer", the $s$ is the character 0, $u$ is the character 1, ... and $r$ is the character 5.

Like in the ```range``` function, *a string*```[start: end]``` outputs the start$^{th}$ character up to the (end -1)$^{th}$ character of the string, where the indices start at 0.

In [1]:
string1 = "Hello"
string2 = "Summer"


# To take the characters individually 
print("character 0 :", string1[0])
print("character 1 :",string1[1])
print("character 2 :", string1[2])
print("character 3 :", string1[3])
print("character 4 :", string1[4])

# This would yield an Error
#print(string1[5])



character 0 : H
character 1 : e
character 2 : l
character 3 : l
character 4 : o


In [2]:
# To take the 3 first letters of "Hello", we write:
print("3 first characters of Hello :", string1[:3], "or equivalently", string1[0:3])

# To take the characters 1, 2, 3 of "Summer", we write:
print("Characters 1, 2, 3 of Summer (indices always start at 0):", string2[1:4])

# We can also begin from the end
# This gives the last 4 characters
# The "end" parameters is set to default (the end of the string) 
print("Last 4 characters of Hello :", string1[-4:])


3 first characters of Hello : Hel or equivalently Hel
Characters 1, 2, 3 of Summer (indices always start at 0): umm
Last 4 characters of Hello : ello


### Escape Characters

Strings are defined as a sequence of characters, but there also exist a a number of espace characters, which are non-printable, that get interpreted at runtime and are used to define things like tabulation, newline, carriage return (end line), etc. They are always made of a combination of the "\" character followed by a character (sometimes more than one). 

Below is a short list of some useful escape characters:

| Backslash notation |   Description   |
|:------------------:|:---------------:|
| \t                 | Tab             |
| \n                 | Newline         |
| \s                 | Space           |
| \b                 | Backspace       |
| \r                 | Carriage return |
| \e                 | Escape          |
| \cx                | Control-x       |

### String Special Operators

There exist a number of special operators for strings, which obviously allow performing operations on strings, like concatenating two strings, slicing, repeating, etc.

Below is a list of some of these operators:

In [3]:
a = "Hello"
b = "Python"

| Operator |                                     Description                                    |           Example           |
|:--------:|:----------------------------------------------------------------------------------:|:---------------------------:|
| +        | Concatenation - Adds values on either side of the operator                         | a + b will give HelloPython |
| *        | Repetition - Creates new strings, concatenating multiple copies of the same string | a*2 will give HelloHello   |
| []       | Slice - Gives the character from the given index                                   | a[1] will give e            |
| [ : ]    | Range Slice - Gives the characters from the given range                            | a[1:4] will give ell        |
| in       | Membership - Returns true if a character exists in the given string                | H in a will give 1          |
| not in   | Membership - Returns true if a character does not exist in the given string        | M not in a will give 1      |

### Triple Quotes

Triple quotes in Python allow strings to span multiple lines, and it is defined between three successive quotes on each side, using either single quotes characters (') or double quotes characters (").

In [4]:
paragraph = """this is a long string that is made up of
several lines and non-printable characters such as
TAB ( \t ) and they will show up that way when displayed.
NEWLINEs within the string, whether explicitly given like
this within the brackets [ \n ], or just a NEWLINE within
the variable assignment will also show up.
"""

print (paragraph)

this is a long string that is made up of
several lines and non-printable characters such as
TAB ( 	 ) and they will show up that way when displayed.
NEWLINEs within the string, whether explicitly given like
this within the brackets [ 
 ], or just a NEWLINE within
the variable assignment will also show up.



**TIP:** In practice, triple quotes may be used to comment out multiple lines of code, instead of inserting a (#) character at the beginning of each line to be commented out:

### Built-in String Methods (functions)

There exist a number of Python built-in functions to manipulate strings (and there exist many more with additional modules and third party packages... 😉 )

Below is a list of such built-in functions:

|               Function               |                                                                          Description                                                                          |
|:------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------:|
| count(str, beg= 0,end=len(string))   | Counts how many times str occurs in string or in a substring  of string if starting index beg and ending index end are given.                                 |
| find(str, beg=0 end=len(string))     | Determine if str occurs in string or in a substring of string  if starting index beg and ending index end are given returns  index if found and -1 otherwise. |
| len(string)                          | Returns the length of the string.                                                                                                                             |
| lower()                              | Converts all uppercase letters in string to lowercase.                                                                                                        |
| upper()                              | Converts lowercase letters in string to uppercase.                                                                                                            |
| replace(old, new [, max])            | Replaces all occurrences of old in string with new or at most  max occurrences if max given.                                                                  |
| strip()                              | Performs ltrip() and rstrip(), which respectively remove leading  and trailing whitespaces.                                                                   |
| split(str="", num=string.count(str)) | Splits string according to delimiter str (space if not provided)  and returns list of substrings; split into at most num substrings  if given.                |

In [5]:
string1 = "HelloPython"
print("Numbers of o :", string1.count("o"))
print("Len of string :", len(string1))
print("Find a P :", string1.find("P"))
print("Find a p :", string1.find("p"), "which means.. never")

Numbers of o : 2
Len of string : 11
Find a P : 5
Find a p : -1 which means.. never


## 2. Lists

Lists are important to keep an arbitrary number of variables inside 1 variable (the list) in order to be able to make some operations on it. Unlike strings, we can store any type of variable inside of a list. We have access to those elements via their index. Lists are one of the three basic types of Sequence Data Types (the other two will be covered in the next lecture).

In [6]:
# Simple list containing integers
# Lists are delimited by square brackets []
# Elements are delimited by commas.
lst = [0,1,3,-5]
#2 different ways of printing this list
print(lst)
print([0,1,3,-5])

[0, 1, 3, -5]
[0, 1, 3, -5]


In [7]:
my_list = [3, "Hello", 7.5, True]
print("My list is", my_list, "\n")
print("First element (element #0) :", my_list[0])
print("Second element (element #1) :", my_list[1])
print("Third element (element #2) :", my_list[2])
print("Last element (element #3) :", my_list[3])

My list is [3, 'Hello', 7.5, True] 

First element (element #0) : 3
Second element (element #1) : Hello
Third element (element #2) : 7.5
Last element (element #3) : True


The function ```len``` gives the length of a list (number of elements in it) and is very useful.

In [8]:
lst = [3,4,5, [2,"hello"], True, False]
print(len(lst))

6


In [9]:
# we can use elements like variables as well

my_list = [3, "Hello", "Bonjour", 5, 7.5]

print(my_list[1]+my_list[2])
print(my_list[0]+my_list[3]+my_list[4])

HelloBonjour
15.5


We can change the value of the elements by re-assigning it like normal variables.

In [10]:
lst = [0,1,2,3,4]
# In order to change the element 1, we do :
print("The list before changing element 1:", lst)
lst[1] = "something else"
print("The list after changing element 1 :", lst)

The list before changing element 1: [0, 1, 2, 3, 4]
The list after changing element 1 : [0, 'something else', 2, 3, 4]


In [11]:
# We can also have lists inside lists (concatenation)

list_of_list = [[1,2,3], [5,6,7, 8, 9, 10], ["hello", "summer"]]

print("The list: ",list_of_list, "\n")


The list:  [[1, 2, 3], [5, 6, 7, 8, 9, 10], ['hello', 'summer']] 



In [12]:
# In that case, what is the element #0 ?
print("Element 0: ", list_of_list[0])
print("Element 1: ", list_of_list[1])
print("Element 2: ", list_of_list[2], "\n")

# We can access the elements inside the lists
print(list_of_list[1][2])

Element 0:  [1, 2, 3]
Element 1:  [5, 6, 7, 8, 9, 10]
Element 2:  ['hello', 'summer'] 

7


Accessing multiple elements

In [13]:
my_list = ["hi", 5, -45, 2, 3.7, 4, 7, 10]

# To access the 3 first elements:
print(my_list[:3])

# To access elements 3, 4, 5 and 6:
print(my_list[3:7])

# To access every element but the last
print(my_list[:7])
print(my_list[:len(my_list)-1])
print(my_list[:-1])

# TO access last 4 elements:
print(my_list[:len(my_list)-4])
print(my_list[:-4])



['hi', 5, -45]
[2, 3.7, 4, 7]
['hi', 5, -45, 2, 3.7, 4, 7]
['hi', 5, -45, 2, 3.7, 4, 7]
['hi', 5, -45, 2, 3.7, 4, 7]
['hi', 5, -45, 2]
['hi', 5, -45, 2]


There are some already built-in function, like ```sum```, that outputs the sum of a list that contains numerical values

In [14]:
lst = [1,2,3,4,5,6]
print(sum(lst))

21


You already have programmed a function that averages 2 numbers. Now we can do it for an arbitrary number of elements in a list!

In [15]:
def average(a, b):
    return (a+b)/2

In [16]:
def average_list(lst):
    sum_of_elements = 0
    for elem in lst:
        sum_of_elements += elem
    print("The sum is", sum_of_elements)
    print("There is", len(lst), "elements")
    
    avg = sum_of_elements/len(lst)
    print("So the average is", avg)
    return avg
        

In [17]:
lst =[1,2,3,4,5,6]
average_list(lst)

The sum is 21
There is 6 elements
So the average is 3.5


3.5

In [18]:
# Or simply use the built-in function sum:
print(sum(lst)/len(lst))

3.5


### List comprehension

A list comprehension is a syntactic construct available in some programming languages for creating a list based on existing list. It follows the mathematical set builder notation. For example the set of even numbers is defined as $$\{2n : n \in N\}$$

In [19]:
# Generates the list of even numbers less than 25:
lst = [i for i in range(0, 25, 2)]
print("First way to generate it :",lst, "\n")
lst2 = [i for i in range(25) if i%2 ==0]
print("Second way to generate it :",lst2)

First way to generate it : [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24] 

Second way to generate it : [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24]


In [20]:
lst3 = [elem for elem in lst2 if elem%4==0]
print(lst3)

[0, 4, 8, 12, 16, 20, 24]
