# Data Structures 1

**References**:
+ [ThinkPython (book)](https://allendowney.github.io/ThinkPython/)

**Content**:
+ Strings
    + Indexing
    + Slices
    + Immutable
    + Methods
    + Regular expressions
+ Lists
    + list operations
    + list methods
    + list comprehensions
+ Working with lists and strings
+ Objects, Values, & Aliasing 

## Strings

+ A string is a **sequence of characters**
+ A **character** can be a letter, a digit, a punctuation mark, or whitespace

### Indexing
+ select single letters via indexing (e.g., `word[2]`)
+ the index can be an integer, a variable, or an expression
+ Note: indexing starts in Python with `0`
+ use negative index to count backward from the end

In [1]:
# select a letter using indexing
# index as integer

# index as variable

# index as expression

# negative index: get last letter 


### Slices
+ A **segment of a string** is called a slice
+ different types of slices:
    + closed form: `[n:m] -> [n,m)` returns the part of the string from the nth character to the mth character (excluding the last letter)
    + open start: `[:m] -> [...,m)` slice starts at the beginning of the string and goes to the mth character
    + open end: `[n:] -> (n,...]` slice starts at the nth character and goes until the end
    + empty set: `[n:n] -> (n,n)` yields an empty element

In [2]:
# slices
# select the letter 1,2,3

# select first three letters

# select last 3 letters

# empty element


### Immutability
+ Strings are **immutable** (i.e., you can’t change an existing string by assigning to it a new value)

In [3]:
# strings are immutable

# work around


### Comparisons
+ evaluate whether
    + two strings are equal `==`
    + one string comes in alphabetic order before `<` or after `>` another one
    + uppercase comes always before lowercase

In [4]:
# check whether two strings are equal

# check whether first string comes before "c" in alphabet

# uppercase comes before lowercase


### Methods
+ Strings provide methods that perform a variety of useful operations (overview of methods for a string type: `dir(str)` or `dir("some_string")`)
+ A method call is called an **invocation** (e.g., in the case of `fruit.upper()`, we would say that we are invoking `upper` on `fruit`.
+ **Example methods**:
    + `lower`, `upper`
    + `replace`
    + `split`, `join`
    + `startswith`, `endswith`

In [5]:
# have a look into all methods of strings

# checkout how a method works

# example methods


### Regular expressions for working with patterns in a text document
+ A module called `re` ([Documentation](https://docs.python.org/3/library/re.html)) provides functions related to regular expressions
+ it allows for a lot of tools such as
    + check whether specific patterns appear in the text  `re.search(pattern, text)`
    + if pattern is not in the text the method will return an empty element
    + check for two different types of one pattern (e.g., `re.search("col(o|ou)r", text)`)
    + string substitution with `re.sub(pattern, repl, string)`

In [6]:
# import the re module

# create a variable called abstract 

# check whether "Bayes" appears in abstract

# using indexing to check span

# returns nothing if pattern is not in string 

# check whether null_result is empty

# check for different types of patterns

# string substitution


### 🤩 You are ready to try the *String exercises* in the file `exercises-ds`

## Lists
Python’s basic container type is the list. A list is a **sequence**. 

+ We can define our own list with square brackets `[ ]`
+ elements in a list can be of any type (even of type list; **nested** structure)
+ a list without any elements is an **empty list** with length zero
+ we can get the **length of a list** by using `len(list)` 
+ to access an element in a list we can use the same indexing methods as with strings
    + but list indexing can be come a bit more complex when we have nested strcutures 
+ In contrast to a string, a list is **mutable** (i.e., elements in a list can be modified by assigning new elements)
+ we can use the `in` operator to check whether an element appears in the list
    + note: the `in` element checks only whether an element is in the first hierarchy of a nested list, but not deeper hierarchies.

In [10]:
# list elements can be of any type
random_list = [1, "str", [1,2], 1.]
empty_list = []

# length of a list
len(random_list)
len(empty_list)

# indexing nested structures
random_list[2][1]

# lists are mutable
## replace the first element with a string
random_list[0] = 3
random_list

# check whether an element appears in the list
"str" in random_list
# elements in nested lists are not detected
2 in random_list

False

### List operations
+ allowed operations for lists
    + `+` for concatenation 
    + `*` for repetition 

In [14]:
# create two lists
list1 = ["Hello", "I", "am"]
list2 = ["Flo"]

# concatenate two lists
list1+list2

# repeat list 3 times
list1*3

# what about / ?
# list1 / list2

['Hello', 'I', 'am', 'Hello', 'I', 'am', 'Hello', 'I', 'am']

### List methods
+ lists come with several built-in methods (see `dir(list)` or `dir([1,2])`)
+ Example methods are:
    + `append`, `extent`
    + `remove`, `pop`
    + `reverse`, `sort`

In [25]:
# see methods of lists
dir(list)
# example methods
# list1.append("Flo")
# list1.remove("Flo")
# list1.pop(0)
num_list = [1,2,3,4]
num_list.reverse()
num_list

[4, 3, 2, 1]

### List comprehensions
+ A quick way to build a sequence is using a **list comprehension**
+ list comprehensions can be very convenient as they are short and compact, but they can be difficult to read by others
+ use list comprehensions carefully and consider always readability
+ they can be very helpful for initializing multiple lists before running a for-loop

In [42]:
# Build a list of Unicode code points from a string
# technique 1) using a for-loop
symbols = '$%&§&/('
codes = []
for symb in symbols:
    codes.append(symb)

# technique 2) using list comprehensions
[symb for symb in symbols]

# use list comprehension with if-else conditional
## multiply a number by itself if it is odd and add a number by itself if it is even
[x+x if x%2 == 0 else x*x for x in range(8)]

# use list comprehension with for-loop
# Difference between all combinations of two number y and x
[x-y for x in range(4) for y in range(4)]

# alternative: using list comprehension to initalize multiple lists
user = []
performance = []
data = []

user, performance, data = [[] for _ in range(3)]
user

[]

In [38]:
# just a side note about iteration
for x in range(8):
    print(x)

list(range(8))


0
1
2
3
4
5
6
7


1

## Working with lists and strings
+ convert a string into a list: `list(string)`
+ break down a string into single list elements: `string.split()`
+ join single strings into a text string

In [48]:
# print variable 'abstract'
abstract = "Priors are a key feature of the Bayesian paradigm."

# convert string into list
list(abstract)

# break down text string into words in list
in_words = abstract.split()
in_words
# add a word into the list
in_words.insert(3, "wonderful")
in_words
# joint single strings again using whitespace
" ".join(in_words)

'Priors are a wonderful key feature of the Bayesian paradigm.'

## Objects, Values, & Aliasing

+ two situations:
    + (1) variables refer to the same object that has a value (both variables have the same `id()`)
    + (2) variables refer to different objects that have each one value; but the values are the same (variables have different `id()`)
+ the `id()` function returns a unique id for the specified object
+ to check whether two variables refer to the same object, you can use `is`

In [52]:
# situation (1)
# a and b are identical
a = "house"
b = "house"

# two variables refer to the same object 
a is b
print( id(a), id(b) )

# situation (2)
# c and d are equivalent
c = [1,2,3]
d = [1,2,3]

c is d
print( id(c), id(d) )

2622840412976 2622840412976
2622840904896 2622840909056


+ create a new object by assigning another object to it (e.g., `b = [1,2,3]; a = b`)
    + `a` and `b` refer to an identical object
    + for mutable objects (e.g., lists) any changes to `b` will also transfer to `a`
    + for immutable objects (e.g., strings) this is less a problem
+ An object with more than one reference has more than one name, so we say the object is **aliased**.
+ if you want to create a new object as a copy of another object, you can use `a = b.copy()`
    + this will create two different objects with the same value 

In [57]:
# create variable with list of three elements
a = [1,2,3]
# assign variable to new variable
a2 = a
# print new variable
a2
# both objects are identical
a is a2
# if you change a you will automatically change a2
a[1] = 4
print(a, a2)
# but what if you just wanted a copy of a? Then use copy
b = [1,2,3]
b2 = b.copy()
b[1] = 4
print(b, b2)
print(id(b), id(b2))

[1, 4, 3] [1, 4, 3]
[1, 4, 3] [1, 2, 3]
2622840824320 2622840909120


Why is it no problem for immutable objects such as strings?

In [59]:
# Let's try it out
var1 = "some string"
var2 = var1
print(var1, var2)
# modify var2
var2 = var2+"!"
print(var1, var2)
# indeed, changing var2 did not cause changes in var1

some string some string
some string some string!


### 🤩 You are ready to try the *List exercises* in the file `exercises-ds`