# Strings

## Table of content

1. [Strings](#Strings)
   1. [Table of content](#Table-of-content)
2. [Definitions](#Defintions)
   1. [Position Si](#Position-Si)
      1. [repr](#repr)
   2. [Substring Sij](#Substring-Sij)
   3. [Length N](#Length-N)
3. [String Methods](#String-Methods)
   1. [split](#split)
   2. [find](#find)
   3. [in](#in)
4. [The module re](#The-module-re)
   1. [split](#split)
   2. [wildcard in re](#wildcard-in-re)
   3. [finditer](#finditer)
   4. [findall](#findall)
5. [Modifying strings](#Modifying-strings)
   1. [upper and lower](#upper-and-lower)
   2. [strip](#strip)
   3. [replace](#replace)
   4. [Concatenating strings](#Concatenating-strings)
6. [Working with files](#Working-with-files)
   1. [open](#open)
   2. [write](#write)
   3. [saving](#saving)
11. [Exercises](#Exercises)
    1. [Exercise 1 - Simple Strings](#Exercise-1---Simple-Strings)
    2. [Exercise 2 - String methods](#Exercise-2---String-Methods)
    3. [Exercise 3 - Working with Strings](#Exercise-3---Working-with-strings)
    4. [Exercise 4 - A bot more about Strings](#Exercise-4---A-bit-more-about-Strings)
12. [Extras](#Extras)
    1. [Accessing the letters of the alphabet](#Accessing-the-letters-of-the-alphabet)

# Definitions

- Alphabets are the collection of symbols or letters that can make up the string
  - Σdna = {A,G,C,T} ; for nucleotides
  - Σamino = {A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y}; for amino acids
- Each **Element E** of an alphabet is named symbol or letter
- **String S** is a juxtaposition of symbols or letters from the alphabet Sigma
  - S = (ΣE) (An order of symbols is defined)
  - E.g: S = „ATGGAATGCTAATAG“
- **Position Si** is the letter E on position i in the string (starts at position 0!)
- **Substring Sij** are the letters from position Si to Sj excluding the latter
- **Length N** of a string is the number of letters in it
- **Split Spliti** is the splitting of a string in two substrings at position i. Two new strings S0i and SiN are created
- **Split SplitDelimitor** is the splitting of a string in multiple substrings divided at the delimitor (which can be anything, but examples could be a space, a tab, a comma etc.). The number of substrings is equal to the number of occurrences of the delimitor +1
- **Prefix n** is a substring of the string S which contains the first n letters of S
- **Suffix n** is a substring of the string S which contains the last n letters of S

## Position Si

We can access individual letters/symbols of a string similar to accessing elements of a list. 

- Indexing starts at 0
- Negative indices access the string from the back
- It is only possible to access elements that exist, otherwise there is an out of range error.
- linebreaks (`\n`, `\t` etc.) which need an escape character are considered *one character*

```
S = "Examplestring"
S[0] = "E"
S[1] = "x"
S[-1] = "g"
S[20] = ERROR
```




In [8]:
S = "Examplestring"
print(S[0])
print(S[1])
print(S[-1])
print(S[15])

E
x
g


IndexError: string index out of range

### repr

If we access a character that is represented by nothing (e.g. space, tab, linebreak) this shows in print like we didn't access anything. Using `repr`inside `print` allows us to see them better by adding `''`.

In [3]:
S = "Example string"
print(S[7])
print(repr(S[7]))

 
' '


In [7]:
S = "Example\nstring"

print("####")
print(S)

print("####")
print(repr(S))

print("####")
print(S[7])

print("####")
print(repr(S[7]))

print("####")

####
Example
string
####
'Example\nstring'
####


####
'\n'
####


## Substring Sij

To generate a substring the string is spliced. FOllows the splicing rules of lists

- start index i is included
- end index j is excluded
- j can be higher than the length of the string, it then just goes to the last letter/symbol
- the first or last may be left empty if we want everything from the start/till the end

```
S = "Examplestring"
S[i:j]

S[2:5] = "amp"
S[:5] = S[0:5] = "Examp"
S[5:] = S[5:12] = "lestring"
```




In [9]:
S = "Examplestring"
print(S[2:5])
print(S[:5])
print(S[5:])
print(S[5:50])

amp
Examp
lestring


## Length N

`len(S)` counts the number of symbols 

- including spaces, commas, linebreaks etc.
- `\n` etc. count as *one character*

In [12]:
S = "Examplestring"
S2 = "Example string"
S3 = "Example\nstring"

print(f"S has length {len(S)} and S2 and S3 have length {len(S2)} and {len(S3)}")

S has length 13 and S2 and S3 have length 14 and 14


# String methods

## split

This splits a string at the occurence of a specific symbol/letter, which is a built in string method. 

- It will result in a list with the substrings
- default is space as delimitor
- otherwise the delimitor can be specified as an argument (can be anything even several letters)
- the delimitor is *removed* from the list and is lost!
- If the delimitor is not encouteres split returns the entire string in a list of ine element

Syntax:

```
string.split()

#to use , as delimitor:
string.split(",")
```

In [26]:
S = "This is an example"
split_list = S.split()
print(split_list)

['This', 'is', 'an', 'exmample']


In [30]:
S = "This is an example, so we need to make a bigger sentence"
split_list = S.split(",")
print(split_list)

['This is an example', ' so we need to make a bigger sentence']


## find

The string method `find` allows us to search for substrings.

- it returns the start index of the *first* occurence of the search term
- case sensitivity applies
- if the search term is not present find returns -1

Syntax

```
string.find("term")
```




In [32]:
S = "This is an example"
idx = S.find("is")
print(idx)

idx2 = S.find("IS")
print(idx2)

2
-1


## in

If we just want to know if a substring is present and don't care *where* it percisely is we can use `in` 

- it returns `True` or `False`

Examples:

    print("term" in string)
    
    if "term" in string:
        do something

In [33]:
S = "This is an example"
print("is" in S)

if "is" in S:
    print("Yeah")

True
Yeah


# The module re

This module contains functions for regular expression operations. It is part of the standard python libraries.

- contains a lot of useful methods
- we will discuss only 3, but there are many more
- we use the functions with the point notation

import Syntax:

    import re

## split

similar to the built in `split`

- several delimitors allowed, separated by pipes `|`
- two arguments
  - the delimitor(s)
  - the string
- Returns a list with the substrings
  - if the string starts with a delimitor the first element of the result list is an empty string
  - if two delimitors follow each other this is also represented by an emtpy string
  - otherwise the delimitor is simply deleted like the behaviour of built-in `split`
  - The reason for this:
    - `split` opens a substring, which is closed when it comes across a delimitor.
    - If the first symbol is a delimitor the substring is closed immediately after being opened and is emtpy.
    - Same if two delimitor follow each other: An emtpy string is produced
    - while built in `split` ignores these emtpy strings `re.split` will include them.
    - to exclude the empty strings see example below

Syntax:

    re.split("delimitor1|delimitor2", string)

In [59]:
import re

S = ",This is, a long, sentence; with weird punctuation,;!"
S_split0 = re.split(",|;",S)

print(S_split0)

#To exclude the emtpy string:
S_split = [elem for elem in S_split0 if len(elem)>0]
print(S_split)

['', 'This is', ' a long', ' sentence', ' with weird punctuation', '', '!']
['This is', ' a long', ' sentence', ' with weird punctuation', '!']


## wildcard in re

There is a single wildcard in re: `.`

- if `.` is to be used as a delimitor it needs to be combined with an escape character: `\.`
  - since Python expects certain symbols after `\` and `.` is not one of them this produces a syntax warning, altough it still works
  - the correct syntax would then be `\\.`
- `.` stands for exactly one character

In [67]:
import re

S = "This is an example of two senteces. So I need a second sentence."
print(re.split("\.",S))

['This is an example of two senteces', ' So I need a second sentence', '']


  print(re.split("\.",S))


In [66]:
import re

S = "This is an example of two senteces. So I need a second sentence."
print(re.split("\\.",S))

['This is an example of two senteces', ' So I need a second sentence', '']


## finditer

finds the start index od *all* occurences of a search term

finditer returns a generator which makes usage a bit complicated to get to the indices

Syntax using a list comprehension:

    idcs = [m.start() for m in re.finditer("a",S3)]

Explanation:

- `finditer` uses two arguments: the search term and the string. 
- It produce an iterator which contains the match objects (= the found instances)
- by looping over this iterator we can look at the match objects
- An easy way o get to the start indices:
  - these match objects have a method `start` which we can call by point notation and which contains the indices
  - by looping over the the match objects we can save the indices in a list
  - similarly there is a method called `end` which tells us where the string ends

In [90]:
S = "This is an example, of two senteces, So I need a second, sentence."

idcs = [m.start() for m in re.finditer(",",S)]
print(idcs)

[18, 35, 55]


In [91]:
# accessing the match objects
test1 = re.finditer(",",S)

for m in test1:
    print(m)
    print(m.start())
    print(m.end())
    #print(dir(m))

<re.Match object; span=(18, 19), match=','>
18
19
<re.Match object; span=(35, 36), match=','>
35
36
<re.Match object; span=(55, 56), match=','>
55
56


## findall

To see how *how often* my substring occurs without caring where they occur `findall` is a good option

- findall will return a list with the occurences (which all look the same)
- the length of that list is the number of occurences

Syntax:

    Sf = re.findall("term",string)
    len(Sf)

In [92]:
S = "Hello, This is a long example"
Sf = re.findall("l",S)
print(Sf)
print(len(Sf))

['l', 'l', 'l', 'l']
4


# Modifying strings

sometimes we need to make changes to the strings, e.g. set sequence data to all uppercase or all lowercase so case sensitivity doesn't mess things up

## upper and lower

upper and lower are a built in string methods which turns a string into all uppercase(all lowercase

- we assign a new variable
- S in itself remains unchanged, unless I reassign it onto itself!
- upper and lower do not work inplace

```
S = "This is a string"
S_upper = S.upper()
S_lower = S.lower()
```



In [119]:
S = "This is a string"
S_upper = S.upper()
S_lower = S.lower()

print(S)
print(S_upper)
print(S_lower)

This is a string
THIS IS A STRING
this is a string


## strip

strip removes whitespaces at the beginning or end of the string

- like upper and lower we need to assign a new variable
- doesn't change the originial string, unless I reassign it into itself

Syntax:

```
S = " This is a string"
S1 = S.strip()
```


In [120]:
S = " This is a string"
S1 = S.strip()
print(repr(S))
print(repr(S1))

' This is a string'
'This is a string'


## replace

replace changes one substring into antother.

- takes two arguments
  - the substring I want to replace
  - what I want to replace it with

Syntax:

```
S = "This is a string"
Sreplace = S.replace("This","That")
```

Example how to get rid of all spaces, linebreaks and tabstops:

```
replacements = [" ", "\n","\t"]
for repl in replacements:
    S2 = S2.replace(repl,"")
```




In [121]:
S = "This is a string"
Sreplace = S.replace("This","That")
print(repr(S))
print(repr(Sreplace))

'This is a string'
'That is a string'


## Concatenating strings

By using the `+` operator it is possible to simply combine two strings. They are simply added together with nothing in between

```
S1 = "This is a string"
S2 = "This is another string"
S3 = S1+S2

S3 = "This is a stringThis is another string"
```


In [123]:
S1 = "This is a string"
S2 = "This is another string"
S3 = S1+S2
print(S3)
S4 = S1 + " " + S2 + "!"
print(S4)

This is a stringThis is another string
This is a string This is another string!


# Working with files

It is realatively simple to create files in Python. The first step is always opening a file

## open

- Takes two arguments
  - the file name
  - what we want to do with the file
    - "w": writing, overwrites existing content
    - "a": appending, adds to an existing file (also works on non-exisiting files)
    - "r": reading, works only for existing files
- we always need to assign the file to a variable!!

```
file = open("TestFile.txt","w")
```

## write

to write into a file we use the built in method write

- takes a single argument
- has to be a string!
- doesn't create linebreaks, if we want them we have to add them!!
- to add several strings they may be added by + 
- f strings can be created as a variable and thus inserted into the file
- several write statments underneath each other are alled added to the file one after the other
  - this is the part where linebreaks are not added!! its like + with strings
- the file doesn't need to exist for "w", it is created in the working directory when we close it

```
a = "test"
b= 34
file.write("string")

file.write("string1" + a + str(234) + str(b) + "\n")
```

## saving

to save what was written into the file we close it

```
file.close()
```

alternatively we use the whole in a with statement. This is then saved automatically

```
with open("Filename.txt", "w") as with_file:
    with_file.write(string)
```



In [150]:
file = open("Test.txt","w")
file.write("Hello world!\n")
# add a visual separator, 20 x #
file.write(20*'#' + "\n")
file.write("My name is Lisa")
file.close()

In [149]:
with open("Test2.txt", "w") as with_file:
    with_file.write("hello garden\n")
    with_file.write(50*'-' + "\n")
    with_file.write("My name is Lisa R\n")

In [151]:
file = open("Test.txt","a")
file.write("\nSome more lines\n")
file.close()

# Exercises

## Exercise 1 - Simple Strings

- Write a Python script that defines five string variables containing these Strings:
  - “AAAAAAAAAAAAAAAAA”
  - “ACTGACTGACTGACTGACTG”
  - “HAIMGVVFTWIMALACAAPPLVGWSRY”
  - “SSSIYNPVIYIMLNKQFRNCMLTTLCCGKNPLG”
  - “PFSNVTGVVRSPFEQPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQH”
- Print out the length of all five variables.
- Print out the letters at the indices 0,5,10,15,20 and 25 of the five strings if possible
- Create the substrings S-5-20, S-20-30, S-2-10 of all five strings if possible

In [23]:
#create the strings
S1 = "AAAAAAAAAAAAAAAAA"
S2 = "ACTGACTGACTGACTGACTG"
S3 = "HAIMGVVFTWIMALACAAPPLVGWSRY"
S4 = "SSSIYNPVIYIMLNKQFRNCMLTTLCCGKNPLG"
S5 = "PFSNVTGVVRSPFEQPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQH"

#print the lengths
print(f"String 1 has length {len(S1)}")
print(f"String 2 has length {len(S2)}")
print(f"String 3 has length {len(S3)}")
print(f"String 4 has length {len(S4)}")
print(f"String 5 has length {len(S5)}")

String 1 has length 17
String 2 has length 20
String 3 has length 27
String 4 has length 33
String 5 has length 54


In [24]:
# I make two lists of indices, since String 1 and 2 won't have indices 20 and 25
idc1 = [0,5,10,15]
idc2 = [0,5,10,15,20,25]

#Loop over printing the indices using idc1 for S1 and S2 and idc2 for S3-5
for idx in idc1:
    print(S1[idx])
print()

for idx in idc1:
    print(S2[idx])
print()

for idx in idc2:
    print(S3[idx])
print()

for idx in idc2:
    print(S4[idx])
print()

for idx in idc2:
    print(S5[idx])
print()

A
A
A
A

A
C
T
G

H
V
I
C
L
R

S
N
I
Q
M
C

P
T
S
P
A
F



In [25]:
#Create the substrings S-5-20, S-20-30, S-2-10 of all five strings if possible (20:30 is not possible for S1 and S2)

S1sub5_20 = S1[5:20]
S1sub2_10 = S1[2:10]

S2sub5_20 = S2[5:20]
S2sub2_10 = S2[2:10]

S3sub5_20 = S3[5:20]
S3sub20_30 = S3[20:30]
S3sub2_10 = S3[2:10]

S4sub5_20 = S4[5:20]
S4sub20_30 = S4[20:30]
S4sub2_10 = S4[2:10]

S5sub5_20 = S5[5:20]
S5sub20_30 = S5[20:30]
S5sub2_10 = S5[2:10]

#print

print(S1sub5_20)
print(S1sub2_10)
print()
print(S2sub5_20)
print(S2sub2_10)
print()
print(S3sub5_20)
print(S3sub20_30)
print(S3sub2_10)
print()
print(S4sub5_20)
print(S4sub20_30)
print(S4sub2_10)
print()
print(S5sub5_20)
print(S5sub20_30)
print(S5sub2_10)

AAAAAAAAAAAA
AAAAAAAA

CTGACTGACTGACTG
TGACTGAC

VVFTWIMALACAAPP
LVGWSRY
IMGVVFTW

NPVIYIMLNKQFRNC
MLTTLCCGKN
SIYNPVIY

TGVVRSPFEQPQYYL
AEPWQFSMLA
SNVTGVVR


## Exercise 2 - String Methods

Use the strings from Exercise 01 to fulfill the following tasks:

- At which index are the first occurrences of the letters A, C, G, N, T and U
- Split the strings at each appearance of the patterns “TG”, “TT” and “AA”
- Use if statements to test whether the strings contain the patterns “FLL” and “MLT”

In [None]:
#create the strings
S1 = "AAAAAAAAAAAAAAAAA"
S2 = "ACTGACTGACTGACTGACTG"
S3 = "HAIMGVVFTWIMALACAAPPLVGWSRY"
S4 = "SSSIYNPVIYIMLNKQFRNCMLTTLCCGKNPLG"
S5 = "PFSNVTGVVRSPFEQPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQH"

# I put them in a dictionary to make looping over them easier:
stringdict = {}
stringdict["S1"] = S1
stringdict["S2"] = S2
stringdict["S3"] = S3
stringdict["S4"] = S4
stringdict["S5"] = S5

In [44]:
# Where do the letters of list1 first occur in the strings?
# This is the "long" loop

list1 = ["A","C","G","N","T","U"]

for elem in list1:
    print(f"First occurence of {elem} in S1 is at index {S1.find(elem)}")
    print(f"First occurence of {elem} in S2 is at index {S2.find(elem)}")
    print(f"First occurence of {elem} in S3 is at index {S3.find(elem)}")
    print(f"First occurence of {elem} in S4 is at index {S4.find(elem)}")
    print(f"First occurence of {elem} in S5 is at index {S5.find(elem)}")
    print()

First occurence of A in S1 is at index 0
First occurence of A in S2 is at index 0
First occurence of A in S3 is at index 1
First occurence of A in S4 is at index -1
First occurence of A in S5 is at index 20

First occurence of C in S1 is at index -1
First occurence of C in S2 is at index 1
First occurence of C in S3 is at index 15
First occurence of C in S4 is at index 19
First occurence of C in S5 is at index -1

First occurence of G in S1 is at index -1
First occurence of G in S2 is at index 3
First occurence of G in S3 is at index 4
First occurence of G in S4 is at index 27
First occurence of G in S5 is at index 6

First occurence of N in S1 is at index -1
First occurence of N in S2 is at index -1
First occurence of N in S3 is at index -1
First occurence of N in S4 is at index 5
First occurence of N in S5 is at index 3

First occurence of T in S1 is at index -1
First occurence of T in S2 is at index 2
First occurence of T in S3 is at index 8
First occurence of T in S4 is at index 22

In [45]:
# Where do the letters of list1 first occur in the strings?
#This is the short dictionary loop

list1 = ["A","C","G","N","T","U"]

for elem in list1:
    for key, value in stringdict.items():
        print(f"First occurence of {elem} in {key} is at index {value.find(elem)}")
    print()

First occurence of A in S1 is at index 0
First occurence of A in S2 is at index 0
First occurence of A in S3 is at index 1
First occurence of A in S4 is at index -1
First occurence of A in S5 is at index 20

First occurence of C in S1 is at index -1
First occurence of C in S2 is at index 1
First occurence of C in S3 is at index 15
First occurence of C in S4 is at index 19
First occurence of C in S5 is at index -1

First occurence of G in S1 is at index -1
First occurence of G in S2 is at index 3
First occurence of G in S3 is at index 4
First occurence of G in S4 is at index 27
First occurence of G in S5 is at index 6

First occurence of N in S1 is at index -1
First occurence of N in S2 is at index -1
First occurence of N in S3 is at index -1
First occurence of N in S4 is at index 5
First occurence of N in S5 is at index 3

First occurence of T in S1 is at index -1
First occurence of T in S2 is at index 2
First occurence of T in S3 is at index 8
First occurence of T in S4 is at index 22

In [47]:
#“TG”, “TT” and “AA”
list2 = ["TG","TT","AA"]

for elem in list2:
    print(elem)
    for key, value in stringdict.items():
        print(f"{key}: {value.split(elem)}")
    print()

TG
S1: ['AAAAAAAAAAAAAAAAA']
S2: ['AC', 'AC', 'AC', 'AC', 'AC', '']
S3: ['HAIMGVVFTWIMALACAAPPLVGWSRY']
S4: ['SSSIYNPVIYIMLNKQFRNCMLTTLCCGKNPLG']
S5: ['PFSNV', 'VVRSPFEQPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQH']

TT
S1: ['AAAAAAAAAAAAAAAAA']
S2: ['ACTGACTGACTGACTGACTG']
S3: ['HAIMGVVFTWIMALACAAPPLVGWSRY']
S4: ['SSSIYNPVIYIMLNKQFRNCML', 'LCCGKNPLG']
S5: ['PFSNVTGVVRSPFEQPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQH']

AA
S1: ['', '', '', '', '', '', '', '', 'A']
S2: ['ACTGACTGACTGACTGACTG']
S3: ['HAIMGVVFTWIMALAC', 'PPLVGWSRY']
S4: ['SSSIYNPVIYIMLNKQFRNCMLTTLCCGKNPLG']
S5: ['PFSNVTGVVRSPFEQPQYYLAEPWQFSML', 'YMFLLIVLGFPINFLTLYVTVQH']



In [49]:
#Use if statements to test whether the strings contain the patterns “FLL” and “MLT”
for key, value in stringdict.items():
    if "FLL" in value: print(f"FLL is present in {key}")
    else:  print(f"FLL is not present in {key}")
    if "MLT" in value:  print(f"MLT is present in {key}")
    else: print(f"MLT is not present in {key}")

FLL is not present in S1
MLT is not present in S1
FLL is not present in S2
MLT is not present in S2
FLL is not present in S3
MLT is not present in S3
FLL is not present in S4
MLT is present in S4
FLL is present in S5
MLT is not present in S5


## Exercise 3 - Working with Strings

- Take the “lorem ipsum” String from down below and use it as a variable in Python
- Complete the following tasks:
  - How long is this text?
  - How many words are in this text? (split and len)
  - How often does each letter of the alphabet occur in this text? (re, len and a loop)
  - Are the words “minim”, “anim”, “caesar”, “brutus” and “duis” in the string?

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur int occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

In [93]:
import re
import string
S = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."

In [96]:
#Finding the characters and words
print(f"The text has {len(S)} characters.")
print(f"The text has {len(S.split())} words.")

The text has 445 characters.
The text has 69 words.


In [117]:
#making a list of all letters of the alphabet
all =  list(string.ascii_letters)
print(all)

#the long version
for letr in all:
    print(f"{letr} occurs {len(re.findall(letr,S))} times.")

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
a occurs 29 times.
b occurs 3 times.
c occurs 16 times.
d occurs 18 times.
e occurs 37 times.
f occurs 3 times.
g occurs 3 times.
h occurs 1 times.
i occurs 42 times.
j occurs 0 times.
k occurs 0 times.
l occurs 21 times.
m occurs 17 times.
n occurs 24 times.
o occurs 29 times.
p occurs 11 times.
q occurs 5 times.
r occurs 22 times.
s occurs 18 times.
t occurs 32 times.
u occurs 28 times.
v occurs 3 times.
w occurs 0 times.
x occurs 3 times.
y occurs 0 times.
z occurs 0 times.
A occurs 0 times.
B occurs 0 times.
C occurs 0 times.
D occurs 1 times.
E occurs 1 times.
F occurs 0 times.
G occurs 0 times.
H occurs 0 times.
I occurs 0 times.
J occurs 0 times.
K occurs 0 times.
L occurs 1 times.
M occurs 0 times.
N occurs 0 times.
O occ

In [116]:
#making a list of all letters of the alphabet
lower = list(string.ascii_lowercase)
# print(lower)
# Printing a list of uppercase alphabets.
upper = list(string.ascii_uppercase)

#the (slightly) more fancy version
for LET,let in zip(upper,lower):
    print(f"{LET} occurs {len(re.findall(LET,S))} times and {let} occurs {len(re.findall(let,S))} times.")
    print(f"{LET} or {let} occur a total of {len(re.findall(LET,S)) + len(re.findall(let,S))} times.")
    print()

A occurs 0 times and a occurs 29 times.
A or a occur a total of 29 times.

B occurs 0 times and b occurs 3 times.
B or b occur a total of 3 times.

C occurs 0 times and c occurs 16 times.
C or c occur a total of 16 times.

D occurs 1 times and d occurs 18 times.
D or d occur a total of 19 times.

E occurs 1 times and e occurs 37 times.
E or e occur a total of 38 times.

F occurs 0 times and f occurs 3 times.
F or f occur a total of 3 times.

G occurs 0 times and g occurs 3 times.
G or g occur a total of 3 times.

H occurs 0 times and h occurs 1 times.
H or h occur a total of 1 times.

I occurs 0 times and i occurs 42 times.
I or i occur a total of 42 times.

J occurs 0 times and j occurs 0 times.
J or j occur a total of 0 times.

K occurs 0 times and k occurs 0 times.
K or k occur a total of 0 times.

L occurs 1 times and l occurs 21 times.
L or l occur a total of 22 times.

M occurs 0 times and m occurs 17 times.
M or m occur a total of 17 times.

N occurs 0 times and n occurs 24 time

In [109]:
# Are the words “minim”, “anim”, “caesar”, “brutus” and “duis” in the string?
list_words = ["minim","anim","caesar", "brutus","duis"]

for word in list_words:
    if word in S:
        print(f"{word} is part of the string.")
    else:
        print(f"{word} is not part of the string.")

minim is part of the string.
anim is part of the string.
caesar is not part of the string.
brutus is not part of the string.
duis is not part of the string.


## Exercise 4 - A bit more about Strings

Take the “lorem ipsum” string from Exercise 03 and complete the following tasks:

- Replace the strings et, in and eu with at, on and au
- Create an all lower case and an all upper case variant of it
- Split the text in it’s words and concatenate them together again (split and a loop)

In [124]:
import re
Lorem = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."

In [126]:
#Replace syllables
Lorem_repl = Lorem.replace("et","at")
Lorem_repl = Lorem_repl.replace("in","on")
Lorem_repl = Lorem_repl.replace("eu","au")

print(Lorem_repl)

Lorem ipsum dolor sit amat, consectatur adipiscong elit, sed do eiusmod tempor oncididunt ut labore at dolore magna aliqua. Ut enim ad monim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor on reprehenderit on voluptate velit esse cillum dolore au fugiat nulla pariatur. Exceptaur sont occaecat cupidatat non proident, sunt on culpa qui officia deserunt mollit anim id est laborum.


In [128]:
#Write text in all upper and all lower case
Lorem_upper = Lorem.upper()
Lorem_lower = Lorem.lower()

print(Lorem_upper)
print()
print(Lorem_lower)

LOREM IPSUM DOLOR SIT AMET, CONSECTETUR ADIPISCING ELIT, SED DO EIUSMOD TEMPOR INCIDIDUNT UT LABORE ET DOLORE MAGNA ALIQUA. UT ENIM AD MINIM VENIAM, QUIS NOSTRUD EXERCITATION ULLAMCO LABORIS NISI UT ALIQUIP EX EA COMMODO CONSEQUAT. DUIS AUTE IRURE DOLOR IN REPREHENDERIT IN VOLUPTATE VELIT ESSE CILLUM DOLORE EU FUGIAT NULLA PARIATUR. EXCEPTEUR SINT OCCAECAT CUPIDATAT NON PROIDENT, SUNT IN CULPA QUI OFFICIA DESERUNT MOLLIT ANIM ID EST LABORUM.

lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.


In [141]:
# split the words
Lorem_split = [elem for elem in re.split(" |,|\\.", Lorem) if len(elem)>0]
print(Lorem_split)
print()

#concatenate the words
Lorem_conc = ""
for elem in Lorem_split: 
    Lorem_conc+=elem

print(Lorem_conc)

['Lorem', 'ipsum', 'dolor', 'sit', 'amet', 'consectetur', 'adipiscing', 'elit', 'sed', 'do', 'eiusmod', 'tempor', 'incididunt', 'ut', 'labore', 'et', 'dolore', 'magna', 'aliqua', 'Ut', 'enim', 'ad', 'minim', 'veniam', 'quis', 'nostrud', 'exercitation', 'ullamco', 'laboris', 'nisi', 'ut', 'aliquip', 'ex', 'ea', 'commodo', 'consequat', 'Duis', 'aute', 'irure', 'dolor', 'in', 'reprehenderit', 'in', 'voluptate', 'velit', 'esse', 'cillum', 'dolore', 'eu', 'fugiat', 'nulla', 'pariatur', 'Excepteur', 'sint', 'occaecat', 'cupidatat', 'non', 'proident', 'sunt', 'in', 'culpa', 'qui', 'officia', 'deserunt', 'mollit', 'anim', 'id', 'est', 'laborum']

LoremipsumdolorsitametconsecteturadipiscingelitseddoeiusmodtemporincididuntutlaboreetdoloremagnaaliquaUtenimadminimveniamquisnostrudexercitationullamcolaborisnisiutaliquipexeacommodoconsequatDuisauteiruredolorinreprehenderitinvoluptatevelitessecillumdoloreeufugiatnullapariaturExcepteursintoccaecatcupidatatnonproidentsuntinculpaquiofficiadeseruntmollit

## Exercise 5 - Writing Files

Write a program that creates two different files.

- The first file should be opened normally and two lines should be written in it.
- Afterwards close the file, open it again and add another three lines of text to the file (use the operator „a“ for append)
- The second file should be generated by using„with“ and a loop to write a countdown from 10 to 0
- Each number should be in a separate line

# Extras

## Accessing the letters of the alphabet

In [118]:
# Python program to print a list of alphabets.
# Importing the string module.
import string
# Printing a list of lowercase alphabets.
lower = list(string.ascii_lowercase)
print(lower)
# Printing a list of uppercase alphabets.
upper = list(string.ascii_uppercase)
print(upper)
# Printing a list of lower and uppercase alphabet
all = list(string.ascii_letters)
print(all)

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
