# 1. Simple data types
### 2.1 Strings

In [43]:
# You initialize a string with a name of choice
amazing_string = "Hello"
second_amazing_string = "students"
third_amazing_string = "!"

When you initialize an object (in this case, a string), you are just passing the information to Python. This means that if you run the above cell, you won't see anything happen. However, Python now knows the strings that you defined, and you can use them in the next cells. 
> Please keep in mind that just filling the cell with code is not enough, you also need to run it with the small play command on the left.

You can merge several string with the `+` operator.

In [44]:
all_strings = amazing_string+second_amazing_string+third_amazing_string
print(all_strings)
all_strings_space = amazing_string+" "+second_amazing_string+third_amazing_string # For readability, you add a space between the two strings, added to the above line as a new string (" "). You can see that it is a string because it is delimited by quotes.
print(all_strings_space)

Hellostudents!
Hello students!


### Excursus: Python's built-in words

Programming languages like Python have many similarities with natural languages. For example, they **already have their own vocabulary**. Those are Python's built-in words, that Python already knows. Some examples are `print` or `len` that we already saw. Differently from that, **variable names** need to be defined by you. In this case, Python will learn them inside your own project (for example, in the next cell you can still operate on the string you defined in the previous cell). 
> In our use cases, the built-in words will be **methods**. Think of them as **performing an action** on the variables. For example, `len` computes the length of the input, and returns it to you. The input itself (the **variable**) is something that **you will define**.

Naming variables is up to you, there are no strict rules, but only conventions. What you need to avoid is **naming them with Python built-in words**. For example, naming a string "`print`" would be a very bad idea.
> Keep in mind that Python's data types are built-in as well. Therefore, **do not name your variables like the data type**. In practice: do not name your string "str", your list "list", your integer "int", etc. **Even if it might work in the moment, it will create conflicts as you keep coding.**
> Tip: If you are unsure how to call the variable, choose something personal. For example, the convention is something like "`my_string`" or "`new_string`". If you are handling specific data (e.g., your first name), a safe option is also to refer to that in the naming process - e.g. `first_name`.

One of the questions that might arise is: How do I know Python's built-in words? <br>
Typically, you will learn them by experience. Python's built-in words are a lot, and you won't need (or remember them all). While you are expected to remember the basic ones (`print`, `len`, and the other ones that you will see in this classes), you will also need to **search for solutions** when you need a specific operation. Do not worry: it will all come with experience, and you will be remembering them more and more as you use them! <br>
The good resource to use is [**Python's official documentation**](https://docs.python.org/3/tutorial/index.html), where you can look them up when you need them.
> Hint: In the official Python documentation, strings are explained under "text".

> Hint: In VSC, if you type a variable name that is a built-in word, Python will recognize it and **highlight it in some color**, while if you write a variable name that is undefined yet, Python will underline it to flag it as unrecognized. Beware that if you set a = after the variable name, the interpreter won't underline it anymore, because it will expect you to define it.

We will now see basic operations with strings.

In [45]:
my_string="Python is fun."
print(my_string)

Python is fun.


In [46]:
# Count the lenght of the string (length=number of characters).
len(my_string)

14

In [47]:
# Access a specific character of the string
my_string[3] # Important: Python indexing starts by 0, so this will return the 4th character

'h'

Here, you are directly performing an operation, so the cell outputs the result without the need to use `print`. It is different from the string initialization that you did above: there, you were just defining a variable. This is way you needed a print statement to see the results, and you do not need it now.

In [48]:
# Change capitalization of the string
upper_string = my_string.upper()
print(upper_string)
lower_string = my_string.lower()
print(lower_string)
capitalized_string = my_string.capitalize() # capitalizes the first word of the string
print(capitalized_string)

PYTHON IS FUN.
python is fun.
Python is fun.


In [49]:
# Split a string into words
words = my_string.split()
print(words)
# If no argument is passed (= there is nothing inside the parentheses), split() splits the string into words (where words are defined by spaces). 
# However, you can also split the string based on a specific character, for example a comma:
comma_string = "Hello, my dear friend, are you doing well?"
print(comma_string.split(",")) # the character will be removed in the output, and it needs to be passed as a string (= in quotes)

['Python', 'is', 'fun.']
['Hello', ' my dear friend', ' are you doing well?']


A very useful built-in method for cleaning a string is `strip()`. It removes **whitespaces**: spaces, tabs, and newlines. This won't remove spaces **inside** the string.

In [50]:
# Remove whitespaces at the beginning and end of the string
string_to_strip = "             let's clean this string    "
stripped_string = string_to_strip.strip()
print(stripped_string)
# Strip only left side of the string
stripped_string_left = string_to_strip.lstrip()
print(stripped_string_left)
# Strip only right side of the string
stripped_string_right = string_to_strip.rstrip()
print(stripped_string_right)

let's clean this string
let's clean this string    
             let's clean this string


In [51]:
# Substitute a substring
replaced_string = my_string.replace("fun", "boring")
print(replaced_string)

Python is boring.


> Important: String data types are actually **immutable**. THis means that after you initialized them, you cannot modify them. What happens with the `replace` method is that Python *returns a new string*, that is a copy of the old one, but with a change. 

Python automatically recognizes the data type of the variable you initialize. A string is **anything that is contained by quotes**. For example, a string can contain numbers - if it is delimited by quotes, it will still be of datatype string and not integer (whole number, `int`).

In [52]:
# Check if a string contains only numerical characters
num_string = "1" 
num_string.isdigit()

True

In [53]:
# Check if a string contains a specific substring
if "Python" in my_string: # "in" is a built-in word!
    print("Found Python!")

# Check where exactly it is (at which index)
my_string.find("Python")

Found Python!


0

In [54]:
# Count how many times a substring appears
my_string.count("Python")

1

Here, we had to use a **control flow**. Remember that in class we talked about `while`, `if`, `elif`, `else`, `for`. They can be useful to **iteratively apply** the actions seen above. We will see this better in the section about **lists** - but first, here are some homework about **strings**.

In [55]:
# Exercise 1
# initialize two strings and merge them into a single string. Remember to add a space if necessary.

In [56]:
# Exercise 2
# Take the merged string of the above cell and turn all characters to uppercase

In [57]:
# Exercise 3
# Look for all the occurrences of "a" in the string

In [58]:
# Exercise 4
# Capitalize all the "a" characters in the string
# Hint: instead of using "upper", try replacing the character with its capitalized version


In [59]:
# Exercise 5
# add the following string to your string: "Andrew, I am astonished." Count how many times the letter 'a' (or 'A') appears, then lowercase the whole string.

### 2.2 Integers (`int`)

In [60]:
# Remember: you need to define your variables
i = 3
j = 5
print(i)

3


We briefly saw already in the first week that you can perform mathematical operations easily with Python:

Python **automatically recognizes** the data type of the objects you initialize, and it is able to convert them:

In [61]:
# Convertions
x = int(3.5)
print(x)
y = float(5)
print(y)
z = str(42) # "42" for Python
print(z)

3
5.0
42


Some built-in functions for integers:

In [62]:
# Return maximum value
max(5, 3)

5

In [63]:
# Return minimum value
i=5
j=3
min(i, j)

3

In [64]:
# Return sum
sum([i, j])

8

In [65]:
# Return absolute value
k = -1
abs(k)

1

In [66]:
# Perform numerical operations with the arithmetical operators that you know from basic maths (they are built-in too)
summed_ints = 3+5
# Of course, this works too
summed_ints=i+j
print(summed_ints)
subtracted = 5-3
print(subtracted)
multiplied = 5*3
print(multiplied)
divided = 5/3
print(divided)
elevated = 5**3
print(elevated)

8
2
15
1.6666666666666667
125


Control flows:

In [67]:
count=0 # let's initialize an integer variable called "count"

while count <= 3:
    count+=1
    print(count)
 # the code is executed until the condition is reached

1
2
3
4


In [68]:
for i in range(7): # range() is a built-in method useful for looping
    if i < 3:
        print(f"{i} is lower than 3.")
    else:
        print(f"{i} is greater or equals to 3.")

0 is lower than 3.
1 is lower than 3.
2 is lower than 3.
3 is greater or equals to 3.
4 is greater or equals to 3.
5 is greater or equals to 3.
6 is greater or equals to 3.


##### Exercises

In [69]:
# Exercise 6
i1 = 6
i2 = -4
# Return the absolute value of i2 and divide it by i1, then convert the obtained result to float

In [70]:
# Exercise 7
# Write a snippet that prints i+1 for all 0 < i < 5, and "Number is too big" otherwise. Test this exercise for all numbers from 0 to 7.

# 2. Complex data types
### 2.1 Lists (`list`)
We talked in class about **lists, tuples, sets, and dictionaries**.

In [None]:
# Initialize a list
my_list = [1,2,3,4,5]
print(my_list)

[1, 2, 3, 4, 5]


In [92]:
# Indexing
my_list[0]

1

##### 2.1 List

In [73]:
# Slicing
# Remember: the last element of the range is not included
print(my_list[1:3]) # range of elements from 1 to 3, 3 excluded
print(my_list[:3]) # range of elements from 0 to 3
print(my_list[1:]) # range of elements from 1 until the end

[2, 3]
[1, 2, 3]
[2, 3, 4, 5]


In [74]:
# Appending
my_list.append(6)
print(my_list)

[1, 2, 3, 4, 5, 6]


In [75]:
# Inserting
my_list.insert(1, 15) # add 15 at index 1

In [76]:
# Removing
my_list.remove(2) # remove the element
print(my_list)

[1, 15, 3, 4, 5, 6]


In [77]:
del(my_list[3]) # removes element based on its index
print(my_list)

[1, 15, 3, 5, 6]


In [78]:
my_list_2 = [0, 24]
lists = my_list+my_list_2
print(lists)

[1, 15, 3, 5, 6, 0, 24]


Some useful **built-in operations**:

In [79]:
print(len(my_list))      # Number of elements
print(max(my_list))      # Maximum value
print(min(my_list))      # Minimum value
print(sum(my_list))      # Sum of elements

5
15
1
30


In [80]:
# Reverse the elements of the list
my_list.reverse()
print(my_list)

[6, 5, 3, 15, 1]


Let's **iterate through the list** - please remember **control flows** from the lecture.

In [81]:
# Iterate through a list and print the elements separately

another_list = [1,2,3,"hello", 5] # lists are heterogeneous

for elem in another_list: # you do not need to explicitly define elem, you do so implicitly with "for" and "in" (built-in!)
    print(elem)

1
2
3
hello
5


In [116]:
for id in range(len(another_list)): # iterate through all the indices until the max of the list
    print(f"The element at position {id} is {another_list[id]}")

The element at position 0 is 1
The element at position 1 is 2
The element at position 2 is 3
The element at position 3 is hello
The element at position 4 is 5


##### Functions

Remember that you can define **functions**, reusable block of code that perform a specific operation. For example:

In [None]:
def print_position_element(input_list): # input_list is the argument or parameter
    for id in range(len(input_list)): # iterate through all the indices until the max of the list
        print(f"The element at position {id} is {input_list[id]}")

# Call the function
print_position_element(another_list)

The element at position 0 is 1
The element at position 1 is 2
The element at position 2 is 3
The element at position 3 is hello
The element at position 4 is 5


In [122]:
def print_position_element(input_list): # input_list is the argument or parameter
    for elem in input_list:
        elem+=1 # same as elem = elem + 1
    return input_list
# Call the function
print_position_element(my_list)

[1, 2, 3, 4, 5]

A function defines an action that you perform on a variable. Here, we call this variable **input_list**. This is just a **placeholder** for whatever list you will use when calling the function. This means that you do not need to initialize the parameter in the function. However, you do need to:
- be coherent with your naming strategy (the parameter should be used inside the function, without changing its name)
- initialize the **actual variable** you use when you call the function ("another_list" in the above input - you initialized this variable before, remember?)

> The second part of the exercises (for list, set, dict, and functions) are taken from [This Colab notebook by Rian Orsinger](https://colab.research.google.com/github/ryanorsinger/101-exercises/blob/main/101-exercises.ipynb#scrollTo=maRyeWrzNtTh). 

Some exercises now:

In [83]:
# Exercise 8
# Given the following assigment of the list of fruits, add "tomato" to the end of the list. 
fruits = ["mango", "banana", "guava", "kiwi", "strawberry"]

 # Write your code here, then run the cell.
 
assert fruits == ["mango", "banana", "guava", "kiwi", "strawberry", "tomato"], "Ensure the variable contains all the strings in the right order"
print("Exercise 8 is correct")

AssertionError: Ensure the variable contains all the strings in the right order

In [None]:
# Exercise 9
# Given the following assignment of the vegetables list, add "tomato" to the end of the list.
vegetables = ["eggplant", "broccoli", "carrot", "cauliflower", "zucchini"]

# your code

assert vegetables == ["eggplant", "broccoli", "carrot", "cauliflower", "zucchini", "tomato"], "Ensure the variable contains all the strings in the provided order"
print("Exercise 2 is correct")

In [None]:
# Exercise 10
# Sort the vegetables in alphabetical order

# your code

assert vegetables == ['broccoli', 'carrot', 'cauliflower', 'eggplant', 'tomato', 'zucchini']
print("Exercise 10 is correct.")

In [None]:
# Exercise 11
# Write the code necessary to sort the fruits in reverse alphabetical order
# Hint: you can reverse the list and then sort it!

# your code

assert fruits == ['tomato', 'strawberry', 'mango', 'kiwi', 'guava', 'banana']
print("Exercise 11 is correct.")

In [None]:
# Exercise 12
# Write the code necessary to produce a single list that holds all fruits then all vegetables in the order as they were sorted above.
fruits_and_veggies = # your code

assert fruits_and_veggies == ['tomato', 'strawberry', 'mango', 'kiwi', 'guava', 'banana', 'broccoli', 'carrot', 'cauliflower', 'eggplant', 'tomato', 'zucchini']
print("Exercise 6 is correct")


### 2.2 Tuple

In [None]:
# Initialize a tuple
my_tuple = (10, 20, 30, 40, 50)
print(my_tuple)

(10, 20, 30, 40, 50)


In [None]:
# Indexing and slicing (same as in lists)

print(my_tuple[0])
print(my_tuple[1:3])

10
(20, 30)


Some built-in operations (same as in lists):

In [None]:
print(len(my_tuple))      # Number of elements
print(max(my_tuple))      # Maximum value
print(min(my_tuple))      # Minimum value
print(sum(my_tuple))      # Sum of elements
print(10 in my_tuple)     # Check if 10 exists

5
50
10
150
True


**Iterate** through a tuple (same as in lists).

In [None]:
# Iterating (same as in lists)
for item in my_tuple:
    print(item)

for id in range(len(my_tuple)):
    print(f"The element at position {id} is {my_tuple[id]}")

10
20
30
40
50
The element at position 0 is 10
The element at position 1 is 20
The element at position 2 is 30
The element at position 3 is 40
The element at position 4 is 50


### 2.3 Sets (`set`)

> Reminder: Sets are **unordered**, so you cannot index or slice.

In [84]:
my_set = {5, 12, 7, 18}  # unordered collection
empty_set = set()         # empty set: You can initialize and empty set and later add elements

print(my_set)
print(empty_set)

{18, 12, 5, 7}
set()


In [85]:
# Access items (no indexing or slicing!)

for item in my_set:
    print(item)

18
12
5
7


In [86]:
# Adding elements
my_set.add(20)                  # Add a single element
my_set.update([25, 30])         # Add multiple elements
print("After adding elements:", my_set)

After adding elements: {5, 7, 12, 18, 20, 25, 30}


In [None]:
# Removing elements
my_set.remove(12)               # Remove specific element (error if not found)
my_set.discard(100)             # Remove element if exists (no error if not)

removed_element = my_set.pop()  # Remove and return an arbitrary element

print("Removed element with pop():", removed_element)
print("Set after removals:", my_set)

Removed element with pop(): 5
Set after removals: {7, 18, 20, 25, 30}


You can try out the basic operations that we saw for lists and tuples, like `len` or `sum`. Here are more specific **set operations**:

In [88]:
set_a = {1, 2, 3}
set_b = {3, 4, 5}

# Union: all elements from both sets
union_set = set_a | set_b
print("Union:", union_set) 

# Intersection: elements common to both sets
intersection_set = set_a & set_b
print("Intersection:", intersection_set)  

# Difference: elements in set_a but not in set_b
difference_set = set_a - set_b
print("Difference :", difference_set) 

# Symmetric difference: elements in either set_a or set_b but not both
sym_diff_set = set_a ^ set_b
print("Symmetric Difference:", sym_diff_set)  

Union: {1, 2, 3, 4, 5}
Intersection: {3}
Difference : {1, 2}
Symmetric Difference: {1, 2, 4, 5}


##### Exercises

In [None]:
# Exercise 13
set_n = {1, 2, 3, 4, 5}
set_m = {3, 9, 46}

# print the union of the two sets

In [None]:
# Exercise 14

# print the intersection of set a and set b

In [None]:
# Exercise 15
# Remove all duplicates from a list
# Hint: to do so, you can simply convert the list to a set. You can use the set() operation to do so.

### 2.4 Dictionaries (`dict`)

Dictionaries hold **key-value pairs**.

In [105]:
# Initialize a dictionary
tukey_paper = {
    "title": "The Future of Data Analysis",
    "author": "John W. Tukey",
    "link": "https://projecteuclid.org/euclid.aoms/1177704711",
    "year_published": 1962
}

thomas_paper = {
    "title": "A mathematical model of glutathione metabolism",
    "author": "Rachel Thomas",
    "link": "https://www.ncbi.nlm.nih.gov/pubmed/18442411",
}

In [106]:
# Access values
print(thomas_paper["title"])
print(thomas_paper.get("title"))

A mathematical model of glutathione metabolism
A mathematical model of glutathione metabolism


In [112]:
# Access all items (key or values)

print(thomas_paper.keys())
print(thomas_paper.values())
print(thomas_paper.items())  

dict_keys(['title', 'author', 'link', 'year_published'])
dict_values(['A mathematical model of glutathione metabolism', 'Rachel Thomas', 'https://www.ncbi.nlm.nih.gov/pubmed/18442411', 2008])
dict_items([('title', 'A mathematical model of glutathione metabolism'), ('author', 'Rachel Thomas'), ('link', 'https://www.ncbi.nlm.nih.gov/pubmed/18442411'), ('year_published', 2008)])


In [110]:
# Add or update values
thomas_paper["hello_students"] = "hi"       # update existing
thomas_paper["year_published"] = 2008   # add new key

print(thomas_paper)
print(tukey_paper)

{'title': 'A mathematical model of glutathione metabolism', 'author': 'Rachel Thomas', 'link': 'https://www.ncbi.nlm.nih.gov/pubmed/18442411', 'year_published': 2008, 'hello_students': 'hi'}
{'title': 'The Future of Data Analysis', 'author': 'John W. Tukey', 'link': 'https://projecteuclid.org/euclid.aoms/1177704711', 'year_published': 1962}


In [111]:
thomas_paper.pop("hello_students")   # remove by key
print(thomas_paper)

{'title': 'A mathematical model of glutathione metabolism', 'author': 'Rachel Thomas', 'link': 'https://www.ncbi.nlm.nih.gov/pubmed/18442411', 'year_published': 2008}


In [114]:
# Loop through a dictionary
for key, value in thomas_paper.items():
    print(key, value)


title A mathematical model of glutathione metabolism
author Rachel Thomas
link https://www.ncbi.nlm.nih.gov/pubmed/18442411
year_published 2008


##### Exercises

In [None]:
# Exercise 15
# Write a function named get_paper_title that takes in a dictionary and returns the title property

assert get_paper_title(tukey_paper) == "The Future of Data Analysis"
assert get_paper_title(thomas_paper) == "A mathematical model of glutathione metabolism"
print("Exercise 15 is correct.")

In [None]:

# Exercise 16
# Write a function named get_year_published that takes in a dictionary and returns the value behind the "year_published" key.

assert get_year_published(tukey_paper) == 1962
assert get_year_published(thomas_paper) == 2008
print("Exercise 16 is correct.")

In [None]:
# Run this code to create data for the next two questions
book = {
    "title": "Genetic Algorithms and Machine Learning for Programmers",
    "price": 36.99,
    "author": "Frances Buontempo"
}

In [None]:
# Exercise 17
# Write a function named get_price that takes in a dictionary and returns the price

assert get_price(book) == 36.99
print("Exercise 17 is complete.")

In [None]:
# Exercise 18
# Write a function named get_book_author that takes in a dictionary (the above declared book variable) and returns the author's name


assert get_book_author(book) == "Frances Buontempo"
print("Exercise 18 is complete.")