# Exam preparation

## Whirlwind, Lists, Loops, Cond. Stmts., Functions, Modules, Dicts

### What is the difference between a list and a dictionary?

A list contains an unordered sequence of items called elements. The elements in the list can be accessed using their index. Like in most other languages, the index of lists start at `0`. This means that an array of size `5` has indecies from `0` through `4`.

In [None]:
numbers = [1, 2, 3, 4, 5]
first = numbers[0]
last = numbers[4]

(first, last)

If you access an index that are out of bounds from the array, an error occurred.

In [None]:
numbers[5]

A dictionary contains key-value pairs. This means that you insert values into a key. You can then access values by the key they were inserted with.

> A dictionary is a collection which is unordered, changeable and indexed. In Python dictionaries are written with curly brackets, and they have keys and values.

> Dictionaries are Python’s implementation of a data structure that is more generally known as an associative array. A dictionary consists of a collection of key-value pairs. Each key-value pair maps the key to its associated value.

You can create a new dictionary in python by using curly brackets.

In [None]:
d = {"name": "Thomas", "age": 21}
d

This dictionary now contains two entries (pair). The first pair has the key `name` and the value `Thomas`. The second entry has the key `age` and the value `21`.

We can access any value by using their associated key.

In [None]:
d["name"]

Like lists, you get an error, if the provided key does not exist in the dictionary.

In [None]:
d["unknown"]

These are some of the methods on the dictionary type:

**`get` allows for accessing a value using its key. This does not produce an error as seen above.**

In [None]:
d.get("unknown") # returns null since the key does not exist.

**`items` returns a list of 2-length tuples. Each tuple represents a key-value pair in the dictionary.**

In [None]:
d.items()

**`keys` returns a list of the keys in the dictionary.**

In [None]:
d.keys()

**`values` returns a list of the values in the dictionary.**

In [None]:
d.values()

**`update` adds a new key-value pair to the dictionary. Note that this overrides any existing key-value pair with the same key.**

In [None]:
d.update([("name", "Kasper")])
d

Alternatively you can use the normal python update syntax:

In [None]:
d["name"] = "Sanne"
d

**`in` operator checks if a key exists in the dictionary**

In [None]:
has_name = "name" in d
has_unknown = "unknown" in d

(has_name, has_unknown)

### How do you write a module in Python?

Modules in python are just python files. When you import the module, you gain access to the module as a variable. When using `import` it's expected of the interpreter that it can find a python file (`.py`) with the same name, in the same folder.

In [None]:
import my_module
my_module.f_1()

You can also use an alias, to avoid name-collisions when importing multiple modules. 

In [None]:
import my_module as my_mod
my_mod.f_1()

You can also import functions defined in the module by using the `from` keyword.

In [None]:
from my_module import f_2
f_2()

## Files, CLI, OO, Exceptions

### How do you open a file in Python?

https://docs.python.org/3/library/functions.html#open

The normal way to open a file using python, is by using the `open` function.
`open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)`

The function takes many parameters, the most important being the `file` and `mode`.

- The `file` parameter takes the path to the file to open.
- The `mode` parameter takes a string, with the permissions needed.

![](open_modes.png)

In [None]:
with open("names.txt") as f:
    print(f.read())

The default encoding is not utf-8. 

In [None]:
with open("names.txt", encoding='utf-8') as f:
    print(f.read())

Note that we use the `with` control-flow structure, that ensures that our resource is closed after use. Below an error occurs, because the resource was closed after we existed the with-block. The second call to `read` is therefor attempting to read from a closed resource.

In [None]:
with open("names.txt", encoding='utf-8') as f:
    f.read()

print(f.read())

### How do you throw an exception in Python?

You throw exceptions in python using the `raise` keyword.

In [None]:
raise Exception("Some error occurred here.")

You can catch exceptions using the `try` and `except` keyword

In [None]:
def f():
    raise Exception("Exception thrown in f.")
    
try:
    f()
except(Exception) as e:
    print(e)

## Intro to Plotting

### How do you plot a line chart in Matplotlib?

In [None]:
import pandas as pd
import numpy as np

In [None]:
gdp_data = pd.read_csv("gdp_data.csv")
gdp_data.head()

In [None]:
# I want the rows of the data frame to be accessed using their name
gdp_data.index = gdp_data["Country Name"]
gdp_data.rename({"Country Name": "Name", "Country Code": "Code"}, inplace=True)
gdp_data.head()
gdp_data.index

First i want to handle the 'NaN' values in the dataset.

In [None]:
gdp_data[gdp_data.columns[4:]] = gdp_data[gdp_data.columns[4:]].interpolate()
gdp_data.head()

Next i can plot the gdp of Afghanistan on a line chart.

In [None]:
import matplotlib.pyplot as plt

In [None]:
afg = gdp_data.loc["Afghanistan"][4:-1]
xs = afg.index
ys = afg
labels = list(map(lambda p: p[1] if p[0] % 5 == 0 else "", enumerate(xs)))
plt.figure()
plt.plot(xs, ys)
plt.xticks(xs, labels, rotation=90)
plt.show()

### What is an axis in a Matplotlib plot?

An axis is a an object responsible for drawing a line, the ticks and the label of a matplot dimension. In most charts there are two axis, `x` and `y`.

## Intro to Numpy and Plotting Cntd.,

### What does it mean that an array has a shape?

A shape describes the dimensions of an array. The shape is represented using a tuple in numpy.

In [None]:
a = np.array([1, 2, 3, 4, 5])
(a, a.shape)

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
(a, a.shape)

You can change the shape of numpy arrays using the `reshape` method.

In [None]:
a = np.array([1, 2, 3, 4]).reshape(2, 2)
(a, a.shape)

Many operations in python, and machine learning require that data has the same shape.

### How do you plot multiple lines in Matplotlib?

We can draw multiple plots by calling the plot method multiple times.

In [None]:
def plot_line(country):
    c_data = gdp_data.loc[country][4:-1]
    xs = c_data.index
    ys = c_data
    plt.plot(xs, ys)

plt.figure()
countries_to_show = gdp_data.index[:3]
for country in countries_to_show:
    plot_line(country)
labels = list(map(lambda p: p[1] if p[0] % 5 == 0 else "", enumerate(xs)))
plt.xticks(xs, labels, rotation=90)
plt.legend(countries_to_show)
plt.show()

## Intro to Pandas

### What is a DataFrame?

> Dataframe in python comes with pandas library. It is a 2- dimensional data structure, i.e., data is aligned in a tabular form in rows and column.

> The size of pandas dataframe is Mutable, the columns can be of different types and one can perform Arithmetic operation on rows and columns of dataframe.

> Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:

In [None]:
gdp_data.head()

### How do you access a row in a DataFrame?

Rows in a database can be accessed by their numeric index, or their given index. In the above dataframe their index is the name of the country. Many times their given index is just the numeric index.

In [None]:
gdp_data.index

Using the `loc` property, we can access the row by their given index.

In [None]:
gdp_data.loc['Aruba'].head()

Using the `iloc` property, we can access the row by their numeric index.

In [None]:
gdp_data.iloc[0].head()

We can access a range of rows at the same time:

In [None]:
gdp_data.iloc[50:75]

We can access a range of rows at the same time:

In [None]:
gdp_data.iloc[50:55]

We can access multiple rows at the same time:

In [None]:
gdp_data.iloc[[1, 2]]

## Multiprocessing, generators and intro to Requests

### What is the difference between an iterator and a generator?

> Iterator is a more general concept: any object whose class has a next method (__next__ in Python 3) and an __iter__ method that does return self.

> Every generator is an iterator, but not vice versa. A generator is built by calling a function that has one or more yield expressions (yield statements, in Python 2.5 and earlier), and is an object that meets the previous paragraph's definition of an iterator.

> You may want to use a custom iterator, rather than a generator, when you need a class with somewhat complex state-maintaining behavior, or want to expose other methods besides next (and __iter__ and __init__). Most often, a generator (sometimes, for sufficiently simple needs, a generator expression) is sufficient, and it's simpler to code because state maintenance (within reasonable limits) is basically "done for you" by the frame getting suspended and resumed.

There is a difference between `Iterable` and `Iterator`. Lists are `Iterable` which means that they have a `__iter__` method, that returns an `Iterator`. `Iterator`s have

In [None]:
# Iterator example
class MyIterator():
    
    def __init__(self):
        self.counter = 0
    
    def __next__(self):
        self.counter += 1
        return self.counter
        
iterator = MyIterator()
print(next(iterator))
print(next(iterator))
print(next(iterator))

To stop iteration we can raise a `StopIteration` exception.

In [None]:
# Iterable example
class MyIterable():
    
    def __init__(self):
        self.counter = 0
        
    def __iter__(self):
        self.counter = 0
        return self
    
    def __next__(self):
        self.counter += 1
        if self.counter > 5:
            raise StopIteration()
        return self.counter
        
iterable = MyIterable()
for i in iterable:
    print(i)

### How do you parallelise programs in Python?

We can use the `multiprocessing` module in python.

# Regular expression

## What is Regex

>A regular expression is a is a sequence of characters that define a search pattern.

Given the following dataset:

In [None]:

pd.read_csv('gdp_data.csv')[]

We can for example search for the country names in gdp_data.csv:

In [None]:
import re

pattern = re.compile(r"^[\"][a-zA-ZæøåÆØÅ ]{1,}[\"]", flags=re.MULTILINE)

data = open('gdp_data.csv').read()

print(pattern.findall(data))

## Most common patterns

|No|**Symbol**|**Effect**|
|--|--|--|
|1|.|dot matches any character except newline|
|2|\w|matches any word character i.e letters, alphanumeric, digits and underscore ( _ )|
|3|\W|matches non word characters|
|4|\d|matches a single digit|
|5|\D|matches a single character that is not a digit|
|6|\s|matches any white-spaces character like \t and \n|
|7|\S|matches single non white space character|
|8|[abc]|matches single character in the set i.e either match a, b or c|
|9|[^abc]|match a single character other than a, b and c|
|10|[a-z]|match a single character in the range a to z.|
|11|[a-zA-Z]|match a single character in the range a-z or A-Z|
|12|[0-9]|match a single character in the range 0-9|
|13|^|match start at beginning of the string|
|14|$|match start at end of the string|
|15|+|matches one or more of the preceding character (greedy match).|
|16|*|matches zero or more of the preceding character (greedy match).|
|17|?|matches zero or one of the preceding character.|

# Selenium

>Selenium is a software that automates browser tasks. It's mostly used for automating web applications for testing purposes, but boring web-based administration tasks can obviously be automated as well.

In [None]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from webdriver_manager.chrome import ChromeDriverManager

browser = webdriver.Chrome(ChromeDriverManager().install())
browser.get("https://www.google.com/")

elm = browser.find_elements_by_name("q")[0]
elm.send_keys("selenium automation software")

elm.send_keys(Keys.ENTER)

browser.close()