#  Introduction to Python: Data types

## Finding out data types

So far we have worked with texts and numeric values. These are two of the possible data types in Python. We can use the `type()` function to output the data type of a value.

**Task:**

1. Display the data type of the following values:
    - 5
    - 5.3
    - 'Hello'
    - True
    - None
    - \('Apples', 5\)
    - \['Bananas', 7\]
    - \{'Apples': 5, 'Bananen': 7\}

Note the three different types of brackets: '()', '[]' and '{}' and where commas and colons are used.


In [1]:
values = [5, 5.3, "hello", True, None, ('Apples', 5), ['Bananas', 7], {'Apples': 5, 'Bananen': 7}]
for value in values:
    print(f"{value} -> {type(value)}")

5 -> <class 'int'>
5.3 -> <class 'float'>
hello -> <class 'str'>
True -> <class 'bool'>
None -> <class 'NoneType'>
('Apples', 5) -> <class 'tuple'>
['Bananas', 7] -> <class 'list'>
{'Apples': 5, 'Bananen': 7} -> <class 'dict'>


In [2]:
print(type(5))
print(type(5.3))
print(type("hello"))
print(type(True))
print(type(None))
print(type(('Apples', 5)))
print(type(['Bananas', 7]))
print(type({'Apples': 5, 'Bananen': 7}))

<class 'int'>
<class 'float'>
<class 'str'>
<class 'bool'>
<class 'NoneType'>
<class 'tuple'>
<class 'list'>
<class 'dict'>


2. Assign the number `5` to the variable `my_number` and output the data type of the variable.

In [3]:
my_number = 5
print(my_number)

5


3. Now divide the value stored in the variable by `2` and output the data type again.

In [4]:
my_number /= 2
my_number

2.5

4. Use the command `my_number = str(my_number)`. What type do you expect now? Display the content and the type of the variable.

In [10]:
my_number = str(my_number)
print(my_number)
type(my_number)

2.5


str

the data type of the variable changed to string because we used the function str to change the value of the argument called.

*Note:* What you see here is a case of dynamic typing, i.e., the data type of a variable changes dynamically depending on its content. Since some operations only work with some data types (e.g., you cannot add an integer to a string) it is important to pay attention to those data types.

## Integer

Integers are whole numbers (... -2, -1, 0, 1, 2, ...). In principle, they can be as large as you want (as long as enough computer memory is available).

**Task:** Calculate the number of nanoseconds since the Big Bang (approximately) and store it in a variable called `bb_secs`.

In [15]:
bb_secs = 13.8e+9

In [16]:
print(bb_secs)

13800000000.0


*Note:* For better readability, python allows for underscores `_` in numbers, such that, e.g., 13.8 billion can be written as 13800000000 or more readable as 13_800_000_000. Another option is to use the scientific notation `13.8e+9`, which is a shortcut for `13.8 * 10^9`.

## Float

Floats are floating point numbers and only have finite precision. Often we do not see the inaccuracies caused by the limited precision because the values are still accurate to many decimal places and Python displays a rounded value by default. However, when calculating with numbers that are many orders of magnitude apart, numerical errors can quickly occur.

1. Use the `format()` function to display the number `0.1` with 20 decimal places (how Python 'sees' this number). Use `'.20f'` for the function argument `format_spec`.  
(Remember that you can use *Shift + Tab* to display the function documentation.)

In [26]:
format(0.1, '.20f')

'0.10000000000000000555'

2. What do you think will be the result of the following two calculations? Try it out and see what happens.
```python
a) 10**(-20)
b) 10**(-20) + 1 - 1
```

In [28]:
format(10**(-20), ".20f")

'0.00000000000000000001'

In [29]:
format(10**(-20) + 1 - 1 , ".20f")

'0.00000000000000000000'

## String

Strings are texts. These come with a whole set of their own string manipulation functions:https://docs.python.org/3/library/stdtypes.html#string-methods<br>
Strings can be enclosed either with double quotes (`"my text")` or with single quotes (`'my text')`. Note that you cannot use the quotes used to enclose a string in the string itself (`"my "nice" text"` is not a valid expression, `'my "nice" text'` or `"my 'nice' text"` are allowed).

1. Have Python output the following sentences:
    - My cat is called 'Bruno'.
    - My dog is called "Alfred".
    - Bruno's playmate is called "Alfred".

2. Find and use a string function to replace 'Bruno' with 'Max' and 'Alfred' with 'Moritz'.

## List

In Python several values can be combined in a list. The values may even be of different data types. Lists are enclosed with square brackets and the elements of a list are separated with commas, e.g. `[5, 7, 9]`.

1. Create a list containing the following values and store it in a variable named `my_list`:
    - 5
    - 3,7
    - 'Hello'   

2. With square brackets and a number for the index, the individual elements of the list can be accessed, where the first element has index 0, the second index 1, etc., e.g.:<br>
    `numbers = [3, 7, 9]`<br>
    `numbers[0]` results in `3`<br>
    `numbers[1]` results in `7`<br>
    `numbers[2]` results in `9`<br>
Have the entry 'Hello' output from `my_list`. What happens if you choose an index that is too large?

3. Change the first entry in `my_list` (the `5`) to a `4` without having to redefine the entire list.

4. Append another entry `'World'` to `my_list`.<br>
*Note:* You can join multiple *lists* with `+`, or use the `.append()` function (for single values) or the `.extend()` function (for lists).

5. You can access multiple entries of a list by using *slices*. Check out this stackoverflow post about slices: https://stackoverflow.com/questions/509211/how-slicing-in-python-works  
Try out different ways to retrieve the last two entries stored in `my_list`.

## Tuple

Tuples are very similar to lists, except that they use round brackets (e.g. `(3, 7, 9)`) instead of square ones. Unlike lists, entries in a tuple cannot be changed (they are *immutable*). To change entries in a tuple, it must be completely redefined.

**Task:**
Try to repeat the above list tasks with tuples.<br>
*Note:* Tuples with only one entry must be written with a comma after the entry, e.g. `(3,)`. If you would write e.g. `(3)` this would otherwise be interpreted as the number `3` (as integer).

## Dictionary

Dictionaries are similar to lists, except that each value has a key. While lists are defined with square brackets, dictionaries use curly brackets as follows:
```python
personal_data = {'age': 20, 'first_name': 'Max', 'last_name': 'Mustermann', ...}
```
The entries in the dictionary can be accessed via the 'keys', e.g.: `personal_data['age']` would result in `20` here.<br>
*Note:* Unlike variable names, keys in dictionaries can be named arbitrarily and can even contain spaces.

1. Create a dictionary named 'fruits' with the Key:Value pairs from the following table:
| Key | Value |
| ----------- | ----------- |
| 'Apples'  | 5 |
| 'Bananas' | 7 | 
| 'Pears' | 'three' |

2. Add the stored number of apples and bananas in your dictionary and store it under a new key named 'Total' in your dictionary. Can the pears also be added?

3. Change the value for the pears to a numeric value and try again.

## DataFrame

In data analysis, `DataFrames` are used most frequently. These are available via the `pandas` library.  
**Task:** Execute the following cells and observe the result.

In [5]:
# First, pandas is imported and renamed to 'pd'. This is common to save typing.
import pandas as pd

# Now a two-dimensional list is created and stored in X
X = [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]

# And finally, the two-dimensional list is converted into a DataFrame.
df = pd.DataFrame(X, columns=["first column ", "second column", "third column", "fourth column"])

df

ModuleNotFoundError: No module named 'pandas'

We can also add new data to the DataFrame:

In [None]:
# insert new row with index '3'
df.loc[3] = ([12, 13, 14, 15])
df

In [None]:
# insert new column with name 'fifth column'
df['fifth column'] = [16, 17, 18, 19]
df

**Task:**

1. Add another row and column to the DataFrame `df` with the numbers 20 to 24 as content.

2. DataFrames are actually `classes`, which means that they can not only store values but also have a bunch of functions attached to them. Try it out by executing `df.plot()`. Since this function is using the `matplotlib` library, you will have to `import` that library first.

3. You can also pass function arguments to the `df.plot()` function, e.g. `df.plot(title='My title')`. Try it out to give your plot a title.

4. Find the documentation for pandas' `plot()` functions and look up how to plot the data as a bar plot.

## A first real data set

We now have a look at a real dataset from the `sklearn` library, which we can transform into a pandas DataFrame. The dataset consists of 150 readings of 4 attributes of 3 different iris species (flowers).  
You can read more about this dataset here: https://en.wikipedia.org/wiki/Iris_flower_data_set

In [None]:
# Import the load_iris function from the sklearn.datasets library
from sklearn.datasets import load_iris

# The data is loaded as a dictionary with the keys 'data', 'feature_names' and 'target'.
iris = load_iris()

# Load the data in a DataFrame
df_iris = pd.DataFrame(iris.data, columns=iris.feature_names)
df_iris['target'] = iris.target

1. Display the `iris` *dictionary* loaded in the first step.

2. Now have the same data displayed in the tabular form of your *DataFrame*.

3. Plot the data of your DataFrame.

Now you probably realize why DataFrames are so popular among data scientists. 😉

## Let's build a classifier!

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier

# We split the data into train data (80%) and test data (20%):
# x_train and x_test is holding the features (sepal length, sepal width, etc.)
# y_train and y_test is holding the targets (which kind of flower it is, given by the class indices 0, 1, 2, 3)
features = df_iris.drop('target', axis=1)  # all columns except for the target are used as features for the classification
targets = df_iris['target']
x_train, x_test, y_train, y_test = train_test_split(features, targets, test_size=0.2)

# Now we define a Neural Network Classifier (MLP stands for 'Multi-Layer-Perceptron') and fit it to the training data
model = MLPClassifier(max_iter=1000)
model.fit(x_train, y_train)

In [None]:
# Let's check the accuracy of our classifier via our test set:
score = model.score(x_test, y_test)
print(f'Accuracy of the model: {score*100:.0f}%')

**Task:** Rerun the training and test steps and see if the accuracy changes.