# Exercise 1: Type conversions

In the lecture, we discussed the basic built-in data types: integers, floating-point numbers, booleans, and strings. Python allows us to convert one type to another using the following functions:

- [`int()`](https://docs.python.org/3/library/functions.html#int) converts its argument to an integer.
- [`float()`](https://docs.python.org/3/library/functions.html#float) converts its argument to a floating-point number. 
- [`bool()`](https://docs.python.org/3/library/functions.html#bool) converts its argument to a boolean.
- [`str()`](https://docs.python.org/3/library/stdtypes.html#str) converts its argument to a string.

These conversions mostly work in an intuitive fashion, with some exceptions. Perform the following tasks to see how they work in detail:

1.  Define a string variable `s` with the value `'1'`. Convert this variable to an integer, a float, and a boolean. Do you get the same behavior if you define the string `s` to be `'1.0'` instead?
2.  Define the string variables `s1`, `s2`, and `s3` with values `'True'`, `'False'`, and `''` (empty string), respectively. Convert each of these to a boolean. Can you guess the conversion rule?
3.  Define a floating-point variable `x` with the value `0.9`. Convert this variable to an integer, a boolean, and a string.
4.  Define the integer variables `i1` and `i2` with values `0` and `2`, respectively. Convert each of these variables to a boolean.
5.   Define the boolean variables `b1` and `b2` with values `True` and `False`, respectively. Convert each of them to an integer.
6.  NumPy arrays cannot be converted using `int()`, `float()`, etc. Instead, we have to use the 
    method [`astype()`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.astype.html) 
    and pass the desired data type (e.g., `int`, `float`, `bool`) as an argument.

    Create a NumPy array called `arr` with elements `[0.0, 0.5, 1.0]` and convert it to an integer
    and a boolean type.


### Solution

#### Part 1 — String conversion example

In [6]:
# Define the string variable
s = '1'

In [7]:
# Convert to integer
int(s)

1

In [8]:
# Convert to float
float(s)

1.0

In [9]:
# Convert to boolean: this returns True because any non-empty string is considered True
bool(s)

True

The behavior of `float()` and `bool()` is identical if we define `s='1.0'` instead, but the `int()` function fails because the string contains a decimal point even though it can still be interpreted as an integer value of 1.

In [10]:
# Define a new string variable with a decimal point
s = '1.0'

# Attempt to convert to integer: this fails because the string contains a decimal point
int(s)

ValueError: invalid literal for int() with base 10: '1.0'

In [11]:
# Convert to float - works as before
float(s)

1.0

In [12]:
# Convert to boolean: works as before
bool(s)

True

#### Part 2 — Boolean string conversion

In [13]:
# Define string variables
s1 = 'True'
s2 = 'False'
s3 = ''

In [14]:
# Convert 'True' to boolean
bool(s1)

True

In [15]:
# Convert 'False' to boolean
bool(s2)

True

In [16]:
# Convert '' to boolean
bool(s3)

False

#### Part 3 — Float conversion examples

In [17]:
# Define floating-point number
x = 0.9

In [18]:
# Convert to integer: The fractional part is truncated, NOT rounded!
int(x)

0

In [19]:
# Convert to boolean: Any numeric value other than 0 is interpreted as True
bool(x)

True

In [20]:
# Convert to string
str(x)

'0.9'

#### Part 4 — Integer to boolean

In [21]:
# Define integer variables
i1 = 0
i2 = 2

In [22]:
# Convert 0 to boolean: the numeric value 0 is interpreted as False
bool(i1)

False

In [23]:
# Convert 2 to boolean: any non-zero numeric value is interpreted as True
bool(i2)

True

#### Part 5 — Boolean to integer

In [24]:
# Define boolean variables
b1 = True
b2 = False

In [25]:
# Convert True to integer
int(b1)

1

In [26]:
# Convert False to integer
int(b2)

0

#### Part 6 — NumPy array conversions

In [27]:
# Import numpy
import numpy as np

# Create array of floating-point numbers
arr = np.array([0.0, 0.5, 1.0])
arr

array([0. , 0.5, 1. ])

In [28]:
# Convert to integer array: note that values are again truncated, NOT rounded.
arr.astype(int)

array([0, 0, 1])

In [29]:
# Convert to boolean array
arr.astype(bool)

array([False,  True,  True])

<span style="display: none;">SolutionEnd</span>

***
# Exercise 2: Working with strings

Strings in Python are full-fledged objects, i.e., they contain both the character data as well as additional functionality implemented via functions or so-called _methods_.
The official [documentation](https://docs.python.org/3/library/stdtypes.html#string-methods) 
provides a comprehensive list of these methods. For our purposes, the most important are:

- [`str.lower()`](https://docs.python.org/3/library/stdtypes.html#str.lower) 
    and 
  [`str.upper()`](https://docs.python.org/3/library/stdtypes.html#str.upper)
  convert the string to lower or upper case, respectively.
- [`str.strip()`](https://docs.python.org/3/library/stdtypes.html#str.strip) 
  removes any leading or trailing whitespace characters from a string.
- [`str.count()`](https://docs.python.org/3/library/stdtypes.html#str.count)
  returns the number of occurrences of a substring within a string.
- [`str.startswith()`](https://docs.python.org/3/library/stdtypes.html#str.startswith) 
    and 
  [`str.endswith()`](https://docs.python.org/3/library/stdtypes.html#str.endswith) 
  check whether a string starts or ends with a given substring.

**Important:** These methods need to be applied to a particular string variable, not the class `str` itself. For example, if you have a string variable `s`, you use `s.lower()`, etc.

Moreover, strings are also sequences and as such support indexing in the 
same way as lists or tuples, so for example `'NHH'[1]` returns the 2nd character `'H'`.

In this exercise, you are asked to apply a few of these concepts.
Create a string variable with the value 
```python
s = '  NHH Norwegian School of Economics  '
```
and perform the following tasks:

1. Strip the surrounding spaces from the string using `strip()`.
2. Count the number of `'H'` in the string.
3. Modify your code so that it is case-insensitive, i.e., both instances of 
   `'h'` and `'H'` are counted.
4. Reverse the string, i.e., the last character should come first, and so on.
5. Create a new string which contains every 2nd letter from the original.
6. Select the last character from this new string using at least two different methods.


### Solution

#### Part 1 — Strip whitespace

In [30]:
# Define the string
s = '  NHH Norwegian School of Economics  '

# strip leading & trailing whitespace
s = s.strip()
s

'NHH Norwegian School of Economics'

#### Part 2 — Count letter H

In [31]:
# Count the number of H in the string
s.count('H')

2

#### Part 3 — Case-insensitive count

In [32]:
# Count number of H, ignore case:
# We first convert the string to lowercase, then look for the number of h
s.lower().count('h')

3

#### Part 4 — Reverse string

In [33]:
# Reverse the string
s[::-1]

'scimonocE fo loohcS naigewroN HHN'

Recall that the slice `start:stop:step` can be used to index sequences *and* strings. We use the default values for `start` and `stop` (so they can be omitted), but set the step to `-1` to reverse the order.

#### Part 5 — Every second character

In [34]:
s2 = s[::2]
s2

'NHNreinSho fEoois'

#### Part 6 — Select last character

In [35]:
# Select last element using -1
s2[-1]

's'

In [36]:
# Select last element using len()
s2[len(s2) - 1]

's'

<span style="display: none;">SolutionEnd</span>

***
# Exercise 3: Summing lists and arrays

In this exercise, we investigate an additional difference between built-in lists and NumPy arrays: performance.
You are asked to investigate performance differences for different implementations of the `sum()` function.

1. Create a list `lst` and a NumPy array `arr`, each of them containing the sequence 
   of ten values `0, 1, 2, ..., 9`.

   *Hint*: You can use the list constructor [`list()`](https://www.w3schools.com/python/ref_func_list.asp)
   and combine it with the [`range()`](https://docs.python.org/3/library/functions.html#func-range)
   function which returns an object representing a range of integers.

   *Hint:* You should create the NumPy array using 
   [`np.arange()`](https://numpy.org/doc/stable/reference/generated/numpy.arange.html).

2. We want to compute the sum of integers contained in `lst` and `arr`. Use 
   the built-in function [`sum()`](https://www.w3schools.com/python/ref_func_sum.asp)
   to sum elements of a list.
   For the NumPy array, use the NumPy function 
   [`np.sum()`](https://numpy.org/doc/stable/reference/generated/numpy.sum.html).

3. You are interested in benchmarking which summing function is faster.
    Repeat the steps from above, but use the cell magic 
    [`%timeit`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit)
    to time the execution of a statement as follows:

    ```python
    %timeit statement
    ```

4.  Recreate the list and array to contain 100 integers starting from 0,
    and rerun the benchmark.

5.  Recreate the list and array to contain 10,000 integers starting from 0,
    and rerun the benchmark.


What do you conclude about the relative performance of built-in lists 
vs. NumPy arrays?

### Solution

#### Part 1 — Create list and array

In [37]:
# create list with 10 elements 0,1,...,9
lst = list(range(10))

In [38]:
import numpy as np
# create array with 10 elements 0,1,...,9
arr = np.arange(10)

#### Part 2 — Sum using functions

In [39]:
# sum list using the built-in function sum()
sum(lst)

45

In [40]:
# sum the NumPy array using NumPy's sum() function
np.sum(arr)

np.int64(45)

#### Part 3 — Benchmark with %timeit

In [41]:
# benchmark summing list using built-in sum()
%timeit sum(lst)

44.9 ns ± 0.597 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [42]:
# Benchmark summing array using NumPy's sum()
%timeit np.sum(arr)

1.15 μs ± 6.58 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


As you can see, for a short list the built-in `sum()` was faster by a factor of about 25 (the exact difference varies depending on your hardware and platform).

#### Part 4 — Benchmark 100 elements

In [43]:
# Recreate list and array to contain 100 integers starting at 0
N = 100
lst = list(range(N))
arr = np.arange(N)

In [44]:
# benchmark built-in sum() with 100 elements
%timeit sum(lst)

329 ns ± 2.67 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [45]:
# benchmark NumPy's sum() with 100 elements
%timeit np.sum(arr)

1.16 μs ± 5.62 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


For 100 elements, the built-in `sum()` is still faster, but only by a factor of 4 (again, the exact values depend on your platform). Note that the execution time for `np.sum()` remained almost unchanged, which suggests that the function call has a high fixed cost but scales much better with the number of elements to be summed.

#### Part 5 — Benchmark 10000 elements

In [46]:
# Recreate list and array to contain 10,000 integers starting at 0
N = 10000
lst = list(range(N))
arr = np.arange(N)

In [47]:
# benchmark built-in sum() with 10000 elements
%timeit sum(lst)

44.6 μs ± 852 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [48]:
# benchmark NumPy's sum() with 10000 elements
%timeit np.sum(arr)

1.73 μs ± 7.73 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


Lastly, for 10,000 elements `np.sum()` is substantially faster by a factor of about 20. 

You should conclude that for large arrays, you can expect much better performance from NumPy's functions, but this may not be true for small datasets.
<span style="display: none;">SolutionEnd</span>