# Type conversions

In the lecture, we discussed the basic built-in data types: integers, floating-point numbers, booleans, and strings. Python allows us to convert one type to another using the following functions:

- [`int()`](https://docs.python.org/3/library/functions.html#int) converts its argument to an integer.
- [`float()`](https://docs.python.org/3/library/functions.html#float) converts its argument to a floating-point number.
- [`bool()`](https://docs.python.org/3/library/functions.html#bool) converts its argument to a boolean.
- [`str()`](https://docs.python.org/3/library/stdtypes.html#str) converts its argument to a string.

These conversions mostly work in an intuitive fashion, with some exceptions.

1.  Define a string variable `s` with the value `'1.1'`. Convert this variable to integer, float, and boolean.
2.  Define the string variables `s1`, `s2`, and `s3` with values `'True'`, `'False'`, and `''` (empty string), respectively. Convert each of these to boolean. Can you guess the conversion rule?
3.  Define a floating-point variable `x` with the value `0.9`. Convert this variable to integer, boolean, and string.
4.  Define the integer variables `i1` and `i2` with values `0` and `2`, respectively. Convert each of these variables to boolean.
5.  Define the boolean variables `b1` and `b2` with values `True` and `False`, respectively. Convert each of them to integer.
6.  NumPy arrays cannot be converted using `int()`, `float()`, etc. Instead, we have to use the
    method [`astype()`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.astype.html)
    and pass the desired data type (e.g., `int`, `float`, `bool`) as an argument.
    Create a NumPy array called `arr` with elements `[0.0, 0.5, 1.0]` and convert it to integer
    and boolean type.


In [1]:
# 1. Define a string s with the value '1.1'. Converts its argument to an integer

s = "1.1"
print(int(float(s)))

1


In [2]:
# 2. Define the string variables, s1,s2, and s3 with values'True','False', and''(empty string),respectively.
# Convert each of these to boolean. Can you guess the conversion rule?

s1 = "True"
s2 = "False"
s3 = ""

# Coverting to boolean
print(bool(s1))
print(bool(s2))
print(bool(s3))

# The conversion rule is that any non-empty string is converted to True and empty string is converted to False

True
True
False


In [3]:
# 3. Define a floating-point variable x with the value 0.9. Convert this variable to integer, boolean,and string.

x = 0.9

# Converting to integer
print(int(x))

# Converting to boolean
print(bool(x))

# Converting to string
print(str(x))

0
True
0.9


In [4]:
# 4. Define the integer variables i1 and i2 with values 0 and 2 , respectively. Convert each of these variables to boolean.

i1 = 0
i2 = 2

# Converting to boolean
print(bool(i1))  # 0 is converted to False
print(bool(i2))  # Any non-zero integer is converted to True

False
True


In [5]:
# 5. Define the boolean variables b1 and b2 with values True and False, respectively. Convert each of them to integer.

b1 = True
b2 = False

# Converting to integer
print(int(b1))  # True is converted to 1
print(int(b2))  # False is converted to 0

1
0


In [9]:
# 6. NumPy arrays cannot be converted using int(),float(), etc. Instead, we have to use the method as type()and pass the desired data type (e.g.,int,float,bool) as an argument.
# Create a NumPy array called arr with elements[0.0, 0.5, 1.0] and convert it to integer and boolean type.
import numpy as np

arr = np.array([0.0, 0.5, 1.0])

# Converting to integer
print(arr.astype(int))

# Converting to boolean
print(arr.astype(bool))

[0 0 1]
[False  True  True]


# Working with strings

Strings in Python are full-fledged objects, i.e., they contain both the character data as well as additional functionality implemented via functions or so-called _methods_.
The official [documentation](https://docs.python.org/3/library/stdtypes.html#string-methods)
provides a comprehensive list of these methods. For our purposes, the most important are:

- [`str.lower()`](https://docs.python.org/3/library/stdtypes.html#str.lower)
  and
  [`str.upper()`](https://docs.python.org/3/library/stdtypes.html#str.upper)
  convert the string to lower or upper case, respectively.
- [`str.strip()`](https://docs.python.org/3/library/stdtypes.html#str.strip)
  removes any leading or trailing whitespace characters from a string.
- [`str.count()`](https://docs.python.org/3/library/stdtypes.html#str.count)
  returns the number of occurrences of a substring within a string.
- [`str.startswith()`](https://docs.python.org/3/library/stdtypes.html#str.startswith)
  and
  [`str.endswith()`](https://docs.python.org/3/library/stdtypes.html#str.endswith)
  check whether a string starts or ends with a given substring.

Moreover, strings are also sequences, and as such support indexing in the
same way as lists or tuples.

Create a string variable with the value

```python
s = '  NHH Norwegian School of Economics  '
```

and perform the following tasks:

1. Strip and surrounding spaces from the string using `strip()`.
2. Count the number of `'H'` in the string.
3. Modify your code so that it is case-insensitive, i.e., both instances of
   `'h'` and `'H'` are counted.
4. Reverse the string, i.e., the last character should come first, and so on.
5. Create a new string which contains every 2nd letter from the original.
6. Select the last character from this new string using at least two different methods.


In [10]:
# Create a string variable
s = "NHH Norwegian School of Economics"

In [11]:
# 1. Strip and surrounding spaces from the string using the strip() method.
print(s.strip())

NHH Norwegian School of Economics


In [15]:
# 2. Count the number of 'H' in the string
print(s.count("H"))

2


In [17]:
# 3. Case-insensitive count of the number of 'h' in the string
print(s.lower().count("h"))

3


In [19]:
# 4. Reverse the string, i.e., the last character becomes the first, and so on.
print(s[::-1])

scimonocE fo loohcS naigewroN HHN


In [22]:
# 5. Create a new string which contains every 2nd letter from the orginal.
x = s[::2]
print(x)

NHNreinSho fEoois


In [24]:
# 6. Select the last character from this string using at least two different methods.

# Method 1
print(s[-1])

# Method 2
print(s[len(s) - 1])

s
s


True

In [27]:
# Start with and end with methods

# Start with method
print(s.startswith("NHH"))

# End with method
print(s.endswith("Economics"))

True
True


In [28]:
# Using %timeit to measure the time taken to execute the code

# 1. Measure the time taken to create a list of numbers from 0 to 999 using the range() function.
import timeit

print(timeit.timeit("list(range(1000))", number=10000))

0.09177149999959511


# Summing lists and arrays

In this exercise, we investigate an additional difference between built-in lists and NumPy arrays: performance.
You are asked to investigate performance differences for different implementations of the `sum()` function.

1. Create a list `lst` and a NumPy array `arr`, each of them containing the sequence
   of ten values `0, 1, 2, ..., 9`.

   _Hint_: You can use the list constructor [`list()`](https://www.w3schools.com/python/ref_func_list.asp)
   and combine it with the [`range()`](https://docs.python.org/3/library/functions.html#func-range)
   function which returns an objecting representing a range of integers.

   _Hint:_ You should create the NumPy array using
   [`np.arange()`](https://numpy.org/doc/stable/reference/generated/numpy.arange.html).

2. We want to compute the sum of integers contained in `lst` and `arr`. Use
   the built-in function [`sum()`](https://www.w3schools.com/python/ref_func_sum.asp)
   to sum elements of a list.
   For the NumPy array, use the NumPy function
   [`np.sum()`](https://numpy.org/doc/stable/reference/generated/numpy.sum.html).

3. You are interested in benchmarking which summing function is faster.
   Repeat the steps from above, but use the cell magic
   [`%timeit`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit)
   to time the execution of a statement as follows:

   ```python
   %timeit statement
   ```

4. Recreate the list and array to contain 100 integers starting from 0,
   and rerun the benchmark.

5. Recreate the list and array to contain 10,000 integers starting from 0,
   and rerun the benchmark.

What do you conclude about the relative performance of built-in lists
vs. NumPy arrays?


In [29]:
# 1. Create a list and a Numpy array - sequence of 10 values 0,1, ..., 9.
import numpy as np

lst = list(range(10))
arr = np.arange(10)

print(lst)
print(arr)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0 1 2 3 4 5 6 7 8 9]


In [30]:
# 2. Compute the sum of integers contained in lst and arr.

# Sum of integers in lst
print(sum(lst))

# Sum of integers in arr
print(np.sum(arr))

45
45


In [32]:
# 3. Benchmarking which summing function is faster using %timeit.

# Sum of integers in lst
time_lst = %timeit sum(lst)

# Sum of integers in arr
time_arr = %timeit np.sum(arr)

print(time_lst)
print(time_arr)

73.1 ns ± 1.58 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
1.66 μs ± 7.52 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
None
None


In [34]:
# 3 Alternative solution
import timeit as ti

# Sum of integers in lst
time_lst = ti.timeit("sum(lst)", globals=globals(), number=10000)

# Sum of integers in arr
time_arr = ti.timeit("np.sum(arr)", globals=globals(), number=10000)

print(time_lst)
print(time_arr)

0.0007499000003008405
0.016937899999902584


In [46]:
# 4. Recreate the list array to contain 100 integers from 0, and return the benchmark for the summing functions.
lst = list(range(100))
arr = np.arange(100)

time_lst = %timeit sum(lst)
time_arr = %timeit np.sum(arr)

346 ns ± 7.05 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
1.72 μs ± 12.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [48]:
# Alternative solution
time_lst = ti.timeit("sum(lst)", globals=globals(), number=10000)
time_arr = ti.timeit("np.sum(arr)", globals=globals(), number=10000)

print(time_lst)
print(time_arr)

0.003518200000144134
0.018346600000768376


In [51]:
# 5. Recreate the list array to contain 10 000 integers from 0, and return the benchmark.
N = 10_000

lst = list(range(N))
arr = np.arange(N)

time_lst = %timeit sum(lst)
time_arr = %timeit np.sum(arr)

25.2 μs ± 898 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
3.41 μs ± 46.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [55]:
# Alternative solution
time_lst = ti.timeit("sum(lst)", globals=globals(), number=N)
time_arr = ti.timeit("np.sum(arr)", globals=globals(), number=N)

print(time_lst)
print(time_arr)

0.24967480000032083
0.030575499999940803
