<h2><center>Week 3 - Assignment</center></h2>
<h3><center>Programming for Data Science 2024</center></h3>

Exercises for the topics covered in the third lecture.

The exercise will be marked as passed if you get **at least 10/15** points.

Exercises must be handed in via **ILIAS** (Homework assignments). Deliver your submission as a compressed file (zip) containing one .py or .ipynb file with all exercises. The name of both the .zip and the .py/.ipynb file must be *SurnameName* of the two members of the group. Example: Riccardo Cusinato + Athina Tzovara = *CusinatoRiccardo_TzovaraAthina.zip* .

It's important to use comments to explain your code and show that you're able to take ownership of the exercises and discuss them.

You are not expected to collaborate outside of the group on exercises and submitting other groups’ code as your own will result in 0 points.

For questions contact: *riccardo.cusinato@unibe.ch* with the subject: *Programming for Data Science 2024*.

**Deadline: 14:00, March 14, 2024.**

<h3 style="text-align:left;">Exercise 1 - Error investigation<span style="float: right">2 points</span></h3>

The code below squares and sums the numbers in the array *arr*, and holds the result in the variable *squared_sum*, which should be 1135. However, that is not the case. Correct the code and explain in a comment , clearly and amply, what was wrong.

In [74]:
import numpy as np

arr = np.array([13, 14, 15, 16, 17], dtype=np.int16)
squared_sum = np.sum(arr ** 2)
squared_sum

1135

In [75]:
###
# Correction: changed dtype from int8 to int16 (min = −32'768, max = 32'767 )
# In Numpy integers have fixed size. The datatype int8 can only store 8 bits. 8 bits can only store integers from -128 
# until 127. So 1135 is to big to be stored in this datatype and it will lead to an overflow.
###
print(np.iinfo(np.int8))
print(np.iinfo(np.int16))


Machine parameters for int8
---------------------------------------------------------------
min = -128
max = 127
---------------------------------------------------------------

Machine parameters for int16
---------------------------------------------------------------
min = -32768
max = 32767
---------------------------------------------------------------


<h3 style="text-align:left;">Exercise 2 - Vacation selector<span style="float: right">3 points</span></h3>

The code below defines five vacation destinations (*locations*) and four attributes for each (*attributes*). Each row describes one destination, and the columns represent scores on the factors scenery, activities, food, and nightlife.

Write a function *vacation_advisor* that asks the user whether they find each of the attributes important or not, and suggests the best vacation spot based on these preferences.

Use techniques from the third lecture to solve the exercise.

Example interaction:
```python
Is scenery important to you [y/n]?    > y
Is activities important to you [y/n]? > y
Is food important to you [y/n]?       > n
Is nightlife important to you [y/n]?  > n
Based on your preferences, the best destination is Australia
```

In [76]:
# List of destinations
locations = np.array(["Hawaii", "Thailand", "Italy", "Australia", "Japan"])

# List of attributes for each destination. Each column is an attribute. Each row a destination.
attributes = np.array([
    [8, 8, 7, 6],
    [7, 9, 8, 7],
    [8, 6, 9, 7],
    [9, 8, 8, 6],
    [7, 9, 7, 8]
])

# Declare attribute names and initialize boolean array with preferences
attribute_names = ['scenery', 'activities', 'food', 'nightlife']


In [77]:
###
# YOUR CODE GOES HERE
###

<h3 style="text-align:left;">Exercise 3 - Indexing<span style="float: right">3 points</span></h3>

You have two arrays of the same length: temperature *temp*, and humidity, *rh*. Write a program that:
1) Substitutes the values of *temp* for which the corresponding values of *rh* is less than 0.3 with *np.nan*.
2) On this new temperature array, calculate the mean value (do **not** calculate it on the original array).

As an example:

```python
temp = [70, 80, 90]
rh = [0.5, 0.2, 0.6]

temp_nan --> [70, np.nan, 90]
temp_avg --> 80
```

In [78]:
# Generate some surrogate data

np.random.seed(29041996)  # Make sure we all have the same data
temp = 20 * np.cos(np.linspace(0, 2 * np.pi, 100)) + 80 + 2 * np.random.randn(100)
rh = np.abs(0.1 * np.cos(np.linspace(0, 4 * np.pi, 100))
            + 0.3 + 0.05 * np.random.randn(100))

In [79]:
#print(temp)
rh_mask = rh < 0.3  #creates a boolean mask where element at index i is true if rn[i] is smaller than 0.3
temp_nan = temp.copy()  # copy temp array to avoid changing original array in case we need the original data
temp_nan[rh_mask] = np.nan  # replace every value with np.nan if the corresponding index in rh is < 0.3

np.nanmean(temp_nan)  # calculate the mean, nanmean() is used to ignore the nan values, using mean() would return nan

79.00078388998652

<h3 style="text-align:left;">Exercise 4 - Base converter<span style="float: right">2 points</span></h3>

Write a function *int_to_bin* that takes a positive integer as input and returns the binary equivalent of that integer.

You can **not** use built-in methods such as *bin()* in your solution.

In [134]:
def int_to_bin(number: int) -> str:
    """Converts a positive integer to its binary representation. number has to be a positive integer.
    Returns a string"""
    if number < 0:  # assert the given number is positive
        raise Exception('provided number is not positive')

    bin_rep: str = '0b'  # start the binary representation with 0b, which signals the number is in binary
    if number == 0:  # guard clause if the number is 0, since this would yield an incorrect result otherwise
        bin_rep += '0'
        return bin_rep

    digits: int = bin_digits(number)  # calculate how many digits the binary representation will have

    for position in range(digits - 1, -1, -1):
        # for each digit of the binary representation, check if the digit will be 0 or 1.
        position_value: int = 2 ** position
        bin_rep += str(number // position_value)
        # add '0' or '1' to the binary representation based on if remaining number is larger or equal to position_value
        number = number % position_value  # calculate the remainder which has to be converted to binary
    return bin_rep


def bin_digits(number: int) -> int:
    """Calculates how many digits the binary representation will have. Is equivalent to floor(log2(number)), but without using builtin methods."""
    position: int = 0
    while number >= 2 ** position:
        position += 1
    return position


test = 17
to_bin: str = int_to_bin(test)
print("int_to_bin: " + to_bin)
print("expected:   " + bin(test))

int_to_bin: 0b10001
expected:   0b10001


<h3 style="text-align:left;">Exercise 5 - Broadcasting<span style="float: right">2 points</span></h3>

Reshape *a* so it is possible to multiply *a* and *b*, and explain why you had to reshape *a* to be able to multiply the two arrays.

In [81]:
import numpy as np

a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([2, 3])

###
# YOUR CODE GOES HERE
###

In [82]:
###
# YOUR COMMENT HERE
###

<h3 style="text-align:left;">Exercise 6 - Moving average<span style="float: right">3 points</span></h3>

Given the array of values, *a*, we can calculate the moving average by averaging nearby values and repeating the procedure sliding along the array. Here's an example of a 3-point moving average (ignoring the edges), with a for loop:

In [83]:
a = np.round(30 + np.random.randn(20) * 2, 1)
print(a)

# Moving average
a_avg = np.zeros_like(a)
# We're just ignoring the edge effects here
for i in range(1, len(a) - 1):
    sub = a[i - 1:i + 2]
    a_avg[i] = sub.mean()
# For the first and last point, we use the original values.
a_avg[[0, -1]] = a[[0, -1]]
print(a_avg)

[35.3 30.8 32.2 29.8 28.7 28.2 33.6 31.3 28.6 31.3 28.5 28.6 30.8 29.4
 31.7 31.9 31.2 29.3 30.7 33.3]
[35.3        32.76666667 30.93333333 30.23333333 28.9        30.16666667
 31.03333333 31.16666667 30.4        29.46666667 29.46666667 29.3
 29.6        30.63333333 31.         31.6        30.8        30.4
 31.1        33.3       ]


Write a function *mov_avg* that takes an array in input and returns its 3-point moving average. You **have to use broadcasting** to compute the moving average. As in the example, use the original array values at the borders.

In [84]:
###
# YOUR CODE GOES HERE
###