[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Humboldt-WI/bads/blob/master/exercises/1_ex_python.ipynb) 

# Exercises on Python programming
We covered a lot of concepts in the first [tutorial on Python programming](https://github.com/Humboldt-WI/bads/blob/master/tutorials/1_nb_python_intro.ipynb). Solving the exercises allows you to test your familiarity with these concepts.  

## Variables, assignments, and comparisons

1. Create two variables $a$ and $b$ and assign values of $3$ and $4.5$.

In [None]:
a = 3
b = 4.5

2. Query the type of variable $a$.

In [None]:
type(a)

int

3. Check whether variable $b$ is a text variable (of class character).

In [None]:
type(b) == str

False

In [None]:
# and as an alternative
isinstance(b, str)

False

4. Calculate $a^2 + \frac{1}{b}$, $\sqrt{a*b}$, and $log_2(a)$.

In [None]:
a ** 2 + 1/b

9.222222222222221

In [None]:
import numpy as np
np.sqrt(a * b)

# we import numpy to use numpy.sqrt() function
# can alternatively be computed using (a * b) ** (1/2)

3.6742346141747673

In [None]:
np.log2(a)

1.584962500721156

## Matrix algebra
Create three additional variables as follows:

 $$ A = \left( \begin{matrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 10 \end{matrix} \right) \quad
  B = \left( \begin{matrix} 1 & 4 & 7 \\ 2 & 5 & 8 \\ 3 & 6 & 9 \end{matrix} \right)  \quad
  y = \left( \begin{matrix} 1 \\ 2 \\ 3 \end{matrix} \right) $$

Perform the following operations. Note that mathematical operators like `*` might not behave in the way you need it. Wasn't there a powerful library for all sorts of numerical computations including classic linear algebra?

Calculate  

  1. $a*A$

In [None]:
A = np.array(([1, 2, 3], 
              [4, 5, 6], 
              [7, 8, 10]))
a * A

# when multiplying a matrix by a scalar, we can use the operator * to perform element-wise multiplication

array([[ 3,  6,  9],
       [12, 15, 18],
       [21, 24, 30]])

  2. $A*B$

In [None]:
B = np.array(([1, 4, 7], 
              [2, 5, 8], 
              [3, 6, 9]))
np.matmul(A, B)

# matrix multiplication is performed using numpy.matmul()

array([[ 14,  32,  50],
       [ 32,  77, 122],
       [ 53, 128, 203]])

  3. The inverse of matrix $A$ and store the result in a variable $invA$. Any ideas how to get Python to invert a matrix?

> Hint: NumPy is your friend.  

In [None]:
# Option 1 (standad approach)
invA = np.linalg.inv(A)
print(invA)

# Option 2 (for the curious reader)
invA = np.linalg.solve(A, np.eye(B.shape[1]))  # Look into the documentation of numpy.linalg.solve() to see what happens

print(invA)

array([[-0.66666667, -1.33333333,  1.        ],
       [-0.66666667,  3.66666667, -2.        ],
       [ 1.        , -2.        ,  1.        ]])

  4. Multiply $A$ and $invA$ and verify that the result is the identity matrix (i.e. only 1s on the diagonal). You'll probably find that it isn't, because computers usually make very small rounding error when handling real numbers. The reason is interesting, but you'll have to look it up if you're interested.

In [None]:
res = np.matmul(A, invA)
res

# as you can see below, some matrix elements are not exactly zero

array([[1.00000000e+00, 0.00000000e+00, 1.11022302e-16],
       [0.00000000e+00, 1.00000000e+00, 2.22044605e-16],
       [8.88178420e-16, 0.00000000e+00, 1.00000000e+00]])

In [None]:
(res == np.eye(3)).all()

# res == np.eye(3) checks if values in two arrays are equal element-wise
# .all() is used to check if condition holds across all array elements

False

  5. The transpose of matrix $B$

In [None]:
np.transpose(B)

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

  6. Fill the first row of matrix $B$ with ones

In [None]:
B[0,:] = np.ones(3)
B

array([[1, 1, 1],
       [2, 5, 8],
       [3, 6, 9]])

  7. Calculate the ordinary least squares estimator $\beta$ (i.e. a standard regression) 
$$ \beta = (A^{\top}A)^{-1}A^{\top} y $$  

In [None]:
y = np.transpose(np.array([1, 2, 3]))

# let's split our equation into two terms and compute beta step by step

term1 = np.linalg.inv(np.matmul(np.transpose(A), A))
term2 = np.matmul(np.transpose(A), y)
beta  = np.matmul(term1, term2)
beta

array([-3.33333333e-01,  6.66666667e-01, -6.92779167e-14])

## Indexing
1. Look at values of variables $A$, $B$, and $y$ from the last exercise

In [None]:
print(A)
print('')
print(B)
print('')
print(y)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8 10]]

[[1 1 1]
 [2 5 8]
 [3 6 9]]

[1 2 3]


2. Access the second element in the third row of $A$ and the first element in the second row of $B$, and compute their product

In [None]:
A[2, 1] * B[1, 0]

# remember that numbering in Python starts at 0

16

3. Multiply the first row of $A$ and the third column of $B$

In [None]:
np.matmul(A[0, :], B[:, 2])

44

4. Access the elements of y that are greater than 1 (without looking up their position manually)

In [None]:
y[y > 1]

array([2, 3])

5. Access the elements of A in the second column, for which the values in the first column are greater or equal to 4.

In [None]:
A[:,1][A[:,0] >= 4]

array([5, 8])

6. Access the 4th row of A. If this returns an error message, use Google to investigate the problem and find out what went wrong.

In [None]:
A[3, :]

# IndexError pops up when trying to access index that is outside of the array dimensions
# in our case, A only has three rows, so trying to access the 4th row results in an error

IndexError: ignored

## Custom functions
For many statistical applications it is practical to standardize variable values. One way to standardize is *centering and scaling*. In simple words, we make the variables comparable by reducing them to the same scale.

Start with implementing a custom function. Your function should take an argument **x**. To keep things simple, we expect x to always be a numeric vector (and not text or a matrix, for example). In the body of the function, calculate the mean and standard deviation of **x**. Store the results in variables  **mu** and **std**, respectively. Then for each element in the vector, substract the mean and divide by the standard deviation.
$$ x_{new} = \frac{x-\mu}{std}$$
Make sure your functions **returns** the standardized vector (i.e., $x_{new}$ in the equation) as result. You might want to import `NumPy` for calculating the mean and standard deviation.

In [None]:
def standardize(x):
  '''
  Substracts mean and divides by standard deviation

  Input: 
  - x (numeric numpy array)

  Output:
  - x_new (standardized numeric numpy array)
  '''

  import numpy as np

  mu  = np.mean(x)
  std = np.std(x)

  x_new = (x - mu) / std

  return x_new

  # when defining a function, it is always useful to provide a docstring, which is
  # a comment snippet in the following format: '''comment'''. It is common practice
  # to specify the goal of the function as well as its inputs, outputs and arguments.

You should always test your functions. Create a vector **a** with the elements (-100, -25, -10, 0, 10, 25, 100) and check if your function produces the correct result.     

In [None]:
x = np.array([-100, -25, -10, 0, 10, 25, 100])
standardize(x)

*Optional*: Create a vector **b** with elements ("1", "2", "3") and check the function. Let's include a simple check in the function and give feedback. Before doing any calculations, use `if()` and `type()` to check if the input is a numeric vector. There are many ways to code the condition *x is numeric* in Python. Run a quick web search and use a simple approach. If the input is not numeric, skip the computations and print a message "input not numeric".

In [None]:
b = np.array(["1", "2", "3"])
standardize(b)

# does not work because b is not a numeric array

In [None]:
# when working with numpy arrays, one way to check if input is numeric is to use 
# np.dtype() to query the array type, and use np.issubdtype() to check if the type
# belongs to a parent type (see https://numpy.org/doc/stable/reference/arrays.scalars.html 
# for hierarchy). In our case, we will check if an array belongs to a type np.number

def standardize(x):
  '''
  Substracts mean and divides by standard deviation

  Input: 
  - x (numeric numpy array)

  Output:
  - x_new (standardized numeric numpy array)
  '''

  import numpy as np

  # stop execution if array is not numeric
  if not np.issubdtype(x.dtype, np.number): 
      print('Input not numeric. Received array of type:', x.dtype)
      return

  mu  = np.mean(x)
  std = np.std(x)

  x_new = (x - mu) / std

  return x_new

In [None]:
# test the new function with numeric input
standardize(x)

In [None]:
# test the new function with non-numeric input
standardize(b)

## Data structures 
Say you want to keep track of the members of the four houses of the famous Hogwarts School of Witchcraft and Wizardry. What might be a suitable data structure. We create a dictionary named **hogwarts** and use the names of the houses as keys. Then, the values associated with those keys could be any type that supports storing a set of strings, i.e., to store the names of the members. Draw on your knowledge of Python dictionaries and list to implement such a data structure. Populate the dictionary with the following data, and feel free to add more characters if you wish. 

- Gryffindor: I'm sure you know many members of that house 
- Hufflepuff: notable members include Newt Scamander, Cedric Diggory and Nymphadora Tonks
- Ravenclaw: here we've got, e.g. Luna Lovegood, Gilderoy Lockhart and Filius Flitwick
- Slytherin: Draco Malfoy, Vincent Crabbe, Gregory Goyle, and of course the one that must not be named

In [None]:
hogwarts = {'Gryffindor': ['Harry Potter', 'Ron Weasley', 'Hermione Granger'],
            'Hufflepuff': ['Newt Scamander', 'Cedric Diggory', 'Nymphadora Tonks'],
            'Ravenclaw':  ['Luna Lovegood', 'Gilderoy Lockhart', 'Filius Flitwick'],
            'Slytherin':  ['Draco Malfoy', 'Vincent Crabbe', 'Gregory Goyle', 'Tom Riddle']
            }
hogwarts

Dictionaries are really useful. Still our above data structure is limited. We can only store the name of a witch or wizard. Wouldn't it be cool to be able to store more information, something like her/his favored charm, best friend, pet, etc. 

Think about how we could realize this functionality. Well, we could create yet another dictionary in which we use a witch's/wizard's name as key and as value some some other data structure in which we can store all the details we like. To our knowledge, the names of witches / wizards are unique in the Harry Potter universe, so that names could serve as (unique) keys; nerd alert. Still a dictionary of dictionaries sounds pretty complicated. In fact, the task we described above is a perfect use case of custom classes. They allow us to store any piece of information about a person at one place.

Create a custom class wizard that facilitates storing the following properties:
- First name
- Last name
- Pet
- Pet name
- Patronus shape


Also implement a method `tell_pet()`that prints an output of the following format:
*"Harry Potter's owl is called Hedwig."* 

Implement one more  method `expecto_patronum()`. Calling that method for Harry would produce the output (print):
*"A stag appears."*

In [None]:
class wizard():

    '''Custom wizard class that stores relevant wizard information'''

    def __init__(self,        
                 first_name,             
                 last_name,
                 pet            = None,
                 pet_name       = None,
                 patronus_shape = None): # first and last name have to be specifed; pet and patronus data is set to None by default 

        # assign attributes
        self.first_name     = first_name 
        self.last_name      = last_name 
        self.pet            = pet 
        self.pet_name       = pet_name 
        self.patronus_shape = patronus_shape 

        
    def tell_pet(self): 

        '''Method for printing the pet information'''

        # there are several ways to print the required statement
        # one way is to use the so-called f strings; we will use .format() syntax
        print('{} {}\'s {} is called {}.'.format(self.first_name, self.last_name, self.pet, self.pet_name))
        return None


    def expecto_patronum(self): 

        '''Method for summoning a patronum'''

        print('A {} appears.'.format(self.patronus_shape))
        return None

Update your dictionary with schools and their members. Instead of storing a list of names as values, your new dictionary should store a list of instances of the class wizard. Note that you need to create these instances first. So you need to create an instance of class wizard for Harry, another one for Ron, Malfoy, etc. In case your knowledge of the Potter universe is a bit shaky, simply invent the data you need. Just in case, [here is a ittle refresher of the expecto patronum spell](https://www.insider.com/harry-potter-characters-patronus-2018-11).

In [None]:
# creating class instances

HarryPotter = wizard(first_name     = 'Harry',
                     last_name      = 'Potter',
                     pet            = 'owl',
                     pet_name       = 'Hedwig',
                     patronus_shape = 'stag')

RonWeasly = wizard(first_name     = 'Ron',
                   last_name      = 'Weasley',
                   pet            = 'rat',
                   pet_name       = 'Scabbers',
                   patronus_shape = 'dog')

HermioneGranger = wizard(first_name      = 'Hermione',
                         last_name       = 'Granger',
                         pet             = 'cat',
                         pet_name        = 'Crookshanks',
                         patronus_shape  = 'otter')

In [None]:
# updating Gryffindor members in the dictionary
# other houses can be created in the same fashion

hogwarts['Gryffindor'] = [HarryPotter, RonWeasly, HermioneGranger]

In [None]:
# check Harry's pet

hogwarts['Gryffindor'][0].tell_pet()

In [None]:
# check Harry's patronus

hogwarts['Gryffindor'][0].expecto_patronum()

## Well done!!! 
You will soon be a true Python wizard.