## 1. What is the Average Height of US Presidents?

Aggregates available in NumPy can be extremely useful for summarizing a set of values.
As a simple example, let's consider the heights of all US presidents.

This data is available in the file *president_heights.csv*, which is a simple comma-separated list of labels and values.

Find the mean height, the standard deviation of height, and the president who is the smallest and tallest.

You can use `pandas` to read in the file if you want, then cast the column to a `np.array`

In [1]:
import pandas as pd
import numpy as np

ph = pd.read_csv("data/president_heights.csv")
df = pd.DataFrame(ph)
avg = np.mean(df['height(cm)'])
std = np.std(df['height(cm)'])
min1 =  df.loc[df['height(cm)']== np.min(df['height(cm)'])]
max1 = df.loc[df['height(cm)']== np.max(df['height(cm)'])]

print(avg)
print(std)
print(min1)
print(max1)

179.73809523809524
6.931843442745893
   order           name  height(cm)
3      4  James Madison         163
    order               name  height(cm)
15     16    Abraham Lincoln         193
33     36  Lyndon B. Johnson         193


# Exercise 2

Recall the polynomial formula

$$
p(x) = a_0 + a_1 x + a_2 x^2 + \cdots a_N x^N = \sum_{n=0}^N a_n x^n \tag{1}
$$

In the **math functions workshop**, you wrote a simple function `p(x, coeff)` to evaluate it without thinking about efficiency.

Now write a new function that does the same job, but uses NumPy arrays and array operations for its computations, rather than any form of Python loop.

(This is already implemented in `np.poly1d`, but use that only to test your function)

- Hint: Use `np.cumprod()`  


In [2]:
import pandas as pd
import numpy as np

def p(x, coeffArr):
    np_coeffArray = np.array(coeffArr)
    np_filled_with_x = np.full(len(coeffArr) - 1, x)
    np_cumulative_prod_of_x = np.cumprod(np_filled_with_x)
    final_array_to_multiply = np.insert(np_cumulative_prod_of_x, 0, 1)
    return np.sum(np_coeffArray * final_array_to_multiply)

print(p(3, [1, 2, 3, 4, 5]))
poly = np.poly1d([1, 2, 3, 4, 5][::-1])#test
poly(3)

547


547

## Exercise 3 Softmax

Read in `data/iris.csv` and compute the [softmax]() of the sepal length. The formula for the softmax function $\sigma(x)$ for a vector $x = \{x_0, x_1, ..., x_{n-1}\}$ is
    .$$\sigma(x)_j = \frac{e^{x_j}}{\sum_k e^{x_k}}$$


Your result should be equal to the output of `scipy.special.softmax`

In [3]:
from scipy.special import softmax
import math
import pandas as pd
import numpy as np

iris = pd.read_csv("data/iris.csv")
df = pd.DataFrame(iris)
sl = df['sepallength']
def sftmax(lst):
    new_sl=[]
    num = []
    for num1 in lst:
        num.append(math.exp(num1))
    total = np.sum(num)
    for num1 in num:
        new_sl.append(round(num1/total,6))
    return new_sl

#softmax(sl)
sftmax(sl)

[0.00222,
 0.001817,
 0.001488,
 0.001346,
 0.002008,
 0.002996,
 0.001346,
 0.002008,
 0.001102,
 0.001817,
 0.002996,
 0.001644,
 0.001644,
 0.000997,
 0.00447,
 0.004044,
 0.002996,
 0.00222,
 0.004044,
 0.00222,
 0.002996,
 0.00222,
 0.001346,
 0.00222,
 0.001644,
 0.002008,
 0.002008,
 0.002453,
 0.002453,
 0.001488,
 0.001644,
 0.002996,
 0.002453,
 0.003311,
 0.001817,
 0.002008,
 0.003311,
 0.001817,
 0.001102,
 0.00222,
 0.002008,
 0.001218,
 0.001102,
 0.002008,
 0.00222,
 0.001644,
 0.00222,
 0.001346,
 0.002711,
 0.002008,
 0.01484,
 0.008144,
 0.013428,
 0.003311,
 0.009001,
 0.004044,
 0.007369,
 0.001817,
 0.009947,
 0.002453,
 0.002008,
 0.00494,
 0.005459,
 0.006033,
 0.003659,
 0.010994,
 0.003659,
 0.00447,
 0.006668,
 0.003659,
 0.00494,
 0.006033,
 0.007369,
 0.006033,
 0.008144,
 0.009947,
 0.01215,
 0.010994,
 0.005459,
 0.004044,
 0.003311,
 0.003311,
 0.00447,
 0.005459,
 0.002996,
 0.005459,
 0.010994,
 0.007369,
 0.003659,
 0.003311,
 0.003311,
 0.006033,
 0.

## Exercise 4: unique counts


Compute the counts of unique values row-wise.

Input:
```
np.random.seed(100)
arr = np.random.randint(1,11,size=(6, 10))
arr
> array([[ 9,  9,  4,  8,  8,  1,  5,  3,  6,  3],
>        [ 3,  3,  2,  1,  9,  5,  1, 10,  7,  3],
>        [ 5,  2,  6,  4,  5,  5,  4,  8,  2,  2],
>        [ 8,  8,  1,  3, 10, 10,  4,  3,  6,  9],
>        [ 2,  1,  8,  7,  3,  1,  9,  3,  6,  2],
>        [ 9,  2,  6,  5,  3,  9,  4,  6,  1, 10]])
```
Desired Output:
```
> [[1, 0, 2, 1, 1, 1, 0, 2, 2, 0],
>  [2, 1, 3, 0, 1, 0, 1, 0, 1, 1],
>  [0, 3, 0, 2, 3, 1, 0, 1, 0, 0],
>  [1, 0, 2, 1, 0, 1, 0, 2, 1, 2],
>  [2, 2, 2, 0, 0, 1, 1, 1, 1, 0],
>  [1, 1, 1, 1, 1, 2, 0, 0, 2, 1]]
```
Output contains 10 columns representing numbers from 1 to 10. The values are the counts of the numbers in the respective rows.
For example, Cell(0,2) has the value 2, which means, the number 3 occurs exactly 2 times in the 1st row.

In [5]:
#run this first
import random 
import numpy as np
arr = np.random.randint(1,11,size=(6, 10))
arr

array([[ 4,  8, 10,  8, 10,  1,  4,  3,  9,  8],
       [ 4, 10,  4,  6,  7,  3,  1,  8,  1,  5],
       [ 5,  5,  4,  5,  4,  8,  4,  6,  3,  8],
       [ 9,  1,  7,  2, 10,  7,  6,  2,  7,  8],
       [ 6, 10,  6,  9,  2,  2, 10,  4,  4,  8],
       [ 6,  6,  3,  1,  3,  1,  7,  7,  1,  9]])

In [6]:
def unique(arr):
    new=[]
    lst1 = list(range(1,11))
    for i in range(len(arr)):
        for j in range(len(arr[i])):
            if lst1[j] in arr[i]:
                a = np.count_nonzero(arr[i]==j+1)
                new.append(a)
            else:
                c = 0
                new.append(c)
    new = np.array(new)
    new.shape=(6,10)
    return new
unique(arr)

array([[1, 0, 1, 2, 0, 0, 0, 3, 1, 2],
       [2, 0, 1, 2, 1, 1, 1, 1, 0, 1],
       [0, 0, 1, 3, 3, 1, 0, 2, 0, 0],
       [1, 2, 0, 0, 0, 1, 3, 1, 1, 1],
       [0, 2, 0, 2, 0, 2, 0, 1, 1, 2],
       [3, 0, 2, 0, 0, 2, 2, 0, 1, 0]])

## Exercise 5: One-Hot encodings

Compute the one-hot encodings (AKA dummy binary variables) for each unique value in the array.

Input:
```
np.random.seed(101) 
arr = np.random.randint(1,4, size=6)
arr
#> array([2, 3, 2, 2, 2, 1])
```
Output:
```
#> array([[ 0.,  1.,  0.],
#>        [ 0.,  0.,  1.],
#>        [ 0.,  1.,  0.],
#>        [ 0.,  1.,  0.],
#>        [ 0.,  1.,  0.],
#>        [ 1.,  0.,  0.]])
```

In [23]:
np.random.seed(101) 
arr1 = np.random.randint(1,4, size=6)
arr1

array([2, 3, 2, 2, 2, 1])

In [26]:
def oneHotEncoding(arr):
    new=[]
    for i in range(len(arr)):
        zero= [0,0]
        c = np.insert(zero,arr[i]-1,1)
        new.append(c)
    new = np.array(new)
    return new
oneHotEncoding(arr1)

array([[0, 1, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0],
       [1, 0, 0]])

## Exercise 6

Let `q` be a NumPy array of length `n` with `q.sum() == 1`.

Suppose that `q` represents a [probability mass function](https://en.wikipedia.org/wiki/Probability_mass_function) over a statistical distribution. Recall that a distribution is an array of probabilities of events.

We want to generate a discrete random variable $ x $ such that $ \mathbb P\{x = i\} = q_i $.

In other words, `x` takes values in `range(len(q))` and `x = i` with probability `q[i]`.

The standard (inverse transform) algorithm is as follows:

- Divide the unit interval $ [0, 1] $ into $ n $ subintervals $ I_0, I_1, \ldots, I_{n-1} $ such that the length of $ I_i $ is $ q_i $.  
- Draw a uniform random variable $ U $ on $ [0, 1] $ and return the $ i $ such that $ U \in I_i $.  


The probability of drawing $ i $ is the length of $ I_i $, which is equal to $ q_i $.

We can implement the algorithm as follows

```python
from random import uniform

def sample(q):
    a = 0.0
    U = uniform(0, 1)
    for i in range(len(q)):
        if a < U <= a + q[i]:
            return i
        a = a + q[i]
```

If you can’t see how this works, try thinking through the flow for a simple example, such as `q = [0.25, 0.75]`
It helps to sketch the intervals on paper.

**Your exercise is to speed it up using NumPy, avoiding explicit loops**

- Hint: Use `np.searchsorted` and `np.cumsum`  


If you can, implement the functionality as a class called `DiscreteRV`, where

- the data for an instance of the class is the vector of probabilities `q`  
- the class has a `draw()` method, which returns one draw according to the algorithm described above  


If you can, write the method so that `draw(k)` returns `k` draws from `q`.

In [27]:
import numpy as np
from random import uniform
class DiscreteRV:
    
    def __init__(self,q):
        self.q = q
        self.Q = np.cumsum(q)
        
    def draw(self):
        U = np.random.uniform(0,1)
        print(U)
        return self.Q.searchsorted(U)

d = DiscreteRV([0.1,0.2,0.3, 0.4])# 0 ===>0.10 0.10===>0.30 0.30===0.60 0.60 ===1

d.draw()

0.3069662196722378


2

## Exercise 7 Peaks

Find all the peaks in a 1D numpy array a. Peaks are points surrounded by smaller values on both sides.

Input:
```
a = np.array([1, 3, 7, 1, 2, 6, 0, 1])
```
Desired Output:
```
#> array([2, 5])
```
where, 2 and 5 are the positions of peak values 7 and 6.

### 1. Solve this usign a regular python for loop

### 2. Solve this using no loops and only numpy functions

In [31]:
def peaks(arr):
    lst=[]
    for i in range(1,len(arr)-1):
        if arr[i-1] < arr[i] and arr[i+1] < arr[i]:
            lst.append(i)
    return lst
    
    
peaks([1, 3, 7, 1, 2, 6, 0, 1])    

[2, 5]