# Assignment-2
* The objective of this assignment is to develop a solid understanding of Numpy array operations.

##  Numpy Array Operations:

### 1. Picking 5 interesting Numpy functions from: [Numpy Array Documentation](https://numpy.org/doc/stable/reference/routines.html)  and explaining each of them.

* function 1 = `np.linspace`
* function 2 = `np.digitize`
* function 3 = `np.random.choice`
* function 4 = `np.repeat`
* function 5 = `np.polyfit`

Let's begin by importing Numpy and listing out the functions covered in this notebook.

In [4]:
import numpy as np

In [5]:
# List of functions explained 

function1 = np.linspace
function2 = np.digitize
function3 = np.random.choice
function4 = np.repeat
function5 = np.polyfit

## Function 1 - `np.linspace`

This function returns evenly spaced numbers over a specified interval defined by the first two arguments of the function.

> **Syntax:** `np.linspace(start,stop,num=1/2/3/...,endpoint=True,retstep=False/True,dtype=None/int/float,axis=0/1)`  

```
 * `start` = Starting value of array
 * `end` = Ending value of array (included)
 * `num` = The number of samples generated
 * `endpoint` = To include or exclude the last element(default value - True)
 * `retstep` = It provides the step value of the array (Default value- False)
 * `dtype` = We can specify data types like - int, float, boolean (default value - none)
 * `axis` = 0(for row wise) / 1(for column wise)
```

In [6]:
# Example 1 :- 

a = np.linspace(1,100,10)
print(a)

[  1.  12.  23.  34.  45.  56.  67.  78.  89. 100.]


**Explanation about example:-** This provides a list of 10 equally spaced values within the range from 1 to 100 note that starting and ending value i.e. 1 & 100 included (default value).

In [7]:
# Example 2 :-

b = np.linspace(111,1111,11,endpoint=False)
b

array([ 111.        ,  201.90909091,  292.81818182,  383.72727273,
        474.63636364,  565.54545455,  656.45454545,  747.36363636,
        838.27272727,  929.18181818, 1020.09090909])

**Explanation about example:-** Above command doesn't include the end value (i.e.- 1111) of the range from 111 to 1111.

In [8]:
# Example 3 :- 

c = np.linspace(10,1000,10,endpoint=True,retstep=True,dtype=float,axis=0)
c

(array([  10.,  120.,  230.,  340.,  450.,  560.,  670.,  780.,  890.,
        1000.]),
 110.0)

**Explanation about example:-** Above command provides all the requirement including the step size of this array.

In [9]:
# Example 3 :- 

d = np.linspace(10,100,11,endpoint=True,retstep=True,dtype=float,axis=1)
d

AxisError: destination: axis 1 is out of bounds for array of dimension 1

**Explanation about example (why it breaks and how to fix it):-** Since linspace is a 1-D array (called- Vector). So we cannot define as a column-wise. Therefore `axis=0` (*Always*).

**Remarks:-** Some times we need to generate simple array (1-D / Vectors) with equally spaced values that's why function is very useful in Data analysis & Machine learning. 

## Function 2 - `np.digitize`

 * It can be really useful working with continuous spaces in reinforcement learning. 
 
> **Syntax :-** `np.digitize(x, bins, right=False/True)`

```
Where, x = An input array x  
       bins = An array of bins, returning the indices of the bins to which each value in input array belongs. 
       right = True (for inclusion of right the right interval index)/False-(default)(inclusion of left interval index)   
```
* See the examples to understand it batter:

In [11]:
# Example 1 :- 

# Input array
x = np.array([.33])

# Bins - 5 bins in total
bins = np.array([-1,0,1,2])

# Digitize function :- 0.33 belong to the bin 0<= 0.33 <1 :- therefore returned index 2.
np.digitize(x,bins)




array([2])

**Explanation about example:-**  In the code above, we have 5 bins in total:
```
 x < -1    → Index 0
-1 ≤ x < 0 → Index 1
 0 ≤ x < 1 → Index 2
 1 ≤ x < 2 → Index 3
 2 ≤ x     → Index 4


```
Therefore, if we provide as an input 0.33, the function returns 2, since that is the index of the bin to which 0.33 belongs.

In [12]:
# Example 2 :- 

# The input array can contain several inputs
x = np.array([2,3.5,5.5])

# Bins - 6 bins in total
bins = np.array([1,2,3,4,5])

# Digitize function
np.digitize(x,bins)


array([2, 3, 5])

**See The difference using `right = True` note that default value is `False`.**

In [13]:
# The input array can contain several inputs
x = np.array([2,3.5,5.5])

# Bins - 6 bins in total
bins = np.array([1,2,3,4,5])

# Digitize function
np.digitize(x,bins,right=True)

array([1, 3, 5])

**Explanation about example:-**In the code above, we have 5 bins in total:
```
 x < 1     → Index 0
 1 ≤ x < 2 → Index 1
 2 ≤ x < 3 → Index 2
 3 ≤ x < 4 → Index 3
 4 ≤ x < 5 → Index 4
 5 ≤ x     → Index 5 
```

Therefore, we provided as inputs:
1. input = 2 => returns = 1 (index number in which interval 1 lies)
2. input = 3.5 => returns = 3 (index number in which interval 3.5 lies)
3. input = 5.5 => returns = 5 (index number in which interval 5.5 lies)
**Note:-** Here the index for the value `2 of array x` is taken the `1 e.g. shifting the right`coz of using `right=True`.

In [14]:
# Example 3 :- 

# The input array can contain several inputs
x = np.array([])

# Bins - 5 bins in total
bins = np.array([-1,0,1])

# Digitize function
np.digitize(x,bins,right=True)

array([], dtype=int64)

**Explanation about example:-**  Empty array always provides a empty indexing.

**Remarks:-** In reinforcement learning, we can discretize state spaces by using uniformly-spaced grids. Discretization allows us to apply algorithms designed for discrete spaces such as Sarsa, Sarsamax, or Expected Sarsa to continuous spaces.

## Function 3 :- `np.random.choice`
* It returns a random sample from a given array. By default, a single value is returned.

**Syntax:-**  `np.random.choice(a, size=None, replace=True, p=None)`  

```
 where, a = array
        size = To specify the number values required (default value = One)/ array
        replace = True (To change in original value) / False (Default) / integer
        p = Probability- To assign different probabilities to each element (By default, elements have equal probability of being selected)

```


In [16]:
# Example 1 :-
# toss a coin
# 0-->head, 1-->tail

np.random.choice([0,1])


0

In [17]:
# toss a coin 5 times
# 0-->head, 1-->tail

np.random.choice([0,1],size=5)

array([0, 1, 1, 0, 0])

In [18]:
# toss a biased coin with 80% probability of obtain head & 20% tail.
# 0-->head, 1-->tail

np.random.choice([0,1],p=[0.8,0.2])

1

In [19]:
# toss a biased coin 10 times with 80% probability of obtain head & 20% tail.
# 0-->head, 1-->tail

np.random.choice([0,1],size=10,p=[0.8,0.2])

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [20]:
# toss a biased coin 10 times with 80% probability of obtain head & 20% tail.
# 0-->head, 1-->tail

np.random.choice([0,1],size=10,replace=True,p=[0.8,0.2])

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

**Explanation about example:-** Above example is tossing a coin. We know that the probability of getting *Head* or *Tail* is equal (i.e.- 50% or .5 or 1/2).  
Using `np.random.choice()` we get the random value with given probability (default probability is equal for each output) & size.

In [21]:
# Example 2 :- 
# roll a dice
np.random.choice([1,2,3,4,5,6])

3

In [22]:
# roll a dice 10 times
np.random.choice([1,2,3,4,5,6],size=10)

array([1, 1, 1, 1, 6, 6, 1, 1, 4, 5])

In [23]:
# roll a dice biased dice (40% probability of obtain 6 - 20% for 5, and rest is 10%)
np.random.choice([1,2,3,4,5,6],p=[.1,.1,.1,.1,.2,.4])

2

In [24]:
# roll a dice biased dice 6 times with 40% probability of obtain 6 - 20% for 5, and rest is 10%.
np.random.choice([1,2,3,4,5,6],size=6,p=[.1,.1,.1,.1,.2,.4])

array([6, 6, 6, 5, 1, 5])

**Explanation about example:-** Above example is rolling a dice. We know that the probability of getting *1,2,3,4,5 or 6* is equal (i.e.- 16.67% or .66 or 1/6).  
Using `np.random.choice()` we get the random value with given probability (default probability is equal for each output) & size.

In [25]:
# Example 3 :-
# toss a biased coin 10 times (80% probability of obtain head - 20% tail)
# 0-->head, 1-->tail

np.random.choice([0,1],size=10,replace=False,p=[0.8,0.2])

ValueError: Cannot take a larger sample than population when 'replace=False'

**Explanation about example (why it breaks and how to fix it):-**  Here `replace=False` value is causing an error also if the value of `replace` is `greater than population size`. Therefore to fix this we have to give a *definite value* or *True*.

**Remarks:-** In predicting data `np.random.choice()` is a useful function. You can also see the similar functions with different properties like:- `np.random.probability()`, `np.randint()`, `np.shuffle()`, `np.permutation()` `np.rand()`, np.arange()`, etc.

## Function 4 :- `np.repeat`
* This function repeats the elements of an array. The number of repetitions is specified by the second argument repeats.

**Syntax:-** `np.repeat(a,repeats,axis=None)`

```
    where,  a= array
            repeats = The number of repetitions
            axis = 0/1/None/int
            
```

In [27]:
# Example 1 :-
# repeat number 3 -> 5 times

np.repeat(3,5)

array([3, 3, 3, 3, 3])

In [28]:
# repeat number array([1,2,3]) -> 3 times

a = np.array([[1,2],
              [3,4]])
np.repeat(a,2)

array([1, 1, 2, 2, 3, 3, 4, 4])

In [29]:
np.repeat(a, 3, axis=1)

array([[1, 1, 1, 2, 2, 2],
       [3, 3, 3, 4, 4, 4]])

In [30]:
np.repeat(a, [2, 2], axis=0)

array([[1, 2],
       [1, 2],
       [3, 4],
       [3, 4]])

**Explanation about example:-** We can easily understand the command and it's output.

In [31]:
# Example 2 :-
# repeat string '2015' 15 times

np.repeat('2015',15)

array(['2015', '2015', '2015', '2015', '2015', '2015', '2015', '2015',
       '2015', '2015', '2015', '2015', '2015', '2015', '2015'],
      dtype='<U4')

Explanation about example

In [32]:
# Example 3 :-
arr = np.array([1,3,5])
np.repeat(arr,3,axis=3)

AxisError: axis 3 is out of bounds for array of dimension 1

**Explanation about example (why it breaks and how to fix it):-** The axis must be less or equal to the dimension of the repeating array. Here dimension must be less or equal to 1.

**Remarks:-** The `np.repeat()` function is very useful in different parts of data analysis.

## Function 5 :- `np.polyfit`
* `np.polyfit()` function outputs a polynomial of degree (*deg*) that fits the points (x,y), minimizing the square error.
* This function can be very useful to find the relationship between a dependent variable and an independent variable, obtaining a line that best fits the data.
```
y = mx + c
```
where, x is the independent variable, 
       y is the dependent variable, 
       m is the slope,  
       c is the intercept.  
       
**To obtain both coefficients m and c, we can use the `np.polyfit` function as follows:**  

**Syntax:-** `np.polyfit(x, y, deg, rcond=None, full=False, w=None, cov=False)`

```
        where, x = independent variable
               y = dependent variable
               deg = degree of polynomial
               rcond = float / optional 
               full = bool / optional
               w = array / shape (M,) / optional
               cov = bool or str or optional

```

**Notes:-**
1. `rcond` = Relative condition number of the fit. Singular values smaller than this relative to the largest singular value will be ignored. The default value is len(x)*eps, where eps is the relative precision of the float type, about 2e-16 in most cases.
2. `full` =  Switch determining nature of return value. When it is False (the default) just the coefficients are returned, when True diagnostic information from the singular value decomposition is also returned.
3. `w` = Weights to apply to the y-coordinates of the sample points. For gaussian uncertainties, use 1/sigma (not 1/sigma**2).

4. `cov` = If given and not `False`, return not just the estimate but also its covariance matrix. By default, the covariance are scaled by chi2/dof, where dof = M - (deg + 1), i.e., the weights are presumed to be unreliable except in a relative sense and everything is scaled such that the reduced chi2 is unity. This scaling is omitted if ``cov='unscaled'``, as is relevant for the case that the weights are 1/sigma**2, with sigma known to be a reliable estimate of the uncertainty.

In [33]:
# Example 1 :-
x = [1,2,3,4,5]
y = [1,4,9,16,25]
np.polyfit(x,y,1)

array([ 6., -7.])

**Explanation about example:-** Here x is list of values acts as independent variable and y is list of values acts as dependent variable now out put using the function `np.polyfit` gives the value of `slop(m) = 6.0` and `intercept(c) = -7.0` for `deg = 1` polynomial.

In [34]:
# Example 2 :-
x = np.array([1,2,3])
y = np.array([1,4,9])
np.polyfit(x,y,1,rcond=.3,full=True,cov=False)


(array([1.22848325, 2.65382713]),
 array([], dtype=float64),
 1,
 array([1.3877392 , 0.27235987]),
 0.3)

**Explanation about example:-** Here x is array of int acts as independent variable and y is array of int acts as dependent variable. Now out put using the function `np.polyfit` give the values: 
1. `slop(m) = 1.22` and `intercept(c) = 2.65` for `deg = 1` and `rcond=1` polynomial.
2. `slop(m) = 1.38` and `intercept(c) = 0.27` for `deg = 1` and `rcond=.3` polynomial.
**Note:-** There must be `full=True`.

In [35]:
# Example 3 :-
x = np.random.rand(1,3)
y = np.full([2,3],3)
np.polyfit(x,y,2)

TypeError: expected 1D vector for x

**Explanation about example (why it breaks and how to fix it):-** Notice that x should be a vector as independent variable.

**Remarks:-** `np.polyfit()` function is very useful in linear regression problems. Linear regression models the relationship between a dependent variable and an independent variable, obtaining a line that best fits the data.

## Conclusion

**Summarize what was covered in this notebook:**
* Choosing the best 5 function from [Numpy document](https://numpy.org/doc/stable/reference/routines.html)
* Exploring the first function i.e- `np.linspace`.
* Exploring the second function i.e- `np.digitize`.
* Exploring the third function i.e- `np.random.choice`.
* Exploring the fourth function i.e- `np.repeate`.
* Exploring the fifth function i.e- `np.polyfitz`.

**Next We will discuss the 100 Numpy questions to learn more about Numpy Library.**

## Reference Links
Provide links to your references and other interesting articles about Numpy arrays:
* Numpy official tutorial :[Click Here](https://numpy.org/doc/stable/user/quickstart.html)
* 10 Numpy functions we should learn : [Click Here](https://towardsdatascience.com/10-numpy-functions-you-should-know-1dc4863764c5)