#Numpy
NumPy is a Python library used for working with arrays(陣列).

It also has functions for working in domain of linear algebra, fourier transform([傅立葉轉換 ](https://hackmd.io/@sysprog/fourier-transform)), and matrices(矩陣).

NumPy was created in 2005 by Travis Oliphant. It is an open source project and you can use it freely.

NumPy stands for Numerical Python.

##Reference
- https://www.w3schools.com/python/numpy/numpy_intro.asp
- https://www.educba.com/what-is-numpy-in-python/

</br>

<img src="https://ugoproto.github.io/ugo_py_doc/img/scipy_cs/Numpy_Python_Cheat_Sheet.png" alt="drawing" width="1200"/>



# import library


In [None]:
import numpy as np

# Creating Arrays

<img src="https://fgnt.github.io/python_crashkurs_doc/_images/numpy_array_t.png" alt="drawing" width="600"/>



## 1-Dimensional Array

A 1-dimensional array is like a list.

### np.array

In [None]:
a = np.array([5, 6, 8])
print(a)

[5 6 8]


Object type

In [None]:
type(a)

numpy.ndarray

Data type in array

In [None]:
a.dtype

dtype('int64')

Specifying data type when creating an array

In [None]:
a = np.array([5, 6, 8], dtype = float)
a

array([5., 6., 8.])

If not given, then the type will be determined as the minimum type required to hold the objects in the sequence.

In [None]:
a = np.array([3.4, 5, 7.7])
a

array([3.4, 5. , 7.7])

In [None]:
a.dtype

dtype('float64')

### np.arange

creating range-like array (不含尾數)

In [None]:
np.arange(10, 24, 2)

array([10, 12, 14, 16, 18, 20, 22])

In [None]:
list(range(10,22,2))

[10, 12, 14, 16, 18, 20]

The returned values are numpy.ndarray objects rather than range objects.

In [None]:
print(type(np.arange(10, 22, 2)))
print(type(range(10,22,2)))

<class 'numpy.ndarray'>
<class 'range'>


np.arange allows for fractions while range doesn't.

In [None]:
np.arange(2.1, 5.5)

array([2.1, 3.1, 4.1, 5.1])

In [None]:
range(2.1, 5.5)

TypeError: ignored

### numpy.linspace
An alternative to np.arange is linspace, in which instead of pecifying size, we specify the first number, the last number, and how many points to have in the middle.
```python
np.linspace(a, b, x) # x points that interpolate between a and b.
```

In [None]:
np.linspace(0, 8, 9)

array([0., 1., 2., 3., 4., 5., 6., 7., 8.])

In [None]:
np.linspace(1, 0, 5)

array([1.  , 0.75, 0.5 , 0.25, 0.  ])

Other useful functions when creating an array

### np.zeros

```python
np.zeros(x) # a list of x zeros
```
結果預設為 float

In [None]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [None]:
np.zeros(5, dtype = int)

array([0, 0, 0, 0, 0])

### np.ones

```python
np.ones(x) # a list of x ones
```

In [None]:
np.ones(5)

array([1., 1., 1., 1., 1.])

### np.pi

In [None]:
np.pi

3.141592653589793

In [None]:
np.linspace(0, 2*np.pi, 100)

array([0.        , 0.06346652, 0.12693304, 0.19039955, 0.25386607,
       0.31733259, 0.38079911, 0.44426563, 0.50773215, 0.57119866,
       0.63466518, 0.6981317 , 0.76159822, 0.82506474, 0.88853126,
       0.95199777, 1.01546429, 1.07893081, 1.14239733, 1.20586385,
       1.26933037, 1.33279688, 1.3962634 , 1.45972992, 1.52319644,
       1.58666296, 1.65012947, 1.71359599, 1.77706251, 1.84052903,
       1.90399555, 1.96746207, 2.03092858, 2.0943951 , 2.15786162,
       2.22132814, 2.28479466, 2.34826118, 2.41172769, 2.47519421,
       2.53866073, 2.60212725, 2.66559377, 2.72906028, 2.7925268 ,
       2.85599332, 2.91945984, 2.98292636, 3.04639288, 3.10985939,
       3.17332591, 3.23679243, 3.30025895, 3.36372547, 3.42719199,
       3.4906585 , 3.55412502, 3.61759154, 3.68105806, 3.74452458,
       3.8079911 , 3.87145761, 3.93492413, 3.99839065, 4.06185717,
       4.12532369, 4.1887902 , 4.25225672, 4.31572324, 4.37918976,
       4.44265628, 4.5061228 , 4.56958931, 4.63305583, 4.69652

## 2-dimensional Array
A 2-dimensional array often used to represent matrix or 2nd order tensors (like a talbe).

In [None]:
#@title np.array
a = np.array([[1, 2, 3], [4, 5, 6]])
a

array([[1, 2, 3],
       [4, 5, 6]])

Difference between single and double bracket

In [None]:
a = np.array([0, 0, 0, 0])
b = np.array([[0, 0, 0, 0]])

print(a)
print(b)

[0 0 0 0]
[[0 0 0 0]]


In [None]:
print(type(a))
print(type(b))

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>


If you think elements in an array as lists. a is similar to a list of 4 items. b is a list of 1 item, and that item is another list.

In [None]:
print(len(a))
print(len(b))

4
1


Precisely speaking, a is a 4 element one-dimensional array, and b is 1 x 4 two-dimensional array.

In [None]:
print(a.shape)
print(b.shape)

(4,)
(1, 4)


Array in previous example is 2 x 3 two-dimensional array.

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])
a.shape

(2, 3)

Apply function to create 2-dimensional array

In [None]:
#@title np.zeros
np.zeros([3, 4])

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [None]:
np.zeros([3, 4], dtype = int)

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

In [None]:
#@title np.ones
np.ones((2, 3))

array([[1., 1., 1.],
       [1., 1., 1.]])

### np.random

```python
np.random.rand
```

Create an array of the given shape and populate it with random samples from a uniform distribution over [0, 1).

In [None]:
np.random.rand(3)

array([0.78667726, 0.19351885, 0.2387316 ])

In [None]:
np.random.rand(3, 2)

array([[0.71455028, 0.95185362],
       [0.6758807 , 0.04396246],
       [0.84846778, 0.42844699]])

`np.random.randint(low, high=None, size=None)`
Return random integers from the “discrete uniform” distribution of the specified dtype in the “half-open” interval [low, high). If high is None (the default), then results are from [0, low).

In [None]:
np.random.randint(1, 100, size = (4, 3))

array([[43, 76, 37],
       [62, 66, 70],
       [82, 72, 34],
       [99, 39,  5]])

In [None]:
np.random.randint(1, 100, 12).reshape(4, 3)

array([[23, 11, 77],
       [46, 87,  5],
       [ 9, 58, 34],
       [79, 27, 47]])

np.random.seed

```python
np.random.seed(0) 
```

NumPy random seed is simply a function that sets the random seed of the NumPy pseudo-random(偽隨機) number generator. It provides an essential input that enables NumPy to generate pseudo-random numbers for random processes.

- Reference https://www.sharpsightlabs.com/blog/numpy-random-seed/


In [None]:
np.random.seed(0)
np.random.randint(1, 100, size = (4, 3))

array([[41, 16, 73],
       [23, 44, 83],
       [76,  8, 35],
       [50, 96, 76]])

In [None]:
np.random.seed(0)
np.random.randint(1, 100, size = (4, 3))

array([[45, 48, 65],
       [68, 68, 10],
       [84, 22, 37],
       [88, 71, 89]])

In [None]:
np.random.randint(1, 100, size = (4, 3))

array([[89, 13, 59],
       [66, 40, 88],
       [47, 89, 82],
       [38, 26, 78]])

In [None]:
np.random.randint(1, 100, size = (4, 3))

array([[73, 10, 21],
       [81, 70, 80],
       [48, 65, 83],
       [89, 50, 30]])

# Simple Statistics

In [None]:
a = np.array([[11, 12, 13 , 14, 15], [21, 22, 23, 24, 25], [31, 32, 33, 34,35]])
print(a)
print('shape =' ,a.shape)

[[11 12 13 14 15]
 [21 22 23 24 25]
 [31 32 33 34 35]]
shape = (3, 5)


### np.sum

Total of elements

In [None]:
np.sum(a) # NumPy's numpy.sum

345

In [None]:
a.sum() # Python's sum

345

Difference: Pythons sum will be faster on lists, while NumPys sum will be faster on arrays.

Summing rows for each column

<img src="https://i.stack.imgur.com/h1alT.jpg" alt="drawing" width="400"/>



In [None]:
np.sum(a, axis = 0)

array([63, 66, 69, 72, 75])

Summing columns for each row

In [None]:
np.sum(a, axis = 1)

array([ 65, 115, 165])

### np.average
Average of elements

In [None]:
np.average(a)

23.0

In [None]:
np.average(a , axis = 1)

array([13., 23., 33.])

### np.std
Standard deviation of elements

In [None]:
np.std(a , axis = 0)

array([8.16496581, 8.16496581, 8.16496581, 8.16496581, 8.16496581])

Boolean Count

In [None]:
a

array([[11, 12, 13, 14, 15],
       [21, 22, 23, 24, 25],
       [31, 32, 33, 34, 35]])

In [None]:
a > 20

array([[False, False, False, False, False],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True]])

In [None]:
np.sum(a > 20) # Count number of elements greater than 20

10

In [None]:
np.sum((a >= 20) & (a <= 30))

5

In [None]:
np.sum((a <= 20) | (a > 30))

10

### np.max
Maximum value of all elements

In [None]:
np.max(a)

35

### np.min
Minimum value of all elements

In [None]:
np.min(a)

11

#*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

#Application

## 1.Create dataset

In [None]:
#@title Demo mockdata

# 24 hrs pattern
# reference  https://www.statsmodels.org/devel/examples/notebooks/generated/statespace_seasonal.html

import numpy as np
import plotly.graph_objects as go

x = np.arange(0, 2*np.pi, 0.5)
y = np.sin(x)
y3 = np.concatenate((  ((y* -0.5) - 0.75)[:9] , (y *0.5)[:9] , (y*1.2)[:6]), axis=None) 
y3 = ( y3+ y3.min() *-1 +1) 

y_axis = y3


# Create traces
fig = go.Figure()

fig.add_trace(go.Scatter( x = np.arange(0, len(y3)) ,  y = y_axis))
# fig.add_trace(go.Scatter( x = x ,  y = y_axis))
fig.update_layout(autosize=False, title = '24 Hourly Orders Pattern')

fig.show()

In [None]:
#@title Webscrap stop time
from googlesearch import search

for url in search('shopline', 
        num = 5, 
        stop = 5, 
        pause = np.random.randint(1, 3) , 
        safe='on' , 
        country= 'countryTW',
        ):
    print(url)


https://shopline.tw/
https://shopline.tw/careers
https://www.104.com.tw/company/1a2x6bj96b
https://tw.linkedin.com/company/shopline
https://www.yourator.co/companies/shopline


## 2.statistics 

In [None]:
#@title 2.table condition manipulation

# !pip install fuzzywuzzy
from fuzzywuzzy import fuzz
import numpy as np

from google.colab import auth
auth.authenticate_user()

import gspread
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default())

import pandas as pd

data_url = 'https://docs.google.com/spreadsheets/d/1j1uCvkaD5FG7jyJhegl4Tq2o9O39vFNhyhieFJ-Nm0k/edit?usp=sharing'

wb = gc.open_by_url(data_url)

df = pd.DataFrame( wb.worksheet('Sheet1').get_all_values() )
df.columns = df.iloc[0]
raw_data = df[1:]

print(raw_data.shape)

df = raw_data.copy().head(45)
df['fuzz_score'] = df.apply(lambda x: fuzz.ratio(x.company_name, x.title), axis=1)

df.head(10)

df.groupby(by = ['company_name']).agg({'fuzz_score': [ np.average , 'mean'] })

(16671, 4)


Unnamed: 0_level_0,fuzz_score,fuzz_score
Unnamed: 0_level_1,average,mean
company_name,Unnamed: 1_level_2,Unnamed: 2_level_2
JC Kids x Korea,42.888889,42.888889
the eternity,28.0,28.0
大吉利,24.333333,24.333333
敲玩藝,18.75,18.75
柏詩科技有限公司,56.1,56.1


#*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

## Indexing

### 1-Dimensional Array 
Similar to list indexing

In [None]:
a = np.arange(1, 11)
a

In [None]:
a[4:6]

In [None]:
a[::2] # a[start:end:step]

In [None]:
a[0:11:2]

In [None]:
a[range(0, 10, 2)]

In [None]:
list(range(0, 10, 2))

In [None]:
a[[0, 2, 4, 6, 8]]

In [None]:
a[::-1] # reverse order

In [None]:
a[::-2] # reverse order with decrements of 2

### 2-Dimensional Array
```python
a[x, y] # element in xth row and yth column
```

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])
a

In [None]:
a[1, 2] # element in 2nd row and 3rd column

In [None]:
a[0, :] # the 0th row

In [None]:
a[:, 2] # the 2rd column

In [None]:
a[0, :2] # elements of 0th row before 2nd column

In [None]:
a[0:, :2] # elements starting from 0th row and before 2nd column

### Fancy Indexing

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])
a

In [None]:
b = np.array([1, 0, 0, 1])
b

In [None]:
a[b,:] # access the given rows

In [None]:
a[:,b] # access the given columns

In [None]:
a[:,[1, 0, 0, 1]]

### Boolean Indexing

In [None]:
a = np.arange(1, 11)
a

In [None]:
a < 5

In [None]:
a[a < 5]

### Assign value

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])
a

In [None]:
a[0,:] = 3
a

## Element Wise Operation
The benefit of arrays over lists is that we can do operations on the whole array at once, instead of going through each element one at a time as with a list. This is called "vectorized" operations, and it makes the code shorter to write and faster to run. All numpy functions deal with arrays rather than listds. DataFrame and Series from the pandas module is also built on numpy arrays.

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.ones([2, 3])
print(a)
print(b)

### Addition

In [None]:
a + b

In [None]:
a + 1

In [None]:
a += 1 # a = a + 1
a

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])

In [None]:
c = np.ones([2, 4])
c

In [None]:
a + c

### Subtraction

In [None]:
a - b

In [None]:
a - 2

### Multiplication

In [None]:
a * 2

In [None]:
c = a
a * c

### Division

In [None]:
a / 2

In [None]:
a / a # divide each element of a by each element of a

### True/False Array

In [None]:
a > 3

In [None]:
(a >= 3) & (a <= 5)

In [None]:
(a < 3) | (a > 5)

### Other Operation

#### Negate

In [None]:
-a

#### Square Root

In [None]:
np.sqrt(a)

In [None]:
a ** (1/2)

#### np.power

In [None]:
np.power(a, 2)

In [None]:
a ** 2

In [None]:
np.power(a, 1/2)

#### np.log

In [None]:
np.log(a) # natural log

#### np.abs

In [None]:
a - 3 

In [None]:
np.abs(a - 3)

#### np.cumsum
Flatten the array into 1-D and compute array of running totals

In [None]:
np.cumsum(a)

## Manipulating

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])
a

### Transpose

In [None]:
a.T

### np.where
Find the index of rows and column satisfying the given condition.
<br>
result: ( array ( [ row1, row2, ... ] ) , array ( [ column1, column2,... ] ) )

In [None]:
np.where(a >= 3)

In [None]:
a[np.where(a >= 3)] = 3 # set where a is larger or equal to 3 to 3
a

### np.hstack
Horizontally stack arrays.

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.ones([2, 3])
print(a)
print(b)

In [None]:
np.hstack([a, b])

In [None]:
np.hstack((b, a))

In [None]:
np.hstack([a, b, a])

### np.vstack
Vertically stack arrays.

In [None]:
np.vstack([a, b])

In [None]:
c = np.zeros(9)
c

In [None]:
np.vstack([a, c])

In [None]:
c = np.zeros([3, 3])
c

In [None]:
np.vstack([a, c])

In [None]:
np.hstack([a, c])

### reshape

In [None]:
np.arange(0, 6)

In [None]:
np.arange(0, 6).reshape(2, 3)

In [None]:
np.array([[0, 1, 2], [3, 4, 5]])

In [None]:
np.arange(0, 6).reshape(2, 3).reshape(1, 6)

In [None]:
np.arange(0, 6).reshape(3, 3)

## Iterating

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])
a

In [None]:
for row in a:
    print(row)

In [None]:
for column in a:
    print(column)

In [None]:
for x in a:
    print(x)

So, how do we ierate through column?

In [None]:
for row in a.T:
    print(row)

## Sorting

In [None]:
a = np.array([[6, 10, 1, 2], [5, 8, 10, 3]])
a

### np.sort

Return an array that sorts each row of a (does not change a)

In [None]:
np.sort(a)

![image.png](attachment:image.png)

In [None]:
np.sort(a, axis = 1)

Return an array that sorts each column of a (does not change a)

In [None]:
np.sort(a, axis = 0)

### np.argsort

In [None]:
a = np.array([[6, 10, 1, 2], [5, 8, 10, 3]])
a

Return the column indices of each row of a in increasing order

In [None]:
np.argsort(a)

In [None]:
np.argsort(a, axis = 1)

Return the row indices of each column of a in increasing order

In [None]:
np.argsort(a, axis = 0)

Sort columns of a by row 0

In [None]:
a[:, a[0].argsort()]

In [None]:
a[0]

In [None]:
a[0].argsort()

In [None]:
a[:, [2, 3, 0, 1]]

Sort rows of a by column 0

In [None]:
a

In [None]:
a[a[:, 0].argsort(), :]

In [None]:
a[:, 0]

In [None]:
a[:, 0].argsort()

In [None]:
a[[1, 0], :]

## Optimization

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])
a2 = np.array([[3, 2, 5], [4, 5, 4]])
print(a)
print(a2)

[[1 2 3]
 [4 5 6]]
[[3 2 5]
 [4 5 4]]


### np.maximum
Element wise maximum

In [None]:
np.maximum(a, 3)

array([[3, 3, 3],
       [4, 5, 6]])

In [None]:
np.maximum(a, a2)

array([[3, 2, 5],
       [4, 5, 6]])

### np.minimum
Element wise minimum

In [None]:
np.minimum(a, 3)

In [None]:
np.minimum(a, a2)

### np.argmax / np.argmin
Find for each column/row of an array, the row/column index of the 1st element that is the largest/smallest

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])
a2 = np.array([[3, 2, 5], [4, 5, 4]])
print(a)
print(a2)

Find for each column of a2, the row index of the 1st element that is the largest

In [None]:
np.argmax(a2, axis = 0)

Find for each row of a2, the column index of the 1st element that is the smallest

In [None]:
np.argmin(a2, axis = 1)

### np.argwhere
Find the (rowm, column) indices where an array satisfies the given condition

In [None]:
np.argwhere(a2 >= 4)

Intepretation: a2[0, 2] >= 4, a2[1, 0] >= 4, a2[1, 1] >= 4, a2[1, 2] >= 4