### **What? Numpy is a package in python**
 A collection of pre-written functions, classes, and methods. Which are capable of handling and manipulating data and calculating results.  


**Main feature: Arrays**  
 A type of data structures, which store elements and refer to their values via integer indices.

### **Why? Incredibly computationally stable and efficient**
 **Pandas:** Stores multiple types of data simultaneously.  
 **NumPy:** Works in a lower-level language. Created with C.  
 - Shorter computation times.

### **When?**
 - Compute a lot of values for analysis.
 - Deal with vectors and matrics.
 - import and pre-process data into Python directly.
 - Generating random texts.

### **How?**
**N-D array class = Python's List**  
 - Both store ordered data and we use square brackets to display them.  


**N-D array class**  
 - N-D array class is more useful when computing values beacause operations work element wise. 
 - Functions (Faster than Python functions). e.g *np.function_name(ndarray)*
 - Methods (Faster than Python methods). e.g *ndarray.method_name()*
 - Operations
    - Broadcasting
    - Type casting






### **N-D array**
NumPy is commonly used for its N-D array.  
N-D array = N-Dimentional array. (N = Natural Numbers)

 - *point* 0th dimention.
 - *line* 1st dimention.
 - *plane* 2nd dimention.

**0-D array:** a single data point.  
**1-D array:** a sequence of values.  
**2-D array:** a collection of 1-D sequences.  

### Importing NumPy

In [1]:
import numpy as np

### Using NumPy

In [2]:
array_a = np.array([1,2,3])
array_a

array([1, 2, 3])

In [3]:
array_b = np.array([[1,2,3],[4,5,6]])
array_b

array([[1, 2, 3],
       [4, 5, 6]])

### NumPy Documentation
Documentation is a collection of instructions about all the functions, methods, and classes within a module and details on how to use them. A sort of user manual.
[Documentation Link](https://numpy.org/doc/stable/)

on ide shortcut for documentation of any function is *shift + tab*

In [4]:
np.mean([1,2,3])

2.0

### A Short history of NumPy
The fundamental package for scientific computing with python. *NumPy = Numeric Python*
- Numeric (compatible with Scipy) fast for smaller arreys.
- Scientific Python = SciPy
- Num Array (not compatible with Scipy) fast for large arrays.
- Additionally, developers needed to swith between Numeric and Num Array.
- thus, *Numeric + Num Array = SciPy Core = Numpy* (compatible with Scipy)

### N-D arrays:
- Originate from the NumPy package.
- A special data type in python.
- Can store multiple numeric values in a sequence.
- Elementwise properties.  



**Elementwise Addition:**  
*array_a = [1,2,3]; array_b = [4,5,6]  
array_a + array_b = [1,2,3] + [4,5,6] = [5,7,9]*


In [5]:
array_a = np.array([1,2,3])
array_a

array([1, 2, 3])

In [6]:
type(array_a)

numpy.ndarray

array -> name of the function.  
ndarray -> name of the object.  
ndarray = array

In [7]:
print(array_a)

[1 2 3]


Using print function doesn't show 'array' before output.

In [8]:
array_a.shape

(3,)

We have a 1-dimentional array of size 3.

**A 2-dimentional array**

In [9]:
array_b = np.array([[1,2,3],[11,22,33]])
print(array_b)

[[ 1  2  3]
 [11 22 33]]


### Geometric equivalence of arrays:
**1-D array**  
- part of a line.
- Values on a single row(or column).  

**2-D array**
- part of a plain.
- a collection of rows and columns.

In [10]:
type(array_b)

numpy.ndarray

The type is same for both 1D and 2D arrays.

In [11]:
array_b.shape

(2, 3)

This output resembles a table consisting of 2 rows. Each row is an array. Each row has 3 values. Like a 2x3 table.

In [12]:
array_b.shape[0]

2

In [13]:
array_b.shape[1]

3

We get the number of rows and colums from shape attribute by indexing ('0' indicates number of rows and '1' indicates number of columns).

In [14]:
array_c = np.array(12)
print(array_c)

12


In [15]:
type(array_c)

numpy.ndarray

Same goes for 0-D arrays (a data point).

In [16]:
array_c.shape

()

Since a single value does not have any shape or dimentions. It is a point in a N-dimentional space.  
However, a single value can have a shape. If it is in an array.

In [17]:
array_d = np.array([12])
array_d.shape

(1,)

this can be thought as a point in a geometic space.

In [18]:
array_d = np.array([[12]])
array_d.shape

(1, 1)

This is a point in a 2D plane.  
The dimention of an array can be identified by it number of opening and closing braces (*[ ]*).

Thus, **ndarrays** can contain  
- single scalers
- sequence of numbers
- table of values

However their syntex resembels another python datatype *"lists"*


### **Arrays vs Lists**

In [19]:
list_a = [1,2,3,4,5,6]
len(list_a)

6

In [20]:
list_b = [[1,2,3],[4,5,6]]
len(list_b)

2

Lists are defined similarly to ndarrays.  

list_a and list_b are python list objects. list_a have 6 items. Where list_b 
have 2. list_b is a list of lists. which is similar in format of ndarrays.  

We can just plug in lists as inputs of the np.array() function.

In [21]:
array_lb = np.array(list_b)

In [22]:
type(list_b)

list

In [23]:
type(array_lb)


numpy.ndarray

Eventhough the syntex resembles arrays, Python recognizes lists as a seperate datatype.

In [24]:
print(list_b)


[[1, 2, 3], [4, 5, 6]]


In [25]:
print(array_lb)

[[1 2 3]
 [4 5 6]]


Eventhough these variables contain the same data, their outputs vary.

In [26]:
array_lb.shape

(2, 3)

In [27]:
# list_b.shape
len(list_b)

2

Arrays have a shape attribute where lists don't. That explains why the output of lists are in a single line.

In [28]:
len(list_b[0])

3

We can access the sub lists by indexing.

The biggest difference between lists and arrays are array operations work elementwise where list operations don't. To illustrate this:

In [29]:
list_bs = list_b[0] + list_b[1]
array_lbs = array_lb[0] + array_lb[1]

In [30]:
print(list_bs)
print(array_lbs)

[1, 2, 3, 4, 5, 6]
[5 7 9]


list concatenates the items. where, array performs element wise addition.

In [31]:
# import math
# print(math.sqrt(list_a[1,0]))
np.sqrt(array_lb)

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

Element wise operations not possible for lists.

Lists and arrays serve different purposes. The main purpose of list is to store data and the main purpose of array is to compute mathemetical operations.

### Indexing
We use indices to refer to the individaul elements within an array.  
It can be seen as cordinates that help us navigate through the array.

Integers array_a[1]  
1st position = index 0  
2nd position = index 1

In [32]:
array_a = np.array([[1,2,3],[4,5,6]])
array_a

array([[1, 2, 3],
       [4, 5, 6]])

In [33]:
array_a[0]

array([1, 2, 3])

In [34]:
array_a[1]

array([4, 5, 6])

In [35]:
# array_a[2]

array_a have only 2 rows.

In [36]:
array_a[0][1]

2

we obtain the 2nd element of the 1st row.

In [37]:
array_a[0,1]

2

Both are the same [0,1] and [0][1]

In [38]:
array_a[:,1]

array([2, 5])

we go through all the elements on the 2nd column. this is called slicing.

### Assigning values

In [39]:
array_a = np.array([[1,2,3],[4,5,6]])
array_a

array([[1, 2, 3],
       [4, 5, 6]])

In [40]:
# assign a single value in the array
array_a[0,2] = 9
array_a

array([[1, 2, 9],
       [4, 5, 6]])

In [41]:
# Assign a value to all elements of a row or column
array_a[0] = 5
array_a

array([[5, 5, 5],
       [4, 5, 6]])

In [42]:
array_a[:,0] = 0
array_a

array([[0, 5, 5],
       [0, 5, 6]])

In [43]:
# assign all values of the array
array_a[:] = 0
array_a

array([[0, 0, 0],
       [0, 0, 0]])

In [44]:
# assign different values to a row or column
array_a[0] = [1,2,3]
array_a

array([[1, 2, 3],
       [0, 0, 0]])

In [45]:
array_a[:,0] = [1,4]
array_a

array([[1, 2, 3],
       [4, 0, 0]])

### Elementwise properties
whatever mathemetical computation we are conducting, we are doing it to each element of the array.

In [46]:
array_a = np.array([7,8,9])
array_b = np.array([[1,2,3],[3,4,5]])

In [47]:
array_a + 2

array([ 9, 10, 11])

In [48]:
array_b + 2

array([[3, 4, 5],
       [5, 6, 7]])

In [49]:
# We can add 2 arrays of same shape.
array_a + array_b[0]

array([ 8, 10, 12])

In [50]:
array_a + array_b

array([[ 8, 10, 12],
       [10, 12, 14]])

we can also perform elementwise operations such as substraction, multiplication, division etc.

### Datatypes

In [51]:
array_a = np.array([[1,2,3],[4,5,6]])
array_a

array([[1, 2, 3],
       [4, 5, 6]])

In [52]:
array_a = np.array([[1,2,3],[4,5,6]], dtype = np.float16)
array_a

array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float16)

In [53]:
array_a = np.array([[1,2,3],[4,5,6]], dtype = np.str_)
array_a

array([['1', '2', '3'],
       ['4', '5', '6']], dtype='<U1')

In [54]:
array_a = np.array([[1,2,3],[4,5,6]], dtype = np.complex_)
array_a 

array([[1.+0.j, 2.+0.j, 3.+0.j],
       [4.+0.j, 5.+0.j, 6.+0.j]])

In [55]:
array_a = np.array([[1,2,3],[0,5,6]], dtype = np.bool_)
array_a

array([[ True,  True,  True],
       [False,  True,  True]])

Link -> [numpy datatype](https://numpy.org/devdocs/reference/generated/numpy.dtype.kind.html)

### Fundeamentals of NumPy functions
- Universal parameters
- Universl functions

**Universal Functions**  
- Work with NDarrays on an element by element basis.
- An extention of the elementwise operations.
- Mathematical operations, trigonometric functions, comparison functions.
- Broadcasting, Type casting, Computing over a given axis.  
[Documentation](https://numpy.org/devdocs/reference/ufuncs.html)

### Broadcasting
We want to conduct elementwise operations but have elements of deferent sizes, and/or dimentions.  
We can broadcast the smaller variable and create a broadcasted version with the size of the larger one.  
We can think it as "stretching" one variable over the other to produce an outpur with the same shape.  

**Broadcasting Rules:**  
1. The arrays have same shape.
2. The arrays have the same number of dimentions, and the length of each dimention is either common or 1.
3. The arrays that have too few dimentions can have their shapes altered with a dimention 1 to satisfy the 2nd rule.

In [56]:
array_a = np.array([1,2,3])
array_a

array([1, 2, 3])

In [57]:
array_b = np.array([[1],[2]])
array_b

array([[1],
       [2]])

In [58]:
matrix_c = np.array([[1,2,3],[4,5,6]])
matrix_c

array([[1, 2, 3],
       [4, 5, 6]])

In [59]:
np.add(array_a,matrix_c)

array([[2, 4, 6],
       [5, 7, 9]])

In [60]:
np.add(array_b,matrix_c)

array([[2, 3, 4],
       [6, 7, 8]])

### Type casting
Taking every element of an array and changing it to a specified datatype.

In [61]:
np.add(array_b,matrix_c,dtype=np.float64)

array([[2., 3., 4.],
       [6., 7., 8.]])

In [62]:
# np.add(array_b,matrix_c, dtype = np.str_)

the 2nd syntex is invalid. because the data type is changed into strings first then tried to add them.

### Running a function along (over) a given axis 
1. Numpy breaks down an ND-array into smaller arrays of (N-1)-many dimentions.
2. Applies the function to each one.  

We can use this feature to run a function along each row or column.

In [63]:
matrix_c

array([[1, 2, 3],
       [4, 5, 6]])

In [64]:
np.mean(matrix_c, axis = 0)

array([2.5, 3.5, 4.5])

In [65]:
np.mean(matrix_c, axis = 1)

array([2., 5.])

### Slicing
**Basic Slicing**  
Creating a new array by taking chunks of values out of an existing one. The slices consists of adjacent pices of data.  

**Slice**  
It can contain entire rows and columns of the original array, or just parts of them.

In [66]:
matrix_a = np.array([[1,2,3],[4,5,6]])

In [67]:
matrix_a[:]

array([[1, 2, 3],
       [4, 5, 6]])

In [68]:
matrix_a[0:0]

array([], shape=(0, 3), dtype=int32)

In [69]:
matrix_a[0:1]

array([[1, 2, 3]])

In [70]:
matrix_a[0:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [71]:
matrix_a[:,:]

array([[1, 2, 3],
       [4, 5, 6]])

In [72]:
type(matrix_a[:,:])

numpy.ndarray

In [73]:
matrix_a[:1]

array([[1, 2, 3]])

In [74]:
matrix_a[1:]

array([[4, 5, 6]])

In [75]:
matrix_a[2:]

array([], shape=(0, 3), dtype=int32)

In [76]:
matrix_a[:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [77]:
matrix_a[1]

array([4, 5, 6])

In [78]:
matrix_a[:1]

array([[1, 2, 3]])

Indexing provides 1D and sliceing provides a 2D array here.

In [79]:
matrix_a[:-1]

array([[1, 2, 3]])

In [80]:
matrix_a[:,1:]

array([[2, 3],
       [5, 6]])

In [81]:
matrix_a[1:,1:]

array([[5, 6]])

### Stepwise Slicing
Slicing, where we dont take consecutive values. Values which are a certain distance apart.

In [82]:
matrix_b = np.array([[1,2,3,4,5],[1,3,5,6,7],[9,7,5,3,2]])
matrix_b

array([[1, 2, 3, 4, 5],
       [1, 3, 5, 6, 7],
       [9, 7, 5, 3, 2]])

In [83]:
matrix_b[::2,::]

array([[1, 2, 3, 4, 5],
       [9, 7, 5, 3, 2]])

In [84]:
matrix_b[::,::2]

array([[1, 3, 5],
       [1, 5, 7],
       [9, 5, 2]])

In [85]:
matrix_b[::2,::2]

array([[1, 3, 5],
       [9, 5, 2]])

In [86]:
matrix_b[::-2,::2]

array([[9, 5, 2],
       [1, 3, 5]])

In [87]:
matrix_b[0:1,::-2]

array([[5, 3, 1]])

### Conditional Slicing

In [88]:
matrix_c = np.array([[1,2,3,4,5],[1,3,5,6,7],[9,7,5,3,2]])
matrix_c

array([[1, 2, 3, 4, 5],
       [1, 3, 5, 6, 7],
       [9, 7, 5, 3, 2]])

In [89]:
matrix_c[:,0]

array([1, 1, 9])

In [90]:
matrix_c[:,0] > 2

array([False, False,  True])

In [91]:
matrix_c[:,:] > 2

array([[False, False,  True,  True,  True],
       [False,  True,  True,  True,  True],
       [ True,  True,  True,  True, False]])

In [92]:
matrix_c[matrix_c[:,:] > 2]

array([3, 4, 5, 3, 5, 6, 7, 9, 7, 5, 3])

In [93]:
matrix_c[(matrix_c[:,:] > 2) & (matrix_c[:,:] % 2 == 0)]

array([4, 6])

### Squeeze Function

In [94]:
matrix_d = np.array([[1,1,1,2,0],[2,4,6,8,10],[2,3,4,5,6]])
matrix_d

array([[ 1,  1,  1,  2,  0],
       [ 2,  4,  6,  8, 10],
       [ 2,  3,  4,  5,  6]])

In [95]:
type(matrix_d[0,0])

numpy.int32

In [96]:
print(matrix_d[0,0])

1


In [97]:
type(matrix_d[0,0:1])

numpy.ndarray

In [98]:
print(matrix_d[0,0:1])

[1]


In [99]:
type(matrix_d[0:1,0:1])

numpy.ndarray

In [100]:
print(matrix_d[0:1,0:1])

[[1]]


In [101]:
print(matrix_d[0,0].shape)
print(matrix_d[0,0:1].shape)
print(matrix_d[0:1,0:1].shape)

()
(1,)
(1, 1)


Here 1st one is scaler, 2nd one is vector, 3rd one is matrix.  

What difference does it make whether we're storing it as a scaler, vector or matrix?  
Certain functions or methods can only be executed with inputs of a fixed size.  

That is why we need squeeze method to remove all the unnecessary dimentions of an array.

In [102]:
matrix_d[0:1,0:1].squeeze()

array(1)

In [103]:
type(matrix_d[0:1,0:1].squeeze())

numpy.ndarray

numpy has a equivalent function:  
variable_name.squeeze() = np.squeeze(variable_name)

In [104]:
print(matrix_d[0,0].squeeze().shape)
print(matrix_d[0,0:1].squeeze().shape)
print(matrix_d[0:1,0:1].squeeze().shape)

()
()
()


### Generating Data with NumPy

In [105]:
array_empty = np.empty(shape = (2,3))
array_empty

array([[2., 3., 4.],
       [6., 7., 8.]])

In [106]:
array_0s = np.zeros(shape = (2,3))
array_0s

array([[0., 0., 0.],
       [0., 0., 0.]])

In [107]:
array_0s = np.zeros(shape = (2,3), dtype = np.int8)
array_0s

array([[0, 0, 0],
       [0, 0, 0]], dtype=int8)

In [108]:
array_1s = np.ones(shape = (2,3))
array_1s

array([[1., 1., 1.],
       [1., 1., 1.]])

In [109]:
array_full = np.full(shape = (2,3), fill_value = 2)
array_full

array([[2, 2, 2],
       [2, 2, 2]])

### "_like" functions  
- Equivalent to np.empty, np.zeros, np.ones, np.full.
- Dont need to specify a shape or type.  
- We need to provide another array (whose shape and type we take).  


empty_like, zeros_like, ones_like, full_like works similarly.

In [110]:
array_a = np.array([[1,2,3,4,5],[5,4,3,2,1],[6,7,8,9,9],[1,2,5,6,8]])
array_empty_like = np.empty_like(array_a)
array_empty_like

array([[-1, -1,  0,  0,  0],
       [ 0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0]])

In [111]:
array_0s_like = np.zeros_like(array_a)
array_0s_like

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

In [112]:
array_1s_like = np.ones_like(array_a)
array_1s_like

array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

In [113]:
array_full_like = np.full_like(array_a, fill_value=3)
array_full_like

array([[3, 3, 3, 3, 3],
       [3, 3, 3, 3, 3],
       [3, 3, 3, 3, 3],
       [3, 3, 3, 3, 3]])

Applications of *zeros_like* in analysis  
1. Staring point for a planner (keep track of how many times we increased or decreased an element in an array)
2. A "switch" where we change the values from 0 to 1 (and back) useful for creating dummy variables.  

Why are *_like* functions useful?  
A second array where we store a value of each element of the original one. Convenient when working with huge databases (faster loding times).


### "np.arange()" function  
arange = array range (NumPy's equivalent of Pythons range function)

It creates a sequence of consecutive integer values within a given range.

"range" function gives an output of range object, where "array range or arange" function gives an output of array object. 

In [114]:
print(range(10))
print(list(range(10)))
print(type(range(10)))

range(0, 10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
<class 'range'>


In [115]:
array_rng = np.arange(10)
array_rng

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

it is smilar to "range" output the only difference is it gives an output in a array. Eventhough 10 is in the parenthesis 10 is not included in output.

"np.arange" function takes attributes of start, stop, step, and dtype. Which is **almost** similar to slicing.

In [116]:
# array_rng = np.arange(start = 10)
# array_rng
array_rng = np.arange(stop = 10)
print(array_rng)
array_rng = np.arange(start = 0, stop = 10)
print(array_rng)

[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]


when only start arguement is given "arange" provides an error. Because, arange() requires stop to be specified. But when only stop arguement is given it assumes "start = 0". 

The step arguement can take both int and float.

In [117]:
array_rng = np.arange(start = 0, stop = 10, step = 2.5)
print(array_rng)

array_rng = np.arange(start = 0, stop = 10, step = 2.5, dtype = np.float32)
print(array_rng)

array_rng = np.arange(start = 0, stop = 10, step = 2.5, dtype = np.int32)
print(array_rng)

[0.  2.5 5.  7.5]
[0.  2.5 5.  7.5]
[0 2 4 6]


When the step is a float, the elements of array for arange takes float elements. Here, it includes 0, 2.5, 5, 7.5 (cannot include 10, since it is not included in the actual array without the steps). 

Again when the dtype arguement is changed to integer, the number of elements in output remains same as float dtype. It works something like changing the input in step arguement to integer first then use it to obtain the same number of element as the output for float dtype.

### Random Generators

In [118]:
from numpy.random import Generator as gen
from numpy.random import PCG64 as pcg

The **Generator** function takes a **bit generator** as an input and creates **generator** objects.


It gives an output of numpy.random.Generator class.


PCG stands for Permutation Congruential Generator. 64 indicates function pointers thet can producevalues of up to 64 bits in size.

In [127]:
array_RG = gen(pcg())
# 1
print(array_RG.normal())
# 2
print(array_RG.normal(size = 5))
# 3
print(array_RG.normal(size = (5,5))

SyntaxError: incomplete input (614996228.py, line 7)

1. It gives a single value from standard normal distribution.
2. It gives an 1D array of values from standard normal distribution using the size arguement.
3. It gives an 2D array of values from standard normal distribution using a tuple of dimentions in the size arguement.

Every time we call a method, the Generator randomly selects a "seed". **Seed** is a set of starting parameters for the algorithm.