# Numpy

## Basic Introduction:
- Numpy stands for numerical python. It is a library written in C
- It helps to perform mathematical calculations faster using multidemionsonal array
- It introduces a new datatype called *numpy array/ndarray*, which is a powerful n-dimensional array.
- It is homogenous meaning (primarily used for storing numbers). String can also be stored but rarely used `<U3` datatype
- Fixed item size must be defined before. But append is present but works differently (it returns a new copy of ndarray does not change in-place)
- Scalar values are of type `np.<datatype>` like `np.int64` something like that

## Why use numpy over standard python array?
- Integers in python are basically ~ 28 bytes based on the architecture.
- Python list are basically pointers to pyobjects. Python list values are scattered across heap memory. List pointers are contigous.
- In numpy numbers as stored as raw C values. Hence, they consume less space.
- Other reasons include vectorization, faster addition, inbuilt methods for matrix operations etc
- In python stack only has references actual values are stored in heap

In [8]:
import numpy as np
import sys

ndarray = np.array([1,2,3,4])
sys.getsizeof(ndarray[0]) # OP: 32 -> This is not actual size in memry because all of this are of a special np type, wrapper around actual dt
# This wrapper is not done for all ements it is a temproary scalar created only during access of element
ndarray.itemsize # OP: 8 -> This gives the actual size of individual elements

standard = [1,2,3,4]
# itemsize will not work here because it is present only on np.
sys.getsizeof(standard[0]) # OP: 28 -> Size of pyobject

28

## Creating n-dimensional arrays
- We have around 6 methods to achieve this.
- You can actually create arrays of n-dimesions but it is a bit difficult for humans to visualize beyond 2D
- In python you can just say this to be nested, nested and nested lists ðŸ˜‚
- It doesn't make sense right. Take cases like collecting sales data of multiple weeks and stacking upon becomes 3D
- All default value created by numpy will be float because precision matters `numerical python`

In [37]:
# 1. np.array -> Used to convert normal list to numpy array. Nest deep down 3d, 4d and etc it will convert to numpy
sample = np.array([[1,2,3],[4,5,6]]) # Dimensions must be same each row should have equal elements
sample2 = np.array([[[1,2],[3,4]]]) # 3D array

# 2. np.zeros -> recieves a single number or tuple of size in each dimension. Each position is filled with 0
np.zeros((5))
np.zeros(5) # Both are same creates 1D
np.zeros((2,3)) # 2D
np.zeros((1,2,3)) # 3D
# Think of dimensions like this, last value (x,x,5) here 5 tells number of elements per deep nested arrays
# All following the las says how times to repeat previous level in current
# (2,2,3) There elements in deep nested array, repeat 2 copies of this nested array and top most level will reapeat 2 times this same whole thing.
# Like recursion
np.zeros((2,2), dtype=int) # Get values of specified type

# 3. np.ones -> Works same as zeros but fills with ones
np.ones((2,2))

# 4. np.full -> apart from dimension takes in a values as well and prepopulates whole array with it. Fill value compulsory
np.full((2,2),1, dtype=float) # follows same datatpye of fill value, or you can specify seperately as dtype as well

# 5. np.identity -> Creates an identity matrix. It is a matrix where all diagonal elements are 1
np.identity(5) # Creates an identity matrix with 5 diagonal element (basically 5 rows)

# 6. np.arange -> Same as range function in python but used to create an array.
# Last digit is not included (n-1). Start from 0 and end at (n-1)
# Step value is optional, it skips number by spcified values (jumps)
np.arange(5)
np.arange(2,10) # Creates array from 2-9
np.arange(2,10,2) # OP: [2, 4, 6, 8]

# 7. np.linspace -> (lower_range, upper_range, number_parts_to_split). Includes both lower and upper range
# Creates equally spaced points in range for that many numbers.
np.linspace(10,20,10) # Includes 10 and 20. Gives an array with values split between those values equally.

# arr, it is the reference to np array
arr = np.array([1,2,3,4])
arr_copy = arr.copy()
arr_copy

array([1, 2, 3, 4])

## Important Properties and Attribute Of Numpy Array
- *Consider arr1 to be a numpy array*
1. arr1.shape
2. arr1.ndim
3. arr1.size
4. arr1.itemsize
5. arr1.dtype

### Shape
- Shape attribute is used to give `how many elements present at each nesting level`
- Returns a tuple. Length of the tuple is equal to the total number of axis.
- Each value says how many elemnts present at that axis/
- From left to right. Top most to deep nested level

### Ndim
- Returns the dimension of the matrix. Or basically toal number of axis present or nestings present.

### Size
- Gives total number of items. All numerical elemnts or actual value present. Count of that will be returned

### Item size
- Every individual number or datatype size in memory

### Dtype
- It returns the the datatype of the array

### astype()
- Method convert from one datatype to another
- arr1.astype(float) -> Converts the array datatype to float
- This does not modify arr1, but returns a new array with modified datatype.

In [50]:
k1 = [[[1,2],[3,4]],[[5,6],[7,8]]]
k2 = [[[1,2],[3,4]],[[5,6],[7]]]

# You can create a numpy array with k1, but not with k2. Numpy requires uniform shape at every level
arr1 = np.array(k1)
arr1.shape # OP: (2, 2, 2) at each level 2 elements. axis 0: [[1,2],[3,4]],[[5,6],[7,8]]. axis 1: [1,2],[3,4]. axis 2: 1,2

arr1.ndim

arr1.size

arr1.itemsize

arr1.dtype

arr1.astype(float)

dtype('int64')

## Indexing, Iteration & Slicing

### Indexing & Slicing
- Accessing index is same as standard list in python.
- You have negative indexing as well. You even have slicing.
- Slicing in standard python list creates a completely new copy. Meaning once a sliced list created in a reference and modified it does not affect the actual list.
- But in numpy arrays slicing creates a view, meaning it is still the real array created earlier

#### arr1[2:4][1:3] vs arr1[2:4, 1:3]
- First one is like arr1[2:4], in whole arr1 perform this, to the output of this perform [1:3]
- Flow is like -> arr1[2:4][1:3] ([2:4] applied to arr1) -> (result of arr1[2:4])[1:3] ([1:3] is applied to result of previous)
- Second one is like in first topmost level or shape, perform [2:4] and [2:4, 1:3] this is applied to second nested level or shape
- Perform an operation or slicing in each shape or level we use this , technique. But if we want to chain results we use outside
- The comma method basically changes the shape as well if in case range not given only picked it changes dimension accordingly.

### Iteration np.nditer(arr_reference)
- nditer allows you to perform iteration on each element disregarding the dimension. The array could be 1d, 2d, 3d you can instead of using nested loops. You can use this to perform iteration value by value
- Itâ€™s a tool that lets you iterate over a NumPy array element-by-element, but in a memory-efficient and controllable way.
- It gives a scaler object tied to that memory location
- `np.nditer(arr, order='C')` -> Default (Row-Major) goes row-based one by one
- `np.nditer(arr, order='F')` -> (Column-Major) goes column based on by one. Prioritizes column

In [106]:
arr2 = np.arange(24).reshape(6,4) # 2-dimensional
arr2
arr2[2:4] #[1:3] #OP: [[12, 13, 14, 15]] -> 1. arr1[2:4] gives [[ 8,  9, 10, 11],[12, 13, 14, 15]] -> 2. [1:3] is applied to this reult giving the output

arr2[2:4, 1:3] # OP: [[ 9, 10],[13, 14]]

#Question: I want entire 3rd column. [ 2,  6, 10, 14, 18, 22]
# First shape or top deimension I want all. second only 3rd column
arr2[:,2]

Question: [[18,19],[22,23]]
arr2[4:,2:]

arr3 = arr2.reshape(3,2,4)

# Iteration
for i in np.nditer(arr3):
    print(i)
arr3

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]],

       [[16, 17, 18, 19],
        [20, 21, 22, 23]]])

## Numpy Operations

- Numpy array makes it easier to perfrom mathematical operations upon arrays. Like adding a scalar to all elements. Operation between arrays and matrices.

### Basic Operations
- Keep in mind that when you are perform operations on arrays directly, they must have the same shape, if not these operations will not work.
- Operation along last axis are the fastest, because they are looking into contagious part of memory smaller jumps. While other larger jumps
- For all other axis except last deep most one. arrange and look downward. Break into axis childs and try to make reduce them.
- Even sum works with axis like this only
1. arr1.mean()
2. arr1.std()
3. np.sin(arr1) -> Gives you the sine value
4. np.median(arr1)

In [115]:
arr1 = np.arange(1,6)
arr2 = np.arange(6,11)

arr1+arr2 # Returns an array adding corresponding elements
arr1*2 # Applies scalar operation to this array
arr1+2 
arr1>=3 # Returns an boolean array of same shape, wtih true in places where condition satisfied and false in places where condition not satisfied.

arr1.dot(arr2) # Used to perform dot operation on arrays. Gives the dorproduct result

np.matmul(arr1, arr2) # Perform matrix multiplication

arr1.min() # Retrns minimum in array
arr1.max() # Returns max in array

'''
[# axis-0 childs
    [# axis-1 childs
        [ 0,  1,  2,  3], #axis-2 childs
        [ 4,  5,  6,  7]
    ],
    [
        [ 8,  9, 10, 11],
        [12, 13, 14, 15]
    ],
    [
        [16, 17, 18, 19],
        [20, 21, 22, 23]
    ]
]
'''

arr3.min(axis=0)

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

## Reshaping Numpy Arrays

1. Ravel
2. Reshape
3. Transpose
4. Stacking
5. Splitting

### Ravel
- This function is used to bring higher dimension arrays to 1d array.
- This function creates a view but actually in memory it is the same array. Hence modifying the view also changes or updates the original array

### Reshape
- Reshape is used to change the shape of a numpy array into n-dimension.
- You can convert it into any dimension as long as it is a factor of the total number of arrays present in a numpy arrays.

### Transpose
- This does not create a view instead it creates a new copy only. Hence modifying this won't affect the original
- Basically as the name suggests it helps to create the transpose of a matrix.
- Convert rows to columns and columns to rows

### Stacking
- It is a trick to combine two arrays together.
- It is used combine numpy arrays of same shapes. You pass in a tuple with values as the arrays to combine

### Split
- While stack is used to combine, split is used split the array as the name suggest
- Meaning it must be properly divisible the number of elements to spit
- Again you have `vsplit & hsplit`

## Fancy Indexing
- It is a method to extract desired rows from a numpy array.
- But why not slicing? Slicing is contagious or must follow a specific pattern for step to work. You can't ask to randomly pick rows.
- Fancy indexing allows to achieve that exact behaviour
- `arr[[r1,r2,r3]]` -> Pass in the indexes of desired rows.

In [148]:
arr3.ndim 
arr4 = arr3.ravel() # Converts to 1D array

arr4[0] = 0 # Modifies the original array as well

arr3.reshape(1,2,3,4,1,1,1) # Dimension doesn't matter as long as it is a factor of the toal elements present

k = np.array([arr4]).transpose()
k[0] = 100

# np.hstack(()) # Horizontal stack -> Combine in right like merging from left and right
# np.vstack(()) # Vertical Stack -> Stack one above another. Putting one on top of other

arr3
# np.hsplit(arr, no. of splits)
np.hsplit(arr3, 2)
np.vsplit(arr3, 3)

arr3[[0,2]]

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[16, 17, 18, 19],
        [20, 21, 22, 23]]])

## Indexing With Boolean Arrays (Filtering)

- When apply operators on array like `arr > 60`. It would basically generate a boolean array by applying this operation with all the values and wherever values satisfy the condition it is true and remaining places it is false.
- Now you don't want to get the boolean array instead you just want to get the values satisfying the condition. How will you do that?
- You overlap original array on top of the boolean array and extract true values.
- Isn't this intuitive?
- Boolean array works as a mask. Whereever the boolean value is true those values are return. Numpy runs a C loop on it.

In [167]:
arr_prac = np.arange(50,100)
arr_prac = arr_prac.reshape(10,5)
arr_prac = np.vsplit(arr_prac,2)[0]
arr_prac > 60 # Gives the boolean array
arr_prac[arr_prac>65] # Gives all the values staisfying that condition
arr_prac[(arr_prac>65) & (arr_prac%2==0)]

array([66, 68, 70, 72, 74])