### Deep vs. Shallow Copy
- **Shallow Copy**: Copies reference to the original object. Changes in one affect the other. 
- **Deep Copy**: Creates a new object, fully independent of the original.

In [40]:
# Shallow copy
original = [1,2,3]
print("Original: ", original)
shallow_copy = original
original [0] = 'change'
print("1. shallow copy: ",shallow_copy)
print(original is shallow_copy)

# Deep copy
original = [1,2,3]
print("Original: ", original)
deep_copy = original.copy()
original [0] = 'change'
print("1. deep copy: ",deep_copy)
print(original is deep_copy)

Original:  [1, 2, 3]
1. shallow copy:  ['change', 2, 3]
True
Original:  [1, 2, 3]
1. deep copy:  [1, 2, 3]
False


# NumPy Notes

### Overview
- **NumPy**: Numerical Python (NumPy) is a powerful library for numerical computing in Python, extensively used for scientific computations.
- **Built on C++**: NumPy is implemented in C and C++ for efficient computation.
- **Key Features**:
  1. An array object of arbitrary homogeneous items.
  2. Fast mathematical operations over arrays.
  3. Linear Algebra, Fourier Transforms, Random Number Generation.

### Benefits of NumPy
- **Scientific Calculations**: NumPy is widely used for tasks requiring scientific computation.
- **No Zero Division Error**: Unlike Python lists, NumPy handles calculations like division by zero gracefully.
- **Efficient Memory Allocation**: Homogeneous arrays ensure efficient memory storage and faster access compared to Python lists.

### Arrays vs. Lists
- **Array**: Homogeneous, faster, and more memory-efficient. Stores elements in a contiguous block of memory.
- **List**: Heterogeneous, elements stored in random locations.
  
  | Feature          | NumPy Array                 | Python List               |
  |------------------|-----------------------------|---------------------------|
  | Data Type        | Homogeneous (same type)      | Heterogeneous (any type)   |
  | Speed            | Faster (due to C backend)    | Slower                    |
  | Memory Efficiency| More efficient               | Less efficient            |

- **Matrix vs. Array**: A matrix in NumPy is a subclass of arrays, specifically 2D. Use matrices for specific linear algebra operations.

### Dimensions in NumPy
- **1D (Vector)**: A single dimension array.
- **2D (Matrix)**: Two-dimensional array.
- **3D (Tensor)**: Higher-dimensional data representation.

### Linear Algebra in NumPy
- **Matrix Operations**: NumPy has built-in functions for matrix multiplication, transposition, and solving linear equations.

### Fourier Transforms and Random Number Generation
- **Fourier Transform**: NumPy includes fast Fourier transform routines to analyze frequency components of signals.
- **Random Number Generation**: The `np.random` module is used for generating random numbers from various distributions.

### Utility Functions
- **np.info()**: Provides detailed information about NumPy objects.
- **np.lookfor()**: Searches NumPy documentation for a keyword.
  
### In-place vs. Copy Operations
- **In-place Operations**: Modify the original array without creating a copy (e.g., `x.sort()`).
- **Copy Operations**: Return a new array with the result (e.g., `np.sort(x)`).



In [341]:
import numpy as np
import time

# Generate the two random Numpy arrays with one million elements each
array1 = np.random.rand(1000000)
array2 = np.random.rand(1000000)


start_time_vectorized = time.time()
result = np.dot(array1, array2)

# Measure time for vector version 
print("Time taken for vectorized calculation: " + str(1000 * (time.time() - start_time_vectorized)) + "millisecond")


# Reset result_vectorized to 0
result_vectorized = 0

# Measure time for the version using a for loop
start_time_for_loop = time.time()
for index in range(1000000):
    result_vectorized += array1[index] * array2[index]

# Print the time taken for the for loop version
print("Time taken for calculation using a for loop: " + str(1000 * (time.time() - start_time_for_loop)) + " milliseconds")

Time taken for vectorized calculation: 2.5854110717773438millisecond
Time taken for calculation using a for loop: 540.1439666748047 milliseconds


In [52]:
import numpy as np
print(np.__version__)
# print(np.__doc__)

1.26.4


## np.array

In [88]:
l = [1,2,3]
t = (1,2,3)
d = {'a': 1,  'b': 2}
d1 = {'a': [1,2],  'b': [2,3]}
s = {1,2,3,4}
st = 'sads'

# Convert list to numpy array
'''
np.array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0,
      like=None)

* (Keyword-Only Arguments Separator):
Description: The * is a syntactical feature in Python that indicates all following parameters are keyword-only, meaning they must be passed by name 
rather than position.
      '''

# Convert each to a NumPy array
array_l = np.array(l)
array_t = np.array(t)
array_d = np.array(list(d.items()))  # Convert dictionary values to array
# array_d1 = np.array(list(d1.items()))  #ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (2, 2) + inhomogeneous part.

array_s = np.array(s)           # Convert set to array
array_st = np.array((st))         # Convert string to array (character by character)

# Print the resulting arrays
print("Array from list:", array_l)
print("Array from tuple:", array_t)
print("Array from dictionary values:", array_d)
print("Array from set:", array_s)
print("Array from string:", array_st)

Array from list: [1 2 3]
Array from tuple: [1 2 3]
Array from dictionary values: [['a' '1']
 ['b' '2']]
Array from set: {1, 2, 3, 4}
Array from string: sads


In [169]:
l1 = [[1,2], ['b', 1.0]]
l1 = np.array(l1)
for i in l1:
    print("Rows: ",i.dtype)
    for j in i:
        print("Row- Column: dtype ",j.dtype)
        print("tpye: ",type(j))
'''NumPy arrays require all elements to be of the same data type. To achieve this, NumPy automatically determines a common data type that can accommodate all elements in l1.
In this case, since the list contains a mix of integers, a string, and a float, NumPy promotes all elements to the data type that can encompass all of them: string (<U32).
Here, <U32 is a Unicode string data type with a maximum length of 32 characters.'''

Rows:  <U32
Row- Column: dtype  <U1
tpye:  <class 'numpy.str_'>
Row- Column: dtype  <U1
tpye:  <class 'numpy.str_'>
Rows:  <U32
Row- Column: dtype  <U1
tpye:  <class 'numpy.str_'>
Row- Column: dtype  <U3
tpye:  <class 'numpy.str_'>


'NumPy arrays require all elements to be of the same data type. To achieve this, NumPy automatically determines a common data type that can accommodate all elements in l1.\nIn this case, since the list contains a mix of integers, a string, and a float, NumPy promotes all elements to the data type that can encompass all of them: string (<U32).\nHere, <U32 is a Unicode string data type with a maximum length of 32 characters.'

In [122]:
'''
np.array(object, dtype = None, copy = True, order = 'K', subok=Falsendim = 0)

dtype = int, float, object, complex, bool,etc
copy = False : allows NumPy to create an array without duplicating the data if it is already in the correct form.

order = 
The order parameter controls the memory layout of the array. The memory layout affects how the elements of the array are stored in 
memory and how they are accessed. The choice of memory layout can impact performance, especially in numerical computations and when 
interfacing with other libraries.Possible Values:
'C': Row-major (C-style) order.This is the default memory layout in NumPy. It is useful when you need to interact with C libraries or when row-wise operations are more frequent.
'F': Column-major (Fortran-style) order. This layout is preferred when working with Fortran libraries or when column-wise operations are more frequent.
'A': Fortran-style order if the input is Fortran contiguous, C-style otherwise. Use when you want to preserve the memory layout of the original array, especially when converting or copying arrays.
'K': Match the layout of the input as closely as possible. This is useful when you want to ensure that the array's memory layout is preserved, which can be important in performance-critical applications.

subok = True: If subok=True, then subclasses of ndarray are allowed, and the returned array will maintain the type of the subclass.Useful when working with custom subclasses of ndarray that have additional methods or properties. Setting subok=True ensures that these methods and properties are preserved.
ndmin  = Specifies the minimum number of dimensions that the resulting array should have.
like = Reference array to look for overriding the __array_function__ implementation.
It allows creating an array like the like array, following its conventions.
'''
print(f"dtype: {np.array(l, dtype = float, order = 'A')}")
arr = np.array([[1, 2], [3, 4]], order='C', ndmin = 5)
print(arr)

class MyArray(np.ndarray):
    def custom_method(self):
        return "Custom method"

arr = MyArray([1, 2, 3])
arr_subok = np.array(arr, subok=True)
print(type(arr_subok))  # Output: <class '__main__.MyArray'>
print(arr_subok.custom_method())  # Output: Custom method


dtype: [1. 2. 3.]
[[[[[1 2]
    [3 4]]]]]
<class '__main__.MyArray'>
Custom method


## 
- `Matrix`np.matrix(data, dtype=None, copy=True)
- - By default 2D 

In [129]:
print(np.matrix(l))

[[1 2 3]]


## Shape, Size, Dimension

In [178]:
# Shape
print(f"""
Array: {arr}
Shape: {arr.shape}
Size: {arr.size}
Dimensions: {arr.ndim}""")



Array: [[[1. 0. 2.]
  [0. 3. 0.]]]
Shape: (1, 2, 3)
Size: 6
Dimensions: 3


Summary
- `np.array`: Always creates a new array. Useful when you need to create an array from scratch and potentially specify the dtype or other properties.
- `np.asarray`: Converts input to an array but avoids copying if the input is already a NumPy array with the correct dtype and order. Useful for efficiency when the input may already be an array.
- `np.asanyarray`: Converts input to an array but preserves ndarray subclasses. Useful when working with custom or specialized array types that should not be converted to base ndarray..

- **`np.array`:** Always creates a new `ndarray` instance. If the input is a subclass of `ndarray`, it is converted to a base `ndarray`, losing any specialized behavior.
  
- **`np.asarray`:** Converts the input to a NumPy array but will not make a copy if the input is already an `ndarray` with the correct dtype and order. It does not necessarily preserve subclasses if they do not match the expected dtype and order.

- **`np.asanyarray`:** Specifically designed to preserve subclasses, which is important when working with custom array types that need to retain their extended functionality.

### **Summary**

`np.asanyarray` is valuable when you have custom or specialized array types derived from `ndarray` and you want to ensure that operations or conversions do not strip away the subclass-specific features. It helps maintain the integrity and functionality of custom array types throughout your code.

## np.`fromfunction`(function, shape, *, dtype=<class 'float'>, like=None, **kwargs)
- Construct an array by executing a function over each coordinate.
- The resulting array therefore has a value fn(x, y, z) at coordinate (x, y, z)
## np.`fromiter`(iter, dtype, count=-1, *, like=None)

- Create a new 1-dimensional array from an iterable object.
- count parameter specifies the number of items to read from the iterator. -1 means no limit to read the parameter from iterable
## np.`fromstring`(string, dtype=float, count=-1, *, sep, like=None)
- A new 1-D array initialized from text data in a string.

In [189]:
np.fromfunction(lambda i,j: i*j, shape = (2,3) )

array([[0., 0., 0.],
       [0., 1., 2.]])

In [197]:
np.fromfunction(lambda i,j,k: i*j*k, shape = (2,3,4) )   # i,j,k are indexing

array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 1., 2., 3.],
        [0., 2., 4., 6.]]])

In [212]:
# Define a function that takes additional keyword arguments
def my_function(x, y, factor=1):
    return (x + y) * factor

# Use np.fromfunction with **kwargs
shape = (3, 3)
result = np.fromfunction(lambda x, y: my_function(x, y, factor=2), shape, dtype=int)

print(result)

[[0 2 4]
 [2 4 6]
 [4 6 8]]


In [220]:
print([i for i in range(5)])  # List comprehension
iterable = (i for i in range(5)) # Iterable
print(iterable)  
print(next(iterable), next(iterable))

[0, 1, 2, 3, 4]
<generator object <genexpr> at 0x00000275D87AFD30>
0 1


In [222]:
np.fromiter(iterable,dtype = int)  # Because 1, 2 already took out

array([2, 3, 4])

In [241]:
# Create a NumPy array from a space-separated string
np.array('a b c d'.split(), dtype=object)

array(['a', 'b', 'c', 'd'], dtype=object)

In [230]:
np.fromstring('23 24 36', sep = " ", dtype = int ) #String not allowed

array([23, 24, 36])

## Range in list vs `arange` in Numpy
- arange([start,] stop[, step,], dtype=None, *, like=None)- 
Return evenly spaced values within a given interval.


In [250]:
list(range(0,2.1,.1))

TypeError: 'float' object cannot be interpreted as an integer

In [246]:
np.arange(0,2.1, .1)

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2,
       1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. ])

## `linspace` and `logspace`
np.linspace(
    start,
    stop,
    num=50,
    endpoint=True,
    retstep=False,
    dtype=None,
   ,)

retstep : bool, optional

     If True, return (`samples`, `step`), where `step` is the spacing
    between sample


np.logspace(
    start,
    stop,
    num=50,
    endpoint=True,
    base=10.0,
    dtype=None,
    a

Return numbers spaced evenly on a log scale.

In linear space, the sequence starts at ``base ** start``
(`base` to the power of `start`) and ends with ``base ** stop``xis=0,
)s.axis=0,

In [258]:
np.linspace(0,10,4,retstep=True)

(array([ 0.        ,  3.33333333,  6.66666667, 10.        ]),
 3.3333333333333335)

In [260]:
np.linspace(0,10,4,retstep=True)

(array([ 0.        ,  3.33333333,  6.66666667, 10.        ]),
 3.3333333333333335)

In [270]:
np.logspace(2,4, base = 3, num = 3 )

array([ 9., 27., 81.])

# zero, zero_like, ones, ones_like, full, full_like, empty, empty_like, 

In [272]:
np.zeros((3,2))

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

In [276]:
np.zeros_like(np.asarray([1,2]))

array([0, 0])

In [278]:
np.ones((2,3))

array([[1., 1., 1.],
       [1., 1., 1.]])

In [284]:
np.full((3,3), 'a')  # np.full(shape, fill_value, dtype=None, order='C', *, like=None)

array([['a', 'a', 'a'],
       ['a', 'a', 'a'],
       ['a', 'a', 'a']], dtype='<U1')

In [287]:
np.empty((2,2)) # Return a new array of given shape and type, without initializing entries.

array([[ 0.        ,  3.33333333],
       [ 6.66666667, 10.        ]])

## np.`eye`(N, M=None, k=0, dtype=<class 'float'>, order='C', *, like=None)
- Return a 2-D array with ones on the diagonal and zeros elsewhere.
- N : int
  Number of rows in the output.
- M : int, optional
  Number of columns in the output. If None, defaults to `N`.
  
- k : int, optional

  Index of the diagonal: 0 (the default) refers to the main diagonal,
  
  '+'value refers to an upper diagonal, and
  
  '-'value  to a lower diagonal.

In [304]:
arr = np.eye(5,6,-2)
arr

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.]])

In [306]:
import pandas as pd
pd.DataFrame(arr)

Unnamed: 0,0,1,2,3,4,5
0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.0,0.0,0.0,0.0,0.0
3,0.0,1.0,0.0,0.0,0.0,0.0
4,0.0,0.0,1.0,0.0,0.0,0.0


# Random

## 
- choice
- randrange
- random
- shuffle
- uniform

In [309]:
import random
random.choice([1,2,3])

2

In [311]:
random.randrange(2,10,2)  # return will be int

8

In [313]:
random.random()  # between 0 to 1

0.9502123736745469

In [326]:
l = [1,2,[3]]
random.shuffle(l)
l

[[3], 1, 2]

In [328]:
random.uniform(2,45)  # return can be float also

8.200408296469128

## np.random
- `rand` : Random numbers in a uniform distribution between 0 and 1.
- `randn` : Random numbers from a standard normal distribution with a mean of 0 and a standard deviation of 1.
- `random_sample`: Random floats in the range [0.0, 1.0)
- `randint` : Random integers between a specified low (inclusive) and high (exclusive) range.

In [334]:
print(np.random.rand(2,3,4)) # 
print(np.random.randn(2,3)) # Normal Distribution
print(np.random.random_sample())
print(np.random.randint(1,5, size = (3,4)))

[[[0.29864068 0.86160962 0.9058072  0.76858325]
  [0.26123164 0.9384556  0.93864246 0.74504455]
  [0.91073504 0.23722471 0.49496735 0.80987834]]

 [[0.95456578 0.63748325 0.91084975 0.69213675]
  [0.04294299 0.8335869  0.36994852 0.936557  ]
  [0.48305288 0.12533161 0.96445418 0.01702583]]]
[[-0.84529039  0.41510056  0.42553059]
 [-0.98072436 -0.42782611  1.49856996]]
0.13889344396681091
[[2 3 1 4]
 [4 1 4 3]
 [4 3 4 1]]


In [349]:
arr = np.random.randint(10,100, size= (3,4))
arr

array([[13, 65, 97, 22],
       [38, 18, 45, 63],
       [56, 49, 13, 82]])

In [351]:
arr[arr>36] # Give all elements where condition is True

array([65, 97, 38, 45, 63, 56, 49, 82])

In [353]:
arr

array([[13, 65, 97, 22],
       [38, 18, 45, 63],
       [56, 49, 13, 82]])

In [355]:
arr[0]

array([13, 65, 97, 22])

In [357]:
arr[2][1]

49

In [361]:
arr[0,[1,3]]  #Specific  0th row 1st and 3rd column 

array([65, 22])

In [363]:
arr[2:4,[2,3]]  # Row from 2 to 3 column 2nd and 3rd

array([[13, 82]])

In [68]:
arr[2:4]

array([[ 925970486,  828585574,  876032557,  909389154,  959263073],
       [ 758659376, 1664705894, 1714763060, 1633903971,          0]])

In [72]:
arr[2:4][0,2]

876032557

In [74]:
arr[2:4,[0,2]]

array([[ 925970486,  876032557],
       [ 758659376, 1714763060]])

In [76]:
arr[2:4,0:2] # Slicing

array([[ 925970486,  828585574],
       [ 758659376, 1664705894]])

In [84]:
arr[2,3]

909389154

In [368]:
arr[1:3] #Slicing

array([[38, 18, 45, 63],
       [56, 49, 13, 82]])

In [378]:
a =  np.array([[1,2,3]])
b = np.random.randint(10,100, size = (3,3))
print(a, b)

[[1 2 3]] [[83 50 26]
 [37 90 16]
 [26 65 26]]


In [382]:
# Dot product
a * b

array([[ 83, 100,  78],
       [ 37, 180,  48],
       [ 26, 130,  78]])

In [384]:
# Cross Product
a @ b

array([[235, 425, 136]])

In [386]:
# # Broadcasting
# Doing operation on each of the element
# a + b
# a - b
# a/5

c = np.array([[1,2,3]])
t = c.T
print(c,t)

[[1 2 3]] [[1]
 [2]
 [3]]


In [388]:
t

array([[1],
       [2],
       [3]])

In [390]:
c

array([[1, 2, 3]])

In [392]:
a+ c

array([[2, 4, 6]])

In [394]:
a + t

array([[2, 3, 4],
       [3, 4, 5],
       [4, 5, 6]])

In [396]:
np.exp(a)

array([[ 2.71828183,  7.3890561 , 20.08553692]])

In [398]:
np.log(a)

array([[0.        , 0.69314718, 1.09861229]])

In [400]:
np.min(a)

1

In [402]:
np.max(a)

3

In [404]:
a.flatten() # Reduce to 1D

array([1, 2, 3])

In [406]:
# Axis  = 0 row
#Axis = 1 column

In [408]:
np.sum(a)

6

In [412]:
np.sum(b, axis=0)

array([146, 205,  68])

In [414]:
np.sum(b, axis = 1)

array([159, 143, 117])

In [416]:
np.expand_dims(a, axis = 0) # Expand shape by 1d

array([[[1, 2, 3]]])

In [418]:
np.expand_dims(a, axis = 0).shape

(1, 1, 3)