# **NumPy**

Many operations present below will seem obvious if you have already done numpy, although it's very important to go through the following section as it forms the very base of statistical analysis and operations in EDA.

## **Arrays & Methods**

In [None]:
# Importing NumPy
import numpy as np

In [None]:
# 1D Array
array_1d = np.array(range(1,17))
array_1d

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16])

In [None]:
# 2D Array
array_2d = array_1d.reshape(4,4)
array_2d

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])

In [None]:
# 3D Array
array_3d = array_1d.reshape(2,4,2)
array_3d

array([[[ 1,  2],
        [ 3,  4],
        [ 5,  6],
        [ 7,  8]],

       [[ 9, 10],
        [11, 12],
        [13, 14],
        [15, 16]]])

In [None]:
# Get Memory Address**
print(f"Memory Address: {array_1d.data}")

# Shape of the data
print(f"Shape: {array_1d.shape}")

# Data type
print(f"Data Type: {array_1d.dtype}")

# Strides of the array**
print(f"Strides: {array_1d.strides}")

# Item Size**
print(f"Item Size: {array_1d.itemsize}")


Memory Address: <memory at 0x7eb11835b280>
Shape: (16,)
Data Type: int64
Strides: (8,)
Item Size: 8


In [None]:
# Shape of the 2D Aray
print(f"Shape: {array_2d.shape}")

# Strides of the array**
print(f"Strides: {array_2d.strides}")

# Item Size**
print(f"Item Size: {array_2d.itemsize}")

Shape: (4, 4)
Strides: (32, 8)
Item Size: 8


The tuple of strides indicates the number of bytes to move from one element to the next along each dimension of a multi-dimensional array. The order of values in the tuple corresponds to the dimensions of the array.

* $32$: This 32 corresponds to $Item Size$ $x$ $Number Of Rows$ (8 * 4). That is, the total amount of ***bytes*** required to move from one row to the next row.

* $8$: This represents the bytes required to move from one column to the next column within the same row.

## **Array from Built-In NumPy Functions**

In [None]:
# Array of ones
shape = (2, 2)
print(f"Ones: {np.ones(shape)}\n")

# Array of zeros
print(f"Zeros: {np.zeros(shape)}\n")

# Array of random values
print(f"Random Values: {np.random.random(shape)}\n")

# An empty array
print(f"Empty: {np.empty(shape)}\n")

# An array full of Ks (called a Full Array)
k = 5
print(f"Full: {np.full(shape, k)}\n")

# Aranged Array
start, stop, step = 5, 25, 3
print(f"Arrange: {np.arange(start, stop, step)}\n")

# Linspace
lower_bound, upper_bound, n = 10, 20, 20
print(f"Linspace: {np.linspace(lower_bound, upper_bound, n)}")

Ones: [[1. 1.]
 [1. 1.]]

Zeros: [[0. 0.]
 [0. 0.]]

Random Values: [[0.74688629 0.62653973]
 [0.38560424 0.67357972]]

Empty: [[0.74688629 0.62653973]
 [0.38560424 0.67357972]]

Full: [[5 5]
 [5 5]]

Arrange: [ 5  8 11 14 17 20 23]

Linspace: [10.         10.52631579 11.05263158 11.57894737 12.10526316 12.63157895
 13.15789474 13.68421053 14.21052632 14.73684211 15.26315789 15.78947368
 16.31578947 16.84210526 17.36842105 17.89473684 18.42105263 18.94736842
 19.47368421 20.        ]


## **I/O Operations**

In [None]:
# Saving data to a txt file
data = np.arange(10, 50, 6)
np.savetxt("data.txt", data, delimiter=",")

print(f"Data: {data}")

Data: [10 16 22 28 34 40 46]


In [None]:
# Loading data from memory
np.loadtxt("data.txt", delimiter=",", unpack=True)

array([10., 16., 22., 28., 34., 40., 46.])

In [None]:
# Using "genfromtxt"
np.genfromtxt("data.txt", skip_header=1, filling_values=-999)

array([16., 22., 28., 34., 40., 46.])

## **Inspecting Arrays**

In [None]:
print(f"N Dims     : {array_3d.ndim}")
print(f"Shape      : {array_3d.shape}")
print(f"Size       : {array_3d.size}")
print(f"Item Size  : {array_3d.itemsize}")
print(f"Bytes Consumed : {array_3d.nbytes}")
print(f"Memory layout  : \n{array_3d.flags}")

N Dims     : 3
Shape      : (2, 4, 2)
Size       : 16
Item Size  : 8
Bytes Consumed : 128
Memory layout  : 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False



1. **C_CONTIGUOUS**: This attribute indicates whether the array data is in C-style contiguous order. In C-style order, the last axis changes the fastest (row-major order). If `C_CONTIGUOUS` is True, it means the array is stored in a way that is efficient for C-style indexing and operations.

2. **F_CONTIGUOUS**: This attribute indicates whether the array data is in Fortran-style contiguous order. In Fortran-style order, the first axis changes the fastest (column-major order). If `F_CONTIGUOUS` is True, it means the array is stored in a way that is efficient for Fortran-style indexing and operations.

3. **OWNDATA**: This attribute is True if the array owns its own data, meaning it is not a view or a copy of another array's data. If `OWNDATA` is False, it indicates that the array shares its data with another array.

4. **WRITEABLE**: This attribute indicates whether the array data can be modified. If `WRITEABLE` is True, you can modify the data in the array. If False, it means the array is read-only.

5. **ALIGNED**: This attribute is True if the data is properly aligned for SIMD (Single Instruction, Multiple Data) instructions. Alignment can affect the performance of certain operations on the array.

6. **WRITEBACKIFCOPY**: This attribute is True if this array is a copy of some other array, and if modifications to this array should be written back to the original array. If False, modifications are not written back.

## **Brodcasting**

Broadcasting is the mechanism in NumPy that allows operation between arrays of different sizes by replicating the smaller array to match with the bigger array.

In [None]:
a = np.random.random((2,2))
b = np.random.random(2)

print(
    f"A: {a}\n\n",
    f"B: {b}"
)

A: [[0.6722502  0.20366863]
 [0.71815131 0.83092016]]

 B: [0.03538458 0.86494721]


In [None]:
a + b

array([[0.70763478, 1.06861584],
       [0.75353588, 1.69586737]])

In [None]:
a - b

array([[ 0.63686563, -0.66127859],
       [ 0.68276673, -0.03402705]])

In [None]:
a * b

array([[0.02378729, 0.17616261],
       [0.02541148, 0.71870207]])

In [None]:
a @ b

array([0.1999499 , 0.74411355])

In [None]:
a / b

array([[18.99839638,  0.23546943],
       [20.29560303,  0.96065996]])

In [None]:
a // b

array([[18.,  0.],
       [20.,  0.]])

In [None]:
a % b

array([[0.03532783, 0.20366863],
       [0.01045979, 0.83092016]])

## **NumPy Slicing & Subsets**

In [None]:
array_3d

array([[[ 1,  2],
        [ 3,  4],
        [ 5,  6],
        [ 7,  8]],

       [[ 9, 10],
        [11, 12],
        [13, 14],
        [15, 16]]])

In [None]:
array_3d[1]

array([[ 9, 10],
       [11, 12],
       [13, 14],
       [15, 16]])

In [None]:
array_3d[0, :, 0]

array([1, 3, 5, 7])

In [None]:
array_3d[array_3d >= 5]

array([ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16])

The slicing is similar to that of Lists in Python.