<a href="https://www.kaggle.com/code/nilotpalmaitra/introduction-to-numpy?scriptVersionId=195681054" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [None]:
import datetime
print(f"Last updated: {datetime.datetime.now()}")

In [1]:
import numpy as np

# Check the version
print(np.__version__)

1.26.4


## 1. DataTypes and attributes

> **Note:** Important to remember the main type in NumPy is `ndarray`, even seemingly different kinds of arrays are still `ndarray`'s. This means an operation you do on one array, will work on another.

In [5]:
# 1-dimensonal array, also referred to as a vector
a1 = np.array([1, 2, 3])

# 2-dimensional array, also referred to as matrix
a2 = np.array([[1, 2.0, 3.3],
               [4, 5, 6.5]])

# 3-dimensional array, also referred to as a matrix
a3 = np.array([[[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]],
                [[10, 11, 12],
                 
                 [13, 14, 15],
                 [16, 17, 18]]])

In [3]:
a1.shape, a1.ndim, a1.dtype, a1.size, type(a1)


((3,), 1, dtype('int64'), 3, numpy.ndarray)

In [4]:
a2.shape, a2.ndim, a2.dtype, a2.size, type(a2)

((2, 3), 2, dtype('float64'), 6, numpy.ndarray)

In [6]:
a3.shape, a3.ndim, a3.dtype, a3.size, type(a3)

((2, 3, 3), 3, dtype('int64'), 18, numpy.ndarray)

In [None]:
a1

In [None]:
a2

In [None]:
a3

### pandas DataFrame out of NumPy arrays

This is to examplify how NumPy is the backbone of many other libraries.

In [7]:
import pandas as pd
df = pd.DataFrame(np.random.randint(10, size=(5, 3)),
                                    columns=['a', 'b', 'c'])
df

Unnamed: 0,a,b,c
0,9,7,5
1,6,3,4
2,8,2,5
3,7,8,5
4,0,9,0


In [None]:
a2

In [8]:
df2 = pd.DataFrame(a2)
df2

Unnamed: 0,0,1,2
0,1.0,2.0,3.3
1,4.0,5.0,6.5


## 2. Creating arrays

* `np.array()`
* `np.ones()`
* `np.zeros()`
* `np.random.rand(5, 3)`
* `np.random.randint(10, size=5)`
* `np.random.seed()` - pseudo random numbers
* Searching the documentation example (finding `np.unique()` and using it)

In [None]:
# Create a simple array
simple_array = np.array([1, 2, 3])
simple_array

In [None]:
simple_array = np.array((1, 2, 3))
simple_array, simple_array.dtype

In [9]:
# Create an array of ones
ones = np.ones((10, 2))
ones

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

In [None]:
# The default datatype is 'float64'
ones.dtype

In [None]:
# You can change the datatype with .astype()
ones.astype(int)

In [None]:
# Create an array of zeros
zeros = np.zeros((5, 3, 3))
zeros

In [None]:
zeros.dtype

In [None]:
# Create an array within a range of values
range_array = np.arange(0, 10, 2)
range_array

In [None]:
# Random array
random_array = np.random.randint(10, size=(5, 3))
random_array

In [None]:
# Random array of floats (between 0 & 1)
np.random.random((5, 3))

In [None]:
np.random.random((5, 3))

In [None]:
# Random 5x3 array of floats (between 0 & 1), similar to above
np.random.rand(5, 3)

In [None]:
np.random.rand(5, 3)

In [None]:
# Set random seed to 0
np.random.seed(0)

# Make 'random' numbers
np.random.randint(10, size=(5, 3))

In [None]:
# Make more random numbers
np.random.randint(10, size=(5, 3))

In [None]:
# Set random seed to same number as above
np.random.seed(0)

# The same random numbers come out
np.random.randint(10, size=(5, 3))

In [None]:
np.random.seed(0)
df = pd.DataFrame(np.random.randint(10, size=(5, 3)))
df

In [None]:
# Your code here

In [None]:
a1

In [None]:
a2

In [None]:
a3

Array shapes are always listed in the format `(row, column, n, n, n...)` where `n` is optional extra dimensions.

In [None]:
a1[0]

In [None]:
a2[0]

In [None]:
a3[0]

In [None]:
# Get 2nd row (index 1) of a2
a2[1]

In [None]:
# Get the first 2 values of the first 2 rows of both arrays
a3[:2, :2, :2]

In [None]:
a4 = np.random.randint(10, size=(2, 3, 4, 5))
a4

In [None]:
a4.shape

In [None]:
# Get only the first 4 numbers of each single vector
a4[:, :, :, :4]

### Arithmetic

In [None]:
a1

In [11]:
ones = np.ones(3)
ones

array([1., 1., 1.])

In [12]:
# Add two arrays
a1 + ones

array([2., 3., 4.])

In [None]:
# Subtract two arrays
a1 - ones

In [None]:
# Multiply two arrays
a1 * ones

In [None]:
# Multiply two arrays
a1 * a2

In [None]:
a1.shape, a2.shape

In [None]:
# This will error as the arrays have a different number of dimensions (2, 3) vs. (2, 3, 3)
a2 * a3

In [None]:
a3

In [None]:
a1

In [None]:
a1.shape

In [None]:
a2.shape

In [None]:
a2

In [None]:
a1 + a2

In [None]:
a2 + 2

In [None]:
# Raises an error because there's a shape mismatch (2, 3) vs. (2, 3, 3)
a2 + a3

In [None]:
# Divide two arrays
a1 / ones

In [None]:
# Divide using floor division
a2 // a1

In [None]:
# Take an array to a power
a1 ** 2

In [None]:
# You can also use np.square()
np.square(a1)

In [None]:
# Modulus divide (what's the remainder)
a1 % 2

You can also find the log or exponential of an array using `np.log()` and `np.exp()`.

In [None]:
# Find the log of an array
np.log(a1)

In [None]:
# Find the exponential of an array
np.exp(a1)

### Aggregation

Aggregation - bringing things together, doing a similar thing on a number of things.

In [None]:
sum(a1)

In [None]:
np.sum(a1)

**Tip:** Use NumPy's `np.sum()` on NumPy arrays and Python's `sum()` on Python `list`s.

In [13]:
massive_array = np.random.random(100000)
massive_array.size, type(massive_array)

(100000, numpy.ndarray)

In [14]:
%timeit sum(massive_array) # Python sum()
%timeit np.sum(massive_array) # NumPy np.sum()\\\\


21 ms ± 695 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
38.9 µs ± 298 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [None]:
import random
massive_list = [random.randint(0, 10) for i in range(100000)]
len(massive_list), type(massive_list)

In [None]:
massive_list[:10]

In [None]:
%timeit sum(massive_list)
%timeit np.sum(massive_list)

In [None]:
a2

In [None]:
# Find the mean
np.mean(a2)

In [None]:
# Find the max
np.max(a2)

In [None]:
# Find the min
np.min(a2)

In [None]:
# Find the standard deviation
np.std(a2)

In [None]:
# Find the variance
np.var(a2)

In [None]:
# The standard deviation is the square root of the variance
np.sqrt(np.var(a2))

**What's mean?**

Mean is the same as average. You can find the average of a set of numbers by adding them up and dividing them by how many there are.

**What's standard deviation?**

[Standard deviation](https://www.mathsisfun.com/data/standard-deviation.html) is a measure of how spread out numbers are.

**What's variance?**

The [variance](https://www.mathsisfun.com/data/standard-deviation.html) is the averaged squared differences of the mean.

To work it out, you:
1. Work out the mean
2. For each number, subtract the mean and square the result
3. Find the average of the squared differences

In [None]:
# Demo of variance
high_var_array = np.array([1, 100, 200, 300, 4000, 5000])
low_var_array = np.array([2, 4, 6, 8, 10])

np.var(high_var_array), np.var(low_var_array)

In [None]:
np.std(high_var_array), np.std(low_var_array)

In [None]:
# The standard deviation is the square root of the variance
np.sqrt(np.var(high_var_array))

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.hist(high_var_array)
plt.show()

In [None]:
plt.hist(low_var_array)
plt.show()

### Reshaping

In [None]:
a2

In [None]:
a2.shape

In [None]:
a2 + a3

In [None]:
a2.reshape(2, 3, 1)

In [None]:
a2.reshape(2, 3, 1) + a3

### Transpose

A tranpose reverses the order of the axes.

For example, an array with shape `(2, 3)` becomes `(3, 2)`.

In [None]:
a2.shape

In [None]:
a2.T

In [None]:
a2.transpose()

In [None]:
a2.T.shape

For larger arrays, the default value of a tranpose is to swap the first and last axes.

For example, `(5, 3, 3)` -> `(3, 3, 5)`.

In [15]:
matrix = np.random.random(size=(5, 3, 3))
matrix

array([[[0.39035196, 0.31340464, 0.78055179],
        [0.81545165, 0.32758028, 0.7633072 ],
        [0.90417541, 0.48013234, 0.52709232]],

       [[0.38018834, 0.46332855, 0.80634384],
        [0.38165262, 0.67643762, 0.83753438],
        [0.97947719, 0.743634  , 0.06795351]],

       [[0.30456841, 0.30327055, 0.6876546 ],
        [0.57447301, 0.81233542, 0.17446427],
        [0.36476569, 0.57407963, 0.30111454]],

       [[0.35276496, 0.71134284, 0.87595599],
        [0.99296862, 0.74447408, 0.97123103],
        [0.34024125, 0.51779866, 0.35770978]],

       [[0.16750907, 0.8765897 , 0.20292387],
        [0.48079306, 0.92647824, 0.56538612],
        [0.72520204, 0.43742545, 0.49381532]]])

In [16]:
matrix.shape

(5, 3, 3)

In [None]:
matrix.shape

In [17]:
matrix.T

array([[[0.39035196, 0.38018834, 0.30456841, 0.35276496, 0.16750907],
        [0.81545165, 0.38165262, 0.57447301, 0.99296862, 0.48079306],
        [0.90417541, 0.97947719, 0.36476569, 0.34024125, 0.72520204]],

       [[0.31340464, 0.46332855, 0.30327055, 0.71134284, 0.8765897 ],
        [0.32758028, 0.67643762, 0.81233542, 0.74447408, 0.92647824],
        [0.48013234, 0.743634  , 0.57407963, 0.51779866, 0.43742545]],

       [[0.78055179, 0.80634384, 0.6876546 , 0.87595599, 0.20292387],
        [0.7633072 , 0.83753438, 0.17446427, 0.97123103, 0.56538612],
        [0.52709232, 0.06795351, 0.30111454, 0.35770978, 0.49381532]]])

In [18]:
matrix.T.shape

(3, 3, 5)

In [None]:
# Check to see if the reverse shape is same as tranpose shape
matrix.T.shape == matrix.shape[::-1]

In [None]:
# Check to see if the first and last axes are swapped
matrix.T == matrix.swapaxes(0, -1) # swap first (0) and last (-1) axes

You can see more advanced forms of tranposing in the NumPy documentation under [`numpy.transpose`](https://numpy.org/doc/stable/reference/generated/numpy.transpose.html).

In [19]:
np.random.seed(0)
mat1 = np.random.randint(10, size=(3, 3))
mat2 = np.random.randint(10, size=(3, 2))

mat1.shape, mat2.shape

((3, 3), (3, 2))

In [20]:
mat1

array([[5, 0, 3],
       [3, 7, 9],
       [3, 5, 2]])

In [21]:
mat2

array([[4, 7],
       [6, 8],
       [8, 1]])

In [22]:
np.dot(mat1, mat2)

array([[ 44,  38],
       [126,  86],
       [ 58,  63]])

In [23]:
# Can also achieve np.dot() with "@"
# (however, they may behave differently at 3D+ arrays)
mat1 @ mat2

array([[ 44,  38],
       [126,  86],
       [ 58,  63]])

In [None]:
np.random.seed(0)
mat3 = np.random.randint(10, size=(4,3))
mat4 = np.random.randint(10, size=(4,3))
mat3

In [None]:
mat4

In [None]:
# This will fail as the inner dimensions of the matrices do not match
np.dot(mat3, mat4)

In [None]:
mat3.T.shape

In [None]:
# Dot product
np.dot(mat3.T, mat4)

In [None]:
# Element-wise multiplication, also known as Hadamard product
mat3 * mat4

### Dot product practical example, nut butter sales

In [25]:
np.random.seed(0)
sales_amounts = np.random.randint(20, size=(5, 3))
sales_amounts

array([[12, 15,  0],
       [ 3,  3,  7],
       [ 9, 19, 18],
       [ 4,  6, 12],
       [ 1,  6,  7]])

In [26]:
weekly_sales = pd.DataFrame(sales_amounts,
                            index=["Mon", "Tues", "Wed", "Thurs", "Fri"],
                            columns=["Almond butter", "Peanut butter", "Cashew butter"])
weekly_sales

Unnamed: 0,Almond butter,Peanut butter,Cashew butter
Mon,12,15,0
Tues,3,3,7
Wed,9,19,18
Thurs,4,6,12
Fri,1,6,7


In [27]:
prices = np.array([10, 8, 12])
prices

array([10,  8, 12])

In [28]:
butter_prices = pd.DataFrame(prices.reshape(1, 3),
                             index=["Price"],
                             columns=["Almond butter", "Peanut butter", "Cashew butter"])
butter_prices.shape

(1, 3)

In [None]:
weekly_sales.shape

In [29]:
# Find the total amount of sales for a whole day
total_sales = prices.dot(sales_amounts)
total_sales

ValueError: shapes (3,) and (5,3) not aligned: 3 (dim 0) != 5 (dim 0)

The shapes aren't aligned, we need the middle two numbers to be the same.

In [None]:
prices

In [None]:
sales_amounts.T.shape

In [None]:
# To make the middle numbers the same, we can transpose
total_sales = prices.dot(sales_amounts.T)
total_sales

In [30]:
butter_prices.shape, weekly_sales.shape

((1, 3), (5, 3))

In [None]:
daily_sales = butter_prices.dot(weekly_sales.T)
daily_sales

In [None]:
# Need to transpose again
weekly_sales["Total"] = daily_sales.T
weekly_sales

### Comparison operators

Finding out if one array is larger, smaller or equal to another.

In [None]:
a1

In [None]:
a2

In [None]:
a1 > a2

In [None]:
a1 >= a2

In [None]:
a1 > 5

In [None]:
a1 == a1

In [None]:
a1 == a2

## 5. Sorting arrays

* [`np.sort()`](https://numpy.org/doc/stable/reference/generated/numpy.sort.html) - sort values in a specified dimension of an array.
* [`np.argsort()`](https://numpy.org/doc/stable/reference/generated/numpy.argsort.html) - return the indices to sort the array on a given axis.
* [`np.argmax()`](https://numpy.org/doc/stable/reference/generated/numpy.argmax.html) - return the index/indicies which gives the highest value(s) along an axis.
* [`np.argmin()`](https://numpy.org/doc/stable/reference/generated/numpy.argmin.html) - return the index/indices which gives the lowest value(s) along an axis.

In [None]:
random_array

In [None]:
np.sort(random_array)

In [None]:
np.argsort(random_array)

In [None]:
a1

In [None]:
# Return the indices that would sort an array
np.argsort(a1)

In [None]:
# No axis
np.argmin(a1)

In [None]:
random_array

In [None]:
# Down the vertical
np.argmax(random_array, axis=1)

In [None]:
# Across the horizontal
np.argmin(random_array, axis=0)

In [34]:
from matplotlib.image import imread

img = imread('/kaggle/input/trialimages/unknown.png')
img

array([[[0.60784316, 0.5882353 , 0.5137255 , 1.        ],
        [0.6745098 , 0.64705884, 0.58431375, 1.        ],
        [0.67058825, 0.6313726 , 0.58431375, 1.        ],
        ...,
        [0.5294118 , 0.54509807, 0.5882353 , 1.        ],
        [0.49803922, 0.5058824 , 0.5529412 , 1.        ],
        [0.4745098 , 0.48235294, 0.5294118 , 1.        ]],

       [[0.6313726 , 0.6117647 , 0.5372549 , 1.        ],
        [0.68235296, 0.654902  , 0.5921569 , 1.        ],
        [0.6666667 , 0.627451  , 0.5803922 , 1.        ],
        ...,
        [0.5372549 , 0.5529412 , 0.59607846, 1.        ],
        [0.5137255 , 0.52156866, 0.5686275 , 1.        ],
        [0.49803922, 0.5058824 , 0.5529412 , 1.        ]],

       [[0.6156863 , 0.59607846, 0.52156866, 1.        ],
        [0.68235296, 0.654902  , 0.5921569 , 1.        ],
        [0.6666667 , 0.627451  , 0.5803922 , 1.        ],
        ...,
        [0.53333336, 0.54901963, 0.58431375, 1.        ],
        [0.5176471 , 0.529411

In [None]:
img.shape

In [35]:
img

array([[[0.60784316, 0.5882353 , 0.5137255 , 1.        ],
        [0.6745098 , 0.64705884, 0.58431375, 1.        ],
        [0.67058825, 0.6313726 , 0.58431375, 1.        ],
        ...,
        [0.5294118 , 0.54509807, 0.5882353 , 1.        ],
        [0.49803922, 0.5058824 , 0.5529412 , 1.        ],
        [0.4745098 , 0.48235294, 0.5294118 , 1.        ]],

       [[0.6313726 , 0.6117647 , 0.5372549 , 1.        ],
        [0.68235296, 0.654902  , 0.5921569 , 1.        ],
        [0.6666667 , 0.627451  , 0.5803922 , 1.        ],
        ...,
        [0.5372549 , 0.5529412 , 0.59607846, 1.        ],
        [0.5137255 , 0.52156866, 0.5686275 , 1.        ],
        [0.49803922, 0.5058824 , 0.5529412 , 1.        ]],

       [[0.6156863 , 0.59607846, 0.52156866, 1.        ],
        [0.68235296, 0.654902  , 0.5921569 , 1.        ],
        [0.6666667 , 0.627451  , 0.5803922 , 1.        ],
        ...,
        [0.53333336, 0.54901963, 0.58431375, 1.        ],
        [0.5176471 , 0.529411

<img src="https://github.com/mrdbourke/zero-to-mastery-ml/blob/master/images/numpy-car-photo.png?raw=1" alt="photo of a car"/>

In [None]:
img2 = imread("/kaggle/input/trialimages/Screenshot 2023-07-12 205757.png")
img.shape

In [None]:
img2[:,:,:3].shape

<img src="https://github.com/mrdbourke/zero-to-mastery-ml/blob/master/images/numpy-dog-photo.png?raw=1" alt="photo a dog"/>