<details>
<summary>Meta Data</summary>

title: Numpy Crash Course
author: Juma Shafara
date: "2024-01"
date-modified: "2025-12-27"
description: This crash course will teach you the basics and intermediate concepts of the Numpy Library
keywords: [numpy, data types, array mathematics, aggregate functions, Subsetting, Slicing, Indexing]

</details>


![Photo by DATAIDEA](../../assets/banner4.png)

## Objective

In this lesson, you will learn all you need to know to get moving with numpy. ie:

<ul class="cursored-list">
<li><a href="#what-is-numpy"><i class="bi bi-cursor"></i> What is Numpy</a></li>
<li><a href="#inspecting-our-arrays"><i class="bi bi-cursor"></i> Inspecting Numpy arrays</a></li>
<li><a href="#array-mathematics"><i class="bi bi-cursor"></i> Performing array mathematics</a></li>
<li><a href="#subsetting-slicing-and-indexing"><i class="bi bi-cursor"></i> Subsetting, Slicing and Indexing arrays</a></li>
<li><a href="#array-manipulation"><i class="bi bi-cursor"></i> Array manipulation</a></li>
</ul>

<!-- Newsletter -->
<div class="newsletter">
<div class="newsletter-heading">
<h4><i class="bi bi-info-circle-fill"></i> Don't Miss Any Updates!</h4>
</div>
<div class="newsletter-body">
<p>
Before we continue, we have a humble request, to be among the first to hear about future updates of the course materials, simply enter your email below, follow us on <a href="https://x.com/dataideaorg"><i class="bi bi-twitter-x"></i>
(formally Twitter)</a>, or subscribe to our <a href="https://www.youtube.com/@dataideaorg"><i class="bi bi-youtube"></i> YouTube channel</a>.
</p>
<iframe class="newsletter-frame" src="https://embeds.beehiiv.com/5fc7c425-9c7e-4e08-a514-ad6c22beee74?slim=true" data-test-id="beehiiv-embed" height="52" frameborder="0" scrolling="no">
</iframe>
</div>
</div>

## What is Numpy
- Numpy is a python package used for scientific computing
- Numpy provides arrays which are greater and faster alternatives to traditional python lists. An array is a group of elements of the same data type
- A standard numpy array is required to have elements of the same data type.

## Why NumPy?
NumPy is the foundation of most Python data libraries such as:

- Pandas
- SciPy
- Scikit-learn
- TensorFlow / PyTorch

It is fast because:

- It uses C under the hood
- It avoids Python loops using vectorization

In [48]:
# Python list
py_list = [1, 2, 3]
py_list * 2   # duplicates list

[1, 2, 3, 1, 2, 3]

In [49]:
# NumPy array
np_arr = np.array([1, 2, 3])
np_arr * 2    # element-wise multiplication

array([2, 4, 6])

In [50]:
## Uncomment and run this cell to install numpy
# !pip install numpy

## Inspecting our arrays

To use numpy, we'll first import it (you must have it installed for this to work)

In [51]:
 # import numpy module
import numpy as np

We can check the version we'll be using by using the `__version__` method

In [52]:
# checking the numpy version
np.__version__

'2.3.4'

Numpy gives us a more powerful Python List alternative data structure called a Numpy ndarray, we creat it using the `array()` from numpy

In [53]:
# creating a numpy array
num_arr = np.array([1, 2, 3, 4])

The object that's created by `array()` is called `ndarray`.
This can be shown by checking the type of the object using `type()`

In [54]:
# Checking type of object
type(num_arr)

numpy.ndarray

#### Data Types
The table below describes some of the most common data types we use in numpy

Data Type | Description
---------|------------
`int64` | Signed 64-bit integer
`float64` | Double-precision floating point
`complex128` | Complex numbers
`bool` | Boolean values
`object` | Python objects
`str_` | Fixed-length strings

**Dimensions:**

A dimension is a direction or axis along which data is organized in an array. We find the the number of dimensions in our array using the `ndim` attribute. A dimension in NumPy refers to the number of axes or levels of depth in an array, determining its shape (e.g., 2D for a matrix, 3D for a tensor).

In [55]:
# finding the number of dimensions
num_arr.ndim

1

**Shape:**

Refers to a tuple describing the size of each dimension of an array. We can check the shape of a numpy array by using the `shape` attribute as demonstrated below.

In [56]:
# shape of array
num_arr.shape

(4,)

**Length**

In NumPy, the length refers to the size of the first axis (dimension) of an array, which is the number of elements along that axis. We can use the `len()` method to find the length.

In [57]:
# number of elements in array
len(num_arr)

4

**Size**

Size in NumPy refers to the total number of elements in an array across all dimensions. We can use the size of a numpy array using the `size` attribute

In [58]:
# another way to get the number of elements
num_arr.size

4

**Data Type**(`dtype`)

`dtype` in NumPy refers to the data type of the elements stored in an array, such as `int`, `float`, `bool`, etc.

In [59]:
# finding data type of array elements
print(num_arr.dtype.name)

int64


**Converting Array Data Types**

We cas use `astype()` method to convert an array from one type to another.

In [60]:
# converting an array
float_arr = np.array([1.2, 3.5, 7.0])

# use astype() to convert to a specific
int_arr = float_arr.astype(int)

print(f'Array: {float_arr}, Data Type: {float_arr.dtype}')
print(f'Array: {int_arr}, Data Type: {int_arr.dtype}')

Array: [1.2 3.5 7. ], Data Type: float64
Array: [1 3 7], Data Type: int64


### Ask for help

In [61]:
np.info(np.ndarray.shape)

Tuple of array dimensions.

The shape property is usually used to get the current shape of an array,
but may also be used to reshape the array in-place by assigning a tuple of
array dimensions to it.  As with `numpy.reshape`, one of the new shape
dimensions can be -1, in which case its value is inferred from the size of
the array and the remaining dimensions. Reshaping an array in-place will
fail if a copy is required.


    Setting ``arr.shape`` is discouraged and may be deprecated in the
    future.  Using `ndarray.reshape` is the preferred approach.

Examples
--------
>>> import numpy as np
>>> x = np.array([1, 2, 3, 4])
>>> x.shape
(4,)
>>> y = np.zeros((2, 3, 4))
>>> y.shape
(2, 3, 4)
>>> y.shape = (3, 8)
>>> y
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])
>>> y.shape = (3, 6)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: cannot reshape array

In [62]:
?np.ndarray.shape

### Quick Array Inspection Cheatsheet

Attribute | Meaning
---------|--------
`ndim` | Number of dimensions
`shape` | Size along each dimension
`size` | Total number of elements
`dtype` | Data type of elements


In [63]:
arr = np.array([[1, 2, 3], [4, 5, 6]])

print('Dimensions:', arr.ndim)
print('Shape:', arr.shape)
print('Size:', arr.size)
print('Dtype:', arr.dtype)

Dimensions: 2
Shape: (2, 3)
Size: 6
Dtype: int64


## Broadcasting

Broadcasting allows NumPy to perform operations on arrays of different shapes.

In [64]:
arr = np.array([1, 2, 3])
arr + 10

array([11, 12, 13])

Explanation:

- Scalar is stretched to match array shape
- No extra memory used

_This explains why NumPy feels magical._

## Array mathematics

Numpy has out of the box tools to help us perform some import mathematical operations

#### Arithmetic Operations

Arithmetic operations in NumPy are element-wise operations like addition, subtraction, multiplication, and division that can be performed directly between arrays or between an array and a scalar.

In [65]:
# creating arrays
array1 = np.array([1, 4, 6, 7])
array2 = np.array([3, 5, 3, 1])

In [66]:
# subtract
difference1 = array2 - array1
print('difference1 =', difference1)

# another way
difference2 = np.subtract(array2, array1)
print('difference2 =', difference2)

difference1 = [ 2  1 -3 -6]
difference2 = [ 2  1 -3 -6]


As we may notice, numpy does element-wise operations for ordinary arithmetic operations

In [67]:
# sum
summation1 = array1 + array2
print('summation1 =', summation1)

# another way
summation2 = np.add(array1, array2)
print('summation2 =', summation2)

summation1 = [4 9 9 8]
summation2 = [4 9 9 8]


#### Trigonometric operations

Trigonometric operations in NumPy are functions like `np.sin()`, `np.cos()`, and `np.tan()` that perform element-wise trigonometric calculations on arrays.

In [68]:
# sin
print('sin(array1) =', np.sin(array1))
# cos
print('cos(array1) =', np.cos(array1))
# log
print('log(array1) =', np.log(array1))

sin(array1) = [ 0.84147098 -0.7568025  -0.2794155   0.6569866 ]
cos(array1) = [ 0.54030231 -0.65364362  0.96017029  0.75390225]
log(array1) = [0.         1.38629436 1.79175947 1.94591015]


In [69]:
# dot product
array1.dot(array2)

np.int64(48)

The `dot()` function:
- Performs a dot product for 1D arrays
- Performs matrix multiplication for 2D arrays

**Research:**

another way to dot matrices (arrays)

#### Comparison

In NumPy, comparison operators perform element-wise comparisons on arrays and return boolean arrays of the same shape, where each element indicates True or False based on the corresponding element-wise comparison.

In [70]:
array1 == array2

array([False, False, False, False])

In [71]:
array1 > 3

array([False,  True,  True,  True])

#### Aggregate functions

NumPy provides several aggregate functions that perform operations across the elements of an array and return a single scalar value.

In [72]:
# array sum
array_sum = array1.sum(axis=0)
print('Sum: ', array_sum)

Sum:  18


In [73]:
# average value
mean = array1.mean()
print('Mean: ', mean)

Mean:  4.5


In [74]:
# minimum value
minimum = array1.min()
print('Minimum: ', minimum)

Minimum:  1


In [75]:
# maximum value
maximum = array1.max()
print('Maximum: ', maximum)

Maximum:  7


In [76]:
# correlation coefficient
correlation_coefficient = np.corrcoef(array1, array2)
print('Correlation Coefficient: ', correlation_coefficient)

Correlation Coefficient:  [[ 1.         -0.46291005]
 [-0.46291005  1.        ]]


In [77]:
# standard deviation
standard_deviation = np.std(array1)
print('Standard Deviation: ', standard_deviation)

Standard Deviation:  2.29128784747792


**Research:**

<i class="bi bi-cursor"></i> copying arrays (you might meet `view()`, `copy()`) 

## Subsetting, Slicing and Indexing
<i class="bi bi-cursor"></i> Indexing is the technique we use to access individual elements in an array. 0 represents the first element, 1 the represents second element and so on.

<i class="bi bi-cursor"></i> Slicing is used to access elements of an array using a range of two indexes. The first index is the start of the range while the second index is the end of the range. The indexes are separated by a colon ie `[start:end]`

In [78]:
# Creating numpy arrays of different dimension
# 1D array
arr1 = np.array([1, 4, 6, 7])
print('Array1 (1D): \n', arr1)

Array1 (1D): 
 [1 4 6 7]


In [79]:
# 2D array
arr2 = np.array([[1.5, 2, 3], [4, 5, 6]])
print('Array2 (2D): \n', arr2)

Array2 (2D): 
 [[1.5 2.  3. ]
 [4.  5.  6. ]]


In [80]:
#3D array
arr3 = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
                 [[10, 11, 12], [13, 14, 15], [16, 17, 18]]])
print('Array3 (3D): \n', arr3)

Array3 (3D): 
 [[[ 1  2  3]
  [ 4  5  6]
  [ 7  8  9]]

 [[10 11 12]
  [13 14 15]
  [16 17 18]]]


In [81]:
# find the dimensions of an array
print('Array1 (1D):', arr1.shape)
print('Array2 (2D):', arr2.shape)
print('Array3 (3D):', arr3.shape)

Array1 (1D): (4,)
Array2 (2D): (2, 3)
Array3 (3D): (2, 3, 3)


#### Indexing

In [82]:
# accessing items in a 1D array
arr1[2]

np.int64(6)

In [83]:
# accessing items in 2D array
arr2[1, 2]

np.float64(6.0)

In [84]:
# accessing in a 3D array
arr3[0, 1, 2]

np.int64(6)

#### slicing

In [85]:
# slicing 1D array
arr1[0:3]

array([1, 4, 6])

In [86]:
# slicing a 2D array
arr2[1, 1:]
# row index = 1
# column index from 1 to end

array([5., 6.])

In [87]:
# slicing a 3D array
first = arr3[0, 2]
second = arr3[1, 0]

np.concatenate((first, second))

array([ 7,  8,  9, 10, 11, 12])

#### Boolean Indexing

Boolean indexing in NumPy allows you to select elements from an array based on a boolean condition or a boolean array of the same shape. The elements corresponding to True values in the boolean array/condition are selected, while those corresponding to False are discarded. 

In [88]:
# boolean indexing
arr1[arr1 < 5]

array([1, 4])

**Research:**

Fancy Indexing

## Array manipulation

NumPy provides a wide range of functions that allow you to change the shape, dimensions, and structure of arrays to suit your needs

In [89]:
print(arr2)

[[1.5 2.  3. ]
 [4.  5.  6. ]]


In [90]:
# transpose
arr2_transpose1 = np.transpose(arr2)
print('Transpose1: \n', arr2_transpose1)

Transpose1: 
 [[1.5 4. ]
 [2.  5. ]
 [3.  6. ]]


In [91]:
# another way
arr2_transpose2 = arr2.T
print('Transpose2: \n', arr2_transpose2)

Transpose2: 
 [[1.5 4. ]
 [2.  5. ]
 [3.  6. ]]


In [92]:
# combining arrays
first = arr3[0, 2]
second = arr3[1, 0]

np.concatenate((first, second))

array([ 7,  8,  9, 10, 11, 12])

In [93]:
test_arr1 = np.array([[7, 8, 9], [10, 11, 12]])
test_arr2 = np.array([[1, 2, 3], [4, 5, 6]])

np.concatenate((test_arr1, test_arr2), axis=1)

array([[ 7,  8,  9,  1,  2,  3],
       [10, 11, 12,  4,  5,  6]])

### Homework

1. Create an array of 10 numbers
2. Remove the last element
3. Reshape it into a 3x3 matrix
4. Find the mean of each column

**Research:**
    
Adding/Removing Elements
- `resize()`
- `append()`
- `insert()`
- `delete()`

Changing array shape
- `ravel()`
- `reshape()`

In [27]:
#stacking
# np.vstack((a,b))
# np.hstack((a,b))
# np.column_stack((a,b))
# np.c_[a, b]

In [28]:
# splitting arrays
# np.hsplit()
# np.vsplit()

<h2>What's on your mind? Put it in the comments!</h2>
<script src="https://utteranc.es/client.js"
        repo="dataideaorg/dataidea-science"
        issue-term="pathname"
        theme="github-dark"
        crossorigin="anonymous"
        async>
</script>