# The `numpy` module

## What is the `numpy` module?

A collection of many functions is called a *module*. One of the most useful modules in Python is called *numpy* (**num**erical **Py**thon) – it contains many functions to deal with numerical programming. This is technically an extension to the Core Python functionality we've been focussing on so far but now comes as standard in most Python installations.

The `numpy` module builds on the core functionality but also adds additional features including:

 - It is *performant* which means it is well optimised
 - It offers additional *numerical computing tools*
 - It adds an additional object called an *n-dimensional array*

### Numpy arrays vs lists
One thing we can use the `numpy` module for is to create a new object called a *numpy array*. This is another data structure, in addition to the in-built Python types we've been learning about, and is similiar to a list. 

<table style="font-size:0.95em;font-family:Arial, Helvetica, sans-serif;border-spacing:5px;border-collapse:initial">
    <tr>
        <th style="background-color:lavender">
            Numpy arrays
        <td style="width:50%;text-align:left;vertical-align:top">
            Numpy module (and arrays) are a Python extension (but often come as standard)<br>
            <br>
            Ordered<br>
            <br>
            Mutable<br>
            <br>
            Less flexible<br>
             - One data type per array<br>
            <br>
            Allows implicit element-wise operations<br>
            <br>
            Generally quicker (optimised)<br>
            More memory efficient
        <th style="background-color:linen">
            Lists
        <td style="width:50%;text-align:left;vertical-align:top">
            Lists are part of Python in-built functionality<br>
            <br>
            Ordered<br>
            <br>
            Mutable<br>
            <br>
            Very flexible<br>
             - All types in any list<br>
            <br>
            Needs explicit element-wise operations<br>
            <br>
            Generally slower performance<br>
            Less memory efficient
    </tr>
</table>

When using these objects, `list` objects are highly flexible, in both content and shape whereas `numpy.array` objects are much more strict and require every item to be the same type and often work best when they have a consistent shape (e.g. 2x3 grid).

### Numpy arrays

`numpy.array` objects are mutable, ordered container objects but must contain a specific object type and have n-dimensional shape.

To use the `numpy` module we first need to *import* it.

In [2]:
import numpy as np

The `as` part of this import statement gives us a shorthand to use in the code when we want to access numpy, in this case `np`. This is the convention most often used for the numpy module. `import` statements themselves are the way we access additional Python modules such as `numpy` or `matplotlib`. 

One way to create a `numpy.array` is from a `list`:

In [3]:
list1 = [1.,1.,2.,3.,5.,8.]
arr1 = np.array(list1)

where we need the `np.` at the start of the function to tell python to access the `numpy` module.

We can also index and slice `numpy.arrays` in a similar way to other iterable objects (i.e. objects with length like `lists`):

In [4]:
print(arr1[0])
print(arr1[2:-1])

1.0
[2. 3. 5.]


And a `numpy.array` has an additional properties (*attributes*) called *dtype* which tells us what is contained within the array and *shape* which tells us the dimensions of the array.

In [5]:
print(arr1.dtype)
print(arr1.shape)

float64
(6,)


### Element-wise operations

The `numpy` module itself also provides some additional tools and syntax to complete simple operations more succinctly. For instance, we've shown before one way to act on every item in a `list` using a `for` loop:

In [6]:
list2 = []
for item in list1:
    list2.append(item*4)
print(list2)

[4.0, 4.0, 8.0, 12.0, 20.0, 32.0]


There is actually a short hand for creating a new list using a `for` loop for very simple operations called a *list comprehension*.

In [7]:
list2 = [item*4 for item in list1]
print(list2)

[4.0, 4.0, 8.0, 12.0, 20.0, 32.0]


But this is still more complex than using a `numpy.array`, where the same operation can be performed using an operator directly on the whole array:

In [8]:
arr2 = arr1*4
print(arr2)

[ 4.  4.  8. 12. 20. 32.]


### Operation speed

For large numbers of elements the time difference between operations using `lists` and `numpy.arrays` can start to be measurable. We can quickly check this my importing the `time` module:

In [9]:
import time
num_range = 100000

In [10]:
time1 = time.time()
list_out = [item*4 for item in range(num_range)]
time2 = time.time()

list_time = time2-time1

In [11]:
time1 = time.time()
arr_out = np.arange(num_range)*4
time2 = time.time()

arr_time = time2-time1

Comparing the two operations we can see that performing this operation with the `list` takes longer than within a `numpy.array` (this is highly variable though):

In [12]:
print(f"Array operation is {list_time/arr_time:.0f} times faster for {num_range:,} numbers")

Array operation is 4 times faster for 100,000 numbers


You may recall, when we first introduced `list` and `dict` objects, we also mentioned other Python objects which were similar but with some differences in functionality (`tuple` and `set` objects).  In Python, as in many languages, there are often many tools which can be used to complete a task and it's up to you to choose the correct tool for the job. Overall, `list` objects may be more appropriate when you need to store a set of strings or if you don't know the number of elements in advance (appending to a `list` is faster than appending to an `numpy.array` due to the way the data is stored in memory). Whereas `numpy.array` objects would be more appropriate when performance is a factor or for simpler numerical operations.

## Working with `numpy`

To use the `numpy` module we always need to start by using an import statement. In this case we import the `numpy` module and use the shorthand `np`:

In [13]:
import numpy as np

In [14]:
arr1 = np.array([1.,1.,2.,3.,5.,8.])

We've seen that we can apply operators directly to a `numpy.array`:

In [15]:
arr1*3/2 + 5

array([ 6.5,  6.5,  8. ,  9.5, 12.5, 17. ])

Similarly you can use additional functions provided by the `numpy` module to do something to each element in the array. For example you can apply a square root:

In [16]:
print(np.sqrt(arr1))

[1.         1.         1.41421356 1.73205081 2.23606798 2.82842712]


Or perform a reductive operation such as calculating the mean of all the elements:

In [17]:
print(np.mean(arr1))

3.3333333333333335


We can also apply mathematical operations over the whole array. For instance we can look at the `np.cos` function which produces applies the cosine function element-wise:

In [18]:
np.cos?

[31mCall signature:[39m  np.cos(*args, **kwargs)
[31mType:[39m            ufunc
[31mString form:[39m     <ufunc 'cos'>
[31mFile:[39m            ~/.pyenv/versions/3.11.1/lib/python3.11/site-packages/numpy/__init__.py
[31mDocstring:[39m      
cos(x, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True[, signature, extobj])

Cosine element-wise.

Parameters
----------
x : array_like
    Input array in radians.
out : ndarray, None, or tuple of ndarray and None, optional
    A location into which the result is stored. If provided, it must have
    a shape that the inputs broadcast to. If not provided or None,
    a freshly-allocated array is returned. A tuple (possible only as a
    keyword argument) must have length equal to the number of outputs.
where : array_like, optional
    This condition is broadcast over the input. At locations where the
    condition is True, the `out` array will be set to the ufunc result.
    Elsewhere, the `out` array wil

The help states that this wants an *array-like* object and wants the input in radians. We can write this as:

In [19]:
print(np.cos(arr1))

[ 0.54030231  0.54030231 -0.41614684 -0.9899925   0.28366219 -0.14550003]


If we look at `arr1` we can see that this has not been updated by the application of these operations - when using this functionality a copy of the array is returned which you can choose to re-assign to the original variable name or create a new variable:

In [20]:
print(arr1)
arr2 = arr1*3/2 + 5
print(arr2)

[1. 1. 2. 3. 5. 8.]
[ 6.5  6.5  8.   9.5 12.5 17. ]


## Element-wise operations on 1D arrays

Element-wise operations in numpy allow you to perform arithmetic or mathematical functions on each corresponding element of arrays. For example, if you have two arrays of the same length, `arr1` and `arr2`, you can add them directly: `arr1 + arr2`. This will produce a new array where each element is the sum of the elements at the same position in the original arrays. Similarly, you can use other operators (`-`, `*`, `/`) or numpy functions (`np.sqrt(arr1)`, `np.cos(arr1)`) to apply operations to each element individually. The arrays must have compatible shapes for these operations.

In [21]:
# Element-wise addition of arr1 and arr2
added = arr1 + arr2
print(added)
# Element-wise subtraction of arr1 and arr2
subtracted = arr1 - arr2
print(subtracted)

[ 7.5  7.5 10.  12.5 17.5 25. ]
[-5.5 -5.5 -6.  -6.5 -7.5 -9. ]


In [22]:
# Element-wise multiplication and division of arr1 and arr2
multiplied = arr1 * arr2
divided = arr1 / arr2

print("Element-wise multiplication:", multiplied)
print("Element-wise division:", divided)

Element-wise multiplication: [  6.5   6.5  16.   28.5  62.5 136. ]
Element-wise division: [0.15384615 0.15384615 0.25       0.31578947 0.4        0.47058824]


When 1D arrays have different lengths, you need to be careful about the operations you perform. **Element-wise operations**: Operations such as `arr1 + arr3` or `arr1 * arr3` require arrays to have the same length or compatible shapes. If the lengths differ, numpy will raise a `ValueError` due to shape mismatch.


In [23]:
# This will raise a ValueError because arr1 and arr3 have different lengths
result = arr1 + arr3


NameError: name 'arr3' is not defined

## Basic operations on 1D arrays

Summing all elements in a 1D numpy array can be done with `np.sum(arr1)`.

For cumulative summing, use `np.cumsum(arr1)`, which returns an array where each element is the sum of all previous elements.


Sorting is performed with `np.sort(arr1)`, which returns a sorted copy of the array.
 
To concatenate two arrays, use `np.concatenate([arr1, arr2])`. This joins the arrays end-to-end, creating a new array containing all elements from both arrays in order. Concatenation is useful for combining datasets or extending arrays.
  
To find unique elements, use `np.unique(arr1)`, which returns an array of the distinct values in `arr1`. These operations are efficient and commonly used for data analysis.

In [24]:
# Summing all elements in arr1
total_sum = np.sum(arr1)
print("Sum of arr1:", total_sum)

# Cumulative summing of arr1
cumulative_sum = np.cumsum(arr1)
print("Cumulative sum of arr1:", cumulative_sum)

# Sorting arr1
sorted_arr = np.sort(arr1)
print("Sorted arr1:", sorted_arr)

# Combine arr1 and arr2 into a single array
combined = np.concatenate([arr1, arr2])
print("Combined array:", combined)

# Finding unique elements in combined
unique_elements = np.unique(combined)
print("Unique elements in combined:", unique_elements)

Sum of arr1: 20.0
Cumulative sum of arr1: [ 1.  2.  4.  7. 12. 20.]
Sorted arr1: [1. 1. 2. 3. 5. 8.]
Combined array: [ 1.   1.   2.   3.   5.   8.   6.5  6.5  8.   9.5 12.5 17. ]
Unique elements in combined: [ 1.   2.   3.   5.   6.5  8.   9.5 12.5 17. ]
