# More on NumPy

## Objectives

- Explore advanced array operations in NumPy, including reshaping, flattening, concatenating, splitting, searching, and sorting arrays.
- Understand how to manipulate array shapes and dimensions for various data processing tasks.
- Learn array concatenation and splitting techniques for merging and dividing datasets.
- Demonstrate how to search for specific elements and sort arrays in NumPy.

## Background

This notebook continues exploring NumPy, focusing on more complex array operations. NumPy's versatility allows for efficient manipulation of array dimensions, seamless dataset integration, and effective data sorting and searching. These operations are crucial for preprocessing and analyzing data in scientific computing, data science, and machine learning applications.

## Datasets Used

The notebook does not reference external datasets; it utilizes synthetic data generated through various NumPy functions. 

## Reshape array

In [1]:
import numpy as np

The reshape() function is used to give a new shape to an array without changing its data. Array to be reshaped. The new shape should be compatible with the original shape.

In [2]:
m = np.array([[1, 2, 3, 4], 
              [5, 6, 7, 8], 
              [9, 10, 11, 12]]) 
print('Original array: shape =',m.shape, ' dim =', m.ndim)
m

Original array: shape = (3, 4)  dim = 2


array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [3]:
m2 = m.reshape(4, 3)
print('shape =',m2.shape, ' dim =', m2.ndim)
m2

shape = (4, 3)  dim = 2


array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [4]:
m3 = m.reshape(2,6)
print('shape =',m3.shape, ' dim =', m3.ndim)
m3

shape = (2, 6)  dim = 2


array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

In [5]:
m4 = m.reshape(1,12)
print('shape =',m4.shape, ' dim =', m4.ndim)
m4

shape = (1, 12)  dim = 2


array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]])

In [6]:
m5 = m.reshape(12,1)
print('shape =',m5.shape, ' dim =', m5.ndim)
m5

shape = (12, 1)  dim = 2


array([[ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [ 7],
       [ 8],
       [ 9],
       [10],
       [11],
       [12]])

Remember the new shape must be compatible with the original shape.

In [7]:
m7 = m.reshape(2,5)         # This will raise an error

ValueError: cannot reshape array of size 12 into shape (2,5)

## Flatten array

In some cases, we need a one-dimensional array. That is, we need a copy of the array collapsed into one dimension.

In [8]:
m6 = m.reshape(12)
print('shape =',m6.shape, ' dim =', m6.ndim)
m6

shape = (12,)  dim = 1


array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

There is an easy way

In [9]:
m7 = m.flatten()
print('shape =',m7.shape, ' dim =', m7.ndim)
m7

shape = (12,)  dim = 1


array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

## Iterating

Iterating means going through elements one by one.

Iterating 1-D array

In [10]:
m6

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [11]:
for x in m6:
    print(x)

1
2
3
4
5
6
7
8
9
10
11
12


Iterating 2-D array

In [12]:
m3

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

In [13]:
for x in m3:
    print(x)

[1 2 3 4 5 6]
[ 7  8  9 10 11 12]


If we iterate on a n-D array it will go through all dimensions one by one.

## Array concatenation

Concatenate a sequence of arrays

### 1-D arrays

In [14]:
a1 = np.array([1, 1, 1])
a2 = np.array([2, 2, 2])

In [15]:
np.concatenate([a1, a2])

array([1, 1, 1, 2, 2, 2])

Remember, NumPy array are not Python lists. If you use the + operator, yo will not concatenate the arrays,  you will get:

In [16]:
print(a1 + a2)

[3 3 3]


### 2-D arrays

2-D arrays have two dimensions. You need to specify the dimension you want to use for concatenation.

axis:{0 or ‘index’, 1 or ‘columns’}

In [17]:
x0=np.zeros((2,2))
x0

array([[0., 0.],
       [0., 0.]])

In [18]:
x1=np.ones((2,2))
x1

array([[1., 1.],
       [1., 1.]])

Default value: axis=0

In [19]:
concat1 = np.concatenate([x0,x1])   
print(concat1,'\n')
print('Shape =',concat1.shape)

[[0. 0.]
 [0. 0.]
 [1. 1.]
 [1. 1.]] 

Shape = (4, 2)


You can include `axis=0`. It will produce the same result:

In [20]:
concat1 = np.concatenate([x0,x1], axis=0)
print(concat1,'\n')
print('Shape =',concat1.shape)

[[0. 0.]
 [0. 0.]
 [1. 1.]
 [1. 1.]] 

Shape = (4, 2)


In [21]:
concat2 = np.concatenate([x0,x1], axis=1)
print(concat2,'\n')
print('Shape =',concat2.shape)

[[0. 0. 1. 1.]
 [0. 0. 1. 1.]] 

Shape = (2, 4)


### Arrays of mixed dimensions

For working with arrays of mixed dimensions, it can be clearer to use:
- `np.vstack` (vertical stack) 
- `np.hstack` (horizontal stack) functions

Using vstack

In [22]:
a1 = np.array([1,2,3])
print('a1 =\n',a1)
a2 = np.array([[4,4,4],[5,5,5]])
print('a2 =\n',a2)
np.vstack([a1,a2])

a1 =
 [1 2 3]
a2 =
 [[4 4 4]
 [5 5 5]]


array([[1, 2, 3],
       [4, 4, 4],
       [5, 5, 5]])

Using hstack

In [23]:
print('a2 =\n',a2)
a3 = np.array([[8],[8]])
print('a3 =\n',a3)
np.hstack([a2,a3])

a2 =
 [[4 4 4]
 [5 5 5]]
a3 =
 [[8]
 [8]]


array([[4, 4, 4, 8],
       [5, 5, 5, 8]])

## Array split

Splitting is the opposite of concatenation

In [24]:
x = np.arange(9)
print('x =',x)
x1, x2, x3 = np.split(x, 3)
print(x1)
print(x2)
print(x3)

x = [0 1 2 3 4 5 6 7 8]
[0 1 2]
[3 4 5]
[6 7 8]


You can decide where to cut the original array to get arrays with different sizes.

Here, we will get 3 arrays of different sizes:
- x1, from 0 to 2-1 (2 is not included)
- x2, from 2 to 6-1 (6 is not included)
- x3, from 6 to the latest element

In [25]:
x1, x2, x3 = np.split(x, [2,6])
print('x1 =', x1)
print('x2 =', x2)
print('x3 =', x3)

x1 = [0 1]
x2 = [2 3 4 5]
x3 = [6 7 8]


In [26]:
g = np.arange(16).reshape((4,4))
g

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

Vertical split

In [27]:
up, low = np.vsplit(g, 2)
print('up =\n',up)
print('low =\n',low)

up =
 [[0 1 2 3]
 [4 5 6 7]]
low =
 [[ 8  9 10 11]
 [12 13 14 15]]


Horizontal split

In [28]:
l, r = np.hsplit(g, 2)
print('left  =\n', l)
print('right =\n',r)

left  =
 [[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
right =
 [[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


## Search

You can search an array for a certain value, and return the indexes that get a match.

In [29]:
ar = np.array([-5, 6, 4, 4, 1, 0, -6])
print('Searching for 1', np.where(ar == 1))

Searching for 1 (array([4], dtype=int64),)


In [30]:
print('Searching for 4', np.where(ar == 4))

Searching for 4 (array([2, 3], dtype=int64),)


In [31]:
print('Searching for 9', np.where(ar == 9))

Searching for 9 (array([], dtype=int64),)


Find the index where the values are odd

Remember % returns the remainder of the floor division

In [32]:
print('Searching for odd', np.where(ar%2 == 1))

Searching for odd (array([0, 4], dtype=int64),)


In [33]:
print('Searching for positive numbers', np.where(ar > 0))

Searching for positive numbers (array([1, 2, 3, 4], dtype=int64),)


## Sort

In [34]:
ar = np.random.randint(10, size=10)
print('Original array:',ar)
print('Sorted array:  ',np.sort(ar))

Original array: [0 6 0 5 4 7 5 8 2 8]
Sorted array:   [0 0 2 4 5 5 6 7 8 8]


This method returns a copy of the array, leaving the original array unchanged.

Sorting a 2-D array

In [35]:
ar2 = np.array([[3, 2, 5], [7, 0, 1]])

print(np.sort(ar2))

[[2 3 5]
 [0 1 7]]


In [36]:
ar2 = np.array([[1,0,2],[5,6,4],[3,-1,6]])
ar2

array([[ 1,  0,  2],
       [ 5,  6,  4],
       [ 3, -1,  6]])

Array elements in sorted order by row

In [37]:
print(np.sort(ar2))

[[ 0  1  2]
 [ 4  5  6]
 [-1  3  6]]


Column-wise sorted array elements

In [38]:
print(np.sort(ar2, axis=0))

[[ 1 -1  2]
 [ 3  0  4]
 [ 5  6  6]]


## Conclusions

Key Takeaways:
- NumPy enables efficient reshaping and flattening of arrays.
- It provides functions for easy concatenation and splitting of datasets.
- Searching and sorting in NumPy facilitate data organization and retrieval.
- Special functions effectively manage mixed-dimensional arrays.
- NumPy is key for handling large datasets in data analysis and scientific computing.

## References

- VanderPlas, J. (2017) Python Data Science Handbook: Essential Tools for Working with Data. USA: O’Reilly Media, Inc. chapter 2.