![*INTERTECHNICA - SOLON EDUCATIONAL PROGRAMS - TECHNOLOGY LINE*](https://solon.intertechnica.com/assets/IntertechnicaSolonEducationalPrograms-TechnologyLine.png)

# Data Manipulation with Python - The NumPy Library - Reshaping and Broadcasting

*Basic initialization of the workspace.*

In [1]:
!python -m pip install numpy
import numpy as np
print ("NumPy installed at version: {}".format(np.__version__))

NumPy installed at version: 1.19.5


## 1. Array reshaping

Each array in NumPy is defined by the **number of its dimensions and the number of elements associated to each dimension**. Therefore, a NumPy array can be described by a tuple having the number of elements equal to the number of dimensions of that array, each element of the array having a value equal to the number of elements for each dimension.

In Numpy, this information is obtained by accessing the **shape** property.



First of all, let's construct an unidimensional and a bidimensional array:

In [2]:
# constructing the unidimensional array
x_one_dimensional = np.array([ 1,   2,   3,   4,   5,   6,   7,   8,   9,  10])

# constucting the two-dimensional array
x_two_dimensional = np.array(
      [[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10],
       [ 11,  12,  13,  14,  15,  16,  17,  18,  19,  20],
       [ 21,  22,  23,  24,  25,  26,  27,  28,  29,  30],
       [ 31,  32,  33,  34,  35,  36,  37,  38,  39,  40],
       [ 41,  42,  43,  44,  45,  46,  47,  48,  49,  50],
       [ 51,  52,  53,  54,  55,  56,  57,  58,  59,  60],
       [ 61,  62,  63,  64,  65,  66,  67,  68,  69,  70],
       [ 71,  72,  73,  74,  75,  76,  77,  78,  79,  80],
       [ 81,  82,  83,  84,  85,  86,  87,  88,  89,  90],
       [ 91,  92,  93,  94,  95,  96,  97,  98,  99, 100]])



Let's inspect the shapes of these arrays:

In [3]:
# printing the one dimensional's array shape
print("The shape of the uni-dimensional array is {}:".format(
    x_one_dimensional.shape
))

# printing the two dimensional's array shape
print("The shape of the bi-dimensional array is {}:".format(
    x_two_dimensional.shape
))

The shape of the uni-dimensional array is (10,):
The shape of the bi-dimensional array is (10, 10):


NumPy allows changing the shape of the array - assuming that the **new shape is compatible with the old one**. This can be done via the **reshape** method which takes as argument a tuple specifying the new shape. 

In [4]:
# reshape a one-dimensional array so it becomes two-dimensional
one_dimensional_new_shape = (5,2)
x_one_dimensional_reshaped = x_one_dimensional.reshape(one_dimensional_new_shape)
print ("The one dimensional array reshaped on the new shape {} is: \n {}".format(
    one_dimensional_new_shape,
    x_one_dimensional_reshaped
))

The one dimensional array reshaped on the new shape (5, 2) is: 
 [[ 1  2]
 [ 3  4]
 [ 5  6]
 [ 7  8]
 [ 9 10]]


We can do the same thing for multi-dimensional arrays as well:

In [5]:
# reshape a two-dimensional array so it becomes three-dimensional
two_dimensional_new_shape = (2, 2, 25)
x_two_dimensional_reshaped = x_two_dimensional.reshape(two_dimensional_new_shape)
print ("The 2-dimensional array reshaped on the new shape {} is: \n {}".format(
    two_dimensional_new_shape,
    x_two_dimensional_reshaped
))

The 2-dimensional array reshaped on the new shape (2, 2, 25) is: 
 [[[  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
    18  19  20  21  22  23  24  25]
  [ 26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42
    43  44  45  46  47  48  49  50]]

 [[ 51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67
    68  69  70  71  72  73  74  75]
  [ 76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92
    93  94  95  96  97  98  99 100]]]


It is possible to specify -1 for a single element of the tuple, in this case the number of elements for the dimension will be **inferred from the number of elements in the other dimensions**.

In [6]:
# reshaping a two dimensional array with a new shape that has an explicit
# numbers of elements in one dimension and the other dimension will be 
# calculated automatically
two_dimensional_new_shape = (-1,25)
x_two_dimensional_reshaped = x_two_dimensional.reshape(two_dimensional_new_shape)
print ("The 2-dimensional array reshaped on {} is: \n {} \n with the new shape {}".format(
    two_dimensional_new_shape,
    x_two_dimensional_reshaped,
    x_two_dimensional_reshaped.shape
))

The 2-dimensional array reshaped on (-1, 25) is: 
 [[  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
   19  20  21  22  23  24  25]
 [ 26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43
   44  45  46  47  48  49  50]
 [ 51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68
   69  70  71  72  73  74  75]
 [ 76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93
   94  95  96  97  98  99 100]] 
 with the new shape (4, 25)


It is possible to **increase indefinitely the dimensionality of an array** by using tuples with a larger dimension count and value 1 for the elements in the additional dimensions.

In [7]:
# increasing the shape of an array three times without adding additional 
# elements
expaning_new_shape = (1,1,10)
x_one_dimensional_expanded = x_one_dimensional.reshape(expaning_new_shape)
print ("The one-dimensional array reshaped as {} is: \n {}".format(
    expaning_new_shape,
    x_one_dimensional_expanded
))


The one-dimensional array reshaped as (1, 1, 10) is: 
 [[[ 1  2  3  4  5  6  7  8  9 10]]]


Conversely, we can reduce the dimensionality of an array by **omitting the last dimension(s) value**. Of course, the compatibility between the number of the elements in the array and the shape must be maintained.

In [8]:
# decreasing the shape of an array by omitting the last dimension value 
reducing_new_shape = (10,)
x_one_dimensional_reduced = x_one_dimensional_expanded.reshape(reducing_new_shape)
print ("The one-dimensional array reshaped as {} is: \n {}".format(
    reducing_new_shape,
    x_one_dimensional_reduced
))


The one-dimensional array reshaped as (10,) is: 
 [ 1  2  3  4  5  6  7  8  9 10]


In order to prevent unnecessary shape complexity, NumPy offers the function **squeeze** that removes all shapes of length one.
This is often neceessary when dealing with data having redundat formats or dimensions.

In [9]:
# removing all shapes of lengt one via np.squeeze 
x_one_dimensional_squeezed = np.squeeze(x_one_dimensional_expanded)
print ("The array \n {} \nwith shape {} can be squeezed to the array \n {} \n with shape {}".format(
    x_one_dimensional_expanded,
    x_one_dimensional_expanded.shape,
    x_one_dimensional_squeezed,
    x_one_dimensional_squeezed.shape,
))

The array 
 [[[ 1  2  3  4  5  6  7  8  9 10]]] 
with shape (1, 1, 10) can be squeezed to the array 
 [ 1  2  3  4  5  6  7  8  9 10] 
 with shape (10,)


## 2. Array Broadcasting

Numpy facilitates operations between arrays of different shapes, by reshaping them automatically as needed. 

This mechanism is called **array broadcasting**.

### 2.1 Broadcasting with scalars

One of the most used array broadcasting operation is the **multiplication between an array and a scalar**.

In [10]:
# multiplication between a scalar and an array
x_1d = np.array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
scalar = 2
x_1d_multiplied = x_1d * scalar
print(
  "The result of multiplication between array \n {} and \n scalar {} is : \n {} ".format(
    x_1d,
    scalar,
    x_1d_multiplied
  )
)

The result of multiplication between array 
 [ 1  2  3  4  5  6  7  8  9 10] and 
 scalar 2 is : 
 [ 2  4  6  8 10 12 14 16 18 20] 


In the example above the scalar with the value 2 was broadcasted to the array [2, 2, 2, 2, 2, 2, 2, 2, 2, 2] so its dimensionality matches the one of the target array. This is equivalent to the code bellow:

In [11]:
# the same operation can be performed without broadcasting
scalar_1_broadcasted = np.full(10, 2)
x_1d_multiplied_explicit_broadcast = x_1d * scalar_1_broadcasted
print(x_1d_multiplied_explicit_broadcast)

[ 2  4  6  8 10 12 14 16 18 20]


The broadcasting mechanism can be used with multi-dimensional arrays, including more complex operations (such as exponential operations):

In [12]:
# using broadcasting with multi-dimensional arrays
x_2d = np.array(
      [[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10],
       [ 11,  12,  13,  14,  15,  16,  17,  18,  19,  20],
       [ 21,  22,  23,  24,  25,  26,  27,  28,  29,  30],
       [ 31,  32,  33,  34,  35,  36,  37,  38,  39,  40],
       [ 41,  42,  43,  44,  45,  46,  47,  48,  49,  50],
       [ 51,  52,  53,  54,  55,  56,  57,  58,  59,  60],
       [ 61,  62,  63,  64,  65,  66,  67,  68,  69,  70],
       [ 71,  72,  73,  74,  75,  76,  77,  78,  79,  80],
       [ 81,  82,  83,  84,  85,  86,  87,  88,  89,  90],
       [ 91,  92,  93,  94,  95,  96,  97,  98,  99, 100]])
scalar = 2
x_2d_power = x_2d ** scalar

print(
  "The elements of array: \n {} \n raised to the power of {} is: \n {} ".format(
    x_2d,
    scalar,
    x_2d_power
  )
)

The elements of array: 
 [[  1   2   3   4   5   6   7   8   9  10]
 [ 11  12  13  14  15  16  17  18  19  20]
 [ 21  22  23  24  25  26  27  28  29  30]
 [ 31  32  33  34  35  36  37  38  39  40]
 [ 41  42  43  44  45  46  47  48  49  50]
 [ 51  52  53  54  55  56  57  58  59  60]
 [ 61  62  63  64  65  66  67  68  69  70]
 [ 71  72  73  74  75  76  77  78  79  80]
 [ 81  82  83  84  85  86  87  88  89  90]
 [ 91  92  93  94  95  96  97  98  99 100]] 
 raised to the power of 2 is: 
 [[    1     4     9    16    25    36    49    64    81   100]
 [  121   144   169   196   225   256   289   324   361   400]
 [  441   484   529   576   625   676   729   784   841   900]
 [  961  1024  1089  1156  1225  1296  1369  1444  1521  1600]
 [ 1681  1764  1849  1936  2025  2116  2209  2304  2401  2500]
 [ 2601  2704  2809  2916  3025  3136  3249  3364  3481  3600]
 [ 3721  3844  3969  4096  4225  4356  4489  4624  4761  4900]
 [ 5041  5184  5329  5476  5625  5776  5929  6084  6241  6400]
 [ 6561

### 2.2 Broadcasting with arrays

NumPy works also with arrays, not only with scalars.

The broadcasting in NumPy follows a strict set of rules to determine the interaction between the two arrays:

* **Rule 1**: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
* **Rule 2**: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
* **Rule 3**: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

Let's consider the following case of arrays with different dimensions:

In [13]:
# create a one dimensional array
lh_array = np.array([1,1])

# create a two dimensional array
rh_array = np.array([[2,3], [4,5]])

The left hand array will be **broadcasted** to the right hand array by duplicating the first row.

![Array Broadcasting](https://raw.githubusercontent.com/INTERTECHNICA-BUSINESS-SOLUTIONS-SRL/CourseDataManipulationWithPython/main/Module%202%20-%20The%20Numpy%20Library/Session%202%20-%20NumPy%20Basics/images/broadcasting.png)

In [14]:
# display the result of adding the two arrays
result_array = rh_array + lh_array
print("The shape of result array is: {}".format(result_array.shape))
print("The shape of result array is: \n {}".format(result_array))

The shape of result array is: (2, 2)
The shape of result array is: 
 [[3 4]
 [5 6]]
