# Exercise - NumPy Intro - `ndarray` Basics - SOLUTION

Let's practice working with NumPy `ndarray`s. You may find NumPy's [reference documentation](https://numpy.org/doc/stable/reference/arrays.html) useful.

In [1]:
import numpy as np

Create the input data array with the numbers `1` to `500_000_000`.

In [2]:
arr = np.arange(1, 500_000_001)
arr

array([        1,         2,         3, ..., 499999998, 499999999,
       500000000])

Calculate how large the array is in GB with `nbytes`.

In [3]:
arr.nbytes / 1e9

4.0

How many dimensions does the array have?

In [4]:
arr.ndim # `len(arr.shape)` also works, but is longer to type.

1

How many elements does the array have?

In [5]:
arr.size # For 1D array, `arr.shape[0]` also works, but `arr.size` multiplies the size of all dimensions.

500000000

What is the shape of the array?

In [6]:
arr.shape

(500000000,)

Create a new array with `5_000_000` elements containing equally spaced values between `0` to `1000` (inclusive).

In [7]:
arr = np.linspace(0, 1000, 5_000_000, endpoint=True)
arr

array([0.0000000e+00, 2.0000004e-04, 4.0000008e-04, ..., 9.9999960e+02,
       9.9999980e+02, 1.0000000e+03])

Create a random array that is `10_000` by `5_000`.

In [8]:
arr = np.random.rand(10_000, 5_000)
arr

array([[0.47874424, 0.20358231, 0.02455485, ..., 0.51265253, 0.56159225,
        0.48772361],
       [0.63589186, 0.73143007, 0.79031565, ..., 0.94718263, 0.57647135,
        0.59946376],
       [0.26044923, 0.55745457, 0.25562921, ..., 0.03955561, 0.56221741,
        0.13027776],
       ...,
       [0.30329327, 0.17706943, 0.58210662, ..., 0.55591102, 0.4307622 ,
        0.29244399],
       [0.56156033, 0.50636521, 0.04072725, ..., 0.23903848, 0.35491201,
        0.62274408],
       [0.06134186, 0.60138429, 0.55531572, ..., 0.40086822, 0.19394324,
        0.06455707]])

Sort that array.

In [9]:
arr = np.sort(arr)
arr

array([[4.99030364e-05, 2.86984725e-04, 3.18596066e-04, ...,
        9.98660673e-01, 9.99031975e-01, 9.99757980e-01],
       [1.79366466e-04, 9.69908190e-04, 1.25822951e-03, ...,
        9.99022404e-01, 9.99292393e-01, 9.99988173e-01],
       [2.89258301e-04, 5.15617330e-04, 6.58856451e-04, ...,
        9.98459866e-01, 9.99097697e-01, 9.99530669e-01],
       ...,
       [2.92109349e-04, 5.41142713e-04, 1.33981200e-03, ...,
        9.99749901e-01, 9.99919843e-01, 9.99970247e-01],
       [9.49118403e-05, 5.76264566e-04, 7.49589571e-04, ...,
        9.99181024e-01, 9.99755446e-01, 9.99899431e-01],
       [2.43817661e-04, 3.21013669e-04, 5.35478917e-04, ...,
        9.99387768e-01, 9.99612477e-01, 9.99674279e-01]])

Reshape the array to have the last dimension of length `5`.

In [10]:
arr = arr.reshape((-1, 5))
# -1 will infer the size of that dimension from the rest.  Would also accept: arr.reshape((10_000_000, 5))
arr

array([[4.99030364e-05, 2.86984725e-04, 3.18596066e-04, 3.85289129e-04,
        7.26262284e-04],
       [8.93863574e-04, 9.69600656e-04, 9.99128170e-04, 1.01126102e-03,
        1.16957942e-03],
       [1.38175632e-03, 1.46086187e-03, 1.48947347e-03, 1.68783114e-03,
        2.08555933e-03],
       ...,
       [9.97567090e-01, 9.97571545e-01, 9.97690733e-01, 9.97865478e-01,
        9.97983925e-01],
       [9.98144152e-01, 9.98516738e-01, 9.98521903e-01, 9.98622870e-01,
        9.98906715e-01],
       [9.99008691e-01, 9.99283477e-01, 9.99387768e-01, 9.99612477e-01,
        9.99674279e-01]])

Find the sum of each row. Rows are axis 0, but the sum is being applied across columns, which are axis 1.

In [11]:
arr_sum = np.sum(arr, axis=1) # You could also write `arr.sum(axis=1)`.
arr_sum

array([1.76703524e-03, 5.04343284e-03, 8.10548213e-03, ...,
       4.98867877e+00, 4.99271238e+00, 4.99696669e+00])

Normalize each row of the original random array by dividing by the sum you just computed using broadcasting.

In [12]:
arr_normalized = arr / arr_sum[:, np.newaxis]
arr_normalized

array([[0.02824111, 0.1624103 , 0.18029978, 0.2180427 , 0.41100611],
       [0.17723317, 0.19225014, 0.19810478, 0.20051046, 0.23190146],
       [0.17047182, 0.18023134, 0.18376124, 0.20823328, 0.25730232],
       ...,
       [0.19996619, 0.19996708, 0.19999098, 0.200026  , 0.20004975],
       [0.19992022, 0.19999485, 0.19999588, 0.2000161 , 0.20007295],
       [0.19992302, 0.19997801, 0.19999889, 0.20004385, 0.20005622]])

Prove that your normalized array is actually normalized by checking that every row sums to 1.

In [13]:
np.testing.assert_allclose(np.sum(arr_normalized, axis=1), 1.0)