## Reshape and view

> __`.reshape(x, y, z)` will change the way we access our array__

It is important to note that:
- reshape __USUALLY DOES NOT COPY UNDERLYING DATA__ (it is merely changing `strides` and the way we access it)
- __COPY OF `np.ndarray`s IS USUALLY NOT DONE__ (unless necessary)
- It almost never creates any problem for us (as long as we're working with `numpy` reasonably)

First option (without copy) is called __`view`__, while the other one is called __`copy`__.

![](./images/numpy_copy_view.png)

What does "working reasonably" mean?
- __After reshaping DON'T CHANGE ELEMENTS IN EITHER OF THE VIEWS__
- Use them in "functional" manner returning new objects (e.g. addition after reshape)
- See examples below

In [35]:
# elements 0-18 reshaped into 

arr = np.arange(18)

print(arr.shape, arr.strides)

reshaped = arr.reshape(3, 2, -1)

print(reshaped.shape, reshaped.strides)

print(f"Sharing underlying memory: {np.may_share_memory(arr, reshaped)}")

(18,) (8,)
(3, 2, 3) (48, 24, 8)
Sharing underlying memory: True


In [36]:
# Will change both arrays
arr[7] = 99999.

print(arr)
print(reshaped)

[    0     1     2     3     4     5     6 99999     8     9    10    11
    12    13    14    15    16    17]
[[[    0     1     2]
  [    3     4     5]]

 [[    6 99999     8]
  [    9    10    11]]

 [[   12    13    14]
  [   15    16    17]]]


In [62]:
# Correct usage, will not change underlying memory
# View will be used to multiply values within X1

X1 = np.random.randn(128, 10)

X2 = np.random.rand(1280)

X1 * X2.reshape(X1.shape)

array([[ 0.06607302, -0.03757907, -0.22012916, ..., -0.33735095,
         0.12948773, -0.32322123],
       [ 0.40553952, -0.10047234, -0.07102722, ..., -0.2595114 ,
         0.29522957,  0.04071672],
       [-0.11908935, -0.5060945 , -0.01849313, ...,  0.45426441,
         0.13575486,  0.35651088],
       ...,
       [-0.19608282,  0.14517763,  1.42295067, ...,  0.06966209,
         1.05039197,  0.7748389 ],
       [ 0.96215246, -0.0456957 ,  0.47949846, ...,  0.33081944,
        -0.20928297,  0.37043871],
       [ 0.84462781, -0.59371553,  0.01023492, ...,  1.03324647,
        -0.1447829 ,  0.06648208]])

## -1 in reshape

> `-1` is used in order to __infer__ missing dimensionality

It is pretty useful when:
- __we don't know some dimension beforehand__
- __we write function that has to work independently of some dimension__

Let's see a dummy example:

In [10]:
np.random.randn(5, 6, 8).reshape(-1, 10).shape

(24, 10)

In [8]:
def make_second_dimension_10(array):
    assert array.size % 10 == 0, "Number of array elements has to be dividable by 10"
    return array.reshape(-1, 10)


print(make_second_dimension_10(np.random.randn(5, 6, 8)).shape)
make_second_dimension_10(np.random.randn(120)).shape

(24, 10)


(12, 10)

# Broadcasting

After explaining `fancy indexing` and `reshape`, let's take a look at a third, powerful feature of `numpy`:

> __Broadcasting means automatic expansion of smaller array to a larger one__

![](./images/numpy_broadcasting.png)

Looking at the picture above:
- __Arrays have to be expandable__, e.g.:
    - `(3, 10)` and `(3,)`, second one will be expanded to `(3, 1)`
    - `(3, 10)` and `(10,)` __WILL NOT WORK__ as the first dimension does not match
    - We have to reshape above to `(1, 10)`, so the `(1,)` dimension will be expanded to `(3,)`
- __Dimensions have to match__ (exampele above)

Let's see a few examples:

In [6]:
import numpy as np
(np.array([[1], [2], [3]]) * np.array([[1, 2]])).shape

(3, 2)

In [106]:
# Broadcasting for both arrays

arr1 = np.random.randn(10, 3)
arr2 = np.random.randn(10, 5)

result = arr1.reshape(-1, 1, 3) * arr2.reshape(10, -1, 1)
result.shape

(10, 5, 3)

In [110]:
# Will not work
a = np.random.randn(1, 10)
b = np.random.randn(3)

a + b

array([-2.317568  , -1.51070574, -1.46470051])

In [25]:
x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
y = np.array([0, 2, 0]).reshape(3, 1)

x * y

array([[ 0,  0,  0],
       [ 8, 10, 12],
       [ 0,  0,  0]])

In [111]:
a = np.random.randn(3, 3)
b = np.random.randn(3)

a - b

array([[-0.31557747, -3.0836088 , -2.65585039],
       [ 1.21277083, -2.12886569,  0.18391569],
       [-1.09073187, -0.96382431, -2.76317141]])

# Working with shapes

`numpy` is a framework which allows us to work with `N` dimensional arrays.

Due to that, we should try to __think in terms of shapes__, not in terms of specific elements.

Throughout the course you will often see (also today) that we will define many tasks in terms of __dimensions__ and __what each dimension represents__.


An example could be data of shape `(users, movies)` which specifies:
- Ratings given for a movie
- For every user
- For every movie

Visually (assume `?` are equal to zero):

![](./images/numpy_example_matrix.png)

Let's create such data and see operations one can do on it:

In [4]:
import numpy as np

users = 24
movies = 10

data = np.random.randint(0, 11, size=(users, movies)) # 11 as it's one more than maximum 10 score

data

array([[ 8,  0,  9,  7, 10,  6,  0,  7,  1,  3],
       [ 4,  8,  0,  2,  3, 10,  6,  8,  4,  5],
       [ 4,  5,  0,  6, 10,  3,  6,  6,  5,  4],
       [ 6,  2,  4,  9, 10,  4,  5,  2,  6,  7],
       [ 1,  2,  8,  0, 10,  2,  4,  8,  1, 10],
       [ 9, 10,  7,  8,  5,  4,  6,  4,  8,  9],
       [10,  3,  6,  7,  6,  9,  9,  4,  1,  1],
       [ 5,  3,  0,  9,  5,  2,  2, 10,  8,  0],
       [ 9,  8,  5,  2,  2,  7,  3,  6,  3,  4],
       [ 2, 10,  6,  3,  8,  1,  7,  1, 10,  4],
       [ 9,  3,  0,  8,  1,  4,  5, 10,  0,  4],
       [ 1,  4,  1,  1,  7,  5,  6,  5,  4,  9],
       [ 3,  9,  6,  6,  6,  4,  5,  6,  6, 10],
       [ 3,  3,  1,  8,  8,  6,  5,  3,  3,  8],
       [10,  7,  5,  9,  8,  6,  4,  8,  5,  1],
       [ 6,  4, 10,  0,  8,  4,  1,  4,  0,  8],
       [ 5, 10, 10,  4,  3, 10,  2,  3, 10,  1],
       [ 2,  4, 10,  6, 10,  3, 10,  0,  7,  5],
       [ 9,  8,  1,  9,  3,  9,  9,  5,  9,  9],
       [ 3,  9,  1,  2,  4,  9,  2,  7,  9,  1],
       [10,  8,  7, 

__Please notice__:
- If we just look at the numbers alone, they do not convey too much information
- If, instead, we think about what the dimensions represent, we can more easily reason about various operations

> __Most of `numpy` math (and not only math) operations allow us to specify `axis` argument__

> __`axis` allows us to carry operation across specific dimension__

__TIPS:__

- __WRITE DATA SHAPES AS YOU APPLY SPATIAL TRANSFORMATIONS IN CODE COMMENT__
- __DIMENSION ACROSS WHICH WE CARRY THE OPERATION IS OFTEN REMOVED__



Let's see how one could __find average rating for each user__:

In [5]:
# data: (users, movies)

# total_ratings: (users,)
total_ratings = data.sum(axis=1) # sum all of the columns

# mean_ratings: (users,)
mean_ratings = total_ratings / data.shape[1] # divide by total number of available movies

mean_ratings

array([5.1, 5. , 4.9, 5.5, 4.6, 7. , 5.6, 4.4, 4.9, 5.2, 4.4, 4.3, 6.1,
       4.8, 6.3, 4.5, 5.8, 5.7, 7.1, 4.7, 4.8, 3.4, 6.1, 4. ])

Average rating for a movie (__almost the same as previously, just changing dimensions!__):

In [6]:
# data: (users, movies)

# total_ratings: (movies,)
total_ratings = data.sum(axis=0) # sum all of the rows

# mean_ratings: (movies,)
mean_ratings = total_ratings / data.shape[0] # divide by total number of users which gave the movie rating

mean_ratings

array([5.375     , 5.29166667, 4.70833333, 5.5       , 6.08333333,
       4.875     , 4.75      , 5.20833333, 4.91666667, 5.04166667])

Highest rating gave for any movie by specific user:

In [7]:
data.max(axis=1)

array([10, 10, 10, 10, 10, 10, 10, 10,  9, 10, 10,  9, 10,  8, 10, 10, 10,
       10,  9,  9, 10,  9, 10, 10])

Which movie (__movie index__) got the lowest score for each user:

In [9]:
data.argmin(axis=1)

array([1, 2, 2, 1, 3, 5, 8, 2, 3, 5, 2, 0, 0, 2, 9, 3, 9, 7, 2, 2, 5, 0,
       8, 1])

And which one was scored the lowest amongst all users:

In [15]:
# Movie which got the lowest score per-user

lowest = data.argmin(axis=1) # (users, )

# Calculate how often each lowest value occured
# minlength specifies number of entries (10 in our case as there are 10 movies)

counts = np.bincount(lowest, minlength=data.shape[1]) # (movies,)

# Get movies which got lowest rated most frequently:

np.argmax(counts) # (1, )

2

# Key Takeaways

- The `reshape` feature changes the way we access arrays, and does not copy underlying data. It simply changes `strides` and the way we access it
- The `view` option can be used to reshape data, and it does not copy the data itself
- The `copy` option can be used when we want to physically copy the data
- Adding `-1` to the reshape command helps to infer any missing dimensionality. It's especially useful when we don't know some dimension beforehand
- Broadcasting is a Numpy feature which automatically expands smaller arrays into larger ones
- It's suggested to think of data stores in arrays as shapes, rather than specific elements
