# [0.0] - Prerequisites (exercises)

<img src="https://raw.githubusercontent.com/info-arena/ARENA_img/main/misc/headers/header-00.png" width="350">

# Introduction

This page contains important prerequisites for the rest of the material in this chapter. The first page **Core Concepts / Knowledge** is a long list of important concepts and libraries that you should be familiar with before starting the rest of the material. The second page **Einops, Einsum & Tensor Manipulation** provides some coding exercises to help you get familiar with the `einops` and `einsum` libraries, as well as some exercises on indexing and other aspects of tensor manipulation. We anticipate adding more exercises here in the future, in the style of the second page.

See the homepage for more instructions on how to set up your environment, and install all the dependencies you need to run the exercises.

## Content & Learning Objectives

### 1️⃣ Core Concepts / Knowledge

This section doesn't have exercises; it goes through a list of important concepts with associated reading material and questions to check your understanding. We cover all the core material that you'll need for the rest of the course such as neural networks, linear algebra, probability, and calculus.

> ##### Learning Objectives
>
> - Understand the structure and function of neural networks
> - Learn essential linear algebra concepts like matrix operations and transformations
> - Understand core principles of probability and statistics, including expected value and variance
> - Learn how calculus concepts (particularly differentiation) are applied in optimization tasks
> - Cover some foundational information theory concepts, such as entropy and KL divergence
> - Enhance Python programming skills, focusing on NumPy and PyTorch basics

### 2️⃣ Einops, Einsum & Tensor Manipulation

These sections will introduce you to a third party library which you'll find very useful for the rest of this course, and will also require you to engage with various tensor operations (something else that will come up repeatedly during this course).

> ##### Learning Objectives
>
> - Understand the basics of Einstein summation convention
> - Learn how to use `einops` to perform basic tensor rearrangement, and `einsum` to to perform standard linear algebra operations on tensors

# 1️⃣ Core Concepts / Knowledge

> ##### Learning Objectives
>
> - Understand the structure and function of neural networks
> - Learn essential linear algebra concepts like matrix operations and transformations
> - Understand core principles of probability and statistics, including expected value and variance
> - Learn how calculus concepts (particularly differentiation) are applied in optimization tasks
> - Cover some foundational information theory concepts, such as entropy and KL divergence
> - Enhance Python programming skills, focusing on NumPy and PyTorch basics

This page contains a list of all prerequisites we think will be helpful to learn before studying the ARENA program material. You can return to this page while you study. None of it is compulsory and some resources are likely to be much more helpful than others. We denote very high and high-priority resources with a double and single asterisk respectively, so if you have limited time then prioritise these. It is strongly recommended to at least read over everything with a double asterisk. Also, you should try and prioritise areas you think you might be weaker in than others (for instance, if you have a strong SWE background but less maths experience then you might want to spend more time on the maths sections). You can also return to this document throughout the programme, if there are any areas you want to brush up on.

The content is partially inspired by a similar doc handed out by Redwood to participants before the start of MLAB, as well as by pre-prerequisite material provided by Jacob Hilton on his [GitHub page](https://github.com/jacobhilton/deep_learning_curriculum).

> Throughout this document, there are some questions thrown in (denoted with boxes like these). It's not essential to answer all of these questions, but you should be able to answer most (or at least know how you might go about answering them).

If you are reading this, and know of any good material for these topics which we've missed out, please let us know and we might be able to add it in!

## Maths

### Neural Networks**

We won't assume any deep knowledge of neural networks or machine learning before the programme starts, but it's useful to have an idea of the basis so that the first week doesn't have quite as steep a learning curve. The best introductory resources here are 3B1B's videos on neural networks:

* [But what is a neural network? | Chapter 1, Deep learning](https://www.youtube.com/watch?v=aircAruvnKk)
* [Gradient descent, how neural networks learn | Chapter 2, Deep learning](https://www.youtube.com/watch?v=IHZwWFHWa-w)
* [What is backpropagation really doing? | Chapter 3, Deep learning](https://www.youtube.com/watch?v=Ilg3gGewQ5U)

You should prioritise the first two videos in this sequence.

### Linear Algebra**

Linear algebra lies at the core of a lot of machine learning.

<img src = "https://lh3.googleusercontent.com/lc5-ykchkkSylhM77XrLuCzfthg8bEtdUTV5EUcArilVKPlh9NZQDWkVnp55s35HftZAWdmyVmh3h5g83HtkOscn1g7x4wS5JO8s4mujPV8N2RbBqoCKiY0K_lVRJgVj7zuSyqPV4SIkbsjaWUQkqJY" width="360">

Here is a list of topics you should be comfortable with:

*   Linear transformations - what they are, and why they are important
    *   See [this](https://www.youtube.com/watch?v=kYB8IZa5AuE) video from 3B1B
*   How [matrix multiplication works](http://mlwiki.org/index.php/Matrix-Matrix_Multiplication)
*   Basic matrix properties: rank, trace, determinant, transpose
*   Bases, and basis transformations

Some other non-essential topics that we also weakly recommend:

*   [Singular value decomposition](https://www.lesswrong.com/posts/iupCxk3ddiJBAJkts/six-and-a-half-intuitions-for-svd)
*   Eigenvalues and eigenvectors
*   Different types of matrix, and their significance (e.g. symmetric, orthogonal, identity, rotation matrices)

[This video series](https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab) by 3B1B provides a good overview of these core topics (although you can probably skip it if you already have a reasonably strong mathematical background).

If you have a lot more time, [Linear Algebra Done Right](https://link.springer.com/book/10.1007/978-3-319-11080-6) is the canonical textbook for covering this topic (although it will probably cover much more than you need to know). Alternatively, Neel Nanda has two [YouTube](https://www.youtube.com/watch?v=GkPhwnvRe-8) [videos](https://www.youtube.com/watch?v=0EB23unfLSU) covering linear algebra extensively.

### Probability & Statistics**

It's essential to understand the rules of probability, expected value and standard deviation, and helpful to understand independence and the normal distribution.

[This link](https://medium.com/jun94-devpblog/prob-stats-3-expected-value-variance-and-standard-deviation-bce9303d8da8) covers some of the essentials.

### Calculus**

It's essential to understand differentiation and partial differentiation, and helpful to understand the basics of vector calculus including the chain rule and Taylor series.

Again, 3Blue1Brown has a good [video series](https://www.3blue1brown.com/topics/calculus) on this.

## Programming

### Python**

It's important to be strong in Python, because this is the language we'll be using during the programme. As a rough indication, we expect you to be comfortable with at least 80-90% of the material [**here**](https://book.pythontips.com/en/latest/), up to section **"21. for/else"**. For a more thorough treatment of Python's core functionality, see [**here**](https://docs.python.org/3/tutorial/).

### Libraries

The following libraries would be useful to know to at least a basic level, before the course starts.

#### [NumPy](https://numpy.org/)**

Being familiar with NumPy is a staple for working with high-performance Python. Additionally, the syntax for working with NumPy arrays is very similar to how you work with PyTorch tensors (often there are only minor differences, e.g. Torch tends to use the keyword dim where NumPy uses axis). Working through [these 100 basic NumPy exercises](https://github.com/rougier/numpy-100/blob/master/100_Numpy_exercises.ipynb) would be a good idea, or if you're comfortable with NumPy already then you could try doing them in PyTorch (see below).

#### [PyTorch](https://pytorch.org/)**

We will be starting chapter 0 of the programme with some structured exercises designed to get everyone familiar with working in PyTorch. However, the more comfortable you are with PyTorch going in, the easier you'll probably find this. PyTorch has several useful tutorials, and to get comfortable working with tensors you might want to implement the 100 basic NumPy exercises linked to above, using PyTorch instead. Another option would be this [Learn the Basics](https://pytorch.org/tutorials/beginner/basics/intro.html) tutorial. You can skip this if you can clearly explain:

> * **At a high level, what is a `torch.Tensor`?**
> * **What is a `nn.Parameter`, and `nn.Module`?**
> * **When you call `.backward()`, where are your gradients stored?**
> * **What is a loss function? In general, what does it take for arguments, and what does it return?**
> * **What does an optimization algorithm do?**
> * **What is a hyperparameter, and how does it differ from a regular parameter?**
> * **What are some examples of hyperparameters?**

#### [Einops](https://einops.rocks/1-einops-basics) and [Einsum](https://einops.rocks/api/einsum/)*

These are great libraries to get comfortable with, when manipulating tensors. If you're comfortable using them, then you can say goodbye to awkward NumPy/PyTorch methods like transpose, permute and squeeze! We'll have a few einops and einsum exercises on the second day of the fundamentals chapter, but the more comfortable you are with these libraries the faster you'll be.

For einops, you can read through the examples up to "Fancy examples in random order". It's worth trying to play around with these in your own Jupyter notebook, to get more comfortable with them.

For einsum, [this page](https://rockt.github.io/2018/04/30/einsum) provides a basic intro to einstein summation convention, and shows some example tensor implementations. Note that we'll be using the einsum function from the einops library, which allows you to refer to dimensions by name rather than by a single letter.

#### [Typing](https://docs.python.org/3/library/typing.html)*

Type-checking Python functions that you write is a great way to catch bugs, and keep your code clear and readable. Python isn't a strongly-typed language so you won't get errors from using incorrect type specifications unless you use a library like MyPy. However, if you're using VSCode then you can pair this library with a really useful automatic type checker (see next section).

#### [Plotly](https://plotly.com/python)

Plotly is an interactive graphing library which is great for presenting results and investigating data. If you're already very familiar with a different Python plotting library (e.g. matplotlib) then I wouldn't recommend re-learning Plotly, but if you aren't already very familiar with matplotlib or you're open to learning Plotly, I'd strongly recommend giving it a try!

#### [Streamlit](https://share.streamlit.io)

Streamlit is a cool library for building and sharing data-based applications. It integrates very nicely with Plotly (see above), can be hosted on your personal GitHub, and is very intuitive & easy to learn relative to other libraries with similar features (e.g. Dash). This is not compulsory, but if you like the look of Streamlit then you might want to think about using it as a way to submit (or even make public) your end-of-week or capstone projects. See [this page](https://copy-suppression.streamlit.app/) I (Callum) made for presenting the results from the Copy Suppression paper I worked on during SERI MATS, as an example of what Streamlit can do.

## Software Engineering

### Basic coding skills**

If you've been accepted into this programme, then you probably already have this box ticked! However, polishing this area can't hurt. LeetCode is a good place to keep basic coding skills sharp, in particular practising the planning and implementation of functions in the medium-hard sections of LeetCode might be helpful. Practising problems on [Project Euler](https://projecteuler.net/) is another idea.

### VSCode**

Although many of you might already be familiar with Jupyter Notebooks, we recommend working through structured exercises using VSCode. This is a powerful text editor which provides more features than Jupyter Notebooks. Some features it has are:

* **Shortcuts**  
    These are much more powerful than anything offered in Jupyter Notebooks. Below is a table of some particularly useful ones (see [this link](https://www.geeksforgeeks.org/visual-studio-code-shortcuts-for-windows-and-mac/) for more detail on how each of them works).
    |       Commands       |                   Windows/Linux                   |                         MAC                         |
    |:--------------------:|:-------------------------------------------------:|:---------------------------------------------------:|
    |      Delete Line     |                  Ctrl + Shift + K                 |                   Cmd + Shift + K                   |
    | Copy Line Up or Down | Shift + Alt + Up arrow or Shift +Alt + Down arrow | Opt + Shift + Up arrow or  Opt + Shift + Down arrow |
    |      Global Find     |                  Ctrl + Shift + F                 |                   Cmd + Shift + F                   |
    |       Copilot*       |             Ctrl + i / Ctrl + Shift + i           |              Cmd + i / Cmd + Shift + i              |
    | Toggle Block Comment |                  Ctrl + Shift + /                 |                   Cmd + Shift + /                   |
    |    Command Palette   |                  Ctrl + Shift + P                 |                   Cmd + Shift + P                   |
    |  Toggle Line Comment |                      Ctrl + /                     |                       Cmd + /                       |
    |  Trigger Suggestion  |                    Ctrl + Space                   |                     Cmd + Space                     |
    |    Toggle Sidebar    |                      Ctrl + B                     |                       Cmd + B                       |
    |  Multi Select Cursor |                      Ctrl + D                     |                       Cmd + D                       |
    |      Quick Open      |                      Ctrl + P                     |                       Cmd + P                       |
<br>

* **Type checking**      
    We discussed the typing module in a section above. This is particularly powerful when used alongside VSCode's type checker extension. You can activate typing by going to the `settings.json` file in VSCode, and adding this line:  
    
    ```json
    {
        "python.analysis.typeCheckingMode": "basic"
    }
    ```

    You can open the `settings.json` file by first opening VSCode's Command Palette (see the shortcuts above), then finding the option Preferences: Open User Settings (JSON). We won't be using type checking like this in the course, because it can be a bit too strict sometimes, but it's a useful thing to know about.

* **Notebook functionality**  
Although VSCode does provide an extension which acts just like a Jupyter Notebook, it actually has a much more useful feature. Python files can also be made to act like notebooks, by adding the line #%% which act as cell dividers. In this way, you can separate chunks of code and run them individually (and see their output in a new window). See [this page](https://code.visualstudio.com/docs/python/jupyter-support-py) for a further explanation.  
  
* **Debugger**  
The VSCode debugger is a great tool, and can be much more powerful and efficient than the standard practice of adding lines to print information about your output! You can set breakpoints in your code, and closely examine the local variables which are stored at that point in the program. More information can be found on [this page](https://lightrun.com/debugging/debug-python-in-vscode/).  
  
* **Testing**  
VSCode provides ways to easily keep track of and run tests which are written using the pytest or unittest libraries. We'll have a look at the former during the course.  
  
* **Remote machines**  
VSCode offers easy ways to execute code on remote machines over SSH.  
  
* **Copilot**  
GitHub Copilot uses the OpenAI Codex to suggest code and entire functions in real-time, right from your editor. We will be encouraging (although not requiring) use of copilot during the program. It can't do everything, but it's very helpful in abstracting away the annoying low-level details and allowing you to focus on the higher-level concepts and structures.  
  
* **Local imports**  
If you're getting squiggly yellow lines under local imports (or local imports aren't working), add this to the workspace JSON file (you can access it by searching "workspace JSON" into the command palette):

    ```json
    {
        "python.analysis.extraPaths": [
            "extrapath"
        ],
        "python.analysis.include": [
            "extrapath"
        ]
    }
    ```
    where `extrapath` is the path you want to add (e.g. `./chapter0_fundamentals/exercises`, if you're running imports like `import part1_raytracing.solutions as solutions`).

### Jupyter Notebook / Colab**

Despite the overall awesomeness of VSCode, Jupyter Notebooks still have some advantages over VSCode, primarily in exploration and visualisation. Several of the advantages listed in the previous section also apply to .ipynb files created and edited in VSCode; the main reason we're not encouraging this for the ARENA in-person program is because we want participants to be able to write code that they can import from previous days (and because using .py files can help encourage better coding practices, rather than just creating a huge number of cells and dropping them under different headings, losing track of what code is where!).

Colab has a similar structure to Jupyter Notebooks, but it provides several additional features, most importantly GPU access. For anyone following this material virtually, we unfortunately won't be able to provide compute, so (assuming you don't have any way of accessing better compute yourselves e.g. via Lambda Labs) Colab might be the best option for you.

### Git*

Git is a piece of version control software, designed for tracking and managing changes in a set of files. It can be very powerful, but also a bit of a headache to use.

If you're following this course from the [repo](https://github.com/callummcdougall/ARENA_3.0), you'll need to be comfortable using Git to push and pull files. We also love to see participants contributing to open-source libraries during the program or capstone projects (e.g. TransformerLens or nnsight) as well as using GitHub for collaboration during their capstone projects. These will also require comfort working with Git.

![](https://lh3.googleusercontent.com/jpF1jPI7VGb9ssE-Lzn58nB_d4CQlZItqxhFFU_ZLTL3bv_x_82WVmgyIHQRDyemS7Ne5zpj4-rGVRkpACfR_u7Cw8DTGrMr8H_Sm7w74k03gfcJ2nkAwyk2CElLhMonZ6nhQiirM43DP40BIw1Id0k)

If you already have a strong SWE background then you might not need to spend as much time on this section. Otherwise, we recommend the [Learn Git Branching](https://learngitbranching.js.org/) tutorial series, for an intuitive and interactive set of exercises on Git. A few other resources which have been recommended (which will probably have a lot of overlap):

* [An Intro to Git and GitHub for Beginners](https://product.hubspot.com/blog/git-and-github-tutorial-for-beginners)
* [Git Immersion](https://gitimmersion.com/index.html)
* [Git cheat sheet](https://www.atlassian.com/git/tutorials/atlassian-git-cheatsheet)

Ideally, you should feel comfortable with the following:

* Cloning a repository
* Creating and switching between branches
* Staging and committing changes
* Pushing branches

### Conda*

Virtual environments are a common way to manage dependencies when working across multiple different projects and are standard practice in all professional development contexts. During ARENA, we expect that participants will be working inside likely one, but possibly more environments.

If you haven't used them already, please ensure that you have conda installed and know how to activate it. See: [Getting started with conda](https://conda.io/projects/conda/en/latest/user-guide/getting-started.html).

### Unix

This won't matter a huge amount for this course (at least for the minimal versions of it), but it's an important part of your development as an engineer. [UC Berkeley's UNIX Tutorial](https://people.ischool.berkeley.edu/~kevin/unix-tutorial/toc.html) provides a comprehensive introduction (up to & including section 4 should suffice for most things you might be doing during this course). [Surrey University's UNIX Tutorial](https://users.cs.duke.edu/~alvy/courses/unixtut/) is also a good resource - up to tutorial 2 should be sufficient.

Also, you can use the instructions on [this page](https://cs50.readthedocs.io/terminal/) to set up a UNIX terminal to experiment with the methods covered in the above tutorials.

## Optional Reading

We will teach the course assuming you haven't read any of these, but they are all useful and relevant things to know and will allow you to understand the material more deeply and/or tackle a more advanced Week 4 project.

#### [**100 NumPy Exercises**](https://github.com/rougier/numpy-100)

After you've done the pre-course exercises, if you want more practice with this sort of thing, try these. Some of these are a lot more interesting than others - pick out some that sound fun and challenging and try solving them in PyTorch.

#### [**What is torch.nn really?**](https://pytorch.org/tutorials/beginner/nn_tutorial.html) **by Jeremy Howard**

You'll be implementing a lot of functionality of torch.nn and torch.optim yourself during the course. This is a good introduction to what functionality is in these packages. If you don't learn this now, you can pick it up during the course, though perhaps less deeply.

#### [**NLP Demystified**](https://www.nlpdemystified.org/) **by Nitin Punjabi**

An introduction to natural language processing assuming zero ML background.

*   If you've never done NLP before, it's worth skimming to get a general idea of the field.
*   If you've never built a basic feedforward neural network (MLP) from scratch, the section "Neural Networks I” has a good exercise on this.

#### [**Visualising Representations: Deep Learning and Human Beings**](https://colah.github.io/posts/2015-01-Visualizing-Representations/) **by Chris Olah**

Builds intuition with nice pictures about what deep networks are doing inside.

#### [**Spinning Up in Deep RL**](https://spinningup.openai.com/en/latest/) **by OpenAI**

Introduction to using deep learning for reinforcement learning. ARENA will assume zero RL experience, but having some understanding already means you'll have an easier time and can tackle more advanced versions of things in the course.

#### [**Introduction to Reinforcement Learning**](https://www.youtube.com/watch?v=2pWv7GOvuf0) **with David Silver**

This video course is fairly old (2015) and the state of the art has come a long way, but this is still useful to cover the basics. I would recommend Spinning Up in Deep RL over this unless you learn better from video lectures than reading.

#### [**The Matrix Cookbook**](https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf) **by Petersen and Pedersen**

A densely packed reference of facts and identities about matrices. Definitely not intended to teach topics, but a good place to look up something you need. It's worth memorising identities 1-6, 11-16, and 18-23 from Page 6.

#### [**Zoom In: An Introduction to Circuits**](https://distill.pub/2020/circuits/zoom-in/) **by Chris Olah et al**

A very fun article on interpretability in neural networks trained for computer vision.

#### [**Why Momentum Really Works**](https://distill.pub/2017/momentum/) **by Gabriel Goh**

Variations of gradient descent that use momentum are extremely common. We'll teach the basics of momentum in the course, but if you want a richer and deeper understanding then this is a good article to read and reread.

#### [**The Matrix Calculus You Need for Deep Learning**](https://explained.ai/matrix-calculus/) **by Terence Parr and Jeremy Howard**

Takes you from knowing introductory calculus to calculus on matrices. This will be helpful for the backpropagation material, on day 3 of the course.

#### [**A Mathematical Framework for Transformer Circuits**](https://transformer-circuits.pub/2021/framework/index.html) **by Anthropic**

Analyses transformers starting with the simplest toy models and working up. A heavy read but very good for building intuition about what transformers can do. This will form the cornerstone of the mechanistic interpretability chapter.

#### [**In-context Learning and Induction Heads**](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html) **by Anthropic**

Describes and analyses "induction heads", an important circuit learned by transformers.

#### [**Neural Networks and Deep Learning**](http://neuralnetworksanddeeplearning.com/) **by Michael Nielsen**

# 2️⃣ Einops, Einsum & Tensor Manipulation (Highly Recommended)

> ##### Learning Objectives
>
> - Understand the basics of Einstein summation convention
> - Learn how to use `einops` to perform basic tensor rearrangement, and `einsum` to to perform standard linear algebra operations on tensors

Note - this section contains a large number of exercises. You should feel free to skim through them if you feel comfortable with the basic ideas.

## Reading

* Read about the benefits of the `einops` library [here](https://www.blopig.com/blog/2022/05/einops-powerful-library-for-tensor-operations-in-deep-learning/).
* If you haven't already, then review the [Einops basics tutorial](https://einops.rocks/1-einops-basics/) (up to the "fancy examples" section).
* Read [einsum is all you need](https://rockt.github.io/2018/04/30/einsum) (or [watch it](https://www.youtube.com/watch?v=pkVwUVEHmfI)) for a brief overview of the `einsum` function and how it works. (You don't need to read past section 2.10.)

## Setup

In [None]:
import os
import sys
from pathlib import Path

IN_COLAB = "google.colab" in sys.modules

chapter = "chapter0_fundamentals"
repo = "ARENA_3.0"
branch = "main"

# Install dependencies
try:
    import einops
except:
    %pip install einops jaxtyping

# Get root directory, handling 3 different cases: (1) Colab, (2) notebook not in ARENA repo, (3) notebook in ARENA repo
root = (
    "/content"
    if IN_COLAB
    else "/root"
    if repo not in os.getcwd()
    else str(next(p for p in Path.cwd().parents if p.name == repo))
)

if Path(root).exists() and not Path(f"{root}/{chapter}").exists():
    if not IN_COLAB:
        !sudo apt-get install unzip
        %pip install jupyter ipython --upgrade

    if not os.path.exists(f"{root}/{chapter}"):
        !wget -P {root} https://github.com/callummcdougall/ARENA_3.0/archive/refs/heads/{branch}.zip
        !unzip {root}/{branch}.zip '{repo}-{branch}/{chapter}/exercises/*' -d {root}
        !mv {root}/ARENA_3.0-{branch}/{chapter} {root}/{chapter}
        !rm {root}/{branch}.zip
        !rmdir {root}/ARENA_3.0-{branch}


if f"{root}/{chapter}/exercises" not in sys.path:
    sys.path.append(f"{root}/{chapter}/exercises")

os.chdir(f"{root}/{chapter}/exercises")

In [None]:
import math
import os
import sys
from pathlib import Path

import einops
import numpy as np
import torch as t
from torch import Tensor

# Make sure exercises are in the path
chapter = "chapter0_fundamentals"
section = "part0_prereqs"
root_dir = next(p for p in Path.cwd().parents if (p / chapter).exists())
exercises_dir = root_dir / chapter / "exercises"
section_dir = exercises_dir / section
if str(exercises_dir) not in sys.path:
    sys.path.append(str(exercises_dir))

import part0_prereqs.tests as tests
from part0_prereqs.utils import display_array_as_img, display_soln_array_as_img

MAIN = __name__ == "__main__"

## Einops

In [None]:
arr = np.load(section_dir / "numbers.npy")

`arr` is a 4D numpy array. The first axes corresponds to the number, and the next three axes are channels (i.e. RGB), height and width respectively. You have the function `utils.display_array_as_img` which takes in a numpy array and displays it as an image. There are two possible ways this function can be run:

* If the input is three-dimensional, the dimensions are interpreted as `(channel, height, width)` - in other words, as an RGB image.
* If the input is two-dimensional, the dimensions are interpreted as `(height, width)` - i.e. a monochrome image.

For example:

In [None]:
print(arr[0].shape)
display_array_as_img(arr[0])  # plotting the first image in the batch

In [None]:
print(arr[0, 0].shape)
display_array_as_img(arr[0, 0])  # plotting the first channel of the first image, as monochrome

In [None]:
arr_stacked = einops.rearrange(arr, "b c h w -> c h (b w)")
print(arr_stacked.shape)
display_array_as_img(arr_stacked)  # plotting all images, stacked in a row

A series of images follow below, which have been created using einops functions performed on arr. You should work through these and try to produce each of the images yourself. This page also includes solutions, but you should only look at them after you've tried for at least five minutes.

**Note - if you find you're comfortable with the first ~half of these, you can skip to later sections if you'd prefer, since these aren't particularly conceptually important.**

### Exercises - einops operations (match images)

> ```yaml
> Difficulty: 🔴🔴🔴⚪⚪
> Importance: 🔵🔵⚪⚪⚪
>
> You should spend up to ~45 minutes on these exercises collectively.
> If you think you get the general idea, then you can skip to the next section.
> You shouldn't spend longer than ~10 mins per exercise.
> ```

#### (1) Column-stacking

In [None]:
display_soln_array_as_img(1)

In [None]:
# Your code here - define arr1

a

display_array_as_img(arr1)

#### (2) Column-stacking and copying

In this example we take just the first digit, and copy it along rows using `einops.repeat`.

In [None]:
display_soln_array_as_img(2)

In [None]:
# Your code here - define arr2

display_array_as_img(arr2)

#### (3) Row-stacking and double-copying

This example is pretty similar to the previous one, except that the part of the original image we need to slice and pass into `einops.repeat` also has a batch dimension of 2 (since it includes the first 2 digits).

In [None]:
display_soln_array_as_img(3)

In [None]:
# Your code here - define arr3

display_array_as_img(arr3)

#### (4) Stretching

The image below was stretched vertically by a factor of 2.

In [None]:
display_soln_array_as_img(4)

In [None]:
# Your code here - define arr4

display_array_as_img(arr4)

#### (5) Split channels

The image below was created by splitting out the 3 channels of the image (i.e. red, green, blue) and turning these into 3 stacked horizontal images. The output is 2D (the display function interprets this as a monochrome image).

In [None]:
display_soln_array_as_img(5)

In [None]:
# Your code here - define arr5

display_array_as_img(arr5)

#### (6) Stack into rows & cols

This requires a rearrange operation with dimensions for row and column stacking.

In [None]:
display_soln_array_as_img(6)

In [None]:
# Your code here - define arr6

display_array_as_img(arr6)

#### (7) Transpose

Here, we've just flipped the model's horizontal and vertical dimensions. Transposing is a fairly common tensor operation.

In [None]:
display_soln_array_as_img(7)

In [None]:
# Your code here - define arr7

display_array_as_img(arr7)

#### (8) Shrinking

Hint - for this one, you should use **max pooling** - i.e. each pixel value in the output is the maximum of the corresponding 2x2 square in the original image.

In [None]:
display_soln_array_as_img(8)

In [None]:
# Your code here - define arr8

display_array_as_img(arr8)

### Broadcasting

Before we go through the next exercises, we'll need to address one important topic in tensor operations - **broadcasting**.

Both NumPy and PyTorch have the same rules for broadcasting. When two tensors are involved in an elementwise operation, NumPy/PyTorch tries to broadcast them (i.e. copying them along dimensions) so that they both have the same shape. The rules of broadcasting are as follows:

1. You can prepend dummy dimensions (of size 1) to the start of a tensor until both have the same number of dimensions
2. After this point, if some dimension has size 1 in one of the tensors, it can be repeated until it matches the size of the corresponding dimension in the other tensor

To give a simple example - suppose we have a 2D batch of data, of shape `data.shape = (N, k)` (i.e. we have `N` separate datapoints, each being a vector of length `k`). Suppose we want to add a vector `vec` of length `k` to each datapoint. This is a valid operation, because when we try and add these two objects together:

1. `vec` gets prepended with a dummy dimension so it has shape `(1, k)` and both are 2D
2. `vec` gets repeated along the first dimension so it has shape `(N, k)`, matching the shape of `data`

Then, our output has shape `(N, k)`, and elements `output[i, j] = data[i, j] + vec[j]`.

Broadcasting can be a very easy place to make mistakes, because it's easy to lose track of the exact shape of your tensors involved. As a warm-up exercise, below are some examples of broadcasting. Can you figure out which are valid, and which will raise errors?


```python
x = t.ones((3, 1, 5))
y = t.ones((1, 4, 5))

z = x + y
```

```python
x = t.ones((8, 2, 6))
y = t.ones((8, 2))

z = x + y
```

```python
x = t.ones((8, 2, 6))
y = t.ones((2, 6))

z = x + y
```

```python
x = t.ones((10, 20, 30))
y = t.ones((20, 1))

z = x + y
```

```python
x = t.ones((4, 1))
y = t.ones((4,))

z = x + y
```

<font color="red">Add Answers Below

1.

</font>

Einops is a useful tool for reshaping tensors to enable broadcasting. If you just need to add or remove a dummy dimension, you don't need to use einops: `tensor.unsqueeze(dim)` will give you a new tensor with a dummy dimension of size 1 inserted at position `dim` in the new tensor, and `tensor.squeeze(dim)` will give you a tensor with the dimension at position `dim` removed (if it had size 1, otherwise nothing happens).

```python
x = t.ones((3, 1, 5))

print(x.unsqueeze(3).shape)  # (3, 1, 5, 1) because we add a new dummy dimension at idx 3 (the end) in the new tensor

print(x.squeeze(1).shape) # (3, 5) because we remove the dimension at idx 1 (it has size 1)

print(x.squeeze(0).shape) # (3, 1, 5) because we don't remove the leading dimension (it has size 3)
```

## Einsum

Einsum is a very useful function for performing linear operations, which you'll probably be using a lot during this programme.

> Note - we'll be using the `einops.einsum` version of the function, which works differently to the more conventional `torch.einsum`:
>
> * `einops.einsum` has the arrays as the first arguments, and uses spaces to separate dimensions in the string.
> * `torch.einsum` has the string as its first argument, and doesn't use spaces to separate dimensions (each dim is represented by a single character).
>
> For instance, `torch.einsum("ij,i->j", A, b)` is equivalent to `einops.einsum(A, b, "i j, i -> j")`. (Note, einops doesn't care whether there are spaces either side of `,` and `->`, so you don't need to match this syntax exactly.)

Although there are many different kinds of operations you can perform, they are all derived from three key rules:

1. Repeating letters in different inputs means those values will be multiplied, and those products will be in the output.
    * For example, `M = einops.einsum(A, B, "i j, i j -> i j")` just corresponds to the elementwise product `M = A * B` (because $M_{ij} = A_{ij} B_{ij}$).
2. Omitting a letter means that the axis will be summed over.
    * For example, if `x` is a 2D array with shape `(I, J)`, then `einops.einsum(x, "i j -> i")` will be a 1D array of length `I` containing the row sums of `x` (we're summing along the `j`-index, i.e. across rows).
3. We can return the unsummed axes in any order.
    * For example, `einops.einsum(x, "i j k -> k j i")` does the same thing as `einops.rearrange(x, "i j k -> k j i")`.

*Note - the einops creators supposedly have plans to support shape rearrangement, e.g. with operations like `einops.einsum(x, y, "i j, j k l -> i (k l)")` (i.e. combining the features of rearrange and einsum), so we can all look forward to that day!*

### Exercises - einsum

> ```yaml
> Difficulty: 🔴🔴⚪⚪⚪
> Importance: 🔵🔵🔵🔵⚪
>
> You should spend up to 15-20 minutes on these exercises collectively.
> If you think you get the general idea, then you can skip to the next section.
> ```

In the following exercises, you'll write simple functions using `einsum` which replicate the functionality of standard NumPy functions: trace, matrix multiplication, inner and outer products. We've also included some test functions which you should run.

Note - this version of einsum will require that you include `->`, even if you're summing to a scalar (i.e. the right hand side of your string expression is empty).

In [None]:
def einsum_trace(mat: np.ndarray):
    """
    Returns the same as `np.trace`.
    """
    raise NotImplementedError()


def einsum_mv(mat: np.ndarray, vec: np.ndarray):
    """
    Returns the same as `np.matmul`, when `mat` is a 2D array and `vec` is 1D.
    """
    raise NotImplementedError()


def einsum_mm(mat1: np.ndarray, mat2: np.ndarray):
    """
    Returns the same as `np.matmul`, when `mat1` and `mat2` are both 2D arrays.
    """
    raise NotImplementedError()


def einsum_inner(vec1: np.ndarray, vec2: np.ndarray):
    """
    Returns the same as `np.inner`.
    """
    raise NotImplementedError()


def einsum_outer(vec1: np.ndarray, vec2: np.ndarray):
    """
    Returns the same as `np.outer`.
    """
    raise NotImplementedError()


tests.test_einsum_trace(einsum_trace)
tests.test_einsum_mv(einsum_mv)
tests.test_einsum_mm(einsum_mm)
tests.test_einsum_inner(einsum_inner)
tests.test_einsum_outer(einsum_outer)