# Module 1 - Implementing and training a neural network

## Environment verification
Start by confirming you have PyTorch, TorchVision and TensorBoard installed.


In [1]:
import torch
import torchvision
from torch.utils.data import DataLoader

## QUESTIONS - General autonomous driving questions
In this part, some general questions about autonomous driving, both general and specific to formula student, are presented. You should read the relevant parts of the rulebook and beginner's guide to answer some of the questions (they are attached in /docs folder of the repo). Feel free to use the internet.

1. List some pros and cons of using a stereo camera versus LiDAR versus RADAR for perception. You can research examples from the industry on why do they use specific sensors and not others. 
2. Stereo cameras are capable of perceiving both color and depth for each pixel. These cameras can be bought plug-and-play solutions (for example Intel RealSense or StereoLabs ZED 2) or self-made using industrial cameras (for example Basler). Computing depth from multiple cameras requires processing, called "depth estimation", which is done onboard on the plug and play solutions. Which solution would you opt for if you had a small team with a short budget? Consider complexity, reliability and cost on your decision.
3. In an autonomous car, monitorization and reaction to critical failures are essential to prevent uncontrolled behavior. According to the rulebook and the beginner's guide, what must happen if the car detects a camera and/or LiDAR malfunction? Select the correct option(s), mentioning the relevant rule(s) you found:
    1. Play a sound using the TSAC.
    2. Eject the processing computer.
    3. Activate the EBS.
    4. Send a text message to the officials notifying the issue.
    5. Autonomously approach the ASR to perform a safe shutdown.
4. Usually an autonomous driving pipeline is divided into perception, planning and control. Which algorithms are most commonly used by formula student teams on each of these stages? You can research other teams' social media or FSG Academy, for example.
5. On a Formula Student car with an Autonomous System, for the car to operate on Manual Mode (with a driver) some verifications must be done by the Autonomous System. What are they?

1. 
It all depends on the context, obviously. A stereo camera, let’s say an RGBD one that can capture both image and depth, can be super useful, for example, if we’re training models that need to recognize some kinda of objects present on the image, having that depth perception is a huge plus. You can even find some solutions for that with pre-processing that’ll give you a thermal image with depth info on the different elements (https://github.com/apple/ml-depth-pro). But honestly, having that real-time depth perception right from the start is a big advantage.
If you’ve got something moving—like a drone or a car—depth perception is a must for making quick decisions. And getting depth directly from the camera saves a lot of resources, especially compared to a 2D camera, where you’d have to do way more processing to get the same kind of data, like i said before.

Now, lidar, on the other hand, is awesome when you just need to know distances and mapping the environment without worrying about any visual images. So, if we don’t think image capture is critical, we can just use lidar for measuring distances, depending on the type of robot. Like a robot vacuum, for example. Those don’t need a camera; just lidar and a few sensors to map out their surroundings are more than enough to do the job.

And about radar, it works by sending out radio waves and catching the reflection to calculate distances and object speeds. So Radar’s also better at precisely measuring speed, which is why it’s used in cars to detect other vehicles on the road and gauge their approach accurately. In a nutshell, if you need a system that performs well in all kinds of lighting or weather conditions, radar’s looks like a solid choice at least.

2.
Honestly, if I’ve got a small team and a tight budget, I’d probably go with a plug-and-play solution. With these, you get the depth estimation already built-in, which means I don’t have to worry about setting up a bunch of extra processing on our end. Less hassle, fewer headaches.

Building a custom setup is cool and can be more customizable, sure, but it’s also way more complex. You’d need to handle all that depth estimation yourself, which can be a time suck, not to mention it requires people who really know what they’re doing. And if you’re on a short budget, the extra resources needed for a custom build can quickly add up.

What i usually like to approach is something like "Will need to use this again?" like on other projects or so, if i known for certain that i will, so i start searching the ways to customize the camera my self in my free time, off course it will take a while but in a long run it will be more money saved, and with more knowledge in the topic probably the better it will end up.

So, in terms of reliability and cost-effectiveness, plug-and-play wins here.  But never let end solutions be ALWAYS the first choice.

## Dataset
The used dataset is the well-known MNIST, which is composed of images of handwritten digits (0 to 9) with 28 pixels wide and 28 pixels high.

The goals of most of the models using this dataset is to classify the digit of the image, which is our case.

Download the training and validation dataset:

In [None]:
training_set: torch.utils.data.Dataset = torchvision.datasets.MNIST("./data", train=True, download=True, transform=torchvision.transforms.ToTensor())
validation_set: torch.utils.data.Dataset = torchvision.datasets.MNIST("./data", train=False, download=True, transform=torchvision.transforms.ToTensor())

## Part 1 - MLP evaluation

Import the example MLP:

In [3]:
from bobnet import BobNet

Create an instance of this model:

In [4]:
model1 = BobNet()

Define the hyperparameters for this model:

In [5]:
# batch size
MLP_BATCH_SIZE=64

# learning rate
MLP_LEARNING_RATE=0.001

# momentum
MLP_MOMENTUM=0.9

# training epochs to run
MLP_EPOCHS=10

Create the training and validation dataloaders from the datasets downloaded earlier:

In [6]:
# create the training loader
mlp_training_loader = DataLoader(training_set, batch_size=MLP_BATCH_SIZE, shuffle=True) 

# create the validation loader
mlp_validation_loader = DataLoader(validation_set, batch_size=MLP_BATCH_SIZE, shuffle=True)

Define the loss function and the optimizer:

In [7]:
mlp_loss_fn = torch.nn.CrossEntropyLoss()

mlp_optimizer = torch.optim.SGD(model1.parameters(), lr=MLP_LEARNING_RATE, momentum=MLP_MOMENTUM)

Run the training and validation:

In [8]:
import utils

# how many batches between logs
LOGGING_INTERVAL=100

utils.train_model(model1, MLP_EPOCHS, mlp_optimizer, mlp_loss_fn, mlp_training_loader, mlp_validation_loader, LOGGING_INTERVAL)

RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


### QUESTIONS
Explore the architecture on the script `mod1/bobnet.py`.
1. Why does the input layer have 784 inputs? Consider the MNIST dataset samples' characteristics.
2. Why does the output layer have 10 outputs?

## Part 2 - CNN implementation

Head over to the `cnn.py` file and implement a convolutional architecture (add some convolutional layers and fully connected layers). You can search the LeNet architecture or AlexNet to get some insights and/or inspiration (you can implement a simpler version: with less layers). 2D convolutional layers in PyTorch are created using the `torch.nn.Conv2d` class. Activation and loss functions can be found under `torch.nn.functional` (like ReLU and softmax).

Now, import the model:

In [None]:
from cnn import CNN

Create an instance of this model:

In [None]:
model2 = CNN()

Train the model:

In [None]:
# TODO: run training and validation

### QUESTIONS

1. What are the advantages of using convolutional layers versus fully-connected layers for image processing?