# CS188 Assignment 1-1: Dataloader

Before we start, please put your name and UD in following format

: Firstname LASTNAME, #00000000   //   e.g.) Yining Hong, #123456789

**Your Answer:**   
Your NAME, #XXXXXXXX

In this notebook you will implement the dataloader on the TinyPlaces dataset you created.

The dataloader will take in the tinyplaces-train and tinyplaces-val files you create, turn them into pytorch tensors.

if binary is set to False, it will take in the tinyplaces-train-multiclass and tinyplaces-val-multilclass files you create.

If necessary, the dataloader should be able to sample how many data you want and return a sub-sampled dataset.

The goal of this exercise is to get you started with how to load data using [PyTorch](https://pytorch.org/).

# Setup Code
Before getting started we need to run some boilerplate code to set up our environment. You'll need to rerun this setup code each time you start the notebook.

First, run this cell load the [autoreload](https://ipython.readthedocs.io/en/stable/config/extensions/autoreload.html?highlight=autoreload) extension. This allows us to edit `.py` source files, and re-import them into the notebook for a seamless editing and debugging experience.

In [None]:
%load_ext autoreload
%autoreload 2

### Google Colab Setup
Next we need to run a few commands to set up our environment on Google Colab.

Run the following cell to mount your Google Drive. Follow the link, sign in to your Google account (the same account you used to store this notebook!) and copy the authorization code into the text box that appears below.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Now examine if you can access the assignment folder.

In [None]:
import os

# TODO: Fill in the Google Drive path where you uploaded the assignment
# Example: If you create a 188 folder and put all the files under Assignment1 folder, then '188/Assignment1'
# GOOGLE_DRIVE_PATH_AFTER_MYDRIVE = '188/Assignment1'
GOOGLE_DRIVE_PATH_AFTER_MYDRIVE = 'Assignment1'
GOOGLE_DRIVE_PATH = os.path.join('drive', 'My Drive', GOOGLE_DRIVE_PATH_AFTER_MYDRIVE)
print(os.listdir(GOOGLE_DRIVE_PATH))

Once you have successfully mounted your Google Drive and located the path to this assignment, run th following cell to allow us to import from the `.py` files of this assignment. If it works correctly, it should print the message:

```
Hello from regression.py!
```

as well as the last edit time for the file `regression.py`.

In [None]:
import sys
sys.path.append(GOOGLE_DRIVE_PATH)

import time, os
os.environ["TZ"] = "US/Eastern"
time.tzset()

from regression import hello
hello()

path = os.path.join(GOOGLE_DRIVE_PATH, 'regression.py')
edit_time = time.ctime(os.path.getmtime(path))
print('regression.py last edited on %s' % edit_time)

# Data preprocessing / Visualization

## Setup code
Run some setup code for this notebook: Import some useful packages and increase the default figure size.

In [None]:
import torch
import torchvision
import cs188
import matplotlib.pyplot as plt
import statistics

plt.rcParams['figure.figsize'] = (10.0, 8.0)
plt.rcParams['font.size'] = 16

## Load the Tinyplaces dataset
The utility function `data.tinyplaces()` returns the entire tinyplaces dataset as a set of four **Torch tensors**:

- `x_train` contains all training images (real numbers in the range $[0, 1]$)
- `y_train` contains all training labels (integers in the range $[0, 1]$)
- `x_val` contains all validation images
- `y_val` contains all validation labels

In [None]:
x_train, y_train, x_val, y_val = cs188.data.tinyplaces(GOOGLE_DRIVE_PATH)

print('Training set:', )
print('  data shape:', x_train.shape)
print('  labels shape: ', y_train.shape)
print('Validation set:')
print('  data shape: ', x_val.shape)
print('  labels shape', y_val.shape)

## Visualize the dataset
This cell visualizes some random examples from the training set. We will first try the binary classification dataset.

In [None]:
import random
from torchvision.utils import make_grid

classes = ['indoor', 'outdoor']
samples_per_class = 10
samples = []
for y, cls in enumerate(classes):
    plt.text(-4, 40 * y + 18, cls, ha='right')
    idxs, = (y_train == y).nonzero(as_tuple=True)
    for i in range(samples_per_class):
        idx = idxs[random.randrange(idxs.shape[0])].item()
        samples.append(x_train[idx])
img = torchvision.utils.make_grid(samples, nrow=samples_per_class)
plt.imshow(cs188.tensor_to_image(img))
plt.axis('off')
plt.show()

For the multi-class version

In [None]:
x_train, y_train, x_val, y_val = cs188.data.tinyplaces(GOOGLE_DRIVE_PATH, binary=False)

samples_per_class = 10
samples = []
classes = ['bathroom', 'bedroom', 'bookstore', 'classroom', 'dining_room', 'food_court', 'kitchen', 'lobby', 'living_room', 'office', 'baseball_field', 'bridge', 'campsite', 'canyon', 'coast', 'fountain', 'highway', 'playground', 'mountain', 'rainforest']
for y, cls in enumerate(classes):
    plt.text(-4, 35 * y + 18, cls, ha='right')
    idxs, = (y_train == y).nonzero(as_tuple=True)
    for i in range(samples_per_class):
        idx = idxs[random.randrange(idxs.shape[0])].item()
        samples.append(x_train[idx])
img = torchvision.utils.make_grid(samples, nrow=samples_per_class)
plt.imshow(cs188.tensor_to_image(img))
plt.axis('off')
plt.show()

## Subsample the dataset
When implementing machine learning algorithms, it's usually a good idea to use a small sample of the full dataset. This way your code will run much faster, allowing for more interactive and efficient development. Once you are satisfied that you have correctly implemented the algorithm, you can then rerun with the entire dataset.

The function `cs188.data.tinyplaces()` can automatically subsample the TinyPlaces dataset for us. To see how to use it, we can check the documentation using the built-in `help` command:

In [None]:
help(cs188.data.tinyplaces)

We will subsample the data to use only 100 training examples and validation examples:

In [None]:
num_train = 100
num_val = 100

x_train, y_train, x_val, y_val = cs188.data.tinyplaces(GOOGLE_DRIVE_PATH, num_train, num_val)

print('Training set:', )
print('  data shape:', x_train.shape)
print('  labels shape: ', y_train.shape)
print('validation set:')
print('  data shape: ', x_val.shape)
print('  labels shape', y_val.shape)