<a href="https://colab.research.google.com/github/RexSword/1112-New-Learning-Algorithm/blob/main/hw_with_output_v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## HW Requirement

• Implement the code for the 2-layer neural networks in CS231n 
2021 version with PyTorch (or TensorFlow). 

• Once you have the code (regardless of which framework you 
choose above), you will apply your own data.  The training and test 
dataset is 80%:20%.

• You need to run the code with the following hyperparameter 
settings:

✓ Activation function: tanh, ReLU

✓ Data preprocessing

✓ Initial weights: small random number, Xavier or Kaiming/MSRA 
Initialization

✓ Loss function: without or with the regularization term 
(L2), λ = 
0.001 or 0.0001
$$ E(w) = \frac{1}{N}\sum^{N}_{c=1}[𝑓(X^c, w) −y^c]^2 
 + \lambda[\sum^{p}_{i=0}(w^{o}_{i})^2
 + \sum_{i=1}^{p}\sum_{j=0}^{m}(w_{ij}^H)^2]
$$
✓ Optimizer: gradient descent, Momentum, Adam

✓ Learning epochs: 100, 200, 300

✓ Amount of hidden nodes: 5, 8, 11

✓ Learning rate decay schedule: none and cosine

✓ Ensembles: top 3 models

## Connect to gdrive

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


## Checkout the GPU

In [None]:
!nvidia-smi

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.



In [None]:
import torch
# a = torch.Tensor([1000, 1000, 1000]).cuda()  # 大约占用 1.1G 显存

## Model

In [None]:
import torch
from torch import nn, optim, Generator
from torch.utils.data import DataLoader, Dataset, random_split


In [None]:
from typing import Iterable, Callable, Type
from operator import mul

def product(nums: Iterable[Type], func: Callable[[Type, Type], Type] = mul):
    def _product(nums):
        nonlocal func
        if len(nums) == 1:
            return nums[0]
        return func(nums[-1], _product(nums[:-1]))
    try:
        return _product(nums)
    except Exception as e:
        raise e

In [None]:
ACTIVES = {
    "relu": nn.ReLU,
    "tanh": nn.Tanh
}
INIT_FUNCS = {
    "small_random": lambda x: nn.init.normal_(tensor=x, mean=0, std=0.01),
    "xavier": lambda x: nn.init.xavier_uniform_(tensor=x) if len(x.shape) > 1 else None,
    "kaiming": lambda x: nn.init.kaiming_uniform_(tensor=x, nonlinearity='relu') if len(x.shape) > 1 else None
}
OPTIM_FUNCS = {
    "sgd": optim.SGD,
    "momentum": lambda param, lr, weight_decay: optim.SGD(params=param, lr=lr, momentum=0.9, weight_decay=weight_decay),
    "adam": optim.Adam
}
SCHEDULERS = {
    "cos": lambda opt: torch.optim.lr_scheduler.CosineAnnealingLR(optimizer=opt, T_max=200)
}


In [None]:
from collections.abc import Callable
class TwoLayerNetwork(nn.Module):
    def __init__(self, input_size: int, hidden_size: int, num_classes: int, init_method:Callable, active_func:nn.modules.module.Module) -> None:
        super(TwoLayerNetwork, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        ## first layer
        self.fc1 = nn.Linear(input_size, hidden_size)
        ## activation
        self.active_func = active_func()
        ## initialize
        for param in self.parameters():
            init_method(param)
        ## second layer
        self.fc2 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        out = self.fc1(x)
        out = self.active_func(out)
        out = self.fc2(out)
        return out


In [None]:
import time

def train(model: TwoLayerNetwork, opt: nn.Module, device: str, epochs: int, learning_rate: float, trainloader: DataLoader, valloader: DataLoader, criterion: nn.modules.loss._Loss, sched: optim.lr_scheduler._LRScheduler, weight_decay:float):
    model.to(device)
    optimizer = opt(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
    scheduler = sched(optimizer) if sched else None
    if epochs < 1:
        raise ValueError("Invalid epoch!!")
    else:
        epochs = int(epochs)

    # Record the start time
    start = time.time()
    # Train the model
    for epoch in range(epochs):
        train_loss = 0.0
        train_correct = 0
        model.train()
        for X, y in trainloader:
            X = X.view(-1, model.input_size).to(device)
            y = y.to(device)
            optimizer.zero_grad()
            outputs = model(X)
            loss = criterion(outputs, y)
            loss.backward()
            optimizer.step()
            train_loss += loss.item() * X.size(0)
            _, predicted = torch.max(outputs.data, 1)
            train_correct += (predicted == y).sum().item()
        train_loss /= len(trainloader.dataset)
        train_accuracy = 100. * train_correct / len(trainloader.dataset)

        # Validate the model
        val_loss = 0.0
        val_correct = 0
        model.eval()
        with torch.no_grad():
            for X, y in valloader:
                X = X.view(-1, model.input_size).to(device)
                y = y.to(device)
                outputs = model(X)
                loss = criterion(outputs, y)
                val_loss += loss.item() * X.size(0)
                _, predicted = torch.max(outputs.data, 1)
                val_correct += (predicted == y).sum().item()
            val_loss /= len(valloader.dataset)
            val_accuracy = 100. * val_correct / len(valloader.dataset)
        if scheduler:
            scheduler.step()
        # Print epoch statistics
        print('Epoch [{}/{}], Train Loss: {:.4f}, Train Accuracy: {:.2f}%, Val Loss: {:.4f}, Val Accuracy: {:.2f}%'
              .format(epoch+1, epochs, train_loss, train_accuracy, val_loss, val_accuracy))
    # record the end time
    end = time.time()
    print('Model Training Time: {} s'.format(end-start))
    return end-start


In [None]:
def test(model:nn.Module, device:str, testloader:DataLoader):
    val_correct = 0
    model.eval()
    with torch.no_grad():
        for X, y in testloader:
            X = X.view(-1, model.input_size).to(device)
            y = y.to(device)
            outputs = model(X)
            _, predicted = torch.max(outputs.data, 1)
            val_correct += (predicted == y).sum().item()
        val_accuracy = 100. * val_correct / len(testloader.dataset)
        print("Model Accutacy:{}".format(val_accuracy))
        return val_accuracy

# Dataset

### pytorch dataset

In [None]:
# load pytorch dataset

from torchvision import datasets, transforms


def getPytorchData():
    trainset = datasets.FashionMNIST(
        root="./data/", train=True, download=True, transform=transforms.transforms.ToTensor())
    datum_size = product(trainset[0][0].size())
    class_amount = len(trainset.classes)
    testset = datasets.FashionMNIST(
        root="./data/", train=False, download=True, transform=transforms.transforms.ToTensor())
    # Split the training set into training and validation sets
    train_count = int(0.8 * len(trainset))
    valid_count = len(trainset) - train_count
    print(train_count, valid_count, len(testset))
    trainset, valset = random_split(
        trainset, (train_count, valid_count), Generator().manual_seed(42))
    # Create data loaders to load the data in batches
    trainloader = DataLoader(trainset, batch_size=32, shuffle=True)
    valloader = DataLoader(valset, batch_size=32, shuffle=True)
    testloader = DataLoader(testset, batch_size=32, shuffle=True)
    return trainloader, valloader, testloader, datum_size, class_amount


### customized pytorch dataset

In [None]:
import pandas as pd
import numpy as np
class HotelReservationDataset(Dataset):
    """Hotel Reservation dataset."""

    def __init__(self, csv_path):
        """
        Args:
            csv_path (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        # 19
        reservations = pd.read_csv(csv_path)
        # 5
        for col in map(lambda x: x[0], filter(lambda x:x[1]=="O", reservations.dtypes.items())):
            d = dict((j, i) for i, j in enumerate(reservations[col].value_counts().index))
            setattr(self, f"labels_of_{col}", d.keys())
            reservations[col]=reservations[col].map(d.__getitem__)
        # 17(drop id)
        self.feature = torch.from_numpy(reservations.iloc[:, 1:-1].to_numpy(dtype=np.float32))
        # two status
        self.booking_status = torch.reshape(torch.tensor(reservations.iloc[:, -1:].to_numpy()), shape=(len(self.feature),))
        self.classes = list(getattr(self, f"labels_of_{reservations.columns[-1]}"))
    def __len__(self):
        return len(self.booking_status)

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()
        return self.feature[idx], self.booking_status[idx]

# kaggle: ahsan81/hotel-reservations-classification-dataset
def getCustomizedData():
    # preprocess
    dataset = HotelReservationDataset(
        csv_path=r"D:\dataset\archive\Hotel Reservations.csv")
    class_amount = len(dataset.classes)
    # train test split
    train_count = int(0.7 * len(dataset))
    valid_count = int(0.2 * len(dataset))
    test_count = len(dataset) - train_count - valid_count
    print(train_count, valid_count, test_count)
    trainset, valset, testset = random_split(
        dataset, (train_count, valid_count, test_count), Generator().manual_seed(42))
    datum_size = product(trainset[0][0].size())
    # set loaders
    trainloader = DataLoader(trainset, batch_size=32, shuffle=True)
    valloader = DataLoader(valset, batch_size=32, shuffle=True)
    testloader = DataLoader(testset, batch_size=32, shuffle=True)
    return trainloader, valloader, testloader, datum_size, class_amount


### kaggle dataset

In [None]:
# # download data(zipped csv) from kaggle with username and apikey
# import os
# import json
# with open("kaggle.json", "r") as j:
#     for (k, v) in json.load(j).items():
#         os.environ[k] = v
# from kaggle.api.kaggle_api_extended import KaggleApi
# api = KaggleApi()
# api.authenticate()
# # https://www.kaggle.com/datasets/uciml/iris/download?datasetVersionNumber=2
# # owner/datasetname
# api.dataset_download_files('uciml/iris', path="./data/")


## Training

In [None]:
import sys


def training_schedule():
    sys.stdout = open("./result/", "w")
    # processor
    device = "cuda" if torch.cuda.is_available(
    ) else "mps" if torch.backends.mps.is_available() else "cpu"
    # hyper parameters
    trainloader, valloader, testloader, input_size, output_size = getPytorchData()
    learning_rate = 0.001
    criterion = nn.CrossEntropyLoss()
    # ✓ Amount of hidden nodes: 5, 8, 11
    for hidden_size in (5, 8, 11):
        # ✓ Learning epochs: 100, 200, 300
        for epochs in (100, 200, 300):
            # Create model, optimizer, scheduler
            for (init, method) in INIT_FUNCS.items():
                for (active, func) in ACTIVES.items():
                    # ✓ Activation function: tanh, ReLU
                    # ✓ Initial weights: small random number, Xavier or Kaiming/MSRA Initialization
                    model = TwoLayerNetwork(input_size, hidden_size, output_size,
                                            init_method=method, active_func=func).to(device)
                    # ✓ Optimizer: gradient descent, Momentum, Adam
                    for (optimize, optm) in OPTIM_FUNCS.items():
                        # ✓ Learning rate decay schedule: none and cosine
                        for (schedule, schd) in SCHEDULERS.items():
                            # ✓ Loss function: without or with L2, λ = 0.001 or 0.0001
                            for weight_decay in (0.0, 0.001, 0.0001):
                                print(hidden_size, epochs, init, active,optimize, schedule, "start")
                                train(model=model, optm=optm, device=device, epochs=epochs, learning_rate=learning_rate,
                                      trainloader=trainloader, valloader=valloader, criterion=criterion, sched=schd, weight_decay=weight_decay)
                                test(model=model, device=device, testloader=testloader)


In [None]:
import datetime

def _training_schedule():
    
    # FILE_PATH = "/{}.txt".format(datetime.date.today())
    
    # alternative for colab only
    FILE_PATH = "/content/gdrive/MyDrive/{}.txt".format(datetime.date.today())
    counter = 1
    test_result = {}

    with open(FILE_PATH,"w") as f:
      f.write("{}\n\n".format(datetime.datetime.today()))

    def write_spec_to_file():
        with open(FILE_PATH,"a") as f:
            f.write("- Model {} -\n".format(counter))
            f.write("hidden nodes: {} \nepochs: {} \ninit: {} \nactive: {} \noptimize: {} \nschedule: {} \nweight decay: {}\n".format(
                hidden_size, epochs, init, active, optimize, schedule, weight_decay))
            f.write("-"*50)

    def write_result_to_file(rs, tm):
        with open(FILE_PATH,"a") as f:
            f.write("\nModel Accuracy:{}\n".format(rs))
            f.write("Training Time:{.2f} s\n".format(tm))
            f.write("-"*50+"\n")

    # processor
    device = "cuda" if torch.cuda.is_available(
    ) else "mps" if torch.backends.mps.is_available() else "cpu"
    
    # hyper parameters
    trainloader, valloader, testloader, input_size, output_size = getPytorchData()
    learning_rate = 0.001
    criterion = nn.CrossEntropyLoss()
    
    hidden_size = 5
    epochs = 10
    init = "small_random"
    method = INIT_FUNCS[init]
    active = "relu"
    func = ACTIVES[active]
    optimize = "sgd"
    optm = OPTIM_FUNCS[optimize]
    schedule = None
    schd = schedule
    weight_decay = 0.0
    
    # ✓ Amount of hidden nodes: 5, 8, 11
    for hidden_size in (5, 8, 11):
        model = TwoLayerNetwork(input_size=input_size, hidden_size=hidden_size,
                                num_classes=output_size, init_method=method, active_func=func)
        print(hidden_size, epochs, init, active, optimize, schedule, weight_decay, "start")
        write_spec_to_file()

        time = train(model=model, opt=optm, device=device, epochs=epochs, learning_rate=learning_rate,
              trainloader=trainloader, valloader=valloader, criterion=criterion, sched=schd, weight_decay=weight_decay)
        result = test(model=model, device=device, testloader=testloader)
        write_result_to_file(result, time)

        test_result["model{}".format(counter)] = result
        counter += 1
    # ✓ Learning epochs: 100, 200, 300
    for epochs in (100, 200, 300):
        if epochs == 100:
          continue        
        model = TwoLayerNetwork(input_size=input_size, hidden_size=hidden_size,
                                num_classes=output_size, init_method=method, active_func=func)
        print(hidden_size, epochs, init, active, optimize, schedule, weight_decay, "start")
        write_spec_to_file()

        time = train(model=model, opt=optm, device=device, epochs=epochs, learning_rate=learning_rate,
              trainloader=trainloader, valloader=valloader, criterion=criterion, sched=schd, weight_decay=weight_decay)
        result = test(model=model, device=device, testloader=testloader)
        write_result_to_file(result, time)

        test_result["model{}".format(counter)] = result
        counter += 1
    # Create model, optimizer, scheduler
    for (init, method) in INIT_FUNCS.items():
        if init == "small_random":
          continue

        model = TwoLayerNetwork(input_size=input_size, hidden_size=hidden_size,
                                num_classes=output_size, init_method=method, active_func=func)
        print(hidden_size, epochs, init, active, optimize, schedule, weight_decay, "start")
        write_spec_to_file()

        time = train(model=model, opt=optm, device=device, epochs=epochs, learning_rate=learning_rate,
              trainloader=trainloader, valloader=valloader, criterion=criterion, sched=schd, weight_decay=weight_decay)
        result = test(model=model, device=device, testloader=testloader)
        write_result_to_file(result, time)

        test_result["model{}".format(counter)] = result
        counter += 1
    for (active, func) in ACTIVES.items():
        if active == "relu":
          continue
        model = TwoLayerNetwork(input_size=input_size, hidden_size=hidden_size,
                                num_classes=output_size, init_method=method, active_func=func)
        print(hidden_size, epochs, init, active, optimize, schedule, weight_decay, "start")
        write_spec_to_file()

        time = train(model=model, opt=optm, device=device, epochs=epochs, learning_rate=learning_rate,
              trainloader=trainloader, valloader=valloader, criterion=criterion, sched=schd, weight_decay=weight_decay)
        result = test(model=model, device=device, testloader=testloader)
        write_result_to_file(result, time)

        test_result["model{}".format(counter)] = result
        counter += 1
    
    # ✓ Activation function: tanh, ReLU
    # ✓ Initial weights: small random number, Xavier or Kaiming/MSRA Initialization
    model = TwoLayerNetwork(input_size=input_size, hidden_size=hidden_size,
                            num_classes=output_size, init_method=method, active_func=func)
    
    # ✓ Optimizer: gradient descent, Momentum, Adam
    for (optimize, optm) in OPTIM_FUNCS.items():
        if optimize == "sgd":
          continue

        print(hidden_size, epochs, init, active, optimize, schedule, weight_decay, "start")
        write_spec_to_file()

        time = train(model=model, opt=optm, device=device, epochs=epochs, learning_rate=learning_rate,
              trainloader=trainloader, valloader=valloader, criterion=criterion, sched=schd, weight_decay=weight_decay)
        result = test(model=model, device=device, testloader=testloader)
        write_result_to_file(result, time)

        test_result["model{}".format(counter)] = result
        counter += 1
    # ✓ Learning rate decay schedule: none and cosine
    for (schedule, schd) in SCHEDULERS.items():
        print(hidden_size, epochs, init, active, optimize, schedule, weight_decay, "start")
        write_spec_to_file()

        time = train(model=model, opt=optm, device=device, epochs=epochs, learning_rate=learning_rate,
              trainloader=trainloader, valloader=valloader, criterion=criterion, sched=schd, weight_decay=weight_decay)
        result = test(model=model, device=device, testloader=testloader)
        write_result_to_file(result, time)

        test_result["model{}".format(counter)] = result
        counter += 1
    # ✓ Loss function: without or with L2, λ = 0.001 or 0.0001
    for weight_decay in (0.0, 0.001, 0.0001):
        if weight_decay == 0.0:
          continue

        print(hidden_size, epochs, init, active, optimize, schedule, weight_decay, "start")
        write_spec_to_file()
        
        time = train(model=model, opt=optm, device=device, epochs=epochs, learning_rate=learning_rate,
              trainloader=trainloader, valloader=valloader, criterion=criterion, sched=schd, weight_decay=weight_decay)
        result = test(model=model, device=device, testloader=testloader)
        write_result_to_file(result, time)

        test_result["model{}".format(counter)] = result
        counter += 1

    top3 = sorted(test_result, key=test_result.get, reverse=True)[:3]
    print("\nTop 3 Model:{}\n".format(','.join(top3)))
    with open(FILE_PATH,"a") as f:
        f.write("\nTop 3 Model:{}\n".format(','.join(top3)))

In [None]:
_training_schedule()

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./data/FashionMNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/26421880 [00:00<?, ?it/s]

Extracting ./data/FashionMNIST/raw/train-images-idx3-ubyte.gz to ./data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/29515 [00:00<?, ?it/s]

Extracting ./data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/4422102 [00:00<?, ?it/s]

Extracting ./data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ./data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/5148 [00:00<?, ?it/s]

Extracting ./data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw

48000 12000 10000
5 10 small_random relu sgd None 0.0 start
Epoch [1/10], Train Loss: 2.1905, Train Accuracy: 13.07%, Val Loss: 2.0389, Val Accuracy: 28.24%
Epoch [2/10], Train Loss: 1.8573, Train Accuracy: 38.36%, Val Loss: 1.7086, Val Accuracy: 39.40%
Epoch [3/10], Train Loss: 1.5434, Train Accuracy: 44.54%, Val Loss: 1.4191, Val Accuracy: 49.75%
Epoch [4/10], Train Loss: 1.2782, Train Accuracy: 56.31%, Val Loss: 1.1736, Val Accuracy: 62.46%
Epoch [5/10], Train Loss: 1.0788, Train Accuracy: 63.99%, Val Loss: 1.0222, Val Accuracy: 64.07%
Epoch [6/10], Train Loss: 0.9636, Train Accuracy: 64.95%, Val Loss: 0.9378, Val Accuracy: 65.10%
Epoch [7/10], Train Loss: 0.8962, Train Accuracy: 66.16%, Val Loss: 0.8858, Val Accuracy: 66.34%
Epoch [8/10], Train Loss: 0.8520, Train Accuracy: 67.33%, Val Loss: 0.8496, Val Accuracy: 67.52%
Epoch [9/10], Train Loss: 0.8196, Train Accuracy: 68.65%, Val Loss: 0.8209,

KeyboardInterrupt: ignored