<a href="https://colab.research.google.com/github/HardikPaliwal/CS484Proj/blob/master/cs484%20proj.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **CS484 Final Project**

#### Topic 6: Weakly supervised classification

#### Hardik Paliwal (20725413), Lance Pereira (20719626)

______________________________________________________________

#**Table Of Contents**
- A) Abstract
- B) High Level Goals and Methedology
- C) Team Members and Contributions
- D) External Code Libraries
- E) Code
  - 1) Setup (imports, loading data)
  - 2) Define our Base CNN (Based on VGG11), Test, Train methods
  - 3) Split Fashion MNIST training data into M labeled images, and N-M unlabeled
  - 4) Train Base CNN on M labeled training images
  - 5) Use the 'semi' trained CNN to gain important features (feature embedings) of N-M *unlabeled* training images
  - 6) Match cluster labels to actual Fashion MNIST labels
  - 7) Define retrained CNN function using predicted labels from clustering for unlabeled data
- F) Experiments
  - 1) Diffirent ratios of M/N
  - 2) Diffirent Clustering Methods
- G) Results
- H) Conclusions

#**A) Abstract:**

For our project we have decided to do Project 6, choosing specifically Fashion MNIST. We use weakly supervised classification using feature extraction (embedings similar to a PCA) along with clustering methods to try and improve results compared to just using supervised learning.
 

#**B) High Level Goals and Methedology:**

Out high level goal is to use weakly supervised classification to improve our 
prediction ability compared to just training using labeled images. We hope to 
achieve atleast a 5% increase in our test prediction score using clustering 
methods such as Kmeans and Mini-Batch Kmeans. Our attempts to use other clustering methods were stopped by our limited RAM capacity, but Kmeans and Mini-Batch Kmeans are quite effective.

Our method is split into two experiments, 
- we first will test to see training
a CNN on M labeled images, then use a cluestering method to classify all the 
N images, using the majority label in each cluster as the predicted label for the unlabeled N-M images
- secondly we will try using the predicted labels from the previous step to 
train a new CNN model, to see if it performs better than the original CNN model trained on just the M labeled images

Our baseline will be a simple CNN  trained on the M labeled images. We will use this to see if our weakly supervised method improved the accuracy rate. 

#**C) Team Members and Contributions**

- Hardik Paliwal
  - Created CNN based on VGG11 for training
  - Created function to gather features (feature embedings) from pretrained CNN (trained on M labeled images)
  - Created method to get predicted labels from clustering methods
  - Created train,test methods

- Lance Pereira
  - Create function to retrain CNN using clustered labels
  - Modified clustering prediction methods to use only labeled training data
  - Created experiments for diffirent ratios of M:M-N
  - Created function to split data into labeled and unlabeled
  - Created experiments for diffirent clustering methods
    - Mini Batch Kmeans
    - Kmeans
  - Wrote/formated report



#**D) External Code Libraries**

We used

- Pytorch
  - Because it was crucial for the quick training of our models
  - Allowed us to not have to deal with calculating back propogation
- Sklearn
  - Provided us a large range of clustering methods for quick experimentation
- Numpy
  - Useful for large matrix operations

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision as tv
import matplotlib.pyplot as plt
from torch.utils.data import DataLoader
from torch.autograd import Variable
import sklearn
from torch import optim
import numpy as np
import itertools

%matplotlib inline

# **E) Code**

### **E) [1] Code: Setup**

The below cells:
- Import Fshion MNIST
- define functions that strip label from data

In [2]:
# Constants
dev=torch.device("cuda") 
NUM_EPOCHS = 4
NUM_CLUSTERS = 30
NUM_CLASSES = 10
UNKNOWN_CLASS = 11

In [3]:
!mkdir -p data/t1
!mkdir -p data/t2
!mkdir -p data/t3
!mkdir -p data/t4
!mkdir -p data/t5
!mkdir -p data/t6

In [4]:
# trainset = tv.datasets.FashionMNIST(root="./", download=True,train=True,  transform=tv.transforms.Compose(
#     [tv.transforms.Resize(32), tv.transforms.ToTensor()]))
# trainloader = DataLoader(trainset, batch_size=128, shuffle=False)

testset = tv.datasets.FashionMNIST(root="./", download=True,train=False,  transform=tv.transforms.Compose(
    [tv.transforms.Resize(32), tv.transforms.ToTensor()]))
testloader = DataLoader(testset, batch_size=128, shuffle=True)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./FashionMNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/26421880 [00:00<?, ?it/s]

Extracting ./FashionMNIST/raw/train-images-idx3-ubyte.gz to ./FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./FashionMNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/29515 [00:00<?, ?it/s]

Extracting ./FashionMNIST/raw/train-labels-idx1-ubyte.gz to ./FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./FashionMNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/4422102 [00:00<?, ?it/s]

Extracting ./FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ./FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/5148 [00:00<?, ?it/s]

Extracting ./FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to ./FashionMNIST/raw



### **E) [2] Code: Define our Base CNN (Based on VGG11), Test, Train methods**

In [5]:
#This is an implementation of VGG11 (which is a precursor to VGG16) for mnist dataset.
# it also takes in n, which is the number of classes. N+1 class stands for unknown. 

#this will let us differeniate the unlabelled data from the labelled data
class BasicNet(nn.Module):
    def __init__(self, n=9):
        super(BasicNet, self).__init__()
        self.batchNorm = [nn.BatchNorm2d(64), nn.BatchNorm2d(128),nn.BatchNorm2d(256), nn.BatchNorm2d(256),
                          nn.BatchNorm2d(512), nn.BatchNorm2d(512), nn.BatchNorm2d(512), nn.BatchNorm2d(512)]
        self.conv = [
        nn.Conv2d(1, 64, 3, 1, 1) ,nn.Conv2d(64, 128, 3, 1, 1), nn.Conv2d(128, 256, 3, 1, 1), nn.Conv2d(256, 256, 3, 1, 1)
       ,nn.Conv2d(256, 512, 3, 1, 1), nn.Conv2d(512, 512, 3, 1, 1), nn.Conv2d(512, 512, 3, 1, 1), nn.Conv2d(512, 512, 3, 1, 1)
        ]
        maxPool = nn.MaxPool2d(2, stride=2)
        self.conv1 = nn.Sequential(self.conv[0], self.batchNorm[0], nn.ReLU(), maxPool)
        self.conv2 = nn.Sequential(self.conv[1], self.batchNorm[1], nn.ReLU(), maxPool)
        self.conv3 = nn.Sequential(self.conv[2], self.batchNorm[2], nn.ReLU())
        self.conv4 = nn.Sequential(self.conv[3], self.batchNorm[3], nn.ReLU(), maxPool) 
        self.conv5 = nn.Sequential(self.conv[4], self.batchNorm[4], nn.ReLU())
        self.conv6 = nn.Sequential(self.conv[5], self.batchNorm[5], nn.ReLU(), maxPool)
        self.conv7 = nn.Sequential(self.conv[6], self.batchNorm[6], nn.ReLU()) 
        self.conv8 = nn.Sequential(self.conv[7], self.batchNorm[7], nn.ReLU(), maxPool)
        self.fc1 = nn.Linear(512, 4096)
        self.fc2 = nn.Linear(4096, 4096)
        self.fc3 = nn.Linear(4096, n+1)
        
    def forward(self, x, feature_embedding=False):
        dropOut = nn.Dropout(p=0.5)

        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.conv4(x)
        x = self.conv5(x)
        x = self.conv6(x)
        x = self.conv7(x)
        x = self.conv8(x)
        x = torch.flatten(x, 1)
        
        x = dropOut(F.relu(self.fc1(x)))
        x = dropOut(F.relu(self.fc2(x)))
        if(feature_embedding):
          return x
        #Not sure why this works without softmax. probably a reasoning givin in the paper (cause the output can range from anything (not normalized to a 0-1 probability range))
        x = self.fc3(x)
        return x

In [6]:
def test(data, net):
    net.eval()
    loss_func = nn.CrossEntropyLoss()

    total_correct = 0
    total_loss = 0
    with torch.no_grad():
        correct = 0
        total = 0
        for i, (images, labels) in enumerate(data):
            images= images.to(dev)
            labels = labels.to(dev)
            test_pred = net(images)

            pred = torch.max(test_pred, 1)[1].data.squeeze()
            total_correct+= (pred == labels).sum().item()
            loss = loss_func( test_pred, labels)
            total_loss+= loss.item()*images.size(0)
        # return total_correct/len(data.dataset), total_loss/len(data.dataset)
        # Note From Lance: We shouldn't divide the total loss
        return total_correct/len(data.dataset), total_loss
  
def train(num_epochs, net, trainloader):
    optimizer = optim.SGD(net.parameters(), lr=0.01)
    loss_func = nn.CrossEntropyLoss()

    accuracy_through_epochs = []
    total_step = len(trainloader)
    
    for epoch in range(num_epochs):
        net.train()
        for i, (images, labels) in enumerate(trainloader):
            images= images.to(dev)
            labels = labels.to(dev)
            optimizer.zero_grad()           
            prediction = net(images)
            loss = loss_func( prediction, labels)
            loss.backward()
            optimizer.step()
            if ((i +1) % 100 == 0):
                print(f"Epoch {epoch+1} / {num_epochs}, Step {i+1}/ {total_step} , Loss {loss.item()}")

    return accuracy_through_epochs, net

### **E) [3] Code: Split Fashion MNIST training data into M labeled images, and N-M unlabeled**

In [7]:
#Modifies dataset in place to only have values correspounding to the labels in classesToUse
def splitTrainingData(training_data, M_percent):
  # N is len(training_data)
  len_N = len(training_data)

  # M is the number of labeled images we want
  len_M = int(M_percent*len_N)
  
  labeled_data, unlabeled_data = torch.utils.data.random_split(training_data, [len_M, len_N - len_M])

  # strip the labels from unlabeled_data
  # unlabeled_data.dataset.targets[unlabeled_data.indices] = UNKNOWN_CLASS

  labeled_data_loader = DataLoader(labeled_data, batch_size=128, shuffle=True)
  unlabeled_data_loader = DataLoader(unlabeled_data, batch_size=128, shuffle=True)

  return labeled_data_loader, unlabeled_data_loader, labeled_data, unlabeled_data


In [8]:
# labeled_train_loader, unlabeled_train_loader, labeled_train_data, unlabeled_train_data = splitTrainingData(trainset, 0.7)

### **E) [4] Code: Train Base CNN on M labeled training images**

In [9]:
# This will be done in results baseline

# net = BasicNet()
# net.to(dev)
# result, trained_net = train(NUM_EPOCHS, net, labeled_train_loader)

### **E) [5] Code: Use the 'semi' trained CNN to gain important features (feature embedings) of N-M *unlabeled* training images**

In [10]:
#do this to store the results in 1 numpy array of 10000 images vs like 20 batches of size 128 images. 
#get memory error when doing it on a batch of size len(trainloader), so we have to combine the results for trainloader

def getFeatureEmbedings(dataloader, pretrained_net):
  featureEmbed = []
  predictedDigit = []

  with torch.no_grad():
    for i, (images, labels) in enumerate(dataloader):
        images= images.to(dev)
        labels = labels.to(dev)
        featureEmbed.append(pretrained_net(images, feature_embedding=True).to("cpu").numpy())
        pred = pretrained_net(images)
        predictedDigit.append(torch.max(pred, 1)[1].data.squeeze().to("cpu").numpy())

  # flatten lists
  featureEmbed = np.array(list(itertools.chain(*featureEmbed)))
  predictedDigit = np.array(list(itertools.chain(*predictedDigit)))

  return featureEmbed, predictedDigit

In [11]:
# We need the data with the labels
# trainFeatureEmbed, trainPredictedDigit = getFeatureEmbedings(trainloader)

### **E) [6] Code: Match cluster labels to actual Fashion MNIST labels**

In [12]:
#In order to see how well k-means did we can use this supervised method of defining what a cluster is by seting the cluster label as the most common digits in that cluster
#unsupervised approaches include: manually selecting class depending on mean image
def retrieve_cluster_to_classification(cluster_labels,y_train):
  reference_labels = {}
# For loop to run through each label of cluster label
  for i in range(len(np.unique(kmeans.labels_))):
    index = np.where(cluster_labels == i,1,0)
    # we only read 0:NUM_CLASSES so we dont read the unknown labels
    num = np.bincount(y_train[index==1])[:NUM_CLASSES].argmax()
    reference_labels[i] = num
    # TODO: Right now the refrence label just maps to the majority label, should we also consider the 2nd and 3rd highest
  return reference_labels

### **E) [7] Code: Define retrained CNN function using predicted labels from clustering for unlabeled data**



In [13]:
def retrain_CNN_with_predicted_labels(train_dataset, train_loader, my_labeled_trainset, my_labeled_trainset_loader, my_unlabeled_trainset, unlabeled_train_loader, cluster_method):
  
  print("Training PRETRAINED NET: ")
  # train on labeled data
  pretrained_net = BasicNet()
  pretrained_net.to(dev)
  result, trained_net = train(NUM_EPOCHS, pretrained_net, my_labeled_trainset_loader)

  # remove all unlabeled labels from main training set, set them to NUM_CLASSES
  train_targets = np.array(train_dataset.targets.numpy(), copy=True)  
  train_targets[my_unlabeled_trainset.indices] = NUM_CLASSES
  
  trainFeatureEmbed, _ = getFeatureEmbedings(train_loader, pretrained_net)
  unlabeledTrainFeatureEmbed, _ = getFeatureEmbedings(unlabeled_train_loader, pretrained_net)

  cluster_method.fit(trainFeatureEmbed)

  # mapping from NUM_CLUSTERS to FASHION_MNIST classes
  reference_labels = retrieve_cluster_to_classification(
      cluster_method.labels_, 
      train_targets
  )
  
  # Assign new labels
  predicted_test = cluster_method.predict(unlabeledTrainFeatureEmbed)
  
  for i in range(unlabeledTrainFeatureEmbed.shape[0]):
    my_unlabeled_trainset.dataset.targets[my_unlabeled_trainset.indices[i]] = reference_labels[predicted_test[i]]

  new_combined_data = torch.utils.data.ConcatDataset([my_labeled_trainset, my_unlabeled_trainset])
  new_combined_data_loader = DataLoader(new_combined_data, batch_size=128, shuffle=True)
  print("\nTraining RE-TRAINED NET: ")
  retrained_net = BasicNet()
  retrained_net.to(dev)
  result, retrained_net = train(NUM_EPOCHS, retrained_net, new_combined_data_loader)
  return retrained_net



#**Experiments**


###Experiment A) First we will create three experiments using diffirent ratios of M:N-M


1.   Ratio of 70% labeled to 30% unlabeled
2.   Ratio of 50% labeled to 50% unlabeled
3.   Ratio of 30% labeled to 70% unlabeled




In [14]:
# Experiment A: Diffirent Ratios of M to N

trainset1 = tv.datasets.FashionMNIST(root="./data/t1/", download=True,train=True,  transform=tv.transforms.Compose(
    [tv.transforms.Resize(32), tv.transforms.ToTensor()]))
trainloader1 = DataLoader(trainset1, batch_size=128, shuffle=False)

trainset2 = tv.datasets.FashionMNIST(root="./data/t2/", download=True,train=True,  transform=tv.transforms.Compose(
    [tv.transforms.Resize(32), tv.transforms.ToTensor()]))
trainloader2 = DataLoader(trainset2, batch_size=128, shuffle=False)

trainset3 = tv.datasets.FashionMNIST(root="./data/t3/", download=True,train=True,  transform=tv.transforms.Compose(
    [tv.transforms.Resize(32), tv.transforms.ToTensor()]))
trainloader3 = DataLoader(trainset3, batch_size=128, shuffle=False)


Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./data/t1/FashionMNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/26421880 [00:00<?, ?it/s]

Extracting ./data/t1/FashionMNIST/raw/train-images-idx3-ubyte.gz to ./data/t1/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./data/t1/FashionMNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/29515 [00:00<?, ?it/s]

Extracting ./data/t1/FashionMNIST/raw/train-labels-idx1-ubyte.gz to ./data/t1/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./data/t1/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/4422102 [00:00<?, ?it/s]

Extracting ./data/t1/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ./data/t1/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./data/t1/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/5148 [00:00<?, ?it/s]

Extracting ./data/t1/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/t1/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./data/t2/FashionMNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/26421880 [00:00<?, ?it/s]

Extracting ./data/t2/FashionMNIST/raw/train-images-idx3-ubyte.gz to ./data/t2/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./data/t2/FashionMNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/29515 [00:00<?, ?it/s]

Extracting ./data/t2/FashionMNIST/raw/train-labels-idx1-ubyte.gz to ./data/t2/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./data/t2/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/4422102 [00:00<?, ?it/s]

Extracting ./data/t2/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ./data/t2/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./data/t2/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/5148 [00:00<?, ?it/s]

Extracting ./data/t2/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/t2/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./data/t3/FashionMNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/26421880 [00:00<?, ?it/s]

Extracting ./data/t3/FashionMNIST/raw/train-images-idx3-ubyte.gz to ./data/t3/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./data/t3/FashionMNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/29515 [00:00<?, ?it/s]

Extracting ./data/t3/FashionMNIST/raw/train-labels-idx1-ubyte.gz to ./data/t3/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./data/t3/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/4422102 [00:00<?, ?it/s]

Extracting ./data/t3/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ./data/t3/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./data/t3/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/5148 [00:00<?, ?it/s]

Extracting ./data/t3/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/t3/FashionMNIST/raw



###Experiment B) Then we will try training it with diffirent clustering methods

In [15]:
from sklearn.cluster import MiniBatchKMeans

kmeans = MiniBatchKMeans(n_clusters = NUM_CLUSTERS)

In [16]:
from sklearn.cluster import KMeans

gmm = KMeans(n_clusters = NUM_CLUSTERS)

###Experiment C) Train all the models combining diffirent experiments

In [17]:
labeled_train_loader_70, unlabeled_train_loader_30, labeled_train_data_70, unlabeled_train_data_30 = splitTrainingData(trainset1, 0.7)

retrained_net_kmeans_70_label_30_not = retrain_CNN_with_predicted_labels(
    trainset1, 
    trainloader1, 
    labeled_train_data_70, 
    labeled_train_loader_70,
    unlabeled_train_data_30, 
    unlabeled_train_loader_30, 
    kmeans
  )

accuracy_kmeans_70_label_30_not, loss = test(testloader, retrained_net_kmeans_70_label_30_not)


Training PRETRAINED NET: 
Epoch 1 / 4, Step 100/ 329 , Loss 0.7010186314582825
Epoch 1 / 4, Step 200/ 329 , Loss 0.34918907284736633
Epoch 1 / 4, Step 300/ 329 , Loss 0.33810415863990784
Epoch 2 / 4, Step 100/ 329 , Loss 0.3904504179954529
Epoch 2 / 4, Step 200/ 329 , Loss 0.290596067905426
Epoch 2 / 4, Step 300/ 329 , Loss 0.32221829891204834
Epoch 3 / 4, Step 100/ 329 , Loss 0.2152564823627472
Epoch 3 / 4, Step 200/ 329 , Loss 0.2274576723575592
Epoch 3 / 4, Step 300/ 329 , Loss 0.24228908121585846
Epoch 4 / 4, Step 100/ 329 , Loss 0.2220127284526825
Epoch 4 / 4, Step 200/ 329 , Loss 0.10616663098335266
Epoch 4 / 4, Step 300/ 329 , Loss 0.2788299322128296

Training RE-TRAINED NET: 
Epoch 1 / 4, Step 100/ 469 , Loss 1.518364429473877
Epoch 1 / 4, Step 200/ 469 , Loss 1.3597995042800903
Epoch 1 / 4, Step 300/ 469 , Loss 1.3184791803359985
Epoch 1 / 4, Step 400/ 469 , Loss 1.2740817070007324
Epoch 2 / 4, Step 100/ 469 , Loss 1.1837360858917236
Epoch 2 / 4, Step 200/ 469 , Loss 1.2902848

In [18]:
labeled_train_loader_50, unlabeled_train_loader_50, labeled_train_data_50, unlabeled_train_data_50 = splitTrainingData(trainset2, 0.5)

retrained_net_kmeans_50_label_50_not = retrain_CNN_with_predicted_labels(
    trainset2, 
    trainloader2,
    labeled_train_data_50, 
    labeled_train_loader_50,
    unlabeled_train_data_50, 
    unlabeled_train_loader_50, 
    kmeans
  )

accuracy_kmeans_50_label_50_not, loss = test(testloader, retrained_net_kmeans_50_label_50_not)


Training PRETRAINED NET: 
Epoch 1 / 4, Step 100/ 235 , Loss 0.5326619148254395
Epoch 1 / 4, Step 200/ 235 , Loss 0.3434411287307739
Epoch 2 / 4, Step 100/ 235 , Loss 0.3386166989803314
Epoch 2 / 4, Step 200/ 235 , Loss 0.3593297004699707
Epoch 3 / 4, Step 100/ 235 , Loss 0.20776411890983582
Epoch 3 / 4, Step 200/ 235 , Loss 0.2451578825712204
Epoch 4 / 4, Step 100/ 235 , Loss 0.20684514939785004
Epoch 4 / 4, Step 200/ 235 , Loss 0.23769143223762512

Training RE-TRAINED NET: 
Epoch 1 / 4, Step 100/ 469 , Loss 1.541312336921692
Epoch 1 / 4, Step 200/ 469 , Loss 1.3205475807189941
Epoch 1 / 4, Step 300/ 469 , Loss 1.1669249534606934
Epoch 1 / 4, Step 400/ 469 , Loss 1.4243630170822144
Epoch 2 / 4, Step 100/ 469 , Loss 1.2065227031707764
Epoch 2 / 4, Step 200/ 469 , Loss 1.1233632564544678
Epoch 2 / 4, Step 300/ 469 , Loss 1.1760815382003784
Epoch 2 / 4, Step 400/ 469 , Loss 1.116479754447937
Epoch 3 / 4, Step 100/ 469 , Loss 1.1860990524291992
Epoch 3 / 4, Step 200/ 469 , Loss 1.182968616

In [19]:
labeled_train_loader_30, unlabeled_train_loader_70, labeled_train_data_30, unlabeled_train_data_70 = splitTrainingData(trainset3, 0.3)

retrained_net_kmeans_30_label_70_not = retrain_CNN_with_predicted_labels(
    trainset3, 
    trainloader3,
    labeled_train_data_30, 
    labeled_train_loader_30,
    unlabeled_train_data_70, 
    unlabeled_train_loader_70, 
    kmeans
  )

accuracy_kmeans_30_label_70_not, loss = test(testloader, retrained_net_kmeans_30_label_70_not)


Training PRETRAINED NET: 
Epoch 1 / 4, Step 100/ 141 , Loss 0.6282566785812378
Epoch 2 / 4, Step 100/ 141 , Loss 0.36442428827285767
Epoch 3 / 4, Step 100/ 141 , Loss 0.25593528151512146
Epoch 4 / 4, Step 100/ 141 , Loss 0.16836588084697723

Training RE-TRAINED NET: 
Epoch 1 / 4, Step 100/ 469 , Loss 2.110738515853882
Epoch 1 / 4, Step 200/ 469 , Loss 1.8764426708221436
Epoch 1 / 4, Step 300/ 469 , Loss 1.9614042043685913
Epoch 1 / 4, Step 400/ 469 , Loss 1.964616060256958
Epoch 2 / 4, Step 100/ 469 , Loss 1.9200505018234253
Epoch 2 / 4, Step 200/ 469 , Loss 1.8790396451950073
Epoch 2 / 4, Step 300/ 469 , Loss 1.827128529548645
Epoch 2 / 4, Step 400/ 469 , Loss 1.8143970966339111
Epoch 3 / 4, Step 100/ 469 , Loss 1.896414875984192
Epoch 3 / 4, Step 200/ 469 , Loss 1.8787328004837036
Epoch 3 / 4, Step 300/ 469 , Loss 1.956749439239502
Epoch 3 / 4, Step 400/ 469 , Loss 1.9459476470947266
Epoch 4 / 4, Step 100/ 469 , Loss 1.9078254699707031
Epoch 4 / 4, Step 200/ 469 , Loss 1.908478617668

In [20]:
trainset4 = tv.datasets.FashionMNIST(root="./data/t4/", download=True,train=True,  transform=tv.transforms.Compose(
    [tv.transforms.Resize(32), tv.transforms.ToTensor()]))
trainloader4 = DataLoader(trainset4, batch_size=128, shuffle=False)

trainset5 = tv.datasets.FashionMNIST(root="./data/t5/", download=True,train=True,  transform=tv.transforms.Compose(
    [tv.transforms.Resize(32), tv.transforms.ToTensor()]))
trainloader5 = DataLoader(trainset5, batch_size=128, shuffle=False)

trainset6 = tv.datasets.FashionMNIST(root="./data/t6/", download=True,train=True,  transform=tv.transforms.Compose(
    [tv.transforms.Resize(32), tv.transforms.ToTensor()]))
trainloader6 = DataLoader(trainset6, batch_size=128, shuffle=False)


Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./data/t4/FashionMNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/26421880 [00:00<?, ?it/s]

Extracting ./data/t4/FashionMNIST/raw/train-images-idx3-ubyte.gz to ./data/t4/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./data/t4/FashionMNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/29515 [00:00<?, ?it/s]

Extracting ./data/t4/FashionMNIST/raw/train-labels-idx1-ubyte.gz to ./data/t4/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./data/t4/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/4422102 [00:00<?, ?it/s]

Extracting ./data/t4/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ./data/t4/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./data/t4/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/5148 [00:00<?, ?it/s]

Extracting ./data/t4/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/t4/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./data/t5/FashionMNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/26421880 [00:00<?, ?it/s]

Extracting ./data/t5/FashionMNIST/raw/train-images-idx3-ubyte.gz to ./data/t5/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./data/t5/FashionMNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/29515 [00:00<?, ?it/s]

Extracting ./data/t5/FashionMNIST/raw/train-labels-idx1-ubyte.gz to ./data/t5/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./data/t5/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/4422102 [00:00<?, ?it/s]

Extracting ./data/t5/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ./data/t5/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./data/t5/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/5148 [00:00<?, ?it/s]

Extracting ./data/t5/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/t5/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./data/t6/FashionMNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/26421880 [00:00<?, ?it/s]

Extracting ./data/t6/FashionMNIST/raw/train-images-idx3-ubyte.gz to ./data/t6/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./data/t6/FashionMNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/29515 [00:00<?, ?it/s]

Extracting ./data/t6/FashionMNIST/raw/train-labels-idx1-ubyte.gz to ./data/t6/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./data/t6/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/4422102 [00:00<?, ?it/s]

Extracting ./data/t6/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ./data/t6/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./data/t6/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/5148 [00:00<?, ?it/s]

Extracting ./data/t6/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/t6/FashionMNIST/raw



In [21]:
gmm_labeled_train_loader_70, gmm_unlabeled_train_loader_30, gmm_labeled_train_data_70, gmm_unlabeled_train_data_30 = splitTrainingData(trainset4, 0.7)

retrained_net_gmm_70_label_30_not = retrain_CNN_with_predicted_labels(
    trainset4, 
    trainloader4,
    gmm_labeled_train_data_70, 
    gmm_labeled_train_loader_70,
    gmm_unlabeled_train_data_30, 
    gmm_unlabeled_train_loader_30, 
    gmm
)

accuracy_gmm_70_label_30_not, loss = test(testloader, retrained_net_gmm_70_label_30_not)


Training PRETRAINED NET: 
Epoch 1 / 4, Step 100/ 329 , Loss 0.7938423752784729
Epoch 1 / 4, Step 200/ 329 , Loss 0.341974675655365
Epoch 1 / 4, Step 300/ 329 , Loss 0.27451568841934204
Epoch 2 / 4, Step 100/ 329 , Loss 0.33505597710609436
Epoch 2 / 4, Step 200/ 329 , Loss 0.32659855484962463
Epoch 2 / 4, Step 300/ 329 , Loss 0.20678970217704773
Epoch 3 / 4, Step 100/ 329 , Loss 0.19777457416057587
Epoch 3 / 4, Step 200/ 329 , Loss 0.24567830562591553
Epoch 3 / 4, Step 300/ 329 , Loss 0.24411669373512268
Epoch 4 / 4, Step 100/ 329 , Loss 0.20997707545757294
Epoch 4 / 4, Step 200/ 329 , Loss 0.2560785114765167
Epoch 4 / 4, Step 300/ 329 , Loss 0.15490369498729706

Training RE-TRAINED NET: 
Epoch 1 / 4, Step 100/ 469 , Loss 1.6296995878219604
Epoch 1 / 4, Step 200/ 469 , Loss 1.4348787069320679
Epoch 1 / 4, Step 300/ 469 , Loss 1.3778351545333862
Epoch 1 / 4, Step 400/ 469 , Loss 1.2167047262191772
Epoch 2 / 4, Step 100/ 469 , Loss 1.4363646507263184
Epoch 2 / 4, Step 200/ 469 , Loss 1.35

In [22]:
gmm_labeled_train_loader_50, gmm_unlabeled_train_loader_50, gmm_labeled_train_data_50, gmm_unlabeled_train_data_50 = splitTrainingData(trainset5, 0.5)

retrained_net_gmm_50_label_50_not = retrain_CNN_with_predicted_labels(
    trainset5,
    trainloader5, 
    gmm_labeled_train_data_50, 
    gmm_labeled_train_loader_50,
    gmm_unlabeled_train_data_50, 
    gmm_unlabeled_train_loader_50, 
    gmm
  )

accuracy_gmm_50_label_50_not, loss = test(testloader, retrained_net_gmm_50_label_50_not)

Training PRETRAINED NET: 
Epoch 1 / 4, Step 100/ 235 , Loss 0.5064795017242432
Epoch 1 / 4, Step 200/ 235 , Loss 0.5110149383544922
Epoch 2 / 4, Step 100/ 235 , Loss 0.3052040934562683
Epoch 2 / 4, Step 200/ 235 , Loss 0.2788132131099701
Epoch 3 / 4, Step 100/ 235 , Loss 0.3539171814918518
Epoch 3 / 4, Step 200/ 235 , Loss 0.22007620334625244
Epoch 4 / 4, Step 100/ 235 , Loss 0.22114971280097961
Epoch 4 / 4, Step 200/ 235 , Loss 0.2535540461540222

Training RE-TRAINED NET: 
Epoch 1 / 4, Step 100/ 469 , Loss 1.8795806169509888
Epoch 1 / 4, Step 200/ 469 , Loss 2.072462797164917
Epoch 1 / 4, Step 300/ 469 , Loss 1.9363656044006348
Epoch 1 / 4, Step 400/ 469 , Loss 1.7415788173675537
Epoch 2 / 4, Step 100/ 469 , Loss 1.6757205724716187
Epoch 2 / 4, Step 200/ 469 , Loss 1.6719120740890503
Epoch 2 / 4, Step 300/ 469 , Loss 1.8761194944381714
Epoch 2 / 4, Step 400/ 469 , Loss 1.6647443771362305
Epoch 3 / 4, Step 100/ 469 , Loss 1.640655755996704
Epoch 3 / 4, Step 200/ 469 , Loss 1.7873338460

In [23]:
gmm_labeled_train_loader_30, gmm_unlabeled_train_loader_70, gmm_labeled_train_data_30, gmm_unlabeled_train_data_70 = splitTrainingData(trainset6, 0.3)

retrained_net_gmm_30_label_70_not = retrain_CNN_with_predicted_labels(
    trainset6, 
    trainloader6,
    gmm_labeled_train_data_30, 
    gmm_labeled_train_loader_30,
    gmm_unlabeled_train_data_70, 
    gmm_unlabeled_train_loader_70, 
    gmm
  )

accuracy_gmm_30_label_70_not, loss = test(testloader, retrained_net_gmm_30_label_70_not)

Training PRETRAINED NET: 
Epoch 1 / 4, Step 100/ 141 , Loss 0.5540263652801514
Epoch 2 / 4, Step 100/ 141 , Loss 0.3406590223312378
Epoch 3 / 4, Step 100/ 141 , Loss 0.3763241469860077
Epoch 4 / 4, Step 100/ 141 , Loss 0.21889708936214447

Training RE-TRAINED NET: 
Epoch 1 / 4, Step 100/ 469 , Loss 2.196823835372925
Epoch 1 / 4, Step 200/ 469 , Loss 2.1747653484344482
Epoch 1 / 4, Step 300/ 469 , Loss 2.1730053424835205
Epoch 1 / 4, Step 400/ 469 , Loss 2.1350526809692383
Epoch 2 / 4, Step 100/ 469 , Loss 2.121948003768921
Epoch 2 / 4, Step 200/ 469 , Loss 2.026075839996338
Epoch 2 / 4, Step 300/ 469 , Loss 2.1015124320983887
Epoch 2 / 4, Step 400/ 469 , Loss 2.140540361404419
Epoch 3 / 4, Step 100/ 469 , Loss 2.154820680618286
Epoch 3 / 4, Step 200/ 469 , Loss 2.2382009029388428
Epoch 3 / 4, Step 300/ 469 , Loss 2.0711488723754883
Epoch 3 / 4, Step 400/ 469 , Loss 2.1238675117492676
Epoch 4 / 4, Step 100/ 469 , Loss 1.8937480449676514
Epoch 4 / 4, Step 200/ 469 , Loss 2.14036154747009

#**Results**


###First we will look at the **baseline** results from the CNN trained on M labeled images

In [24]:
trainset = tv.datasets.FashionMNIST(root="./", download=True,train=True,  transform=tv.transforms.Compose(
    [tv.transforms.Resize(32), tv.transforms.ToTensor()]))
trainloader = DataLoader(trainset, batch_size=128, shuffle=False)

In [25]:
baseline_train_loader_70, baseline_train_loader_unlabeled_30, _, _ = splitTrainingData(trainset, 0.7)
net_70 = BasicNet()
net_70.to(dev)

result_70, trained_net_70 = train(NUM_EPOCHS, net_70, baseline_train_loader_70)

baseline_accuracy_70, _ = test(testloader, net_70)

Epoch 1 / 4, Step 100/ 329 , Loss 0.6264896392822266
Epoch 1 / 4, Step 200/ 329 , Loss 0.4041905403137207
Epoch 1 / 4, Step 300/ 329 , Loss 0.3113798499107361
Epoch 2 / 4, Step 100/ 329 , Loss 0.30991795659065247
Epoch 2 / 4, Step 200/ 329 , Loss 0.30522432923316956
Epoch 2 / 4, Step 300/ 329 , Loss 0.3041810393333435
Epoch 3 / 4, Step 100/ 329 , Loss 0.26886269450187683
Epoch 3 / 4, Step 200/ 329 , Loss 0.22970172762870789
Epoch 3 / 4, Step 300/ 329 , Loss 0.2266496866941452
Epoch 4 / 4, Step 100/ 329 , Loss 0.15625521540641785
Epoch 4 / 4, Step 200/ 329 , Loss 0.2137596607208252
Epoch 4 / 4, Step 300/ 329 , Loss 0.20167972147464752


In [26]:
baseline_train_loader_50, _, _, _ = splitTrainingData(trainset, 0.5)
net_50 = BasicNet()
net_50.to(dev)

result_50, trained_net_50 = train(NUM_EPOCHS, net_50, baseline_train_loader_50)

baseline_accuracy_50, _ = test(testloader, net_50)

Epoch 1 / 4, Step 100/ 235 , Loss 0.5154503583908081
Epoch 1 / 4, Step 200/ 235 , Loss 0.5405824780464172
Epoch 2 / 4, Step 100/ 235 , Loss 0.3780803382396698
Epoch 2 / 4, Step 200/ 235 , Loss 0.3361324369907379
Epoch 3 / 4, Step 100/ 235 , Loss 0.1479775756597519
Epoch 3 / 4, Step 200/ 235 , Loss 0.19431744515895844
Epoch 4 / 4, Step 100/ 235 , Loss 0.21139177680015564
Epoch 4 / 4, Step 200/ 235 , Loss 0.16650772094726562


In [27]:
net_30_v2 = BasicNet()
net_30_v2.to(dev)

# not actually unlabeled data, just split
result_30, trained_net_30 = train(NUM_EPOCHS, net_30_v2, baseline_train_loader_unlabeled_30)

baseline_accuracy_30, _ = test(testloader, net_30_v2)

Epoch 1 / 4, Step 100/ 141 , Loss 0.5600244402885437
Epoch 2 / 4, Step 100/ 141 , Loss 0.4201190769672394
Epoch 3 / 4, Step 100/ 141 , Loss 0.23710647225379944
Epoch 4 / 4, Step 100/ 141 , Loss 0.2247791588306427


In [28]:
print(f"Our accuracy with just the CNN on 70% data is: {baseline_accuracy_70} on test set from training on trainset" )
print(f"Our accuracy with just the CNN on 50% data is: {baseline_accuracy_50} on test set from training on trainset" )
print(f"Our accuracy with just the CNN on 30% data is: {baseline_accuracy_30} on test set from training on trainset" )

Our accuracy with just the CNN on 70% data is: 0.7691 on test set from training on trainset
Our accuracy with just the CNN on 50% data is: 0.8214 on test set from training on trainset
Our accuracy with just the CNN on 30% data is: 0.8554 on test set from training on trainset


### Next we will look at all the other methods that use weakly supervised learning, we aim to be better than the baseline in each


In [29]:
print(f"Our accuracy with the CNN and Mini-Batch Kmeans clustering with 70% labeled, 30% unlabeled is: {accuracy_kmeans_70_label_30_not} on test set" )

Our accuracy with the CNN and Mini-Batch Kmeans clustering with 70% labeled, 30% unlabeled is: 0.895 on test set


In [30]:
print(f"Our accuracy with the CNN and Mini-Batch Kmeans clustering with 50% labeled, 50% unlabeled is: {accuracy_kmeans_50_label_50_not} on test set" )

Our accuracy with the CNN and Mini-Batch Kmeans clustering with 50% labeled, 50% unlabeled is: 0.767 on test set


In [31]:
print(f"Our accuracy with the CNN and Mini-Batch Kmeans clustering with 30% labeled, 70% unlabeled is: {accuracy_kmeans_30_label_70_not} on test set" )

Our accuracy with the CNN and Mini-Batch Kmeans clustering with 30% labeled, 70% unlabeled is: 0.7913 on test set


In [32]:
print(f"Our accuracy with the CNN and Kmeans clustering with 70% labeled, 30% unlabeled is: {accuracy_gmm_70_label_30_not} on test set from training on trainset" )

Our accuracy with the CNN and Kmeans clustering with 70% labeled, 30% unlabeled is: 0.8842 on test set from training on trainset


In [33]:
print(f"Our accuracy with the CNN and Kmeans clustering with 50% labeled, 50% unlabeled is: {accuracy_gmm_50_label_50_not} on test set from training on trainset" )

Our accuracy with the CNN and Kmeans clustering with 50% labeled, 50% unlabeled is: 0.8632 on test set from training on trainset


In [34]:
print(f"Our accuracy with the CNN and Kmeans clustering with 30% labeled, 70% unlabeled is: {accuracy_gmm_30_label_70_not} on test set from training on trainset" )

Our accuracy with the CNN and Kmeans clustering with 30% labeled, 70% unlabeled is: 0.8547 on test set from training on trainset


#**Conclusions**

There were some positive results from our experiments. We found Mini-Batch Kmeans with a ratio of 70% labeled, 30% unlabeled yielded a 89% accuracy, which was 13% better than the baseline that was trained on just 70% of the data. We saw almost equal results with the Kmeans at 88% accuracy.

We also saw that with 50% labeled that Mini-Batch kmeans did worse than the baseline CNN trained at 50% of the data, Minibatch Kmeans did 76% against the 82% baseline. However the Kmeans at 50% did 4% better than the baseline with 86% accuracy.

Lastly we saw with 30% labeled data that Mini batch kmeans did worse again 79% compared to the 85% baseline. However we saw that pure K means provided basically equivalent results.


We were quite surprised by our results. We were happy that 70% labeled, 30% unlabeled yielded better results 4% better than the highest baseline CNN, and 13% better than an equally label trained CNN. But we were also happy to find that with 30% labeled, 70% unlabeled that the accuracy wasn't that bad. We still wonder why the baseline CNN for 30% labeled did so accuractely, but assume it was just to luck.


Regarding clustering methods we were limited in which clustering methods we could use due to RAM capacity and learning times. GMM was used originally used but it would take 2 hours to train with a GPU and High RAM settings on Google Colab, and would often crash. DBSCAN, and OPTICS models were also attempted but faced similar issues. 


One reason we believe that the results weren't as strong as we expected is that we randomly split the data into M% labeled and (N-M)% unlabeled, without taking into consideration the labels of the data. This could have lead to hotspots of certain label types being placed in the unlabeled set, leading to poor classification results.


Another reason we believe that could explain the result, is that we performed clustering on the feature emebedings of the images. Meaning that after we trained our original CNN on M labeled images, we used it to get a vector of size 4096 from each image in the training set. We performed clustering on the feature embeded version of each image, and 4096 might have been too small of a feature to encode all the data of an image required to cluster it properly.

Lastly we believe that the way that we assigned labels to the unlabeled images might be improved. We used the mode label in each cluster (from the labeled images in the cluster). We believe that in the future we should have only labeled images with a probability greater than a certain threshold as the label, as there were 30 clusters, and noisy/inacurate labels could have been introduced.