## **Baseline Testing**

This notebook is slightly modified from the evaluation notebook, utilizing the huggingface pipeline instead of own model. 

## Importing stuff and loading the model for evaluation

In [1]:
from fastbook import *
from glob import glob
from pathlib import Path
from sklearn.metrics import precision_recall_fscore_support, accuracy_score, roc_auc_score, classification_report, confusion_matrix 
from tqdm.notebook import tqdm

In [37]:
learn_inf = load_learner('dbc_resnet50_fastai_bigv3.5-cleaned.pkl')
categories = learn_inf.dls.vocab
#only loading this for the vocab

In [None]:
%cd eval

In [None]:
!pwd
!find . -type f ! -name '*.jpg' -delete

## Initializing huggingface pipeline

In [9]:
!pip install transformers -q
import os
from transformers import pipeline
from statistics import mean
from tqdm.notebook import tqdm
classifier = pipeline(model="openai/clip-vit-large-patch14", device=0)

classifier(
    "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png",
    candidate_labels=["animals", "humans", "landscape"],
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.


Tesla T4


`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.


[{'score': 0.9651920795440674, 'label': 'animals'},
 {'score': 0.029521921649575233, 'label': 'humans'},
 {'score': 0.005286007188260555, 'label': 'landscape'}]

In [10]:
import torch
if torch.cuda.is_available():
  print(torch.cuda.get_device_name(0))


Tesla T4


## Running The Evaluation

In [31]:
def get_topk(model_output, k=3, out_type="dict"):
    predictions = model_output
    top_k = predictions[:k]
    if out_type == "dict":
        return top_k
    elif out_type == "list":
        predList = []
        for item in top_k:
            predList.append(item['label'])
        return predList
    else:
        raise Exception("Invalid Output Type")

In [18]:
justTesting = classifier("cat.png", candidate_labels=categories)

In [32]:
print(get_topk(justTesting, out_type="list"))

['standardissuecat', 'burmesecats', 'ExoticShorthair']


In [36]:
from tqdm.notebook import tqdm

%cd /home/studio-lab-user/sagemaker/eval

truthlist = []
predictionlist = []

#for folder in tqdm(sorted(os.listdir('.'))):
print(f"Categories to evaluate: {len(learn_inf.dls.vocab)}")
for folder in tqdm(sorted(learn_inf.dls.vocab)):
  os.chdir(folder)
  tqdm.write(f"Evaluating accuracy for folder {folder}")
  truth = folder.lower()
  for file in tqdm(sorted(os.listdir('.'))):
    if file.endswith(".jpg"):
        pipeline_output = classifier(file, candidate_labels=categories)
        top3 = get_topk(pipeline_output, out_type="list")
        truthlist.append(folder)
        predictionlist.append(top3)
    else:
      print("Skipping Checkpoint File - Not an image")
  os.chdir('..')

%cd /home/studio-lab-user/sagemaker

/home/studio-lab-user/sagemaker/eval
Categories to evaluate: 30


  0%|          | 0/30 [00:00<?, ?it/s]

Evaluating accuracy for folder CalicoKittys


  0%|          | 0/98 [00:00<?, ?it/s]



Evaluating accuracy for folder CreamsicleCats


  0%|          | 0/96 [00:00<?, ?it/s]

Evaluating accuracy for folder ExoticShorthair


  0%|          | 0/93 [00:00<?, ?it/s]

Evaluating accuracy for folder Flamepoints


  0%|          | 0/97 [00:00<?, ?it/s]

Evaluating accuracy for folder MunchkinCats


  0%|          | 0/94 [00:00<?, ?it/s]

Evaluating accuracy for folder NorwegianForestCats


  0%|          | 0/95 [00:00<?, ?it/s]

Evaluating accuracy for folder OneOrangeBraincell


  0%|          | 0/83 [00:00<?, ?it/s]

Evaluating accuracy for folder Siamesecats


  0%|          | 0/100 [00:00<?, ?it/s]

Evaluating accuracy for folder SiberianCats


  0%|          | 0/90 [00:00<?, ?it/s]

Evaluating accuracy for folder Torbie


  0%|          | 0/97 [00:00<?, ?it/s]

Evaluating accuracy for folder TuxedoCats


  0%|          | 0/92 [00:00<?, ?it/s]

Evaluating accuracy for folder TwoFacedCats


  0%|          | 0/101 [00:00<?, ?it/s]

Skipping Checkpoint File - Not an image
Evaluating accuracy for folder bengalcats


  0%|          | 0/91 [00:00<?, ?it/s]

Evaluating accuracy for folder blackcats


  0%|          | 0/89 [00:00<?, ?it/s]

Evaluating accuracy for folder britishshorthair


  0%|          | 0/97 [00:00<?, ?it/s]

Evaluating accuracy for folder burmesecats


  0%|          | 0/93 [00:00<?, ?it/s]

Evaluating accuracy for folder cowcats


  0%|          | 0/91 [00:00<?, ?it/s]

Evaluating accuracy for folder devonrex


  0%|          | 0/94 [00:00<?, ?it/s]

Evaluating accuracy for folder mainecoons


  0%|          | 0/97 [00:00<?, ?it/s]

Evaluating accuracy for folder nebelung


  0%|          | 0/95 [00:00<?, ?it/s]

Evaluating accuracy for folder orientalshorthair


  0%|          | 0/91 [00:00<?, ?it/s]

Evaluating accuracy for folder persiancat


  0%|          | 0/96 [00:00<?, ?it/s]

Evaluating accuracy for folder ragdolls


  0%|          | 0/97 [00:00<?, ?it/s]

Evaluating accuracy for folder russianblue


  0%|          | 0/100 [00:00<?, ?it/s]

Skipping Checkpoint File - Not an image
Evaluating accuracy for folder savannah_cats


  0%|          | 0/97 [00:00<?, ?it/s]

Evaluating accuracy for folder sphynx


  0%|          | 0/92 [00:00<?, ?it/s]

Evaluating accuracy for folder standardissuecat


  0%|          | 0/95 [00:00<?, ?it/s]

Skipping Checkpoint File - Not an image
Evaluating accuracy for folder tortico


  0%|          | 0/99 [00:00<?, ?it/s]

Evaluating accuracy for folder torties


  0%|          | 0/94 [00:00<?, ?it/s]

Skipping Checkpoint File - Not an image
Evaluating accuracy for folder watercolorcats


  0%|          | 0/95 [00:00<?, ?it/s]

/home/studio-lab-user/sagemaker


In [38]:
tbackup = truthlist.copy()
pbackup = predictionlist.copy()

In [39]:
acceptedlist = []
for i in range(len(truthlist)):
  #print(truthlist[i])
  #print(predictionlist[i])
  if truthlist[i] in predictionlist[i]:
    #print("correct")
    acceptedlist.append(truthlist[i])
  else:
    #print("incorrect")
    acceptedlist.append(predictionlist[i][0])
#print(classification_report(truthlist, acceptedlist))
print("Done.")

Done.


## Saving the evaluation output to JSON file
 - Allows more processing to be done to the data later if required without needing to re-run the model

In [40]:
import json

def dump_eval_predictions(truths, predictions, accepted, version):
    filename = f"cat-v{version}-eval.json"
    data = {"truthlist": truths, "predictionlist": predictions, "acceptedlist": accepted}
    with open(filename, "w") as file:
        json.dump(data, file, indent=2)
        print(f"Successfully Dumped Evaluation Data to file {filename}\nItem Count -> T:{len(data['truthlist'])} P:{len(predictionlist)} A:{len(acceptedlist)}")
        
        
def load_eval_predictions(version):
    filename = f"cat-v{version}-eval.json"
    with open(filename, "r") as file:
        data = json.load(file)
        print(f"Successfully Loaded Evaluation Data from file {filename}\nItem Count -> T:{len(data['truthlist'])} P:{len(data['predictionlist'])} A:{len(data['acceptedlist'])}")
        print(f"Usage: truthlist, predictionlist, acceptedlist = load_eval_predictions(#)\n")
        return [data['truthlist'], data['predictionlist'], data['acceptedlist']]
        

### Saving The Data

In [41]:
dump_eval_predictions(truthlist, predictionlist, acceptedlist, "BASELINE")

Successfully Dumped Evaluation Data to file cat-vBASELINE-eval.json
Item Count -> T:2835 P:2835 A:2835


### Loading The Data

In [15]:
%cd ~/sagemaker
truthlist, predictionlist, acceptedlist = load_eval_predictions("BASELINE")

/home/studio-lab-user/sagemaker
Successfully Loaded Evaluation Data from file cat-v2-new-eval.json
Item Count -> T:3780 P:3780 A:3780
Usage: truthlist, predictionlist, acceptedlist = load_eval_predictions(#)



## Data Analysis
 - Classification Report
 - Confusion Matrix

In [55]:
normallist = [item[0] for item in predictionlist]

In [58]:
#huggingface zero-shot image classification pipeline (with topk)
print(classification_report(truthlist, acceptedlist))

                     precision    recall  f1-score   support

       CalicoKittys       0.45      0.71      0.55        98
     CreamsicleCats       0.44      0.29      0.35        96
    ExoticShorthair       0.84      0.89      0.86        93
        Flamepoints       0.40      0.79      0.53        97
       MunchkinCats       0.59      0.14      0.22        94
NorwegianForestCats       0.83      0.16      0.27        95
 OneOrangeBraincell       0.56      0.12      0.20        83
        Siamesecats       0.76      0.74      0.75       100
       SiberianCats       0.94      0.19      0.31        90
             Torbie       0.67      0.74      0.70        97
         TuxedoCats       0.51      0.96      0.66        92
       TwoFacedCats       0.23      0.14      0.17       100
         bengalcats       0.80      0.86      0.83        91
          blackcats       0.80      0.85      0.83        89
   britishshorthair       0.92      0.69      0.79        97
        burmesecats    

In [None]:
%pip install seaborn

In [None]:
learn_inf.dls.vocab

In [None]:
from confusionMatrixVisualizer import make_confusion_matrix

make_confusion_matrix(confusion_matrix(truthlist, acceptedlist), group_names=list(learn_inf.dls.vocab), figsize=(40, 40))