### **Initialization**
- I use these three lines of code on top of my each notebooks because it will help to prevent any problems while reloading the same project. And the third line of code helps to make visualization within the notebook.

In [1]:
#@ INITIALIZATION: 
%reload_ext autoreload
%autoreload 2
%matplotlib inline

**Downloading Libraries and Dependencies**
- I have downloaded all the libraries and dependencies required for the project in one particular cell.

In [2]:
#@ IMPORTING MODULES: UNCOMMENT BELOW:
# !pip install transformers
# !pip install datasets
import torch 
import numpy as np
from time import perf_counter
from pathlib import Path
from datasets import load_dataset
from datasets import load_metric
from transformers import pipeline

#@ IGNORING WARNINGS: 
import warnings
warnings.filterwarnings("ignore")

### **Case Study: Intent Detection**

**The Dataset**
- We will use CLINC150 dataset. This dataset includes 22,500 inscope queries across 150 intents and 10 domains like banking and travel, and also includes 1,200 out of scope queries. 

In [3]:
#@ LOADING CLINC150 DATASET:
clinc = load_dataset("clinc_oos", "plus")                      # Loading dataset.
sample = clinc["test"][23]                                     # Initializing sample.
print(sample)                                                  # Inspection.
intents = clinc["test"].features["intent"]                     # Getting intent attribute.
print(intents.int2str(sample["intent"]))                       # Inspection.

Reusing dataset clinc_oos (/root/.cache/huggingface/datasets/clinc_oos/plus/1.0.0/abcc41d382f8137f039adc747af44714941e8196e845dfbdd8ae7a7e020e6ba1)


  0%|          | 0/3 [00:00<?, ?it/s]

{'text': 'how would i say i need directions if i were french', 'intent': 61}
translate


In [4]:
#@ INITIALIZING FINE-TUNED BERT BASE MODEL: 
bert_ckpt = "transformersbook/bert-base-uncased-finetuned-clinc"                    # Initializing model checkpoint.
pipe = pipeline("text-classification", model=bert_ckpt)                             # Initializing pretrained bert pipeline. 

In [5]:
#@ IMPLEMENTATION OF BERT:
query = """Hey, I'd like to rent a vehicle from Nov 1st to Nov 15th in
Paris and I need a 15 passenger van"""                                              # Initializing example query.
pipe(query)                                                                         # Implementation of bert model.

[{'label': 'car_rental', 'score': 0.5490034818649292}]

**Performance Benchmark**

In [6]:
#@ SKELETON OF PERFORMANCE BENCHMARK:
class PerformanceBenchmark:                                            
    def __init__(self, pipeline, dataset, optim_type="BERT baseline"):
        self.pipeline = pipeline
        self.dataset = dataset
        self.optim_type = optim_type
    
    def compute_accuracy(self):
        pass
    
    def compute_size(self):
        pass
    
    def time_pipeline(self):
        pass
    
    def run_benchmark(self):
        metrics = {}
        metrics[self.optim_type] = self.compute_size()
        metrics[self.optim_type].update(self.time_pipeline())
        metrics[self.optim_type].update(self.compute_accuracy())
        return metrics

In [7]:
#@ FUNCTION FOR COMPUTING ACCURACY:
accuracy_score = load_metric("accuracy")                            # Initializing accuracy metric.
def compute_accuracy(self):                                         # Defining compute accuracy function.
    preds, labels = [], []                                          # Initialization.
    for example in self.dataset:
        pred = self.pipeline(example["text"])[0]["label"]           # Initializing predictions.
        label = example["intent"]                                   # Initializing true label.
        preds.append(intents.str2int(pred))                         # Adding to predictions.
        labels.append(label)                                        # Adding to labels.
    accuracy = accuracy_score.compute(predictions=preds,
                                      references=labels)            # Getting accuracy score.
    print(f"Accuracy on test set - {accuracy['accuracy']:.3f}")     # Inspecting accuracy.
    return accuracy

#@ ADDING TO PERFORMANCE BENCHMARK:
PerformanceBenchmark.compute_accuracy = accuracy_score              # Adding accuracy score.

In [8]:
def compute_accuracy(self):
    preds, labels = [], []
    for example in self.dataset:
        pred = self.pipeline(example["text"])[0]["label"]
        label = example["intent"]
        preds.append(intents.str2int(pred))
        labels.append(label)
    accuracy = accuracy_score.compute(predictions=preds, references=labels)
    print(f"Accuracy on test set - {accuracy['accuracy']:.3f}")
    return accuracy
PerformanceBenchmark.compute_accuracy = compute_accuracy

In [9]:
#@ INSPECTING SIZE OF MODEL:
list(pipe.model.state_dict().items())[23]                           

('bert.encoder.layer.1.attention.self.query.bias',
 tensor([-2.7731e-01,  1.7375e-01, -2.4911e-01, -8.8099e-01,  4.1298e-01,
          1.5590e-02,  5.0920e-01,  2.6641e-01,  4.0497e-01,  3.3502e-02,
          8.8779e-01,  4.7020e-01,  8.2251e-03, -8.0806e-01,  4.2484e-01,
          9.7985e-03, -2.6104e-01,  1.8233e-01,  3.0953e-01,  1.0791e-01,
          2.8354e-01, -3.3286e-01,  6.4774e-01, -3.4947e-03,  7.4745e-01,
          1.8023e-02,  2.8089e-01, -5.8654e-01, -4.7697e-02, -4.2555e-01,
          7.5222e-01, -3.6536e-01, -5.2892e-01, -4.3233e-01, -2.4072e-01,
          6.1750e-01,  2.3975e-01,  5.2870e-01,  4.0786e-01, -5.0314e-01,
         -3.0598e-01,  5.5550e-01, -3.4961e-01,  7.0972e-01,  6.0526e-01,
          5.7314e-01, -1.8228e-01, -8.3693e-02, -3.2728e-01, -9.6335e-02,
          2.4435e-01, -2.2558e-03,  5.0501e-01, -1.9865e-01, -2.6915e-02,
         -3.7415e-01,  3.5009e-01,  7.1231e-01,  1.0904e-01,  3.3678e-01,
          3.2403e-01, -2.1816e-01,  6.0707e-01,  5.5292e-01, 

In [10]:
#@ FUNCTION FOR COMPUTING MODEL SIZE:
def compute_size(self):                                         # Defining function.
    state_dict = self.pipeline.model.state_dict()               # Initializing model dictionary.
    tmp_path = Path("model.pt")                                 # Initializing model path.
    torch.save(state_dict, tmp_path)                            # Saving the model.
    size_mb = Path(tmp_path).stat().st_size / (1024 * 1024)     # Getting size in megabytes. 
    tmp_path.unlink()                                           # Deleting temporary file.
    print(f"Model size (MB) - {size_mb:.2f}")                   # Inspecting size.
    return {"size_mb": size_mb}

#@ ADDING TO PERFORMANCE BENCHMARK:
PerformanceBenchmark.compute_size = compute_size                # Adding to performance benchmark.

In [11]:
#@ CALCULATING TIME OF PIPELINE: 
for _ in range(3):
    start_time = perf_counter()                                 # Starting time.
    _ = pipe(query)                                             # Running pipeline.
    latency = perf_counter() - start_time                       # Calculating latency.
    print(f"Latency (ms) - {1000 * latency:.3f}")               # Inspection.

Latency (ms) - 118.871
Latency (ms) - 123.346
Latency (ms) - 150.441


In [12]:
#@ FUNCTION TO CALCULATE TIME OF PIPELINE:
def time_pipeline(self, query="What is the pin number for my account?"):    # Defining function.
    latencies = []                                                          # Initialization.
    for _ in range(10):
        _ = self.pipeline(query)                                            # Implementation of pipeline. 
    for _ in range(100):
        start_time = perf_counter()                                         # Starting time.
        _ = self.pipeline(query)                                            # Running pipeline.
        latency = perf_counter() - start_time                               # Calculating latency.
        latencies.append(latency)                                           # Adding latencies.
    time_avg_ms = 1000 * np.mean(latencies)                                 # Time average.
    time_std_ms = 1000 * np.std(latencies)                                  # Time standard deviation. 
    print(f"Average latency (ms) - {time_avg_ms:.2f} +\- {time_std_ms:.2f}")
    return {"time_avg_ms": time_avg_ms, "time_std_ms":time_std_ms}      

#@ ADDING TO PERFORMANCE BENCHMARK:
PerformanceBenchmark.time_pipeline = time_pipeline                          # Adding to performance benchmark. 

In [13]:
#@ INSPECTING PERFORMANCE BENCHMARK:
pb = PerformanceBenchmark(pipe, clinc["test"])                              # Initializing performance benchmark class.
perf_metrics = pb.run_benchmark()                                           # Getting metrics. 

Model size (MB) - 418.16
Average latency (ms) - 77.72 +\- 3.71
Accuracy on test set - 0.867


### **Knowledge Distillation**
- Knowledge distillation is a general purpose method for training a smaller `student` model to mimic the behavior of a slower, larger, but better performing `teacher` model. 