*Highly Recommend opening this notebook on Google Colab. A number of Text cells will not show up on GitHub.*

⚠️ <a href="https://colab.research.google.com/github/AlaFalaki/workshop-materials/blob/main/2021-practical-deep-learning/02-NLP_Part2.ipynb"> Click Here To Open in Google Colab </a> ⚠️

# Practical Deep Learning Workshop (Part 2)

# PyTorch vs FastAi (NLP)

* Look closely at 3 main stages
  * Read the Data
  * Implement the Model
  * Training Loop
* More applications like Translation/Summarization using Huggingface


# Install FastAI2

> Run the cell below to install FastAI2. <br /><br />
> ⚠️ Make sure to restart the current runtime after the installation for changes to affect. Select 'Runtime' From the top menu and click on 'Restart Runtime'.

In [None]:
!pip install -Uq fastai

[K     |████████████████████████████████| 194kB 10.1MB/s 
[K     |████████████████████████████████| 61kB 9.3MB/s 
[?25h

In [None]:
from fastai.text.all import *

# 1. Load The Data

In [None]:
path = untar_data(URLs.YELP_REVIEWS_POLARITY, dest="/content")
path

Path('/content/yelp_review_polarity_csv')

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

## The PyTorch Way

> When dealing with PyTorch, we need to do every step ourselves. We even need to make appropriate fields that correspond with our dataset. We also need to handle making a small validation set from the train set. So, for the first step, I did a pre-preprocessing step to read the train.csv and test.csv files and make a valid.csv file. Also, be aware that I only use 20% of the data during these examples to make everything run faster.<br /><br />

*FastAi handles all these steps itself.*

In [None]:
from sklearn.model_selection import train_test_split

train_df = pd.read_csv(path/"train.csv", header=0, names=['label', 'text'])
test_df = pd.read_csv(path/"test.csv", header=0, names=['label', 'text'])

# Work with 20% of the data just to make the model train faster.
# ⚠️  Remove this prat in your experiments to work with the full dataset.
train_df = train_df[0:len(train_df)//80]
test_df = test_df[0:len(test_df)//80]

# Make a validation set
train, valid = train_test_split(train_df, test_size=0.2)

# Write the new dataset to CSV files.
train.to_csv("/content/yelp_review_polarity_csv/train.csv", header=0, index=0)
valid.to_csv("/content/yelp_review_polarity_csv/valid.csv", header=0, index=0)
test_df.to_csv("/content/yelp_review_polarity_csv/test.csv", header=0, index=0)

print("Train set number of samples: {}".format(len(train)))
print("Valid set number of samples: {}".format(len(valid)))
print("Test. set number of samples: {}".format(len(test_df)))

Train set number of samples: 5599
Valid set number of samples: 1400
Test. set number of samples: 474


> Here we declare two functions:
* __tokenizer__: This function basically uses the spacy library to do a word level tokenization for the dataset.
* __encoding__: Encode the dataset labels to represent the sentiments with 0, 1 instead of 1, 2. Because lists start from 0, if we want to represent number 2, we need to have 3 units at the final layer of our classifier.

In [None]:
import spacy

en = spacy.load('en')

def tokenizer(sentence):
    return [tok.text for tok in en.tokenizer(sentence)]

In [None]:
tokenizer("This is a sample text to see how tokenization works here.")

['This',
 'is',
 'a',
 'sample',
 'text',
 'to',
 'see',
 'how',
 'tokenization',
 'works',
 'here',
 '.']

In [None]:
def encoding(inp):
  if inp == "1":
    return "0"
  else:
    return "1"

> To make the data understandable to PyTorch, we need to use the Fields function. Each input/target variable will need to be represented by a Field. We determine whether they are sequential (like the reviews), what tokenizer/preprocessing function should be used for them, do we need a vocab to represent them, and lastly, are they the target value we want to predict?<br /><br/>
The fix_length variable is responsible for keeping all the sequences in the same length. If a sequence is longer, it will truncate it. And it will use padding to fill the shorter ones.
<br /><br />
After doing it, we will put them in a list with a custom name in the same order as they appear in the dataset and easily read the CSV files using the PyTorch TabularDataset class.

In [None]:
from torchtext.legacy import data

TEXT  = data.Field(sequential=True, lower=True, tokenize=tokenizer, fix_length=100)
LABEL = data.Field(sequential=False, is_target=True, use_vocab=False, preprocessing=encoding)

In [None]:
datafields = [("label", LABEL), ("text", TEXT)]

train, valid, test = data.TabularDataset.splits(path="/content/yelp_review_polarity_csv",
                                          train="train.csv", validation="valid.csv", test="test.csv",
                                          format="csv", skip_header=True, fields=datafields)

> Now that we have a data source, we can build vocabulary for the necessary fields (the Text field in our case) with a max_size representing how many tokens we can keep. (I used a relatively small number since we are using just 20% of the whole dataset) It is possible to analyze the vocabulary and see the most_common tokens and the 10 first tokens from the vocabulary. As you can see, the first two indexes are reserved for the < unk > (unknown) token that will be used for the words that were not common enough to be in the vocabulary, and < pad > token to fill the smaller sequences when it is trying to make them the same size.

In [None]:
TEXT.build_vocab(train, max_size = 15000)

In [None]:
print( TEXT.vocab.freqs.most_common(20) )

[('.', 40341), ('the', 36911), (',', 29210), ('and', 23807), ('i', 22757), ('a', 19584), ('to', 18940), (' ', 14701), ('was', 13154), ('of', 11616), ('it', 11431), ('is', 9177), ('in', 9126), ('for', 8656), ('that', 7848), ('my', 6691), ('you', 6132), ('but', 5844), ('this', 5817), ('with', 5727)]


In [None]:
print(TEXT.vocab.itos[:10])

['<unk>', '<pad>', '.', 'the', ',', 'and', 'i', 'a', 'to', ' ']


> Lastly, the BucketIterator function is responsible for putting the data on the correct device (GPU/CPU) and make batches of it.

In [None]:
train_iter = data.BucketIterator(train, device=device, batch_size=32, sort_key=lambda x: len(x.text),
                                sort_within_batch=False, repeat=False)

valid_iter = data.BucketIterator(valid, device=device, batch_size=32, sort_key=lambda x: len(x.text),
                                sort_within_batch=False, repeat=False)

test_iter = data.BucketIterator(test, device=device, batch_size=32, sort_key=lambda x: len(x.text),
                                sort_within_batch=False, repeat=False)

In [None]:
for item in train_iter:
  print(item.text)
  break

tensor([[  46,    6,   64,  ...,  687,   63,  489],
        [  18, 2137,   18,  ...,  819,  215, 1612],
        [  47,   61,  673,  ...,   32,   19,   15],
        ...,
        [   2,   45, 7280,  ...,    1,    1,    1],
        [   9,   23,   88,  ...,    1,    1,    1],
        [   3,  166,    6,  ...,    1,    1,    1]], device='cuda:0')


## The FastAi Way

> There is not lots to talk about in the FastAi way since it takes care of everything. You just have 3 different function based on how your data looks like.

### From DataFrame

> Read/split/preprocess the data from a Dataframe.

In [None]:
df = pd.read_csv(path/"train.csv", header=0, names=['label', 'text'])
df.head()

dls = TexDtataLoaders.from_df(df, text_col='text', label_col='label', seed=42,
                              valid_pct=0.2, shuffle=True, seq_len=72, bs=64, is_lm=False)

  return array(a, dtype, copy=False, order=order)


### From Folder

> If we have dataset consists of single text files for each sample. We can put all the train samples in a folder called **train** and do the same for validation set. FastAi will read all the files and do the preprocessing for you.

In [None]:
path = "path/to/the/folders"

TextDataLoaders.from_folder(path, train='train', valid='valid', seed=42,
                            shuffle=True, seq_len=72, bs=64, is_lm=False)


<img width="800" src="https://raw.githubusercontent.com/AlaFalaki/workshop-materials/main/2021-practical-deep-learning/materials/from_folder.png" />

### From CSV

> If your data is in CSV format and there is a column in your dataset for spliting train/valid data. You can read the data and split it accordingly using the *from_csv* function.

In [None]:
TextDataLoaders.from_csv(path, csv_fname='texts.csv', text_col='text', label_col='label',
                         valid_col='is_valid', shuffle=True, seq_len=72, bs=64, is_lm=False)

# 2. Make the Model

## The PyTorch Way

> We already know that the Model class should inherit PyTorch's Module component and have two main functions: init, forward.
* __init function__: We will define the model's parameters/layers in this function. The embedding layer represents tokens like Word2Vec and acts like a look-up table that finds the representation based on the token id. The next one is a 3-layer LSTM that serves as the encoder in our architecture. The last layer is a simple Linear layer that converts the encoder's representation to the number_of_target (which is 2 here).
* __forward function__: Specify the route that the input takes (the input is a batch of reviews) and the layer orders.

In [None]:
class Classifier(Module):
  def __init__(self, vocab_size, is_fastai=False):
    self.hidden_size = 300
    self.number_of_layers = 3
    self.number_of_targets = 2

    self.emb        = nn.Embedding(vocab_size, self.hidden_size)
    
    if is_fastai:
      self.encoder    = nn.LSTM(self.hidden_size, self.hidden_size, self.number_of_layers, batch_first=True)
    else:
      self.encoder    = nn.LSTM(self.hidden_size, self.hidden_size, self.number_of_layers)

    self.classifier  = nn.Linear(self.hidden_size, self.number_of_targets)

  def forward(self, inp):

    inp_embded = self.emb(inp)
    _, hidden_state = self.encoder(inp_embded)

    cls = self.classifier( hidden_state[0][-1] )
    
    return cls

In [None]:
model = Classifier(len( TEXT.vocab ), is_fastai=False)

<center> <img width="500" src="https://raw.githubusercontent.com/AlaFalaki/workshop-materials/main/2021-practical-deep-learning/materials/linear-layer.png" /> </center>

> **Recurrent Neural Networks (RNNs)**[1]<br />
>The problem with feedforward (Linear layers) is the inability to capture context easily. They look at each input independently. But, RNNs will take previously seen tokens into account by a "memory" concept.

> <center> <img width="500" src="https://raw.githubusercontent.com/AlaFalaki/workshop-materials/main/2021-practical-deep-learning/materials/rnn.png" /> </center>
> <center> <small> Credit: Colah's blog </small> </center>

> A problem with this architecture was the vanishing/exploding gradient. So, we will use LSTM [2] cells to overcome this issue.

> <center> <img width="600" src="https://raw.githubusercontent.com/AlaFalaki/workshop-materials/main/2021-practical-deep-learning/materials/lstm.png" /> </center>
> <center> <small> Credit: Colah's blog </small> </center>

> Read more about all the LSTM gates in more details [here](https://colah.github.io/posts/2015-08-Understanding-LSTMs/).

> <center> <img width="600" src="https://raw.githubusercontent.com/AlaFalaki/workshop-materials/main/2021-practical-deep-learning/materials/lstm-types.jpeg?a" /> </center>
> <center> <small> Credit: <a href="http://karpathy.github.io/2015/05/21/rnn-effectiveness/">Andrej Karpathy blog</a> </small> </center>

> There are different schemes to stack LSTM units based on the application. **One to many** can be used to implement *Image Captioning* programs where the input is just a picture. The output is a sequence of tokens describing it. **Many to one** just like what we did in the *Sentiment Analysis* example. Lastly, **many to many** that we also call **Sequence to Sequence** or **Encoder/Decoder** models to use it for *Translation* or *Summarization*.


## The FastAi Way

> We can quickly load a model with a pre-trained language model with superior implementation. For example, this model has dropout layers in its implementation which we did not have in our custom model.

In [None]:
learn = text_classifier_learner(dls, AWD_LSTM, seq_len=72, drop_mult=0.5,
                              loss_func=None, opt_func=Adam, lr=0.001, 
                              metrics=accuracy)

# 3. Train The Model

## The PyTorch Way

> You have to write the training loop and control everything. It starts with selecting the loss function and optimizer and put the variables on the appropriate device.

In [None]:
import torch.optim as optim

optimizer = optim.SGD(model.parameters(), lr=1e-3)
loss_fn   = nn.CrossEntropyLoss()

In [None]:
model   = model.to(device)
loss_fn = loss_fn.to(device)

> The next step is to write the train function that takes care of training the model for one epoch. It will start by putting the model in the training model by calling model.train(). Then iterate through the training set one batch at a time. Calculate the loss/accuracy and do a backward for each epoch.

In [None]:
def train(model, iterator, optimizer, loss_fn):
    epoch_loss = 0
    epoch_acc = 0
    
    model.train()
    
    for i, batch in enumerate(iterator):
        optimizer.zero_grad()

        predictions = model(batch.text).squeeze(1)
        
        loss = loss_fn(predictions, batch.label)
        acc = binary_accuracy(predictions, batch.label)
        
        loss.backward()
        optimizer.step()
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()

        if i % 200 == 199:
            print(f"[{i}/{len(iterator)}] : epoch_acc: {epoch_acc / len(iterator):.2f}")

    return epoch_loss / len(iterator), epoch_acc / len(iterator)

> Evaluate function will do the same thing without calculating the gradient (*torch.no_grad()*) and putting the model in evaluation mode.

In [None]:
def evaluate(model, iterator, loss_fn):
    epoch_loss = 0
    epoch_acc = 0
    
    model.eval()
    
    with torch.no_grad():
        for i, batch in enumerate(iterator):
            predictions = model(batch.text).squeeze(1)
            
            loss = loss_fn(predictions, batch.label)
            acc = binary_accuracy(predictions, batch.label)
        
            epoch_loss += loss.item()
            epoch_acc += acc.item()
            
    return epoch_loss / len(iterator),  epoch_acc / len(iterator)

> The following function is simply pick the most probable index from the model's outputs with argmax() attribute and compare it to the true labels to calculate the accuracy.

In [None]:
def binary_accuracy(preds, y):
    '''
    Return accuracy per batch ..
    '''
    preds = preds.argmax(1)
    correct = (preds == y).float()
    acc = correct.sum() / len(correct)
    
    return acc

> The main loop of the training just call the train/evaluate functions for the number of epochs we specify.

In [None]:
import time
import datetime

N_epoches = 1

for epoch in range(N_epoches):
    
    start_time = time.time()
    
    train_loss, train_acc = train(model, train_iter, optimizer, loss_fn)
    valid_loss, valid_acc = evaluate(model, valid_iter, loss_fn)
    
    end_time = time.time()
    
    epoch_mins = str(datetime.timedelta(seconds=end_time - start_time))
        
    print(f'Epoch:  {epoch+1:02} | Epoch Time: {epoch_mins}')
    print(f'\tTrain  Loss: {train_loss: .3f} | Train Acc: {train_acc*100:.2f}%')
    print(f'\tValid  Loss: {valid_loss: .3f} | Valid Acc: {valid_acc*100:.2f}%')

Epoch:  01 | Epoch Time: -1 day, 23:59:55.703394
	Train  Loss:  0.693 | Train Acc: 51.00%
	Valid  Loss:  0.693 | Valid Acc: 53.06%


## The FastAi Way + PyTorch Model

> Or, you can write the custom model and use FastAi to preprocess the data and handle the training loop for you.<br /><br />
There are callbacks for early stopping or saving checkpoints during training. Look at [Tracking callbacks](https://docs.fast.ai/callback.tracker.html#SaveModelCallback) from the FastAi documentation for examples. <br /><br />
Also, More about [Weight Decay](https://towardsdatascience.com/this-thing-called-weight-decay-a7cd4bcfccab) for eliminating overfitting and [moms](https://towardsdatascience.com/stochastic-gradient-descent-with-momentum-a84097641a5d) to converge faster.

In [None]:
model2 = Classifier(len(dls.vocab[0]), is_fastai=True)

learn_fastai_pytorch = Learner(dls, model2,
                               loss_func=LabelSmoothingCrossEntropy(), metrics=accuracy,
                               opt_func=Adam, wd=None, moms=(0.95, 0.85, 0.95) 
                               )

In [None]:
learn_fastai_pytorch.fit_one_cycle(1, 4e-4)

epoch,train_loss,valid_loss,accuracy,time
0,0.690886,0.693385,0.522788,00:09


> There are different methods to fit a model using the FastAi library. The most basic one is the *fit* method that will train the model with a fixed learning rate. Or, We can use [*fit_one_cycle()*](https://docs.fast.ai/callback.schedule.html#Learner.fit_one_cycle) which increases the learning rate to a max value and then decreases it. <br />
<img width="400" src="https://raw.githubusercontent.com/AlaFalaki/workshop-materials/main/2021-practical-deep-learning/materials/fit_one_cycle.png" /> <br />
Also, there is a method called [*fit_flat_cos()*](https://docs.fast.ai/callback.schedule.html#Learner.fit_flat_cos) that train with a fixed learning rate to a point and decrease it with a cosine function pattern. <br />
<img width="200" src="https://raw.githubusercontent.com/AlaFalaki/workshop-materials/main/2021-practical-deep-learning/materials/fit_flat_cos.png" /> <br />
Lastly, there is the [*fit_sgdr()*](https://docs.fast.ai/callback.schedule.html#Learner.fit_sgdr) function to drastically change the learning rate during the training.<br />
<img width="200" src="https://raw.githubusercontent.com/AlaFalaki/workshop-materials/main/2021-practical-deep-learning/materials/fit_sgdr.png" />

# Extra

> The latest network architecture that is recently dominating the NLP world is the Transformer[3] which is based on the Attention mechanism[4]. A great [blog post](https://jalammar.github.io/illustrated-transformer/) by Jay Alamar if you want to learn more. <br /><br />
At this point in time, they are not easy to implement or train. But the models are really powerful. There are many pre-trained models for tasks such as Translation or Summarization. Luckily, libraries are trying to make these models more accessible for everyone. We are going to see the [🤗 huggingface](https://github.com/huggingface) library in action.<br /><br />
They implemented the idea of a Pipeline for nearly every NLP task to load the pre-trained models from a collection of 12,000+ models that can be selected [here](https://huggingface.co/models).

In [None]:
!pip install transformers sentencepiece

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/fd/1a/41c644c963249fd7f3836d926afa1e3f1cc234a1c40d80c5f03ad8f6f1b2/transformers-4.8.2-py3-none-any.whl (2.5MB)
[K     |████████████████████████████████| 2.5MB 7.6MB/s 
[?25hCollecting sentencepiece
[?25l  Downloading https://files.pythonhosted.org/packages/ac/aa/1437691b0c7c83086ebb79ce2da16e00bef024f24fec2a5161c35476f499/sentencepiece-0.1.96-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2MB)
[K     |████████████████████████████████| 1.2MB 53.4MB/s 
[?25hCollecting tokenizers<0.11,>=0.10.1
[?25l  Downloading https://files.pythonhosted.org/packages/d4/e2/df3543e8ffdab68f5acc73f613de9c2b155ac47f162e725dcac87c521c11/tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3MB)
[K     |████████████████████████████████| 3.3MB 51.0MB/s 
Collecting huggingface-hub==0.0.12
  Downloading https://files.pythonhosted.org/packages/2f/

In [None]:
from transformers import pipeline

> The following codes are examples of how to use pre-trained large transformer models for different tasks. The codes are pretty self-explanatory.

## Translation

In [None]:
en_fr_translator = pipeline(task="translation_en_to_fr")
en_fr_translator("How old are you?")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1199.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=891691430.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=791656.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1389353.0, style=ProgressStyle(descript…




[{'translation_text': ' quel âge êtes-vous?'}]

> Or, we can chose a model from the list and use it.

In [None]:
en_fr_translator = pipeline(task="translation_xx_to_yy", model="Helsinki-NLP/opus-mt-en-it")
en_fr_translator("How old are you?")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1132.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=342936789.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=42.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=789468.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=813725.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=2351575.0, style=ProgressStyle(descript…




[{'translation_text': 'Quanti anni hai?'}]

## Summarization

In [None]:
# use BART
summarizer = pipeline("summarization")
# use T5
#summarizer = pipeline("summarization", model="t5-base", tokenizer="t5-base")


summarizer("""
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
""", min_length=5, max_length=20)

[{'summary_text': ' America has changed dramatically during recent years . The number of engineering graduates in the U.'}]

## Question Answering

In [None]:
question_answerer = pipeline("question-answering")

question_answerer(
    question="What is my job?",
    context="My name is Ala and I am a research assistant."
)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=473.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=260793700.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=29.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=213450.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=435797.0, style=ProgressStyle(descripti…




{'answer': 'research assistant',
 'end': 44,
 'score': 0.8837323188781738,
 'start': 26}

## Named Entity Recognition

In [None]:
ner = pipeline("ner", grouped_entities=True)
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=998.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1334448817.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=60.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=213450.0, style=ProgressStyle(descripti…




  f'`grouped_entities` is deprecated and will be removed in version v5.0.0, defaulted to `aggregation_strategy="{aggregation_strategy}"` instead.'


[{'end': 18,
  'entity_group': 'PER',
  'score': 0.9981694,
  'start': 11,
  'word': 'Sylvain'},
 {'end': 45,
  'entity_group': 'ORG',
  'score': 0.9796019,
  'start': 33,
  'word': 'Hugging Face'},
 {'end': 57,
  'entity_group': 'LOC',
  'score': 0.99321055,
  'start': 49,
  'word': 'Brooklyn'}]

> You can see a complete list of tasks [here](https://huggingface.co/transformers/main_classes/pipelines.html#the-pipeline-abstraction).<br /><br />
There is still the possibility to fine-tune these models for your custom datasets. Here is an [example](https://nlpiation.medium.com/sentiment-analysis-by-fine-tuning-bert-feat-huggingfaces-trainer-class-97c5635035f7) of fine-tuning a Transformer pre-trained model (BERT) for text classification.

# Workshop Resources

> Go to [Vision Part 1](https://colab.research.google.com/github/AlaFalaki/workshop-materials/blob/main/2021-practical-deep-learning/03-Vision_Part1.ipynb) notebook. (Vision)<br /><br />
> Also, this [Github Repository](https://github.com/AlaFalaki/workshop-materials/tree/main/2021-practical-deep-learning) contains all the notebooks and materials presented in this workshop.

# Free Resources

> You can watch these courses if you are more interested in NLP and want to learn the concepts with more details.

*   [FastAi course](https://course.fast.ai/)
*   [Huggingface](https://huggingface.co/course/chapter1)



# References

1. *McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4), 115-133.*

2. *Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.*

3. *Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.*

4. *Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).*