<a id = 'top'></a>

#  A quick-start guide to fine-tune a BERT model with Keras
  * A. [What is BERT?](#introBERT)
  * B. [What is fine-tuning?](#fineTuned)
  * C. [Datasets](#datasetClass)
      * 1. [IMDB Description](#IMDBdesc)
      * 2. [Exploratory Data Analysis](#EDA)
  * D. [Model Preparation](#modelPrep)
      * 1. [Model Selection](#modelSelection)
      * 2. [Tokenizer Selection](#tokenizerSelect)
      * 3. [Auto Model](#autoModel)
      * 4. [Encode Data](#encodeData)
  * E. [Fine-Tuning The Model](#fineTuning)
     

Hugging Face is a company that offers a library of "transformers" as well as a collection of pre-trained language models.  These represent one source of code and abstract classes as well as a variety of documentation and examples. We are going to explore one way of working with these models at a very high level.  In later classes, when we have covered how a transformer works, we'll come back and look at them at a deeper level.  This tutorial is designed to look at the Huggingface library at the same level of abstraction as the Keras Sequential API rather at the lower level of abstraction of TensorFlow and the Keras Functional API.

---

Larger models, with millions or billions of paramters can only be trained on a machine with a GPU.  Do not run this notebook on your GCP instance as the training epochs will take entirely too long.  If you run this notebook in a Colab, it is automatically configured to use a GPU.


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datasci-w266/2023-fall-main/blob/master/materials/walkthrough_notebooks/bert_as_black_box/Keras_HuggingFace_Transformers_BERT_notebook.ipynb)

[Return to Top](#top)
 <a id = 'introBERT'></a>
# What is BERT?
This notebook leverages one of a variety of BERT models.  BERT models can be classified in terms of three parts.  The first part is a component named a transformer.  These can grow to be quite large.  BERT consists of either 12 or 24 layers of transformers. The second part is the training (called pre-training) the model already has on language.  The pre-training is characterized by one or more tasks.  The third part consists of the very specific tasks it is geared toward performing.  Different models use different sizes and layers of transformers and may be optimized for different languages and different tasks.  For example, CamemBERT is trained in French and SciBERT is trained on scientific journal articles.  You'll want to make sure you use a model appropriate to your language and task.

---

The [HuggingFace web site](https://huggingface.co/transformers) offers an interesting set of resources.  Their [model documentation](https://huggingface.co/transformers/model_summary.html) provides an excellent explanation of transformers as well as the growing variety of models they offer (see the right hand navigation column).  In addition, their collection of [notebooks](https://huggingface.co/transformers/notebooks.html) is a valuable set of examples.  

---

One word of caution:  this is a rapidly evolving resource and as a result you can often run in to bugs.  They will get fixed, eventually, but may be buggy for a while.  

[Return to Top](#top)
 <a id = 'fineTuned'></a>
 # Fine-tuning a Model

We'll use abstract classes that simplify the process of training by consolidating a number of pieces under one class. It's a good way to begin working with these models.  PyTorch is the native computational graph language used in Hugging Face. However, they make a point of porting models to TensorFlow, Google's computational graph language. Many models first get put on HuggingFace in PyTorch. Eventually they get ported over to TensorFlow. Depending on what model you want to use, you may have to run the PyTorch version. It's important to always be aware of which dialect you're using. The good news is that HuggingFace has built these models so that the underlying weight parameters can be used across PyTorch and TensorFlow implementations. It is simply the commands you use to construct, run, and manipulate the model that are in PyTorch or TensorFlow.  This notebook will demonstrate fine-tuning a TensorFlow HuggingFace model using Keras.  To do this, we'll need to select a data set, a model, and be sure to invoke its tokenizer.

 Borrowing liberally from the fine-tuning description in https://huggingface.co/transformers/training.html

In [None]:
!pip install -q transformers

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m28.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m294.8/294.8 kB[0m [31m37.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m85.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m77.3 MB/s[0m eta [36m0:00:00[0m
[?25h

[Return to Top](#top)
 <a id = 'datasetClass'></a>
# Datasets



HuggingFace provides [a class for the managing datasets](https://huggingface.co/docs/datasets/index). They also provide a library of actual data that is accessible via this datasets class. We'll take advantage of the datasets object in Huggingface to access some well known corpora, specifically IMDB.

In [None]:
!pip install -q datasets

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.6/519.6 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m11.2 MB/s[0m eta [36m0:00:00[0m
[?25h

[Return to Top](#top)
 <a id = 'IMDBdesc'></a>
### IMDB Description

IMDB is a set of movie reviews. It is set up for a binary sentiment classification task.  It is good for learning how to work with HuggingFace Transformers library and also good for baselines.

In [None]:
from datasets import load_dataset

raw_datasets = load_dataset("imdb")

Downloading builder script:   0%|          | 0.00/4.31k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/2.17k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/7.59k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/84.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

[Return to Top](#top)
 <a id = 'EDA'></a>
### Exploratory Data Analysis

Let's look inside the IMDB dataset and see what it contains.  We see it is already split into train, test, and unsupervised records.  

In [None]:
raw_datasets

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label'],
        num_rows: 50000
    })
})

Here is one sample record from the test set.  Each record contains a label and some text.  Different data sets will have different parts in their records.

In [None]:
raw_datasets['test'][0]

{'text': 'I love sci-fi and am willing to put up with a lot. Sci-fi movies/TV are usually underfunded, under-appreciated and misunderstood. I tried to like this, I really did, but it is to good TV sci-fi as Babylon 5 is to Star Trek (the original). Silly prosthetics, cheap cardboard sets, stilted dialogues, CG that doesn\'t match the background, and painfully one-dimensional characters cannot be overcome with a \'sci-fi\' setting. (I\'m sure there are those of you out there who think Babylon 5 is good sci-fi TV. It\'s not. It\'s clichéd and uninspiring.) While US viewers might like emotion and character development, sci-fi is a genre that does not take itself seriously (cf. Star Trek). It may treat important issues, yet not as a serious philosophy. It\'s really difficult to care about the characters here as they are not simply foolish, just missing a spark of life. Their actions and reactions are wooden and predictable, often painful to watch. The makers of Earth KNOW it\'s rubbish as 

Here is a utility function that leverages the dataset structure to display 10 random records from the dataset and loads them in a data frame.

In [None]:
#from https://github.com/huggingface/notebooks/blob/master/examples/text_classification.ipynb
import datasets
import random
import pandas as pd
from IPython.display import display, HTML

def show_random_elements(dataset, num_examples=10):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)

    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [None]:
show_random_elements(raw_datasets['test'])

Unnamed: 0,text,label
0,"I think Josh Duhamel is so great!! The rest of the show is fun to watch, but, I think it is the handsome and sexy Josh Duhamel that makes the show ""Las Vegas"" really fun to see!! In the days of ""Magnum"" I loved Tom Selleck, I thought he was the sexiest man on the face of the earth!! A hunk on a television show is a must in order for women to enjoy watching something, especially just for purposes of innocuous entertainment!! I would have done anything to ""Win A Date With Tad Hamilton""!! Josh Duhamel is incredible and I will always have a super crush on him!! Josh is definitely a HUNK!! and I will watch ""Las Vegas"" all the time, Josh Duhamel is a big reason why too!!",pos
1,"This film powerfully demonstrates the struggle of two women in love in a culture so deeply entrenched in ritual and tradition. All this against a backdrop of an India which itself is struggling for freedom from these same values. This film is both political and personal and never too preachy or idealistic on either front. It is easy to see why ""Fire"" has caused riots in India, but tragic nonetheless. A true film such as this one deserves to be seen by all people of the world, not just privileged westerners.",pos
2,"For some unknown reason, 7 years ago, I watched this movie with my mother and sister. I don't think I've ever laughed as hard with them before. This movie was sooooo bad. How sequels were produced is beyond me. Its been awhile since I last saw this ""movie"", but the one impression that it has stuck with me over the years has been, ""They must have found the script in a dumpster in the backlot of a cheap movie studio, made into a ""movie"", and decided that it didn't suck enough, and made it worse. I'm pretty sure that they spent all the budget on camera work and the so called ""special effects"", and then had 13 cents left toward the script AND to pay the ""actors"".",neg
3,"Spending an hour seeing this brilliant Dan Finnerty and his ""Dan Band"" perform their special on Bravo is the most enjoyable hour I've ever spent watching TV. This young man (Dan) is such an incredible talent, as a singer, performer and even dancer. He can go from the cheesiest of ballad pop songs, all of which have only been sung by women, to hip-hop, rock, also songs written for women.. This guy can do anything. I've seen him live at least 11 times, so I was not expecting just how well that his show would adapt to a television or film format, but all reservations went away instantly when the show started because of Dan's overwhelming star quality.Do yourself a favor and watch this, or better yet, buy it.",pos
4,"The true story of a Spanish paraplegic, Ramón Sampedro, who fought for decades for the right to be euthenized. This film, along with the Best Picture winner of the same year, Million Dollar Baby, caused a stir that year with their depictions of disabled persons desiring death. Both advocates for the disabled and (unfortunately for the disability advocates) conservative pro-life groups protested both films, and their Oscar nominations. The nominations also came during the entire Terry Schiavo debacle, just to put it all in some historical perspective. The protests, especially from the disability groups, against Million Dollar Baby make some sense  the film clearly depicted, without wavering, the life of a paraplegic as worthless. The film's central character, Maggie Fitzgerald, becomes a paraplegic, doesn't seem to get any counseling whatsoever, no help whatsoever, and immediately wants to die. The film is, honestly, pretty dumb and uncomplex. The Sea Inside, based on the true story, is certainly a lot more thoughtful on the subject. It most likely got railroaded into the same category as Million Dollar Baby without its protesters having even seen it, an incredibly common phenomenon. The film does give time to many different sides of the argument. And it immediately declares that the wish to die is that of the protagonist and the protagonist alone. It is guilty of a couple of crimes, though, and I'd still understand why disability groups could have a problem with it. First and foremost, there's the protagonist's meeting with a paraplegic bishop. I don't look kindly on the way he's depicted. His orally operated wheelchair is depicted as absurd, and there's almost a comic sequence where his effeminate, boy-toy servants are dragging him, in his chair, up the stairs. He can't even reach the room in which Ramón is located, and one of the boy-toys is forced to carry the conversation between them. I had to think, gee, maybe if Ramón lived in a slightly more wheelchair-accessible household, he wouldn't spend his entire life in bed, and might find life more fulfilling (who knows how closely the film depicts the reality). Director Amenábar (The Others) also includes some laughable scenes that try to make this film about suicide more life-affirming, like a cross-cut sequence where Ramón looks thoughtful and his lawyer's baby is born. But besides a few ugly moments, the film is very good. It hurts that someone may want to die when they have the ability to bring so much joy and insight into the lives of others. However, in the end, our lives do belong to us. Shouldn't we have the right to choose? The film's strongest asset is its supporting characters, and the actors who play them. It depicts how Ramón's fight and decisions affect those around him with a beautiful precision. The family members in particular are great, and Ramón's final departure from them is absolutely heartbreaking, and had me in tears. My favorite performance in the film comes from Lola Dueñas, whom I also felt gave the best, or at least certainly most undervalued, performance in Almodóvar's Volver last year.",pos
5,"Golden Boy is in my opinion one the sleeper / lost treasures animes out there. A sexy comedy, about a young man quest to find his nitch in life and he blunders into all sort of odd jobs that somehow has this rather sexy girl who ultimately falls for him but he not really realizing it! Its truly something that you can easily miss if you at the name, but once viewing it...will fall for the comedy/silliness that lies inside. Truly a crime that only produced 6 OVA episodes and pilot movie were made. However, being unique as it is. I'm surprised it survived to produce that many. If you want a good laugh, with high quality anime that is (100% CGI free), check this anime out. Boy who one day may save the world....or maybe not.",pos
6,"OK, why complain about this movie? It's fiction. Deal with it. If you want to see the biography, go watch it. This is an original, fictionalized version of what happened in Wisconsin. People who are obsessed will complain about this, as they do every other deviation of the facts. Sad but true. I think making Kane Hodder the man in which the film is named after was a great idea. I thought it wasn't so good at first, I'll be honest. But that just made it even scarier. If you like Kane Hodder, Ed Gein or movies based on real events, I think this is a good movie. But if you're obsessed (like some other people) stay away from this movie and all others.",pos
7,"From the makers of Underworld, we have, by far, the worst werewolf movie I have ever seen. It is basically a reconstructed version of Underworld, yet lacking vampires (not a big deal), cool effects (a BIG deal), and generally just about everything that can possibly be done right to produce a decent film dealing with lycanthropy (the biggest deal of them all!). A twenty-something lycanthrope chocolate maker named Vivian is currently residing in Romania ever since her family was hunted down and executed in front of her years ago in America for being werewolves. There, she belongs to a small society (or pack) of werewolves and is apparently chosen to unwillingly wed the pack leader, Gabriel, whose son - some toad with a British accent - takes it upon himself to hunt outside of the pack. They have apparently been discovered in other countries prior and want to remain settled in Romania by avoiding negative attention, so of course, such activity is considered forbidden. Vivian ends up falling in love with an American artist who is oblivious to her involvement with the group of blood-thirsty predators. When they end up discovering the secret relationship, things get messy when someone is killed and the human is forced to participate in a deadly tradition in which he is set loose in the woods and is hunted - giving the pack a chance to transform into their ""wolfy"" selves. All this really consists of is a big leap before they light up and land as a wolf. Very cheezy effects. The entire movie is like a tamed down Underworld with some drippy, romantic montages and very little action. Watching this in the theater, I could not wait for it to end. A devastatingly boring disappointment. Avoid!",neg
8,"This absurd movie was about a ""Goodie-two-shoe,"" teen-girl that really wanted to be Valedictorian but finds her obstacle in a teacher name Mrs. Tingle. Katie Holmes, who plays this ""goodie-two-shoe,"" is faced with ""the biggest dilemma of her teenage life"" when this classmate guy of hers comes along with the final exams sample that should help them nail Mrs. Tingle's test. Mrs. Tingle comes along, catches Holmes, the classmate guy and her best friend with the sample of her final exam. Convinced that the three of them planned on cheating on here exam, Mrs. Tingle enthuses on her opportunity to ruin Holmes once and for all with allegations that can take away any chance of Holmes passing her class. And the classmate guy, who apparently has his eye on Holmes, always wondered why she never gave him the time of day (he's an idiot)? Feeling desperate, Holmes and her friends visit Mrs. Tingle in the middle of the night to try to dissuade her in believing that Holmes was planning to cheat. It all backs fire somehow when the classmate guy points a bow and arrow at Mrs. Tingle, threatening her to make things right for Holmes. Mrs. Tingle fights back but ultimately ends up as Holmes and her friend's captive.<br /><br />During Mrs. Tingle captivity under Holmes, they do everything from tying her up and gagging her in her own bed to blackmailing her with false pictures that they took of the unconscious Coach in bed with Mrs. Tingle. I found myself cringing when the kids were making themselves at home in Mrs. Tingle's house, eating up her food and going though her private work. At one point, Holmes found Mrs. Tingle's grade book and purposely changes the grade in her favor, decreasing the grade of her challenge for valedictorian. The end played out like a childish attempt to bring back the comedy that was sparingly in the beginning of the film, resolving on pure irony, slapstick and absurdity.<br /><br />This has to be the most unlikable and wickedly evil character Holmes would ever play in her entire life. I wanted to help Mrs. Tingle get free to really dig a grave for Holmes. She was manipulative, selfish and conniving. She even slept with the classmate guy despite her best friend's overwhelming interest in him...and she didn't like him. From attempting to ruin her challengers grades by seizing Mrs. Tingle's grade book to taking her best friend's man, you would think that Holmes would get what she deserves in the end, right? Unfortunately, she obtains everything her heart desires, showing that being wicked, manipulative, selfish and whining can get you what you want.<br /><br />Mrs. Tingle was suppose to be the character you didn't like. They didn't bring me to that point once to believe that she was this woman that needed to be ""taught this lesson."" She was like every other strict teacher who even gave valid reasons for her resentment of the next generation. Personally, I felt that her opinions about young people were validated with Holmes and her friend's actions every time. I kept hoping she could get free to call the police and nail Holmes. They kept her tied up in bed, ate up her food like a bunch of pigs, drank up the woman's wine, messed with her personal belongings and we're suppose to believe that she didn't deserve to take a bat to each of their heads? And the classmate guy has to be one of the most disliked characters in the history of film. Forget idiot, we need a new word for him that isn't in the Webster's dictionary. He brought the major trouble into Holme's life then made things worse when he came into Mrs.Tingle's house, uninvited behind Holmes, and corners Mrs. Tingle with a bow and arrow. I was thrilled every time Mrs. Tingle had a chance to slap fire out of him, or choke the wannabe actress best friend.<br /><br />If you're a teen out there and want to see when a teen's manipulation and wrong doing can get him or her the world, see this unfunny, caricature filled, unintentional film noir.",neg
9,"Death Wish 3 is exactly what a bad movie should be. Terrible acting! Implausible scenerios! Ridiculous death scenes! Creepy, evil-for-no-reason villains! The last 30 minutes of this movie just might be the best 30 minutes ever put on film, especially in the scene where the decent, hardworking citizens string chains across the street, knocking down the evil bikers and then shoot them, only to be joined by the neighborhood children (!!!) in celebration. And how can I forget the elderly woman with the broom? She's sweeping out the scum! And if that's not enough, let's not forget how quickly the punks give up after Fraker is killed. I'm laughing just thinking about it.<br /><br />I also love the death scene of Kersey's girlfriend. He just *walks away* after seeing her get blown up. It's little things like this that make Death Wish 3 such a bad movie. And I'm not even mentioning the bizarre soundtrack.<br /><br />I watched this movie because of Martin Balsam, who I seriously think is one of the finest character actors ever (and who's own ""getting beaten up by the scum"" scene is hilarious) and I walked away with a new favorite movie. Thank you, Death Wish 3 for making me laugh so hard.<br /><br />Some other things I forgot to mention: 1. The weird sound effect after Kersey says ""Cash!"" when buying his used car. Ha! It's so evil sounding. 2. MANDY Fraker. Mandy! Did the writers run out of tough guy names? 3. The fact that the gangs apparently have a ""lend and lease"" thug exchange program: ""I need some more guys."" And that Mandy has a working phone line in an abandoned building. 4. At the end of the movie, after Kersey blows up Fraker: is it just me, or does it look like the street gang is about to break into choreography as they're giving up? Just watch how in sync they are after the female punk gives the ""stop"" signal. <br /><br />I love this movie. Nothing cheers me up like Death Wish 3!",pos


[Return to Top](#top)
 <a id = 'modelPrep'></a>
## Model Preparation

[Return to Top](#top)
 <a id = 'modelSelection'></a>
### Model Selection

We need to pick the model we are going to train to classify IMDB.  We'll do that in several stages. First we define some variables to hold information about the model that we'll re-use.

In [None]:
model_checkpoint = "bert-base-cased"
batch_size = 8

[Return to Top](#top)
 <a id = 'tokenizerSelect'></a>
### Tokenizer Selection

We'll use the [AutoTokenizer](https://huggingface.co/docs/transformers/main/en/autoclass_tutorial#autotokenizer) object to avoid simple configuration mistakes because it insures that we get the correct tokenizer given our pre-trained model.  This time the model we're using is BERT and we're selecting the cased version (meaning case is preserved) and the base version (rather than the large version).

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

Downloading (…)okenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

What's the tokenizer doing?  It's taking care of breaking down a sentence into the parts the model can understand and was trained on, as well as a bunch of housekeeping that's needed by the model in order to work properly.  Once we've covered how a transformer works in live session, we'll come back (in week 4) and discuss its various components.  For now, you don't need to understand it in order to make use of it.

Here's one example of what the tokenizer outputs.

In [None]:
tokenizer("Hello, we only need one sentence for our task but these reviews often have more.")

{'input_ids': [101, 8667, 117, 1195, 1178, 1444, 1141, 5650, 1111, 1412, 4579, 1133, 1292, 3761, 1510, 1138, 1167, 119, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

The tokenizer converts the incoming words to integer ids that are used to retrieve the model's input word embeddings.  All tokenizers convert words to input ids.  The wrong tokenizer will produce the wrong set of token ids and result in very poor predictions.  The AutoTokenizer insures the correct ids are assigned.

In [None]:
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

In [None]:
small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))
full_train_dataset = tokenized_datasets["train"]
full_eval_dataset = tokenized_datasets["test"]

[Return to Top](#top)
 <a id = 'autoModel'></a>
### Automatic Model Configuration

We'll use the [AutoModel abstraction](https://huggingface.co/docs/transformers/main/en/autoclass_tutorial#automodel) and invoke the TensorFlow port since we want to use Keras to train and run the model.  As a result we'll instantiate a copy of TFAutoModelForSequenceClassification.  Note the 'TF' at the begining of the class name to designate it as a TensorFlow port.  The model for "sequence classification" is specifically structured to perform classification based on sequences of words like a sentence.  HuggingFace provides a set of models specifically configured for [particular NLP tasks](https://huggingface.co/docs/transformers/main/en/model_doc/auto) as shown by all of the AutoModelFor *FillInTheTask*.

In [None]:
import tensorflow as tf
from transformers import TFAutoModelForSequenceClassification

model = TFAutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=2)

Downloading model.safetensors:   0%|          | 0.00/436M [00:00<?, ?B/s]

All PyTorch model weights were used when initializing TFBertForSequenceClassification.

Some weights or buffers of the TF 2.0 model TFBertForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[Return to Top](#top)
 <a id = 'encodeData'></a>
### Encode Data


Let's create the encoded data for training.  Because we're using the TensorFlow port, we'll need to convert our PyTorch dataset object contents to the TensorFlow version.  HuggingFace provides some nice conversion functions to assist in the process.

In [None]:
tf_train_dataset = small_train_dataset.remove_columns(["text"]).with_format("tensorflow")
tf_eval_dataset = small_eval_dataset.remove_columns(["text"]).with_format("tensorflow")

In [None]:
train_features = {x: tf_train_dataset[x] for x in tokenizer.model_input_names}
train_tf_dataset = tf.data.Dataset.from_tensor_slices((train_features, tf_train_dataset["label"]))
train_tf_dataset = train_tf_dataset.shuffle(len(tf_train_dataset)).batch(batch_size)

eval_features = {x: tf_eval_dataset[x] for x in tokenizer.model_input_names}
eval_tf_dataset = tf.data.Dataset.from_tensor_slices((eval_features, tf_eval_dataset["label"]))
eval_tf_dataset = eval_tf_dataset.batch(batch_size)

[Return to Top](#top)
 <a id = 'fineTuning'></a>
# Fine Tuning

In keeping with the Keras process we call model.compile first to make sure that all the pieces are in place.  We can follow that up with a call to model.summary() to make sure we've put the correct players together in the correct manner.  We can also see how much is trainable, which gives a sense of training time and resource requirements.

In [None]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=tf.metrics.SparseCategoricalAccuracy(),
)

In [None]:
model.summary()

Model: "tf_bert_for_sequence_classification"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 bert (TFBertMainLayer)      multiple                  108310272 
                                                                 
 dropout_37 (Dropout)        multiple                  0         
                                                                 
 classifier (Dense)          multiple                  1538      
                                                                 
Total params: 108311810 (413.18 MB)
Trainable params: 108311810 (413.18 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


Then we call model.fit to perform the actual training.  Note that many times, because of what the model has learned about language in it pre-training phase we can limit our training to a small number of epochs, sometimes as few as one or two.

In [None]:
model.fit(train_tf_dataset, validation_data=eval_tf_dataset, epochs=2)

Epoch 1/2
Epoch 2/2


<keras.src.callbacks.History at 0x7b36aff5b640>

We'll be using TensorFlow and BERT in some class assignments but instead of using AutoClasses we'll actually add our own layers on top of BERT so we can build intuition about how it works.