<a id = 'top'></a>

#  A quick-start guide to fine-tune a BERT model with Keras
  * A. [What is BERT?](#introBERT)
  * B. [What is fine-tuning?](#fineTuned)
  * C. [Datasets](#datasetClass)
      * 1. [IMDB Description](#IMDBdesc)
      * 2. [Exploratory Data Analysis](#EDA)
  * D. [Model Preparation](#modelPrep)
      * 1. [Model Selection](#modelSelection)
      * 2. [Tokenizer Selection](#tokenizerSelect)
      * 3. [Auto Model](#autoModel)
      * 4. [Encode Data](#encodeData)
  * E. [Fine-Tuning The Model](#fineTuning)
     

Hugging Face is a company that offers a library of "transformers" as well as a collection of pre-trained language models.  These represent one source of code and abstract classes as well as a variety of documentation and examples. We are going to explore one way of working with these models at a very high level.  In later classes, when we have covered how a transformer works, we'll come back and look at them at a deeper level.  This tutorial is designed to look at the Huggingface library at the same level of abstraction as the Keras Sequential API rather at the lower level of abstraction of TensorFlow and the Keras Functional API.

---

Larger models, with millions or billions of paramters can only be trained on a machine with a GPU.  Do not run this notebook on your GCP instance as the training epochs will take entirely too long.  If you run this notebook in a Colab, it is automatically configured to use a GPU.


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datasci-w266/2024-fall-main/blob/master/materials/walkthrough_notebooks/bert_as_black_box/Keras_HuggingFace_Transformers_BERT_notebook.ipynb)

[Return to Top](#top)
 <a id = 'introBERT'></a>
# What is BERT?
This notebook leverages one of a variety of BERT models.  BERT models can be classified in terms of three parts.  The first part is a component named a transformer.  These can grow to be quite large.  BERT consists of either 12 or 24 layers of transformers. The second part is the training (called pre-training) the model already has on language.  The pre-training is characterized by one or more tasks.  The third part consists of the very specific tasks it is geared toward performing.  Different models use different sizes and layers of transformers and may be optimized for different languages and different tasks.  For example, CamemBERT is trained in French and SciBERT is trained on scientific journal articles.  You'll want to make sure you use a model appropriate to your language and task.

---

The [HuggingFace web site](https://huggingface.co/transformers) offers an interesting set of resources.  Their [model documentation](https://huggingface.co/transformers/model_summary.html) provides an excellent explanation of transformers as well as the growing variety of models they offer (see the right hand navigation column).  In addition, their collection of [notebooks](https://huggingface.co/transformers/notebooks.html) is a valuable set of examples.  

---

One word of caution:  this is a rapidly evolving resource and as a result you can often run in to bugs.  They will get fixed, eventually, but may be buggy for a while.  

[Return to Top](#top)
 <a id = 'fineTuned'></a>
 # Fine-tuning a Model

We'll use abstract classes that simplify the process of training by consolidating a number of pieces under one class. It's a good way to begin working with these models.  PyTorch is the native computational graph language used in Hugging Face. However, they make a point of porting models to TensorFlow, Google's computational graph language. Many models first get put on HuggingFace in PyTorch. Eventually they get ported over to TensorFlow. Depending on what model you want to use, you may have to run the PyTorch version. It's important to always be aware of which dialect you're using. The good news is that HuggingFace has built these models so that the underlying weight parameters can be used across PyTorch and TensorFlow implementations. It is simply the commands you use to construct, run, and manipulate the model that are in PyTorch or TensorFlow.  This notebook will demonstrate fine-tuning a TensorFlow HuggingFace model using Keras.  To do this, we'll need to select a data set, a model, and be sure to invoke its tokenizer.

 Borrowing liberally from the fine-tuning description in https://huggingface.co/transformers/training.html

In [None]:
!pip install -q transformers

[Return to Top](#top)
 <a id = 'datasetClass'></a>
# Datasets



HuggingFace provides [a class for the managing datasets](https://huggingface.co/docs/datasets/index). They also provide a library of actual data that is accessible via this datasets class. We'll take advantage of the datasets object in Huggingface to access some well known corpora, specifically IMDB.

In [None]:
!pip install -q datasets

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/471.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m471.6/471.6 kB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/116.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m10.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m12.6 MB/s[0m eta [36m0:00:00[0m
[?25h

[Return to Top](#top)
 <a id = 'IMDBdesc'></a>
### IMDB Description

IMDB is a set of movie reviews. It is set up for a binary sentiment classification task.  It is good for learning how to work with HuggingFace Transformers library and also good for baselines.

In [None]:
from datasets import load_dataset

raw_datasets = load_dataset("imdb")

README.md:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

unsupervised-00000-of-00001.parquet:   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

[Return to Top](#top)
 <a id = 'EDA'></a>
### Exploratory Data Analysis

Let's look inside the IMDB dataset and see what it contains.  We see it is already split into train, test, and unsupervised records.  

In [None]:
raw_datasets

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label'],
        num_rows: 50000
    })
})

Here is one sample record from the test set.  Each record contains a label and some text.  Different data sets will have different parts in their records.

In [None]:
raw_datasets['test'][0]

{'text': 'I love sci-fi and am willing to put up with a lot. Sci-fi movies/TV are usually underfunded, under-appreciated and misunderstood. I tried to like this, I really did, but it is to good TV sci-fi as Babylon 5 is to Star Trek (the original). Silly prosthetics, cheap cardboard sets, stilted dialogues, CG that doesn\'t match the background, and painfully one-dimensional characters cannot be overcome with a \'sci-fi\' setting. (I\'m sure there are those of you out there who think Babylon 5 is good sci-fi TV. It\'s not. It\'s clichéd and uninspiring.) While US viewers might like emotion and character development, sci-fi is a genre that does not take itself seriously (cf. Star Trek). It may treat important issues, yet not as a serious philosophy. It\'s really difficult to care about the characters here as they are not simply foolish, just missing a spark of life. Their actions and reactions are wooden and predictable, often painful to watch. The makers of Earth KNOW it\'s rubbish as 

Here is a utility function that leverages the dataset structure to display 10 random records from the dataset and loads them in a data frame.

In [None]:
#from https://github.com/huggingface/notebooks/blob/master/examples/text_classification.ipynb
import datasets
import random
import pandas as pd
from IPython.display import display, HTML

def show_random_elements(dataset, num_examples=10):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)

    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [None]:
show_random_elements(raw_datasets['test'])

Unnamed: 0,text,label
0,"... when dubbed into another language. Let's face it: Neither Nielsen nor Schwarzenegger are really good actors when it comes to dialog. And given the campy lines they are supposed to utter this is a loose-loose situation. Any type of voice-over is sure to be an improvement (and it actually is - at least in the German version).<br /><br />But that is only a minor point. The acting is bad. The speeded up combat sequences are pathetic. Nielsen couldn't use her sword to fight her way out of a wet paper bag. This becomes painfully obvious when compared to the fluidity of motion exhibited by the kid (who has had some martial arts training, no doubt) and to the athleticism shown off by Sandahl Bergman.<br /><br />Schwarzenegger does his Conan thing - nothing new here.<br /><br />Some of the visuals are nice, I'll have to grant that. The dragon skeleton bridge looks cool. But more often than not the plaster is all too evident.<br /><br />Overall the movie isn't worth seeing. Even 'Conan the Destroyer' is better than this (although only marginally). I would have much rather seen Bergman as Red Sonja as she was originally supposed to be, but I doubt that that could have saved this movie - oh well.<br /><br />3/10",neg
1,"Bobbie Phillips, who in her own right has amassed a great list of credits as a hard working Hollywood actress, shines in this third installment of UPN and Village Roadshow's Chameleon series. In this installment, the sexual innuendo has been toned down with Kam showing a caring maternal side towards a recently orphaned genius teen. Bobbie delivers this role to the viewers with great panache'. The action and stunts were the best in the series.",pos
2,"This is an excellent movie. There were several parts to the movie I liked. This movie is very funny! Visit the Ernest fun club web site at www.ernestfunclub.com There are several movies such as the following: Ernest Goes to Camp, Ernest Saves Christmas, Ernest Goes to Jail, Ernest in the Army, Ernest Goes to School, Ernest Rides Again Slam Dunk Ernest etc. I highly recommend these for family movies. All star Jim Varney again try visiting www.ernestfunclub.com Which is the best Ernest movie? In my opinion there are actually 2 Ernest Goes to Camp and Ernest Goes to Jail. So if you have never seen Ernest P. Worrell its time to go and see him. You will find him quite satisfactory ""no what I mean""?",pos
3,"SPOILERS AHEAD For the first ten minutes or so of Star Witness we're introduced to a quote typical urban American family unquote in a nameless city, which is another way of saying Warner Brothers' version of NYC. Except for the young children, including the charming Dickie Moore, and sprightly Sally Blane, they're a pretty dreary lot, and their dinner table conversation is tedious and we wish the story would move along and bring in the star, Walter Huston. But wait, folks, wait. All of a sudden serious gangster movie action breaks out, drawing the family in against their will, and after that this baby never lets up. There's suspense, an Oscar-nominated script, good acting; everything you want old movies to beit is here. I do question Chic Sales performance; he must be an acquired taste, but his presence turns out to be crucial to the plot. He's treated to special status in the credits, so Warner Bros. must have really been high on Sale, but how his corny old man routine fit in with the public then is something lost to me. Perhaps it is lost to time period, an unknowable factor you had to be a 1931 moviegoer to understand. Also, the climax is typically melodramatic. Nevertheless, this right now is the best release of the studio that year I have seen so far (however, I've only seen eight, so perhaps that's an inconclusive view). Do not miss this when TCM shows it. 8 out of 10.",pos
4,"Start with the premise that you will do anything to replace your lost love with a look-alike. Throw in your scientific knowledge of a deforming disease (isn't this the stuff that Leo G. Carroll contracted from the spider venom in ""Tarantula""). Throw in the fact that the main character, instead of finding some way to attract the young woman, engages in heavy-handed stalking, until he totally draws attention to himself and has to hatch this insane plot: If he can make the girl's father sick, then help him recover, she will marry him. The problem is that most of the events are random and unpredictable. Anyone with half a brain would have seen through things. There's a third party, a woman that the doctor, played by J. Carroll Naish, has treated with great insensitivity. You know she is going to be a factor. There's also a gorilla kept in a cage who is used occasionally for heaven know's what. Oh well. There is so little sense to this who thing that it plays itself out and people get their just desserts.",neg
5,"So it's not an award winner, so what? Have you ever wanted to see a film that was just silly? ""The Villain"" and this one could top the list.My husband says that ""Jekyll and Hyde Together Again"" is one of those movies that if ""you've been there and done that"" you'll think this spoof on the 80's cocaine culture is a riot. I think the whole film is just fun. Nothing is sacred; hospitals, plastic surgery, Howard Hughes.... There are ongoing gags that you have to watch for to appreciate. To say that the film doesn't follow the book would be true, but then a lot of really good films take liberties with the published word also. I recommend this movie to all the old ""stoners"" among us. We may be smarter now, but we will still recognize and laugh at many folks we knew (ourselves?) back in the old days.",pos
6,"I saw it at a German press screening. Without giving too much away: Most critics really seemed to like it very much. There was even applause afterwards, which is quite unusual for that species. From my point of view and until now, it was the funniest movie of the year. It keeps the charm and wit of the three W+G shorts and it is enlarged with many references to these and other movies. Of course, there are obvious allusions to monster- and werewolf-movies, especially to ""An American Werewolf in London"", ""Jaws"", ""King Kong"" and even to Peter Jackson's ""Braindead""/""Dead Alive"", but also to other genres.<br /><br />Characterization was better done in ""Chicken Run"", but that movie had a complete new ""cast"" where introduction was necessary. Here, you are already able to know the two main characters. So, the new ""Wallace and Gromit""-movie is enjoyed best if you watched (and liked) the shorts already, yet it also works on its own. ""Chicken Run"" had the more convenient, but also more ""storytelling"" plot. Instead, this new Aardman masterpiece keeps that crazier and somehow more ""isolated"" feeling of the W+G shorts. Children should also enjoy it very much, especially because of the sweet rabbits (if you love cute bunnies, this is a must-see for you!!!) and because Gromit has a lot do to and really steals the show (children also love dogs... :-) ). But many jokes are thought for a more adult audience (there are even soft sexual allusions in it). The movie manages, like ""Shrek 1+2"" and ""The Incredibles"", to fulfil high level entertainment for the whole family, with adding a British and at least a little bit darker edge to the humour of American animated movies.<br /><br />The animation is  as expected  superb, and they kept true to the Aardman style because they didn't put in too many digital effects - I realized just a few when it came to Wallace's inventions.<br /><br />Finally, the score works fine in the movie, although one of the main themes definitely is ""borrowed"" by Randy Edelman's ""Dragonheart"" score.<br /><br />The bad thing is: It will probably take another six years from now until we can see a new animated gem from Nick Park & Co.",pos
7,"Really don't care that no one on here likes this movie,, i do , and that's what this review is about. Lou Diamond Phillips is great in this comedic role. that line about train a b and c is now to me an instant classic, the cg is great, yeah train looks a little fake,, but the aliens wow do they ever rock,, Todd Bridges,, where's Arnold, and Mr. Drummond,, wow he's been out of the loop , guess that's what jail does to you.. a bullet train is on it's way to Las Vegas with the Senator for him to deliver a big speech, a meteor has just hit,, and now all of a sudden we got aliens running loose aboard the train, and our hero cop has to save the day, to make matters worse his ex-wife is on board arguing with him. i just thought this movie was so wonderful,, a must see if you like action.",pos
8,"Retro Puppet Master starts in Kolewige during 1944 where puppet master Andre Toulon (Guy Rolfe) & his living puppets plan to escape Germany, hold up in an Inn puppet master Toulon reminisces about his early life & the point at which he learned the secret of giving life to dead objects way back in 1902 in Paris when his younger self (Greg Sestero) ran the Theate Magique. He describes the fateful night when he met a 3000 year old Egyptian sorcerer named Afzel (Jack Donner) & the eventual love of hi life the young & beautiful Ilsa (Brigitta Dau). He tells the story of how Afzel passed the gift of life to himself & gave life to his own wooden puppets that were part of the Theatre Magique show. However the gift of life was also a curse as the ancient God Sutek whom the secret was stolen from in the first place by Afzel wants it back & everyone who has learnt it dead...<br /><br />As of late I have been on a bit of a Puppet Master bender as being a big fan of the first three I decided to watch the rest of the franchise & as such I have seen Puppet Master 4 (1993), Puppet Master 5: The Final Chapter (1994), Curse of the Puppet Master (1998) & now Retro Puppet Master in the space of a couple of weeks & boy was it tough to get through them all, especially this one as it's the worse of the series so far. Retro Puppet Master feels like a cross between Puppet Master III: Toulon's Revenge (1991) with it's period setting & Puppet Master 4 & Puppet Master 5: The Final Chapter with Sutek trying to kill everyone associated with his stolen life giving secret. There's not much continuity here either, again there's none of the green serum featured in the earlier films & despite Andre Toulon committing suicide in 1939 at the start of the original Puppetmaster (1989) he is seen alive & well during 1944 in this. The majority of the story is told as a flashback & concentrates on Andre Toulon himself rather than the puppets, the film focuses on his relationship with Ilsa & him learning the secret of life & it's all rather dull & tedious stuff to be honest. Even at only 80 odd minutes Retro Puppet Master feels long & padded with no real pace & the no central concept as the plot never really settles down & generally hops around a lot. Then of course there's the baffling decision to totally redesign the puppets which I found incredible, I mean why would the makers take the one basic thing that made the Puppet Master films so memorable & completely do away with it? The puppets are seen briefly at the start & the end but otherwise we get these rubbishy looking wooden caricatures that are nowhere near as cool as their modern re-workings. It's never even explained why these puppets were used rather than the ones all Puppet Master fans have come to love although one suspects that Full Moon was hoping to make yet another sequel which dealt with that very question.<br /><br />If a poor story & a complete lack of our favourite puppets wasn't bad enough Full Moon decided to go with a PG-13 rating for this making Retro Puppet Master the only Puppet Master film not rated 'R' in the US (obviously other countries have their own film ratings systems) & therefore there's not a single drop of blood in the entire film, the puppets don't kill anyone, there's no swearing & no nudity either. This is tamer than tame kids stuff all the way. Besides the puppets themselves being rubbish the special effect are the wost of the series too, there's no stop motion animation at all in this one, no CGI computer effects (surely in 1999 CGI was cheap enough?) & all the effects are of the stiff rod puppet type effects. I mean whenever you see a puppet 'walk' the camera is always positioned above it's wait so it's legs don't have to be shown & there's obviously some production assistant just pushing the thing along, that's as complex & state of the art as the special effects get.<br /><br />The one positive thing that Retro Puppet Master does have going for it is that it looks rather nice, the period production design, costumes & props are actually quite impressive & it's a fairly handsome film to watch at times. Apparently filmed in Bucharest in Romania which doubles up quite nicely for turn of the last century Paris. The acting here is awful & maybe the worst of the series.<br /><br />Retro Puppet Master is more or less the final Puppet Master film as the next one Puppet Master: The Legacy (2004) basically edits together footage from the previous seven films & it's a pretty crappy way to round the series off which started so well with three excellent & distinctive little killer puppet flicks. Don't bother with this, just watch one of the first three again & just remember the good times... The killer puppets would return in the terrible spin-off flick Puppet Master vs Demonic Toys (2004).",neg
9,"This film almost gave me a nervous breakdown. When I was recovering from appendicitis a few years ago, I had just started teaching in secondary (high) school. The whole teaching business was all a bit nervewracking for a beginner, but to mentally prepare myself for going back into the classroom I decided to watch some rather awful films. The Flintstones was one of the films that I chose, and then I put ""King Of The Streets"" (the UK title of 'Alien Warrior') on. Just before it finished I found myself almost in tears at the sheer waste of it all...my life, the film stock, the £2 I had paid for it a couple of weeks ago in the Blockbuster ex rental section, the time it must have taken to print the sleeve art, the effort of the editors and musicians involved in the soundtrack (as negligable as their efforts were)...the list goes on.<br /><br />I love bad films. Let me make this perfectly clear - I LOVE watching crappy films from Blockbusters. Me and my mate Dan used to sit and watch many, many cheapjack horrors and laugh at them. But this was a different type of crappy film. I don't think that anything has come close to this, not even Tobe Hooper's ""Death Trap"" (which is probably my second worst film in the world). The whole making a car from abandoned parts section nearly killed me; the repetition of music at any available opportunity, regardless of on-screen events; the whole.... AAAGGGHHHHHHH!!!!! I can't even carry on with this 'critical' dissection, as my gag reflex has started. The futility of that film, even now, three years after I watched it for the first and last time, still renders me speechless (but I am still able to type). The whole ""making a car from odd parts"" section had me contemplating horrible things.<br /><br />I implore you, if you are interested in watching this film, just gaze at the cover of the video and imagine the worst possible version of the story synopsis on the back. I can almost guarantee that it won't be even half as bad as this film actually is. Don't, under any circumstances, contemplate actually watching it for any reason whatsoever. Not if you are a Christian and you want to see a Christ allegory; not if you are a bad movie afictionado and you want to experience the true nadir of trash; not even if you want your life to seem longer (and believe me, every second that this film runs seems like at least a minute). Make no mistake about it, this film is unholy. It is the antichrist in video form. As Bo Cattlett in Get Shorty said: ""I've seen better film on teeth"".",neg


[Return to Top](#top)
 <a id = 'modelPrep'></a>
## Model Preparation

[Return to Top](#top)
 <a id = 'modelSelection'></a>
### Model Selection

We need to pick the model we are going to train to classify IMDB.  We'll do that in several stages. First we define some variables to hold information about the model that we'll re-use.

In [None]:
model_checkpoint = "bert-base-cased"
batch_size = 8

[Return to Top](#top)
 <a id = 'tokenizerSelect'></a>
### Tokenizer Selection

We'll use the [AutoTokenizer](https://huggingface.co/docs/transformers/main/en/autoclass_tutorial#autotokenizer) object to avoid simple configuration mistakes because it insures that we get the correct tokenizer given our pre-trained model.  This time the model we're using is BERT and we're selecting the cased version (meaning case is preserved) and the base version (rather than the large version).

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]



What's the tokenizer doing?  It's taking care of breaking down a sentence into the parts the model can understand and was trained on, as well as a bunch of housekeeping that's needed by the model in order to work properly.  Once we've covered how a transformer works in live session, we'll come back (in week 4) and discuss its various components.  For now, you don't need to understand it in order to make use of it.

Here's one example of what the tokenizer outputs.

In [None]:
tokenizer("Hello, we only need one sentence for our task but these reviews often have more.")

{'input_ids': [101, 8667, 117, 1195, 1178, 1444, 1141, 5650, 1111, 1412, 4579, 1133, 1292, 3761, 1510, 1138, 1167, 119, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

The tokenizer converts the incoming words to integer ids that are used to retrieve the model's input word embeddings.  All tokenizers convert words to input ids.  The wrong tokenizer will produce the wrong set of token ids and result in very poor predictions.  The AutoTokenizer insures the correct ids are assigned.

In [None]:
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

In [None]:
small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))
full_train_dataset = tokenized_datasets["train"]
full_eval_dataset = tokenized_datasets["test"]

[Return to Top](#top)
 <a id = 'autoModel'></a>
### Automatic Model Configuration

We'll use the [AutoModel abstraction](https://huggingface.co/docs/transformers/main/en/autoclass_tutorial#automodel) and invoke the TensorFlow port since we want to use Keras to train and run the model.  As a result we'll instantiate a copy of TFAutoModelForSequenceClassification.  Note the 'TF' at the begining of the class name to designate it as a TensorFlow port.  The model for "sequence classification" is specifically structured to perform classification based on sequences of words like a sentence.  HuggingFace provides a set of models specifically configured for [particular NLP tasks](https://huggingface.co/docs/transformers/main/en/model_doc/auto) as shown by all of the AutoModelFor *FillInTheTask*.

In [None]:
import tensorflow as tf
from transformers import TFAutoModelForSequenceClassification

model = TFAutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=2)

model.safetensors:   0%|          | 0.00/436M [00:00<?, ?B/s]

All PyTorch model weights were used when initializing TFBertForSequenceClassification.

Some weights or buffers of the TF 2.0 model TFBertForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[Return to Top](#top)
 <a id = 'encodeData'></a>
### Encode Data


Let's create the encoded data for training.  Because we're using the TensorFlow port, we'll need to convert our PyTorch dataset object contents to the TensorFlow version.  HuggingFace provides some nice conversion functions to assist in the process.

In [None]:
tf_train_dataset = small_train_dataset.remove_columns(["text"]).with_format("tensorflow")
tf_eval_dataset = small_eval_dataset.remove_columns(["text"]).with_format("tensorflow")

In [None]:
train_features = {x: tf_train_dataset[x] for x in tokenizer.model_input_names}
train_tf_dataset = tf.data.Dataset.from_tensor_slices((train_features, tf_train_dataset["label"]))
train_tf_dataset = train_tf_dataset.shuffle(len(tf_train_dataset)).batch(batch_size)

eval_features = {x: tf_eval_dataset[x] for x in tokenizer.model_input_names}
eval_tf_dataset = tf.data.Dataset.from_tensor_slices((eval_features, tf_eval_dataset["label"]))
eval_tf_dataset = eval_tf_dataset.batch(batch_size)

[Return to Top](#top)
 <a id = 'fineTuning'></a>
# Fine Tuning

In keeping with the Keras process we call model.compile first to make sure that all the pieces are in place.  We can follow that up with a call to model.summary() to make sure we've put the correct players together in the correct manner.  We can also see how much is trainable, which gives a sense of training time and resource requirements.

In [None]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=tf.metrics.SparseCategoricalAccuracy(),
)

In [None]:
model.summary()

Model: "tf_bert_for_sequence_classification"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 bert (TFBertMainLayer)      multiple                  108310272 
                                                                 
 dropout_37 (Dropout)        multiple                  0 (unused)
                                                                 
 classifier (Dense)          multiple                  1538      
                                                                 
Total params: 108311810 (413.18 MB)
Trainable params: 108311810 (413.18 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


Then we call model.fit to perform the actual training.  Note that many times, because of what the model has learned about language in it pre-training phase we can limit our training to a small number of epochs, sometimes as few as one or two.

In [None]:
model.fit(train_tf_dataset, validation_data=eval_tf_dataset, epochs=2)

Epoch 1/2
Epoch 2/2


<tf_keras.src.callbacks.History at 0x7b71fe0c5060>

We'll be using TensorFlow and BERT in some class assignments but instead of using AutoClasses we'll actually add our own layers on top of BERT so we can build intuition about how it works.