## Introduction to the Transformers Library (Colab Recommended) 🤗


To recap, HuggingFace is an AI company that has blown up in the last few years, especially in the realm of Natural Language Processing (NLP).

In particular, the Transformers library has revolutionized the way people  work with large-scale transformer models. The goal of this challenge is to introduce you to these models for the first time and show how easy they can be to work with.

### Why you should love HuggingFace:

#### Pre-trained Models 📚:

One of the best features of the Transformers library is its huge repo of pre-trained models. Whether you're looking to employ BERT, GPT-2, T5, RoBERTa, or any of the other transformer architectures, chances are you'll find a version that suits your needs in their model hub.

#### It's super easy 👍:

The library is designed to be user-friendly. Loading a model and its corresponding tokenizer can be done in just a couple of lines of code. This simplicity extends to fine-tuning as well, allowing you to adapt these powerful models to a wide range of tasks. The `pipelines` library we'll be using lets you go from model selection to getting results in just a few lines.

#### Tokenizer  🔄 and Datasets 📊 Library:

Alongside the Transformers library, HuggingFace also offers the Tokenizers and Datasets libraries. While the first provides efficient and easy-to-use tokenization methods, the second offers a whole bunch of datasets, meaning you have all the tools and data you need in one ecosystem.

#### Community-Driven 🌐:
The HuggingFace community is very active and any community member (you included) can upload their own models and datasets.

__If you are working in Colab__ you'll need to install the appropriate libraries in your Colab environment (you will have them locally if you followed the setup instructions)

In [1]:
# Install the transformers library from HuggingFace
!pip install transformers torch pytesseract

Collecting pytesseract
  Downloading pytesseract-0.3.10-py3-none-any.whl (14 kB)
Installing collected packages: pytesseract
Successfully installed pytesseract-0.3.10


In [2]:
# You'll also need some extra tools that some of these models use under the hood
! pip install sentencepiece sacremoses

Collecting sacremoses
  Downloading sacremoses-0.1.1-py3-none-any.whl (897 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/897.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m163.8/897.5 kB[0m [31m4.9 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m897.5/897.5 kB[0m [31m14.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: sacremoses
Successfully installed sacremoses-0.1.1


Over the course of this notebook, you'll be using Pipelines to download and easily use some very powerful models. Bear in mind that some of these models are quite large (up to 500Mb so make sure you have some disk space free on your machine or run this notebook in a Colab with faster download speeds!).

We are going to be using pre-built models and the best resource for implementing them will be using the [Pipelines documentation](https://huggingface.co/docs/transformers/main_classes/pipelines). If you ever want to delete the models locally after use, you can find them here in your root directory at:

`/.cache/huggingface/hub`

In [3]:
from transformers import pipeline

### Basic Sentiment : 😀 /  😕 / 😠 / 😟

With that in mind, instantiate a pipeline for sentiment analysis __without__ specifying a model and try testing out that model with the sentence "Transformers are awesome!" Feel free to try some other sentences, too.

In [16]:
pipe = pipeline("text-classification")
pipe("The weather is gray and it is raining.")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9626681208610535}]

### Nuanced Sentiment 🤔

HuggingFace will default to using `distilbert-base-uncased-finetuned-sst-2-english` if we don't specify a model. This model will work fine on a lot of basic use cases, but - because it's been trained on a fairly limited corpus of text:

`The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford parser and includes a total of 215,154 unique phrases from those parse trees, each annotated by 3 human judges.`

It's fairly obvious that a model trained on this will likely perform poorly on sentences that include modern language: e.g. "These beats are sick!". Try running these sentences through your pipeline now and you should get negative scores even though they are expressing quite positive sentiment.

In [17]:
pipe = pipeline("text-classification")
pipe("This beat is sick.")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'NEGATIVE', 'score': 0.9997355341911316}]

Go to the list of HuggingFace models to see if you can find a model that will specialize on Twitter sentiment (looking for `"twitter-roberta-base-sentiment-latest"` might be a good place to start) - hopefully that should be a bit more up to date with all this new lingo! Now create a second pipeline, this time __specifying__ that model that we want to use (use `model=`) and see how our performance instantly improves now we're using a fine-tuned model.


In [20]:
pipe = pipeline("text-classification", model="cardiffnlp/twitter-roberta-base-sentiment-latest")
pipe("It is raining.")

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'label': 'neutral', 'score': 0.6948079466819763}]

You should see a much more accurate interpretation of the sentiment we're trying to express.

### Sentiment in other languages

While even our first pipeline will actually perform surprisingly well on simple sentences in other languages (e.g. "C' est bon" or "Esta bueno"), it breaks down when handling more sophisticated ideas in those languages.

Here is an example review for the Jurassic World Dominion movie 😬:

"This was frankly a spectacular failure from start to finish, with  remarkably uninspired performances from some very well-paid actors who acted with all the passion of a wet biscuit"

Tranlated into Korean it reads as this: "이것은 솔직히 처음부터 끝까지 엄청난 실패였으며 젖은 비스킷의 모든 열정으로 연기한 일부 매우 보수가 좋은 배우들의 현저하게 영감을 받지 못한 연기로 끝났습니다."

Try running the Korean text through either your Twitter model; you should see they won't pick up on how bad the review is.

In [21]:
pipe = pipeline("text-classification", model="cardiffnlp/twitter-roberta-base-sentiment-latest")
pipe("이것은 솔직히 처음부터 끝까지 엄청난 실패였으며 젖은 비스킷의 모든 열정으로 연기한 일부 매우 보수가 좋은 배우들의 현저하게 영감을 받지 못한 연기로 끝났습니다")

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'label': 'neutral', 'score': 0.7435014843940735}]

Now see if you can find a model that might perform better in the HuggingFace library and use it. Try using `"matthewburke/korean_sentiment"` in a `text-classification` pipeline and see if your results change.

In [22]:
pipe = pipeline("text-classification", model="matthewburke/korean_sentiment")
pipe("이것은 솔직히 처음부터 끝까지 엄청난 실패였으며 젖은 비스킷의 모든 열정으로 연기한 일부 매우 보수가 좋은 배우들의 현저하게 영감을 받지 못한 연기로 끝났습니다")

config.json:   0%|          | 0.00/887 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/498M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/552 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/396k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/788k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

[{'label': 'LABEL_0', 'score': 0.9602949023246765}]

In [27]:
pipe = pipeline("text-classification", model="KBLab/robust-swedish-sentiment-multiclass")
pipe("Det regnar!")

[{'label': 'NEUTRAL', 'score': 0.8396603465080261}]

### Translation ✍️

Let's stick with our language theme and see if we can find a model that can handle the tasks of translating some sentences for us. The `opus-mt` project from the University of Helsinki is incredibly active on HuggingFace, creating and maintaining models designed to democratize the translation process for many different global languages. Try implementing the `"Helsinki-NLP/opus-mt-<source-language>-<destination-language>"` to see if you can translate between two langauges (e.g. English to Spanish).

In [32]:
pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-en-sv")
pipe("It's raining!")

config.json:   0%|          | 0.00/1.38k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/295M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/790k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/815k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

[{'translation_text': 'Det regnar!'}]

In [33]:
pipe("Wow, that is a wonderful hat!")

[{'translation_text': 'Wow, det är en underbar hatt!'}]

### Summarization

Another really useful NLP task is summarizing a large amount of information into a very small amount of words. BART is a model that performs well on tasks like summarization; it contains a combination of two models you've already seen briefly in the lecture - the BERT model and autogressive style GPT model - check out this [link](https://www.projectpro.io/article/transformers-bart-model-explained/553) for some more information on it.

Since BART models can be quite large, try to find the `distilbart-xsum-12-6` model on HuggingFace which is one of the smallest distillations available (we'll talk more about distillations later!). Integrate that model into a `"summarization"` pipeline, then take some text (e.g. perhaps by copy-pasting or scraping from [a BBC article](https://www.bbc.com/news/topics/cx2pk70323et)) and summarize it with your pipeline!

N.B. You need to be careful about context windows - here, you may run into an issue with your input being too long for the model!

In [34]:
pipe = pipeline("summarization", model="sshleifer/distilbart-xsum-12-6")

config.json:   0%|          | 0.00/1.59k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/611M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

In [39]:
article = 'On Wednesday, September 21, 2022, the Library of Congress announced that it will be migrating to FOLIO using EBSCO FOLIO Services to transform collections management and access via its Library Collections Access Platform. The announcement made it clear that the Library of Congress saw the decision to move to FOLIO as a way to further develop and implement a new, open-source IT platform that will revolutionize how the Librarys vast physical and digital collections are managed and made accessible for the public, Congress, Library employees and other institutions.With more than 60 institutions already using EBSCO FOLIO, and many more coming on board over the next few years (including the MOBIUS consortium), the decision by the Library of Congress cements FOLIO, the library and vendor-built open-source library services platform, as a viable option for libraries worldwide and as part of a revolutionary decision with far-reaching impact.As Librarian of Congress Carla Hayden said in the press release:"This is a milestone in our journey to implement a user-centered approach to connecting more people to the Librarys collections. We are grateful for Congress generous investment in this next-generation system that is essential to the Librarys digital-forward strategy, which harnesses technology to bridge geographical divides, expand our reach and enhance our services. Part of this digital-forward strategy includes a new, modern approach to accessing high quality metadata, including an expansion of the work the Library of Congress has been doing on BIBFRAME, which its announcement referred to as a new bibliographic description standard being developed by the library and partner organizations that uses a linked data model to make bibliographic information more useful both within and outside the library community.EBSCO Information Services Executive Vice President of Library Services and Research Databases, Gar Sydnor, said this partnership with the Library of Congress is a great opportunity for the FOLIO project. Being able to collaborate with the Library of Congress team and bring a modern approach to digital access is an honor and a real statement about the power of open source technology. We know that the work we do for the Library of Congress, an international leader with more holdings than any other library system, will indeed have a revolutionary impact worldwide on libraries and their patrons.By providing key development resources to the project and by offering world class support services, Sydnor said, EBSCO helps make open-source solutions viable, even to the largest institutions. An open-source community that, by design, is backed by vendors like EBSCO, has helped prove the projects founding principles of adaptability, professional hosting and professional support. Sites can feel confident moving forward with open-source software because we are there to provide implementation, hosting and support services. Given the increased consolidation in the ILS/LSP and overall library services marketplace, FOLIO is — and always will be — open source, so institutions have an open-source option and the opportunity to select the best service providers for the platform.The term next-generation has been used to describe new services that incorporate electronic resources more seamlessly into library automation systems including integrated library systems and newer library services platforms. EBSCOs work with large systems such as the Library of Congress and consortia such as MOBIUS, exemplify the next generations next gen — the future gen. Being part of a collaboration to expand BIBFRAME and deeper Web discovery through linked data is something EBSCO is uniquely qualified to do. As one of the leading SaaS innovators, EBSCO is able to take advantage of its development capabilities, its understanding of holdings, and search and discovery, to innovate for the future.Whether leveraging its own services such as EBSCO Discovery Service (EDS), Panorama or Full Text Finder, or its expertise developing the FOLIO platform as a tool for innovation, EBSCO provides libraries with the choice to select and implement the solutions that make sense for them. EBSCO has created not only a strong hosting service for EBSCO FOLIO libraries but an unparallel implementation team of library professionals designed to manage and support customers through the entire process.Changing ILS/LSPs is a big decision, and it is essential that libraries, of all sizes, know they can depend on a company that has its foundation in the library space, is contributing to the FOLIO project in a multitude of ways, and can support them throughout the entire migration and implementation process as a member of their team. As conceived, FOLIO stood for the Future Of Libraries Is Open. Maybe it is more accurate to say the future generation of library solutions is open.'

In [40]:
pipe(article)

[{'summary_text': ' The Library of Congress has announced that it is to move to an open-source library services platform.'}]

### Going further: Question Answering 🔍

What if we wanted to go further than just a summary? Perhaps asking questions about a specific dataset in an intuitive way? There's a model for that, too! Enter the (reasonably small) `roberta-base-squad2` - a model trained on question-answer pairs that can answer a `question` about a provided `context` (a body of text you will provide). Check the docs [here](https://huggingface.co/deepset/roberta-base-squad2?context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species.&question=How+many+species+are+in+the+Amazon%3F).

You know the drill: Create a `"question-answering"` pipeline with the `roberta-base-squad2` model, then try putting the `article` you picked before as your context and try asking a `question` about it.

In [41]:
pipe = pipeline("question-answering", model="deepset/roberta-base-squad2")

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/496M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

In [44]:
QA_input = {
    'question': 'What library service platform is LC moving to?',
    'context': article}
pipe(QA_input)

{'score': 0.429806649684906, 'start': 97, 'end': 102, 'answer': 'FOLIO'}

### Speech to text 🎤

One of the best models for converting speech to text was made is the open source Whisper model made by OpenAI (creator of ChatGPT etc.) Take a look at the diagram of the model architecture - it should now look quite similar to those you've already seen today:


<img src = https://wagon-public-datasets.s3.amazonaws.com/data-science-images/lectures/Transformers/whipser.png width = 450px>

Run the following command to download this audio sample and install some additional required packages:

In [46]:
#Uncomment line below for Windows/ Linux/ Colab
!sudo apt install ffmpeg

#Uncomment line below for Mac users
#!HOMEBREW_NO_AUTO_UPDATE=1 brew install ffmpeg

!mkdir data
!curl https://wagon-public-datasets.s3.amazonaws.com/deep_learning_datasets/harvard.wav > data/harvard.wav

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
0 upgraded, 0 newly installed, 0 to remove and 35 not upgraded.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3173k  100 3173k    0     0  2374k      0  0:00:01  0:00:01 --:--:-- 2375k


You can listen to the clip by using the by importing `IPython` and loading the audio file (see the Algebra day recap for an example of how this is done!)

In [47]:
from IPython.display import Audio
Audio('data/harvard.wav', autoplay=True)

Output hidden; open in https://colab.research.google.com to view.

Find the smallest Whisper model version on HuggingFace (`whisper-tiny`) and use it to transcribe the audio. Try it on some other `.wav` files if you'd like!

In [48]:
pipe = pipeline("automatic-speech-recognition", model="openai/whisper-tiny")

config.json:   0%|          | 0.00/1.98k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/151M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/3.75k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/836k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.48M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/494k [00:00<?, ?B/s]

normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/34.6k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/185k [00:00<?, ?B/s]

In [49]:
pipe("data/harvard.wav")

{'text': ' The stale smell of old beer lingers. It takes heat to bring out the odor. A cold dip restores health in zest. A salt pickle tastes fine with ham. Tacos all pastora are my favorite. A zestful food is the hot cross bun.'}

In [52]:
pipe("/content/data/DilletanteYears_VocalTrack_231118-001.wav")



{'text': " One, two, breathe, tumble turn. All that's left I could learn."}

In [53]:
pipe("/content/data/10-Vox 4 - luftig-231220_1544.wav")



{'text': " I'm going to put the And satma, na plum mong, pokafe, havelka, wayori, tilselut, umidetala al-thati. I'm going to put it on the back of my mouth. I'm going to sleep. And sit man up low non-pocalfi, havelka, hajo di til slut, a nibetala ar fudid, sit man up low non-pocalfi, havelka, hajo di til slut, a nibetala ar fudid. And sit man, the plumb on the coffee, the velka, and your tea till slut, on the bed of the 30, set man up long"}

### Bonus: Let's get multimodal 😎: Visual Question Answering

We can even use question-answering style models on images if we'd like. Many of these models will use chains under the hood that will extract text from an image then pass it through to a language model. In order to use the following model you will need to make sure you `pip install Pillow pytesseract` which are two libraries that will help us to extract text from our images.

Once that's done, we're going to create a `"document-question-answering"` pipeline - we'll need a model for it, so search for the `layoutlm-invoices` model on HuggingFace. Then try to ask questions about this [`receipt.webp`](https://wagon-public-datasets.s3.amazonaws.com/data-science-images/lectures/Transformers/receipt.webp) (you download the image to your data folder or you can pass the url directly into your model when you call it). Try asking how much the eggs cost, what sales tax was and what the total was. Feel free to try it on some of your own images!

For this to run, you'll need some dependencies:

In [54]:
#For Mac, uncomment:
#!brew install tesseract

#For Linux or Colab etc. uncomment these:
!sudo apt install tesseract-ocr
!sudo apt install libtesseract-dev

# Then restart your kernel and give it a try!

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  tesseract-ocr-eng tesseract-ocr-osd
The following NEW packages will be installed:
  tesseract-ocr tesseract-ocr-eng tesseract-ocr-osd
0 upgraded, 3 newly installed, 0 to remove and 35 not upgraded.
Need to get 4,816 kB of archives.
After this operation, 15.6 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 tesseract-ocr-eng all 1:4.00~git30-7274cfa-1.1 [1,591 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/universe amd64 tesseract-ocr-osd all 1:4.00~git30-7274cfa-1.1 [2,990 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy/universe amd64 tesseract-ocr amd64 4.1.1-2.1build1 [236 kB]
Fetched 4,816 kB in 1s (5,116 kB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debc

In [55]:
pipe = pipeline("document-question-answering", model="impira/layoutlm-invoices")

config.json:   0%|          | 0.00/893 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/511M [00:00<?, ?B/s]

Some weights of the model checkpoint at impira/layoutlm-invoices were not used when initializing LayoutLMForQuestionAnswering: ['token_classifier_head.bias', 'token_classifier_head.weight']
- This IS expected if you are initializing LayoutLMForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LayoutLMForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/315 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

In [67]:
from PIL import Image

image = Image.open("data/receipt.webp")
pipe(question = 'How much were the taxes?',
    image = image)

[{'score': 0.9999734163284302, 'answer': '$1.61', 'start': 98, 'end': 98}]

Congrats 🎉 You've just seen how simple it can be to start working with some advanced Transformer-based models and we've only just scratched the surface.

There are so many models you can explore in the HuggingFace library for all kinds of different tasks. Your imagination is literally the limit (well - your compute power can also be a limit somtimes 😅). To take these models even further for custom usage, we're going to tackle fine-tuning next.

⚠️⚠️⚠️ If you have been running these models locally, don't forget to clean up your `/.cache/huggingface/hub` if you're limited on space or you'll have a lot of unwanted models hanging around in your cache 🧹 ⚠️⚠️⚠️