In [1]:
%pip install --quiet transformers

Note: you may need to restart the kernel to use updated packages.


In [2]:
%pip install --quiet gradio==4.27.0

Note: you may need to restart the kernel to use updated packages.


In [3]:
#The main package that contains functions to use Hugging Face
import transformers

#Set to avoid warning messages.
transformers.logging.set_verbosity_error()

## Reviewing the Pipeline

Use the pipeline registry to look at available pipeline tasks and also explore a specific pipeline

In [4]:
from transformers.pipelines import PIPELINE_REGISTRY

#Get the list of tasks that are supported by Huggingface pipeline
print(PIPELINE_REGISTRY.get_supported_tasks())

['audio-classification', 'automatic-speech-recognition', 'conversational', 'depth-estimation', 'document-question-answering', 'feature-extraction', 'fill-mask', 'image-classification', 'image-feature-extraction', 'image-segmentation', 'image-to-image', 'image-to-text', 'mask-generation', 'ner', 'object-detection', 'question-answering', 'sentiment-analysis', 'summarization', 'table-question-answering', 'text-classification', 'text-generation', 'text-to-audio', 'text-to-speech', 'text2text-generation', 'token-classification', 'translation', 'video-classification', 'visual-question-answering', 'vqa', 'zero-shot-audio-classification', 'zero-shot-classification', 'zero-shot-image-classification', 'zero-shot-object-detection']


In [5]:
#Get information about a specific task
print("\nDefault Model for Translation: ")
print(PIPELINE_REGISTRY.check_task('translation')[1].get('default'))


Default Model for Translation: 
{('en', 'fr'): {'model': {'pt': ('google-t5/t5-base', '686f1db'), 'tf': ('google-t5/t5-base', '686f1db')}}, ('en', 'de'): {'model': {'pt': ('google-t5/t5-base', '686f1db'), 'tf': ('google-t5/t5-base', '686f1db')}}, ('en', 'ro'): {'model': {'pt': ('google-t5/t5-base', '686f1db'), 'tf': ('google-t5/t5-base', '686f1db')}}}


## Loading a Pipeline

In [6]:
from transformers import pipeline
import os

#Load a pipeline. This will download the model checkpoint from huggingface and cache it
#locally on disk. If model is already available in cache, it will simply use the cached version
#Download will usually take a long time, depending on network bandwidth

text_translation_classifier = pipeline(task="translation",
                                       model="google-t5/t5-base")

#Cache usually available at : <<user-home>>.cache\huggingface\hub

cache_dir = os.path.expanduser('~') + "/.cache/huggingface/hub"
print("Huggingface Cache directory is : ", cache_dir)

#Contents of cache directory
os.listdir(cache_dir)




For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on google-t5/t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


Huggingface Cache directory is :  C:\Users\batoo/.cache/huggingface/hub




['.locks',
 'models--bert-base-uncased',
 'models--facebook--bart-large-cnn',
 'models--facebook--mbart-large-50-many-to-many-mmt',
 'models--google-t5--t5-base',
 'models--microsoft--DialoGPT-medium',
 'models--microsoft--speecht5_tts',
 'models--sshleifer--distilbart-cnn-12-6',
 'version.txt']

## Predicting Translation with Default Model

In [7]:
#Predict translation using the pipeline
translation_results=text_translation_classifier("translate English to French: This is a great course")
print(translation_results)

[{'translation_text': "C'est un excellent cours"}]


## Using a Custom Model for Text Translantion

In [8]:
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast

In [9]:
model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")

# translate English to Swedish
tokenizer.src_lang = "en_XX"
tokenizer.tgt_lang = "sv_SE"

text_translation_classifier = pipeline(task="translation",
                                model=model,
                                tokenizer=tokenizer,
                                src_lang="en_XX", tgt_lang="sv_SE")

translation_results=text_translation_classifier("This is a great course")

print(translation_results)

#Contents of cache directory
os.listdir(cache_dir)

[{'translation_text': 'Det här är en fantastisk kurs'}]


['.locks',
 'models--bert-base-uncased',
 'models--facebook--bart-large-cnn',
 'models--facebook--mbart-large-50-many-to-many-mmt',
 'models--google-t5--t5-base',
 'models--microsoft--DialoGPT-medium',
 'models--microsoft--speecht5_tts',
 'models--sshleifer--distilbart-cnn-12-6',
 'version.txt']

In [10]:
import gradio as gr

def text_translation(text='This is a great course'):
    translation_results=text_translation_classifier(text)
    return translation_results[0]["translation_text"]

# text_summarization()
demo = gr.Interface(fn=text_translation, inputs="text", outputs="text")
demo.launch()

Running on local URL:  http://127.0.0.1:7860
IMPORTANT: You are using gradio version 4.27.0, however version 4.29.0 is available, please upgrade.
--------

To create a public link, set `share=True` in `launch()`.


