# Hugging face

In [None]:
# using sentiment analysis pipeline from Hugging Face Transformers
from transformers import pipeline

classifier = pipeline(
    "sentiment-analysis",
    device=0,  # Use GPU if available, otherwise use CPU
)
classifier(["I love using Hugging Face Transformers!", "I hate waiting in long lines."]  )

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


[{'label': 'POSITIVE', 'score': 0.9971315860748291},
 {'label': 'NEGATIVE', 'score': 0.9968921542167664}]

In [5]:
# using zero shot classification pipeline from Hugging Face Transformers
from transformers import pipeline
classifier  = pipeline(
    "zero-shot-classification",
    device=0,  # Use GPU if available, otherwise use CPU
)
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cuda:0


{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445956110954285, 0.1119767278432846, 0.04342765733599663]}

In [7]:
# text generation with Hugging Face Transformers
from transformers import pipeline

generator  = pipeline(
    "text-generation",
    device=0,  # Use GPU if available, otherwise use CPU
)
generator(
    "In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains.",
    max_length=50
)

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': "In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains.\n\nThey were said to have been made up of a single unicorn, which was said to have been able to communicate from its lower jaw to its upper and throat.\n\nIt was believed that a single specimen was needed to give researchers a better understanding of the unicorns.\n\nThe discovery at the moment has not yet been confirmed by the scientific community.\n\nThe team of researchers from the Universidad Nacional de los Andes (UNAM), in Chile, are now working on a project to study the unicorns in a larger area in Bolivia.\n\nThe researchers are also hoping to find out more about these creatures.\n\nThe team is currently working on a project to study the unicorns in Bolivia. Photo: AFP\n\n'We're looking at what the size of these animal was and what the species of unicorn it represents.' said Juan Miguel Vazquez-Puiguas, the project lead fr

In [10]:
# using fill masks
from transformers import pipeline

unmasker = pipeline(
    "fill-mask",
    device=0,  # Use GPU if available, otherwise use CPU
)
unmasker("France is <mask> country.", top_k=2)

No model was supplied, defaulted to distilbert/distilroberta-base and revision fb53ab8 (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0
  return forward_call(*args, **kwargs)


[{'score': 0.2671811878681183,
  'token': 277,
  'token_str': ' another',
  'sequence': 'France is another country.'},
 {'score': 0.19830884039402008,
  'token': 5063,
  'token_str': ' neither',
  'sequence': 'France is neither country.'}]

In [12]:
# using question answering
from transformers import pipeline

question_answerer = pipeline(
    "question-answering",
    device=0,  # Use GPU if available, otherwise use CPU
)
question_answerer(
    question="What is the capital of France?",
    context="France is good"
)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


{'score': 0.5322698950767517, 'start': 0, 'end': 6, 'answer': 'France'}

In [13]:
# ussing summarization
from transformers import pipeline
summarizer = pipeline(
    "summarization",
    device=0,  # Use GPU if available, otherwise use CPU
)
summarizer(
    "The Transformers library is an open-source library for natural language processing (NLP) tasks. It provides pre-trained models and tools for tasks such as text classification, question answering, and text generation. The library is widely used in the NLP community and has become a standard for many applications.",
    max_length=50,
    min_length=25,
    do_sample=False
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

Device set to use cuda:0


[{'summary_text': ' The Transformers library is an open-source library for natural language processing (NLP) tasks . It provides pre-trained models and tools for tasks such as text classification, question answering, and text generation .'}]

In [None]:
import sounddevice as sd
from scipy.io.wavfile import write
from transformers import pipeline

# Parameters
fs = 16000  # Sampling rate expected by Whisper
seconds = 5  # Duration to record

audio = sd.rec(int(seconds * fs), samplerate=fs, channels=1)
sd.wait()

# Save to a temporary WAV file
wav_path = "mic_input.wav"
write(wav_path, fs, audio)

# Load Whisper ASR pipeline
transcriber = pipeline(
    task="automatic-speech-recognition", model="openai/whisper-base.en"
)

# Transcribe audio file
result = transcriber(wav_path)

print("Transcription:")
print(result['text'])

Recording...
Recording complete!


Device set to use cuda:0


Transcription:
 So this is testing how in phrase text to speech feature.
