# [Huggingface Pipeline](https://huggingface.co/docs/transformers/en/index)

Running a Hugging Face pipeline locally on your PC allows you to leverage state-of-the-art models for a variety of tasks, such as text classification, question answering, text generation, and more, without the need for deep knowledge in natural language processing or machine learning model architectures. Here's a step-by-step guide to get you started:

The first time you run a pipeline for a specific model, the `transformers` library will download the necessary model and tokenizer files, which might take some time depending on your internet connection. Subsequent uses of the same model will be much faster since the model will be cached locally.

### Considerations:

- **Model Download Size:** Some models can be quite large, requiring significant disk space and bandwidth to download.
- **Computational Resources:** Running large models, especially for tasks like text generation or deep learning-based analysis, can be resource-intensive. Ensure your PC has adequate memory and, if supported, a compatible GPU to accelerate computations.
- **Environment Management:** Consider using a virtual environment (e.g., via `venv` or `conda`) to manage dependencies and avoid conflicts between different projects.

By following these steps, you'll be able to run Hugging Face pipelines locally on your PC, enabling access to a wide range of pre-trained models for various natural language processing tasks.

## Install the following libraries
`!pip install transformers`


In [2]:
from transformers import pipeline

# Load a pipeline for sentiment analysis
classifier = pipeline('sentiment-analysis')

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

2024-02-19 15:11:09.382802: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M1 Max
2024-02-19 15:11:09.382832: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 32.00 GB
2024-02-19 15:11:09.382841: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 10.67 GB
2024-02-19 15:11:09.383203: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:303] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-02-19 15:11:09.383494: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:269] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
All PyTorch model weights were used when initializing TFDistilBertForSequenceClassification.

All the weights of TFDistilBertForSequenceClassification were initialized from the PyTo

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [3]:
# Use the pipeline to classify the sentiment of a sentence
result = classifier("I love using natural language processing tools!")
print(result)

[{'label': 'POSITIVE', 'score': 0.9996317625045776}]


In [4]:
# explore text generation
generator = pipeline('text-generation')
result = generator("In this course, we will teach you how to")
print(result)

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

All PyTorch model weights were used when initializing TFGPT2LMHeadModel.

All the weights of TFGPT2LMHeadModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to use Google+ in a project to create an idea.\n\nGoogle+)\n\nLet us start with the simplest possible scenario with the Google+ project.\n\nCreate a photo and upload it to'}]


In [5]:
# question answering
question_answerer = pipeline('question-answering')
context = "The name of the course is Natural Language Processing"
result = question_answerer(question="What is the name of the course to learn Generative AI?", context=context)
print(result)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

All PyTorch model weights were used when initializing TFDistilBertForQuestionAnswering.

All the weights of TFDistilBertForQuestionAnswering were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForQuestionAnswering for predictions without further training.


tokenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

{'score': 0.9824060797691345, 'start': 26, 'end': 53, 'answer': 'Natural Language Processing'}


In [6]:
# ner (named entity recognition) pipeline
ner = pipeline('ner')
result = ner("The course is taught by Dr. Muhammad Aammar Tufail")
print(result)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

All PyTorch model weights were used when initializing TFBertForTokenClassification.

All the weights of TFBertForTokenClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForTokenClassification for predictions without further training.


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

[{'entity': 'I-PER', 'score': 0.99917185, 'index': 8, 'word': 'Muhammad', 'start': 28, 'end': 36}, {'entity': 'I-PER', 'score': 0.99955934, 'index': 9, 'word': 'A', 'start': 37, 'end': 38}, {'entity': 'I-PER', 'score': 0.9983216, 'index': 10, 'word': '##am', 'start': 38, 'end': 40}, {'entity': 'I-PER', 'score': 0.9992617, 'index': 11, 'word': '##mar', 'start': 40, 'end': 43}, {'entity': 'I-PER', 'score': 0.9995291, 'index': 12, 'word': 'Tu', 'start': 44, 'end': 46}, {'entity': 'I-PER', 'score': 0.9364724, 'index': 13, 'word': '##fa', 'start': 46, 'end': 48}, {'entity': 'I-PER', 'score': 0.9878686, 'index': 14, 'word': '##il', 'start': 48, 'end': 50}]


In [7]:
# print the result in a more readable format
for entity in result:
    print(f"{entity['entity']} : {entity['word']}")

# I-PER is the tag for a person's name in the dataset used by the NER pipeline.

I-PER : Muhammad
I-PER : A
I-PER : ##am
I-PER : ##mar
I-PER : Tu
I-PER : ##fa
I-PER : ##il
