## 📚 1. Installing Essential Libraries

To get started with our natural language processing (NLP) task, we first need to install the necessary Python libraries. This cell ensures we have the latest versions of the required packages.

- **`transformers`**: The core library from Hugging Face. It provides the `pipeline` API, which is a powerful tool for easily using state-of-the-art pre-trained models.
- **`sentencepiece`**: A crucial tokenization library. Many advanced models rely on it to break down text into smaller units (tokens) that the model can understand.
- **`sacremoses`**: Another important dependency for text processing, often used by `transformers` for tokenization and handling text data correctly.

In [None]:
!pip install -U transformers
!pip install -U sentencepiece
!pip install -U sacremoses

## 📂 2. Setting a Custom Cache Directory (Optional)

Hugging Face models can be very large. When you use a model for the first time, the library downloads it to a cache folder. This code block allows you to specify a custom location for this cache.

By setting the `HF_HOME` environment variable, we're telling Hugging Face to save all models and datasets to the specified path (`X:\...\models`). This is a best practice for managing disk space and keeping your projects organized, especially when you work with multiple large models.

In [None]:
import os
new_cache_dir = """X:\AI-learin\courss\Fine-Tuning-LLM-with-HuggingFace-main\models"""
os.environ['HF_HOME'] = new_cache_dir

## 📦 3. Importing Necessary Modules

With the libraries installed, we now import the specific functions we'll need for our script.

- **`pipeline` from `transformers`**: This is the high-level API we'll use to perform our NLP task. It abstracts away the complex steps of tokenization, model inference, and processing the results.
- **`pandas` as `pd`**: A fundamental library for data science in Python. We use it to display the model's output in a structured and easy-to-read table (a DataFrame).

In [None]:
from transformers import pipeline
import pandas as pd

## 🔑 4. Performing Keyphrase Extraction with a NER Pipeline

This is the core of our notebook. We are using a **Named Entity Recognition (NER)** pipeline to perform **Keyphrase Extraction**. This task is about automatically identifying the most important and representative phrases in a text, which is great for summarizing content.

1.  **Creating the Pipeline**: We initialize the `pipeline` for the `"ner"` task.
    - **`model`**: We specify `"ml6team/keyphrase-extraction-kbir-inspec"`, a model from the Hugging Face Hub that has been specifically fine-tuned to extract keyphrases.
    - **`aggregation_strategy="simple"`**: This is a key parameter. It tells the pipeline to group words that belong to the same entity. For example, instead of identifying "Keyphrase" and "extraction" as separate tokens, it combines them into a single, meaningful entity: `"Keyphrase extraction"`.

2.  **Inference**: We define a sample text and pass it to our `ner_tagger` pipeline.

3.  **Displaying Results**: The pipeline returns a list of dictionaries, each containing an extracted keyphrase (`entity_group`), its confidence score, and its position in the text. We use `pd.DataFrame()` to present this information in a clean, tabular format, making it easy to see the results at a glance.

In [None]:

model = "urchade/gliner_medium-v2.1"
model = "dslim/bert-base-NER"
model = "SamLowe/roberta-base-go_emotions"

ner_tagger = pipeline("ner", aggregation_strategy="simple", model= model)
text = "Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a document.  Thanks to these keyphrases humans can understand the content of a text very quickly and easily without reading  it completely. "

outputs = ner_tagger(text)
pd.DataFrame(outputs)