In [16]:
# Setting up the environment for text generation using transformers
# This script checks the PyTorch version and GPU availability
# and defines a function to generate text using a pre-trained model. 
import torch
print(torch.__version__)
print(torch.cuda.is_available())
print(torch.cuda.get_device_name(0) if torch.cuda.is_available() else "No GPU available")

2.5.1+cu121
True
NVIDIA GeForce RTX 4060 Laptop GPU


In [17]:
# Text Generation Function
from transformers import pipeline
# Load the text generation pipeline
model = pipeline("text-generation")
def generate_text(prompt,max_length=50):
    result = model(prompt, max_length=max_length)
    return result[0]['generated_text']

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


In [18]:
text = "Supernova is the one of the"
result = generate_text(text, max_length=25)
print(result)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=25) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Supernova is the one of the brightest galaxies in the Universe.

This galaxy is the first in the constellation of galaxies known to have a unique optical composition with an extraordinarily long period of intense light.

The Hubble Space Telescope has been studying the galaxy for nearly 40 years, and it has observed its light and made pictures of the galaxy.

Scientists have identified the first visible light from the galaxy in the past 100 years.

The Hubble telescope is a joint project of the European Space Agency and the National Science Foundation. It uses the Hubble Space Telescope to study the composition of the galaxy.

Explore further: Hubble's most powerful telescope ever discovered


In [20]:
from transformers import pipeline
import torch

device = 0 if torch.cuda.is_available() else -1

summarizer = pipeline(
    "summarization",
    model="facebook/bart-large-cnn",
    device=device
)

def summarize_text(text, max_length=130, min_length=40):
    summary = summarizer(
        text,
        max_length=max_length,
        min_length=min_length,
        do_sample=False,
        truncation=True
    )
    return summary[0]['summary_text']

result_summary = summarize_text(result)
print(result_summary)


Device set to use cuda:0
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


The Hubble Space Telescope has been studying the galaxy for nearly 40 years. It has observed its light and made pictures of the galaxy. Supernova is the one of the brightest galaxies in the Universe.


In [21]:
# named entity recognition
ner_pipeline = pipeline("ner", grouped_entities=True, device=device)

# grouped_entities=True to group tokens into entities if false each token is treated separately
# Advantages of grouped_entities=True:
# 1. Improved Readability: Grouped entities provide a clearer and more concise representation of
#    named entities in the text, making it easier to understand the context.
# 2. Reduced Redundancy: By grouping tokens into entities, it reduces redundancy in the output,
#    avoiding multiple entries for the same entity.
# 3. Better Contextual Understanding: Grouping helps in capturing the full context of an entity,
#    which can be crucial for accurate interpretation and analysis.

def recognize_entities(text):
    entities = ner_pipeline(text)
    return entities

entities_result = recognize_entities(result)
print(entities_result)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


[{'entity_group': 'MISC', 'score': np.float32(0.38851184), 'word': 'Super', 'start': 0, 'end': 5}, {'entity_group': 'MISC', 'score': np.float32(0.8392791), 'word': 'Universe', 'start': 54, 'end': 62}, {'entity_group': 'ORG', 'score': np.float32(0.79329664), 'word': 'Hubble Space Telescope', 'start': 225, 'end': 247}, {'entity_group': 'MISC', 'score': np.float32(0.71022725), 'word': 'Hu', 'start': 457, 'end': 459}, {'entity_group': 'ORG', 'score': np.float32(0.8357145), 'word': '##bble', 'start': 459, 'end': 463}, {'entity_group': 'ORG', 'score': np.float32(0.99656385), 'word': 'European Space Agency', 'start': 500, 'end': 521}, {'entity_group': 'ORG', 'score': np.float32(0.99839514), 'word': 'National Science Foundation', 'start': 530, 'end': 557}, {'entity_group': 'MISC', 'score': np.float32(0.70515746), 'word': 'Hu', 'start': 571, 'end': 573}, {'entity_group': 'ORG', 'score': np.float32(0.5874928), 'word': '##bble Space Tel', 'start': 573, 'end': 587}, {'entity_group': 'MISC', 'score

