# Transformer-Based Sentiment Analysis, NER, and Text Generation


In this notebook, we will implement a transformer-based model using the Hugging Face `transformers` library to classify the sentiment of a movie review, recognize and extract named entities (like movie names, people, places), and generate a creative continuation of a story given a starting prompt.


In [1]:

# Install the required libraries
!pip install transformers datasets torch --quiet


[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/471.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m471.0/471.6 kB[0m [31m22.6 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m471.6/471.6 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/116.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:

from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification, AutoModelForTokenClassification, GPT2LMHeadModel, GPT2Tokenizer
import torch


## Initialize Pipelines for Sentiment Analysis, NER, and Text Generation

In [3]:

# Sentiment Analysis Pipeline
sentiment_classifier = pipeline("sentiment-analysis")

# Named Entity Recognition Pipeline
ner_model = AutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")
ner_tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")
ner_pipeline = pipeline("ner", model=ner_model, tokenizer=ner_tokenizer)

# Text Generation Pipeline (using GPT-2)
text_generator_model = GPT2LMHeadModel.from_pretrained("gpt2")
text_generator_tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
text_generator = pipeline("text-generation", model=text_generator_model, tokenizer=text_generator_tokenizer)


No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]



config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

## Function to Perform All Three Tasks on Input Text

In [4]:

def analyze_text(input_text):
    # Sentiment Analysis
    sentiment_result = sentiment_classifier(input_text)

    # Named Entity Recognition (NER)
    ner_results = ner_pipeline(input_text)

    # Text Generation (generate creative continuation)
    generated_text = text_generator(input_text, max_length=50, num_return_sequences=1)[0]['generated_text']

    # Return the results
    return {
        "Sentiment Analysis": sentiment_result,
        "Named Entities": ner_results,
        "Generated Text": generated_text
    }


## Example Usage

In [5]:

input_text = "Leonardo DiCaprio gave an amazing performance in Inception. The movie was mind-blowing!"

# Get results
results = analyze_text(input_text)

# Print results
print("Sentiment Analysis:", results["Sentiment Analysis"])
print("Named Entities:", results["Named Entities"])
print("Generated Text:", results["Generated Text"])


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Sentiment Analysis: [{'label': 'POSITIVE', 'score': 0.9998125433921814}]
Named Entities: [{'entity': 'I-PER', 'score': 0.9986952, 'index': 1, 'word': 'Leonardo', 'start': 0, 'end': 8}, {'entity': 'I-PER', 'score': 0.9974044, 'index': 2, 'word': 'Di', 'start': 9, 'end': 11}, {'entity': 'I-PER', 'score': 0.98325425, 'index': 3, 'word': '##C', 'start': 11, 'end': 12}, {'entity': 'I-PER', 'score': 0.72818923, 'index': 4, 'word': '##ap', 'start': 12, 'end': 14}, {'entity': 'I-PER', 'score': 0.9922174, 'index': 5, 'word': '##rio', 'start': 14, 'end': 17}, {'entity': 'I-MISC', 'score': 0.9970776, 'index': 11, 'word': 'Inc', 'start': 49, 'end': 52}, {'entity': 'I-MISC', 'score': 0.9951396, 'index': 12, 'word': '##ept', 'start': 52, 'end': 55}, {'entity': 'I-MISC', 'score': 0.9920169, 'index': 13, 'word': '##ion', 'start': 55, 'end': 58}]
Generated Text: Leonardo DiCaprio gave an amazing performance in Inception. The movie was mind-blowing!

The New York Times Book Review wrote that he received