In [None]:
# Step 1: Install required libraries
!pip install transformers datasets torch

# Step 2: Import necessary modules
from transformers import pipeline

# ==============================================
# Task 1: Load and Test a Pre-Trained NLP Model
# ==============================================

# Load a text generation model (GPT-2)
generator = pipeline("text-generation", model="gpt2")

# Define a prompt
prompt = "The impact of AI on future jobs is"

# Generate text
generated_text = generator(prompt, max_length=50, num_return_sequences=1)

# Display the output
print("Generated Text:")
print(generated_text[0]['generated_text'])

# Observations:
# - The model generates coherent but sometimes unpredictable text.
# - Length constraints affect how much detail is included.

# Modify prompt and analyze results
prompt2 = "In the next 20 years, AI will"
generated_text2 = generator(prompt2, max_length=50, num_return_sequences=1)

print("\nModified Prompt Output:")
print(generated_text2[0]['generated_text'])

# ==============================================
# Task 2: Sentiment Analysis using Pre-Trained Model
# ==============================================

# Load sentiment analysis pipeline
sentiment_pipeline = pipeline("sentiment-analysis")

# Define test sentences
texts = [
    "I absolutely love working with NLP models!",
    "This assignment is too difficult and frustrating.",
    "The weather today is nice, and I feel great!"
]

# Perform sentiment analysis
results = sentiment_pipeline(texts)

# Display results
for text, result in zip(texts, results):
    print(f"\nText: {text}")
    print(f"Sentiment: {result['label']} (Score: {result['score']:.4f})")

# Observations:
# - The model classifies sentiments correctly in most cases.
# - Some neutral statements may be misclassified as positive or negative.

# ==============================================
# Task 3: Named Entity Recognition (NER)
# ==============================================

# Load NER model
ner_pipeline = pipeline("ner", model="dbmdz/bert-large-cased-finetuned-conll03-english")

# Define sample text (can be from Wikipedia or a news article)
text = "Apple Inc. was founded by Steve Jobs and Steve Wozniak in Cupertino, California."

# Perform Named Entity Recognition
entities = ner_pipeline(text)

# Display extracted entities
print("\nNamed Entities:")
for entity in entities:
    print(f"Entity: {entity['word']}, Label: {entity['entity']}")

# Observations:
# - The model successfully identifies names of people, locations, and organizations.
# - Sometimes, parts of names may be split into multiple tokens.

# ==============================================
# Conclusion:
# - Text generation produces creative content but may lack logical structure.
# - Sentiment analysis is useful but may misclassify ambiguous statements.
# - NER effectively identifies entities in text but may have minor tokenization issues.


Collecting datasets
  Downloading datasets-3.3.1-py3-none-any.whl.metadata (19 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated Text:
The impact of AI on future jobs is probably as clear as the one to which it relates.

Why is it happening, once again? In an increasingly technological-orientated age of online jobs and Facebook, companies are trying to shift the work


No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.



Modified Prompt Output:
In the next 20 years, AI will continue to make it possible to use the Internet for self-driving cars by giving it more autonomy, including by making artificial intelligence a priority over human-controlled cars.

"It's amazing to think that


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cpu



Text: I absolutely love working with NLP models!
Sentiment: POSITIVE (Score: 0.9995)

Text: This assignment is too difficult and frustrating.
Sentiment: NEGATIVE (Score: 0.9997)

Text: The weather today is nice, and I feel great!
Sentiment: POSITIVE (Score: 0.9999)


config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Device set to use cpu



Named Entities:
Entity: Apple, Label: I-ORG
Entity: Inc, Label: I-ORG
Entity: Steve, Label: I-PER
Entity: Job, Label: I-PER
Entity: ##s, Label: I-PER
Entity: Steve, Label: I-PER
Entity: W, Label: I-PER
Entity: ##oz, Label: I-PER
Entity: ##nia, Label: I-PER
Entity: ##k, Label: I-PER
Entity: Cup, Label: I-LOC
Entity: ##ert, Label: I-LOC
Entity: ##ino, Label: I-LOC
Entity: California, Label: I-LOC
