# Week 2: NLP Tasks and Tokenization Techniques

Applied Learning Assignments 1:

● Select an NLP task (e.g., NER, sentiment analysis, or text summarization).

● Use a library like spaCy, NLTK, or Hugging Face Transformers to implement
the task.

● Process a sample dataset, such as extracting named entities from text,
summarizing multiple documents, or classifying text by sentiment.

● Write a brief report detailing:

➢ The selected task.

➢ The steps of implementation.

➢ The observed results.


In [1]:
# Install spaCy: !pip install spacy
import spacy

# Load pre-trained spaCy model
nlp = spacy.load("en_core_web_sm")

# Input text
text = "Nunsi Shiaki was born on August 4, 1961, in Gboko Benue State, Niegria."

# Process text
doc = nlp(text)

# Extract entities
print("Entities and their labels:")
for ent in doc.ents:
    print(f"{ent.text} -> {ent.label_}")

Entities and their labels:
Nunsi Shiaki -> PERSON
August 4, 1961 -> DATE
Gboko Benue State -> FAC
Niegria -> GPE


➢ Selected Task:
The task performed was Named Entity Recognition (NER) using the spaCy NLP library. NER involves identifying and classifying key pieces of information (entities) in text into predefined categories such as person names, locations, organizations, dates, etc.

➢ Steps of Implementation:
Installation and Import:

Installed the spaCy library.

Imported spaCy into the Python environment.

Model Loading:

Loaded the pre-trained English model en_core_web_sm provided by spaCy.

Text Input:

Defined an input sentence:
"Nunsi Shiaki was born on August 4, 1961, in Gboko Benue State, Niegria."

Text Processing:

Processed the text using the loaded NLP model to generate a doc object, which includes linguistic annotations.

Entity Extraction:

Iterated through the recognized entities in the doc object.

Printed each entity along with its label (e.g., PERSON, DATE, GPE).

➢ Observed Results:
When the code was executed, spaCy extracted and classified entities from the input sentence but failed to recognize that ```Gboko Benue State``` is a GPE not FAC

In [1]:
# Install Hugging Face Transformers: !pip install transformers
from transformers import pipeline

# Load pre-trained sentiment analysis pipeline
sentiment_pipeline = pipeline("sentiment-analysis")

# Input text
text = ["I love school!", "I don't really like attending classes."]

# Perform sentiment analysis
results = sentiment_pipeline(text)
for i, result in enumerate(results):
    print(f"Text: {text[i]}")
    print(f"Sentiment: {result['label']}, Score: {result['score']:.4f}")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

Device set to use cpu


Text: I love this product!
Sentiment: POSITIVE, Score: 0.9999
Text: I hate waiting in line.
Sentiment: NEGATIVE, Score: 0.9975


In [2]:
# Input text
text = ["I love school!", "I don't really like attending classes."]

# Perform sentiment analysis
results = sentiment_pipeline(text)
for i, result in enumerate(results):
    print(f"Text: {text[i]}")
    print(f"Sentiment: {result['label']}, Score: {result['score']:.4f}")

Text: I love school!
Sentiment: POSITIVE, Score: 0.9999
Text: I don't really like attending classes.
Sentiment: NEGATIVE, Score: 0.9988


➢ Steps of Implementation:
Installation and Import:

Installed the transformers library using !pip install transformers.

Imported the pipeline utility from the transformers module.

Model Loading:

Initialized a pre-trained sentiment analysis pipeline.

This automatically loads a fine-tuned model (typically distilbert-base-uncased-finetuned-sst-2-english) for classifying sentiment in English text.

Input Text:

Defined a list of sample sentences:

"I love this school!"

"I don't really like attending classes."

Performing Sentiment Analysis:

Passed the list of texts into the pipeline to analyze their sentiments.

The pipeline returned a list of results, each containing:

A label (POSITIVE or NEGATIVE)

A confidence score (probability)

Displaying Results:

Used a for loop to print each input text alongside its predicted sentiment and the associated confidence score (formatted to 4 decimal places).

➢ Observed Results: The results were accurate