### `spaCy` Use Case: Named Entity Recognition (NER)
Extract named entities (e.g., person names, organizations, locations) from a text using spaCy's pre-trained models.

Code Example:

In [2]:
# !pip install spacy

In [1]:
import spacy

# Load a pre-trained spaCy model for English (en_core_web_sm)
nlp = spacy.load("en_core_web_sm")

# Example text for Named Entity Recognition (NER)
text = """
Apple is looking at buying U.K. startup for $1 billion. 
Tim Cook, the CEO of Apple, confirmed the deal in San Francisco last week.
"""

# Process the text
doc = nlp(text)

# Extract and print named entities
print("Named Entities, Phrases, and Concepts:")
for entity in doc.ents:
    print(f"{entity.text} ({entity.label_})")


Named Entities, Phrases, and Concepts:
Apple (ORG)
U.K. (GPE)
$1 billion (MONEY)
Tim Cook (PERSON)
Apple (ORG)
San Francisco (GPE)
last week (DATE)


Explanation:

* We load a small pre-trained model (en_core_web_sm), which has been trained for tasks like part-of-speech tagging, named entity recognition (NER), etc.
* The doc.ents gives us the named entities identified in the text, such as organizations, locations, dates, etc.


### `spaCy` Use Case: Dependency Parsing
Use Case: Extract syntactic relationships between words in a sentence (e.g., subject, verb, object) using spaCy’s dependency parsing.

Code Example:

In [4]:
import spacy

# Load the English model for dependency parsing
nlp = spacy.load("en_core_web_sm")

# Example sentence
sentence = "The cat sat on the mat."

# Process the sentence
doc = nlp(sentence)

# Display syntactic dependencies
print(f"{'Word':<12} {'Dependency':<15} {'Head Word'}")
for token in doc:
    print(f"{token.text:<12} {token.dep_:<15} {token.head.text}")


Word         Dependency      Head Word
The          det             cat
cat          nsubj           sat
sat          ROOT            sat
on           prep            sat
the          det             mat
mat          pobj            on
.            punct           sat


Explanation:

* In dependency parsing, each word in the sentence is linked to a "head" word and has a syntactic role (like subject, verb, object).
* The above code prints the word, its syntactic dependency, and the word it depends on (head word).

### `spaCy` Use Case: Part-of-Speech Tagging
Use Case: Identify the part of speech (e.g., noun, verb, adjective) for each word in a sentence using spaCy.

Code Example:


In [6]:
import spacy

# Load the spaCy model for English
nlp = spacy.load("en_core_web_sm")

# Example sentence for POS tagging
sentence = "Python is an amazing programming language."

# Process the sentence
doc = nlp(sentence)

# Print each word with its part of speech
for token in doc:
    print(f"{token.text}: {token.pos_}")


Python: PROPN
is: AUX
an: DET
amazing: ADJ
programming: NOUN
language: NOUN
.: PUNCT


Explanation:

* The `pos_` attribute gives the part of speech for each token (e.g., `NOUN`, `VERB`, `ADJ`).
* This is useful for tasks like syntactic analysis, text classification, or parsing.


### Hugging Face Transformers Use Case: Sentiment Analysis
Use Case: Perform sentiment analysis (e.g., determine whether a sentence is positive, negative, or neutral) using a pre-trained model from Hugging Face.

Code Example:

In [3]:
from transformers import pipeline

# Load a pre-trained sentiment-analysis pipeline from Hugging Face
sentiment_analyzer = pipeline("sentiment-analysis")

# Example text for sentiment analysis
text = "I love working with Python, it makes development fun and easy!"

# Analyze sentiment
result = sentiment_analyzer(text)

# Print the result
print(result)


  from .autonotebook import tqdm as notebook_tqdm
No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


[{'label': 'POSITIVE', 'score': 0.9997923970222473}]


Explanation:

* We use the pipeline function to easily load a pre-trained sentiment analysis model from Hugging Face. The model returns whether the sentiment is POSITIVE or NEGATIVE along with a confidence score.
* This pipeline makes it easy to perform sentiment analysis without manually handling the model and tokenization process.

### Hugging Face Transformers Use Case: Text Summarization
Use Case: Generate a summary for a long text using a pre-trained text summarization model from Hugging Face.

Code Example:

In [5]:
from transformers import pipeline

# Load a pre-trained summarization pipeline from Hugging Face
summarizer = pipeline("summarization")

# Example long text
text = """
The Apollo 11 mission was the first manned mission to land on the Moon. 
On July 20, 1969, American astronauts Neil Armstrong and Buzz Aldrin landed the lunar module, 
Eagle, on the surface of the Moon while Michael Collins remained in orbit around the Moon. 
Neil Armstrong became the first human to set foot on the Moon, followed shortly by Buzz Aldrin. 
The event was broadcast to an audience of millions, and it marked a significant milestone in the history of space exploration.
"""

# Summarize the text
summary = summarizer(text, max_length=100, min_length=50, do_sample=False)

# Print the summary
print(summary[0]['summary_text'])


No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


 The Apollo 11 mission was the first manned mission to land on the Moon . American astronauts Neil Armstrong and Buzz Aldrin landed the lunar module on July 20, 1969 . The event was broadcast to an audience of millions, and it marked a significant milestone in space exploration .


Explanation:

* The `summarization` pipeline from Hugging Face is used here to reduce the length of a long text while retaining the main points.
* The `max_length` and min_length parameters control the length of the summary. The do_sample=False ensures deterministic summarization (no randomness).


### Hugging Face Transformers Use Case: Named Entity Recognition (NER)
Use Case: Extract named entities (like locations, organizations, people) from text using a Hugging Face model for NER.

Code Example:

In [8]:
from transformers import pipeline

# Load a pre-trained NER pipeline from Hugging Face
ner_model = pipeline("ner")

# Example text for NER
text = "Barack Obama was born in Honolulu, Hawaii, and served as the 44th President of the United States."

# Perform NER on the text
entities = ner_model(text)

# Print the named entities
for entity in entities:
    print(entity)


No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Hardware accelerator e.g. GPU is

{'entity': 'I-PER', 'score': np.float32(0.9989973), 'index': 1, 'word': 'Barack', 'start': 0, 'end': 6}
{'entity': 'I-PER', 'score': np.float32(0.99942195), 'index': 2, 'word': 'Obama', 'start': 7, 'end': 12}
{'entity': 'I-LOC', 'score': np.float32(0.99836236), 'index': 6, 'word': 'Honolulu', 'start': 25, 'end': 33}
{'entity': 'I-LOC', 'score': np.float32(0.9995079), 'index': 8, 'word': 'Hawaii', 'start': 35, 'end': 41}
{'entity': 'I-LOC', 'score': np.float32(0.99870396), 'index': 19, 'word': 'United', 'start': 83, 'end': 89}
{'entity': 'I-LOC', 'score': np.float32(0.9919043), 'index': 20, 'word': 'States', 'start': 90, 'end': 96}
