# Day-53: Named Entity Recongnition (NER) + POS Tagging

We've mastered turning words into meaning-rich vectors. Now, we shift our focus from individual words to sentence structure and information extraction. Today, we're diving into the essential structural analysis tools of NLP: Part-of-Speech (POS) Tagging and Named Entity Recognition (NER), primarily using the industry-leading library, spaCy.

## Topic Covered:

## Part-of-Speech (POS) Tagging: The Grammar Check

POS Tagging is the process of marking up a word in a text as corresponding to a particular part of speech, based on both its definition and its context.

- `How it Works`: Every word is labeled with its grammatical role: noun, verb, adjective, adverb, pronoun, preposition, etc. This is crucial for understanding relationships between words and for more advanced techniques like lemmatization (which needs the POS tag to correctly find the base form).

- `Analogy`: Grammar School. It's like having a grammar teacher go through your sentence and label every word's function.

- `Example`:

    - `Sentence`: "Apple stock soared yesterday."

        - `POS Tags`:

            "Apple" → Noun (NNP - Proper Noun)

            "stock" → Noun (NN - Noun, singular)

            "soared" → Verb (VBD - Verb, past tense)

            "yesterday" → Noun (NN - Noun, singular)

## Named Entity Recognition (NER): Extracting Key Information 

NER is the process of automatically locating and classifying key elements in text into pre-defined categories such as person names, organizations, locations, quantities, monetary values, and dates.

- `How it Works`: NER models use context to identify spans of text that correspond to "named entities." This is essential for building knowledge graphs, automating customer support, and summarizing large documents.

- `Analogy` : The Highlighter. It's like automatically highlighting the most important facts (names, dates, places) in a long document.

- `Example (Common spaCy Entity Types)`:

        ORG: Organization (e.g., Google, Tesla)

        PERSON: People (e.g., Elon Musk, Taylor Swift)

        GPE: Geopolitical Entity (e.g., Paris, Germany)

        DATE: Absolute or relative dates/periods (e.g., 2025, yesterday)

- `Example NER`:

    - `Sentence`: "Tim Cook announced a new iPhone at Apple Park last Tuesday."

    - `Entities`:

        "Tim Cook" → PERSON

        "Apple Park" → FAC (Facility)

        "last Tuesday" → DATE

## The Power of spaCy

spaCy is a highly efficient and production-ready Python library built for advanced NLP tasks. Unlike NLTK (which is more academic), spaCy is designed for speed and scale, making it the industry standard for NER and POS tagging.

## Code Example: spaCy for POS Tagging and NER

In [2]:
! pip install spacy

Collecting spacy
  Downloading spacy-3.8.7-cp311-cp311-win_amd64.whl.metadata (28 kB)
Collecting spacy-legacy<3.1.0,>=3.0.11 (from spacy)
  Using cached spacy_legacy-3.0.12-py2.py3-none-any.whl.metadata (2.8 kB)
Collecting spacy-loggers<2.0.0,>=1.0.0 (from spacy)
  Using cached spacy_loggers-1.0.5-py3-none-any.whl.metadata (23 kB)
Collecting murmurhash<1.1.0,>=0.28.0 (from spacy)
  Downloading murmurhash-1.0.13-cp311-cp311-win_amd64.whl.metadata (2.2 kB)
Collecting cymem<2.1.0,>=2.0.2 (from spacy)
  Downloading cymem-2.0.11-cp311-cp311-win_amd64.whl.metadata (8.8 kB)
Collecting preshed<3.1.0,>=3.0.2 (from spacy)
  Downloading preshed-3.0.10-cp311-cp311-win_amd64.whl.metadata (2.5 kB)
Collecting thinc<8.4.0,>=8.3.4 (from spacy)
  Downloading thinc-8.3.6-cp311-cp311-win_amd64.whl.metadata (15 kB)
Collecting wasabi<1.2.0,>=0.9.1 (from spacy)
  Downloading wasabi-1.1.3-py3-none-any.whl.metadata (28 kB)
Collecting srsly<3.0.0,>=2.4.3 (from spacy)
  Downloading srsly-2.5.1-cp311-cp311-win_am

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gensim 4.3.3 requires numpy<2.0,>=1.18.5, but you have numpy 2.3.3 which is incompatible.
numba 0.61.2 requires numpy<2.3,>=1.24, but you have numpy 2.3.3 which is incompatible.
scipy 1.13.1 requires numpy<2.3,>=1.22.4, but you have numpy 2.3.3 which is incompatible.


In [3]:
! python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     --------------------------------------  12.6/12.8 MB 65.5 MB/s eta 0:00:01
     ---------------------------------------- 12.8/12.8 MB 57.3 MB/s  0:00:00
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-3.8.0
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [4]:
import spacy

# Load the small English language model
try:
    nlp = spacy.load("en_core_web_sm")
except OSError:
    print("Downloading spaCy model. Please run 'python -m spacy download en_core_web_sm' once.")
    # Fallback to loading it after assuming download is handled outside notebook

# Sample text for analysis
text = "Google announced that CEO Sundar Pichai visited London on Monday to discuss a $1 billion investment."

# Process the text with the spaCy pipeline
doc = nlp(text)

# --- 1. PART-OF-SPEECH (POS) TAGGING ---
print("--- POS TAGGING ---")
for token in doc:
    # token.text: The original word
    # token.pos_: The simple POS tag (e.g., NOUN, VERB)
    # token.tag_: The detailed POS tag (e.g., NNP, VBD)
    print(f"{token.text:<10} | {token.pos_:<8} | {token.tag_:<5}")

# --- 2. NAMED ENTITY RECOGNITION (NER) ---
print("\n--- NAMED ENTITY RECOGNITION (NER) ---")
for ent in doc.ents:
    # ent.text: The actual entity text
    # ent.label_: The type of entity
    print(f"Entity: {ent.text:<20} | Type: {ent.label_:<10} | Explanation: {spacy.explain(ent.label_)}")

--- POS TAGGING ---
Google     | PROPN    | NNP  
announced  | VERB     | VBD  
that       | SCONJ    | IN   
CEO        | PROPN    | NNP  
Sundar     | PROPN    | NNP  
Pichai     | PROPN    | NNP  
visited    | VERB     | VBD  
London     | PROPN    | NNP  
on         | ADP      | IN   
Monday     | PROPN    | NNP  
to         | PART     | TO   
discuss    | VERB     | VB   
a          | DET      | DT   
$          | SYM      | $    
1          | NUM      | CD   
billion    | NUM      | CD   
investment | NOUN     | NN   
.          | PUNCT    | .    

--- NAMED ENTITY RECOGNITION (NER) ---
Entity: Google               | Type: ORG        | Explanation: Companies, agencies, institutions, etc.
Entity: Sundar Pichai        | Type: PERSON     | Explanation: People, including fictional
Entity: London               | Type: GPE        | Explanation: Countries, cities, states
Entity: Monday               | Type: DATE       | Explanation: Absolute or relative dates or periods
Entity: $1 billi

## Summary of Day 53

Today, you learned how to structurally analyze text using spaCy. POS Tagging determines the grammatical function of every word, while NER identifies and classifies critical real-world entities like people, organizations, and monetary values. This structural understanding is vital for advanced tasks like automated content classification and question-answering systems.

## What's Next (Day 54)

Now that you can extract both semantic meaning (Word2Vec) and structural information (NER), the next logical step is to analyze the feeling behind the words! Tomorrow, on Day 54, we'll dive into Sentiment Analysis, learning how to use libraries like VADER and build simple logistic regression models to detect the polarity (positive/negative/neutral) of text data.