# Named Entity Recognition (NER) and POS Tagging with spaCy

## üìö Learning Objectives

By completing this notebook, you will:
- Implement NER using spaCy
- Perform POS tagging using spaCy
- Extract named entities from text
- Analyze and visualize NER results

## üîó Prerequisites

- ‚úÖ Unit 2: Text representation completed
- ‚úÖ Understanding of NLP fundamentals
- ‚úÖ Python, spaCy knowledge

---

## Official Structure Reference

This notebook covers practical activities from **Course 07, Unit 3**:
- Implementing NER and POS tagging using spaCy
- **Source:** `DETAILED_UNIT_DESCRIPTIONS.md` - Unit 3 Practical Content

---

## Introduction

**Named Entity Recognition (NER)** identifies and classifies named entities (persons, organizations, locations, etc.) in text. **POS Tagging** assigns part-of-speech tags (noun, verb, adjective, etc.) to words.

In [1]:
import numpy as np
import pandas as pd
from collections import Counter

# Try importing spaCy
try:
    import spacy
    HAS_SPACY = True
    try:
        nlp = spacy.load("en_core_web_sm")
        print("‚úÖ spaCy English model loaded!")
    except OSError:
        print("‚ö†Ô∏è  spaCy model not found. Download with: python -m spacy download en_core_web_sm")
        print("   Using basic spaCy without model...")
        nlp = None
except ImportError:
    HAS_SPACY = False
    nlp = None
    print("‚ö†Ô∏è  spaCy not available. Install with: pip install spacy")

print("\n‚úÖ Libraries imported!")

‚ö†Ô∏è  spaCy model not found. Download with: python -m spacy download en_core_web_sm
   Using basic spaCy without model...

‚úÖ Libraries imported!


## Part 1: Named Entity Recognition (NER) with spaCy


In [2]:
if HAS_SPACY and nlp is not None:
    # Sample text for NER
    text = """
    Apple Inc. is an American multinational technology company headquartered in 
    Cupertino, California. Tim Cook is the CEO of Apple. The company was founded 
    by Steve Jobs in 1976. Microsoft Corporation, founded by Bill Gates, is 
    another major technology company based in Redmond, Washington.
    """
    
    print("=" * 60)
    print("Named Entity Recognition (NER)")
    print("=" * 60)
    print(f"\nInput text:\n{text.strip()}\n")
    
    # Process text with spaCy
    doc = nlp(text)
    
    # Extract named entities
    print("Named Entities Found:")
    print("-" * 60)
    entities = []
    for ent in doc.ents:
        entities.append({
            'Text': ent.text, 'Label': ent.label_,
            'Description': spacy.explain(ent.label_)
        })
        print(f"{ent.text:20s} | {ent.label_:15s} | {spacy.explain(ent.label_)}")
    
    print(f"\nTotal entities found: {len(entities)}")
    
    # Count by entity type
    entity_counts = Counter([ent['Label'] for ent in entities])
    print("\nEntity type distribution:")
    for label, count in entity_counts.items():
        print(f"  {label}: {count}")
else:
    print("=" * 60)
    print("NER with spaCy (Installation Required)")
    print("=" * 60)
    print("""
    To use spaCy for NER:
    
    1. Install spaCy:
       pip install spacy
    
    2. Download English model:
       python -m spacy download en_core_web_sm
    
    3. Use spaCy:
       import spacy
       nlp = spacy.load("en_core_web_sm")
       doc = nlp(text)
       for ent in doc.ents:
           print(ent.text, ent.label_)
    """)


NER with spaCy (Installation Required)

    To use spaCy for NER:
    
    1. Install spaCy:
       pip install spacy
    
    2. Download English model:
       python -m spacy download en_core_web_sm
    
    3. Use spaCy:
       import spacy
       nlp = spacy.load("en_core_web_sm")
       doc = nlp(text)
       for ent in doc.ents:
           print(ent.text, ent.label_)
    


## Part 2: Part-of-Speech (POS) Tagging with spaCy


In [3]:
if HAS_SPACY and nlp is not None:
    # Sample sentence for POS tagging
    sentence = "Natural language processing helps computers understand human language."
    
    print("=" * 60)
    print("Part-of-Speech (POS) Tagging")
    print("=" * 60)
    print(f"\nInput sentence: {sentence}\n")
    
    # Process sentence
    doc = nlp(sentence)
    
    # Extract POS tags
    print("POS Tags:")
    print("-" * 60)
    pos_data = []
    for token in doc:
        pos_data.append({
            'Word': token.text, 'POS': token.pos_,
            'POS_Description': spacy.explain(token.pos_),
            'Tag': token.tag_,
            'Tag_Description': spacy.explain(token.tag_)
        })
        print(f"{token.text:15s} | {token.pos_:10s} | {token.tag_:10s} | {spacy.explain(token.pos_)}")
    
    # Count POS tags
    pos_counts = Counter([token.pos_ for token in doc])
    print("\nPOS distribution:")
    for pos, count in pos_counts.items():
        print(f"  {pos}: {count}")
else:
    print("Note: Install spaCy and download model for POS tagging")


Note: Install spaCy and download model for POS tagging


## Summary

### Key Concepts:
1. **Named Entity Recognition (NER)**: Identifies and classifies entities
   - Common types: PERSON, ORGANIZATION, GPE (location), DATE, MONEY, etc.
   - Useful for information extraction, question answering, knowledge graphs

2. **Part-of-Speech (POS) Tagging**: Assigns grammatical tags to words
   - Common tags: NOUN, VERB, ADJ, ADV, PRON, DET, etc.
   - Useful for parsing, text understanding, feature engineering

3. **spaCy**: Modern NLP library with pre-trained models
   - Fast and efficient
   - Built-in NER and POS tagging
   - Supports multiple languages

### Applications:
- **NER**: Information extraction, knowledge graph construction, search
- **POS Tagging**: Text preprocessing, feature engineering, parsing

**Reference:** Course 07, Unit 3: "Machine Learning for NLP" - NER and POS tagging practical content
