# Extracting Real-World Information: Named Entity Recognition
So far, we have analysed text by looking at individual words or overall themes. __Named Entity Regcognition__ is different. It's a task focused on extracting specific, named 'entities' from text. Think of it like a textual detective looking for clues:
- PERSON: Peoples names
- ORG: Organisation, companies, institutions
- GPE: Geopolitical Entities (Countries, Cities, States)
- DATE: Dates and time periods

To do this, we will introduce a new library: spaCY. While NTLK is great toolkit for learning and research, spaCY is designed for high-performance, production-level NLP. 

## Setup: Installing and Loading spaCY

In [None]:
# Install the spaCy library
!pip install spacy

# Download the small English language model
!python -m spacy download en_core_web_sm

# Import spaCy and our usual pandas
import spacy
import pandas as pd

# Load the spaCy model. The 'nlp' object is our gateway to all of spaCy's power.
nlp = spacy.load("en_core_web_sm")

# Load our BBC dataset
url = 'https://storage.googleapis.com/dataset-uploader/bbc/bbc-text.csv'
bbc_df = pd.read_csv(url)

print("Setup complete. spaCy is ready.")

## Processing Text and Accessing Entities
Using spaCY is simple. You pass a string of text to the nlp object, and it returns a processed Doc object. The Doc object contains a huge amount of information, including the named entities. 

In [None]:
# Get the text of the first article in the 'politics' category
politics_article = bbc_df[bbc_df['category'] == 'politics']['text'].iloc[0]

# Process the text with the nlp object
doc = nlp(politics_article)

# The entities are stored in the 'ents' attribute of the doc
# Let's loop through them and print the entity text and its label
print("Entities found in the article:")
for ent in doc.ents:
    print(f"- {ent.text} ({ent.label_})")

## Understanding and visual entities
What do labels like "GPE" or "NORP" mean? We can use spacy.explain() to find out. 

In [None]:
print(f"GPE: {spacy.explain('GPE')}")
print(f"ORG: {spacy.explain('ORG')}")
print(f"PERSON: {spacy.explain('PERSON')}")
print(f"DATE: {spacy.explain('DATE')}")

spaCy's best feature is its built-in visualiser, displacy. It can generate a beautiful, colour-coded visualisation of the entities right in our notebook

In [None]:
from spacy import displacy

# Use displacy to render the entities in our processed doc
# jupyter=True tells it to render directly in the notebook
displacy.render(doc, style="ent", jupyter=True)

## Exercise
The task is to find all organisations in the first article from 'tech'
1. Select the text of the first article when the category is 'tech'
2. Process it with the nlp object create a doc
3. Create an empty list called tech_organisation 
4. Loop thorugh the enntities in the doc (doc.ents). If any entities labels is 'ORG', append its text (ent.text) to your list
5. Print the final list of organisation names

In [None]:
# Your  code for the exercise here