**Introduction**

Natural Language Processing (NLP) is a powerful field of Artificial Intelligence that enables computers to understand, interpret, and generate human language. One of the key applications of NLP is Named Entity Recognition (NER) — the process of automatically identifying and classifying key information such as names of people, organizations, dates, and locations from text data.

In this project, SpaCy’s pre-trained English language model (en_core_web_sm) is used to analyze and extract named entities from real-world text content. The goal is to transform unstructured news text into structured information that can be easily interpreted and analyzed. By leveraging SpaCy’s efficient NLP pipeline, the project demonstrates how NER can be applied in domains such as news analysis, information retrieval, and data-driven decision-making.

**Installing Libraries**

In [1]:
!pip install spacy
!pip install nltk
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m98.6 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


**Importing and Loading data**

In [2]:
import pandas as pd
import spacy
import requests
from bs4 import BeautifulSoup
nlp = spacy.load("en_core_web_sm")
pd.set_option("display.max_rows", 200)

**Applying NER to a Sample Text**

In [3]:
content = "Trinamool Congress leader Mahua Moitra has moved the Supreme Court against her expulsion from the Lok Sabha over the cash-for-query allegations against her. Moitra was ousted from the Parliament last week after the Ethics Committee of the Lok Sabha found her guilty of jeopardising national security by sharing her parliamentary portal's login credentials with businessman Darshan Hiranandani."
doc = nlp(content)
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

Trinamool Congress 0 18 ORG
Mahua Moitra 26 38 PERSON
the Supreme Court 49 66 ORG
Moitra 157 163 NORP
Parliament 184 194 ORG
last week 195 204 DATE
the Ethics Committee 211 231 ORG
Darshan Hiranandani 373 392 PERSON


**Visualizing Entities**

In [4]:
from spacy import displacy
displacy.render(doc, style="ent")

**Creating a DataFrame for Entities**

In [5]:
entities = [(ent.text, ent.label_, ent.lemma_) for ent in doc.ents]
df = pd.DataFrame(entities, columns=['text', 'type', 'lemma'])
print(df)

                   text    type                 lemma
0    Trinamool Congress     ORG    Trinamool Congress
1          Mahua Moitra  PERSON          Mahua Moitra
2     the Supreme Court     ORG     the Supreme Court
3                Moitra    NORP                Moitra
4            Parliament     ORG            Parliament
5             last week    DATE             last week
6  the Ethics Committee     ORG  the Ethics Committee
7   Darshan Hiranandani  PERSON   Darshan Hiranandani


**Conclusion**

In this project, we successfully demonstrated how Natural Language Processing (NLP) techniques can be applied to extract meaningful information from unstructured text data using SpaCy’s Named Entity Recognition (NER) model. The system accurately identified and categorized entities such as persons, organizations, dates, and locations from real-world news content.

By converting text into structured data through entity extraction, this project showcases how NLP can be used to automate information retrieval and analysis tasks in journalism, business intelligence, and data analytics. Overall, this implementation highlights the power of SpaCy as an efficient and reliable tool for text analysis, laying the foundation for more advanced applications such as sentiment analysis, document classification, and knowledge graph construction.