## Step 1: Installing Libraries

In [1]:
!pip install spacy
!pip install nltk
!python -m spacy download en_core_web_sm

Collecting spacy
  Downloading spacy-3.8.7-cp313-cp313-win_amd64.whl.metadata (28 kB)
Collecting spacy-legacy<3.1.0,>=3.0.11 (from spacy)
  Downloading spacy_legacy-3.0.12-py2.py3-none-any.whl.metadata (2.8 kB)
Collecting spacy-loggers<2.0.0,>=1.0.0 (from spacy)
  Downloading spacy_loggers-1.0.5-py3-none-any.whl.metadata (23 kB)
Collecting murmurhash<1.1.0,>=0.28.0 (from spacy)
  Downloading murmurhash-1.0.13-cp313-cp313-win_amd64.whl.metadata (2.2 kB)
Collecting cymem<2.1.0,>=2.0.2 (from spacy)
  Downloading cymem-2.0.11-cp313-cp313-win_amd64.whl.metadata (8.8 kB)
Collecting preshed<3.1.0,>=3.0.2 (from spacy)
  Downloading preshed-3.0.10-cp313-cp313-win_amd64.whl.metadata (2.5 kB)
Collecting thinc<8.4.0,>=8.3.4 (from spacy)
  Downloading thinc-8.3.6-cp313-cp313-win_amd64.whl.metadata (15 kB)
Collecting wasabi<1.2.0,>=0.9.1 (from spacy)
  Downloading wasabi-1.1.3-py3-none-any.whl.metadata (28 kB)
Collecting srsly<3.0.0,>=2.4.3 (from spacy)
  Downloading srsly-2.5.1-cp313-cp313-win_amd6


[notice] A new release of pip is available: 25.1.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 25.1.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     ------- -------------------------------- 2.4/12.8 MB 16.5 MB/s eta 0:00:01
     -------------- ------------------------- 4.7/12.8 MB 13.0 MB/s eta 0:00:01
     -------------------- ------------------- 6.6/12.8 MB 11.8 MB/s eta 0:00:01
     ----------------------- ---------------- 7.6/12.8 MB 9.9 MB/s eta 0:00:01
     -------------------------- ------------- 8.4/12.8 MB 8.4 MB/s eta 0:00:01
     --------------------------- ------------ 8.7/12.8 MB 7.6 MB/s eta 0:00:01
     ----------------------------- ---------- 9.4/12.8 MB 6.8 MB/s eta 0:00:01
     -------------------------------- ------- 10.5/12.8 MB 6.5 MB/s eta 0:00:01
     ------------------------------------ --- 11.5/12.8 MB 6.3 MB/s eta 0:00:01
     -------------------------------


[notice] A new release of pip is available: 25.1.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


## Step 2: Importing and Loading Data

In [2]:
import pandas as pd
import spacy
import requests 
from bs4 import BeautifulSoup

nlp = spacy.load("en_core_web_sm")
pd.set_option("display.max_rows", 200)

## Step 3: Applying NER to a Sample Text

In [3]:
content = "Trinamool Congress leader Mahua Moitra has moved the Supreme Court against her expulsion from the Lok Sabha over the cash-for-query allegations against her. Moitra was ousted from the Parliament last week after the Ethics Committee of the Lok Sabha found her guilty of jeopardising national security by sharing her parliamentary portal's login credentials with businessman Darshan Hiranandani."

doc = nlp(content)
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

Trinamool Congress 0 18 ORG
Mahua Moitra 26 38 PERSON
the Supreme Court 49 66 ORG
Moitra 157 163 NORP
Parliament 184 194 ORG
last week 195 204 DATE
the Ethics Committee 211 231 ORG
Darshan Hiranandani 373 392 PERSON


**It displays the names of the entities, their start and end positions in the text and their predicted labels.**

## Step 4: Visualize Entities

In [8]:
from spacy import displacy
displacy.render(doc, style="ent") ## Will throw an error as I'm running Python 3.11



ImportError: cannot import name 'display' from 'IPython.core.display' (C:\Users\siddharth.padigar\AppData\Local\Programs\Python\Python313\Lib\site-packages\IPython\core\display.py)

## Step 5: Creating a Dataframe for Entities

In [7]:
entities = [(ent.text, ent.label_, ent.lemma_) for ent in doc.ents]
df = pd.DataFrame(entities, columns=['text', 'type', 'lemma'])

print(df)

                   text    type                 lemma
0    Trinamool Congress     ORG    Trinamool Congress
1          Mahua Moitra  PERSON          Mahua Moitra
2     the Supreme Court     ORG     the Supreme Court
3                Moitra    NORP                Moitra
4            Parliament     ORG            Parliament
5             last week    DATE             last week
6  the Ethics Committee     ORG  the Ethics Committee
7   Darshan Hiranandani  PERSON   Darshan Hiranandani
