<a href="https://colab.research.google.com/github/Priyanshu-Naik/Gen_AI/blob/main/NER.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Implementation of NER in Python

Step 1: Installing Libraries

In [None]:
!pip install spacy
!pip install nltk
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m59.7 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


Step 2: Importing and Loading data

We will be using Pandas and Spacy libraries to implement this.

In [None]:
import pandas as pd
import spacy
import requests
from bs4 import BeautifulSoup
nlp = spacy.load('en_core_web_sm')
pd.set_option('display.max_rows', 200)

Step 3: Applying NER to a Sample Text

We have created some random content to implement this you can use any text based on your choice.

doc = nlp(content): Processes text stored in content using the nlp model and stores resulting document object in the variable doc for further analysis.

for ent in doc.ents: Iterates through the named entities (doc.ents) identified in the processed document and performs actions for each entity.

In [None]:
content = "Trinamool Congress leader Mahua Moitra has moved the Supreme Court against her expulsion from the Lok Sabha over the cash-for-query allegations against her. Moitra was ousted from the Parliament last week after the Ethics Committee of the Lok Sabha found her guilty of jeopardising national security by sharing her parliamentary portal's login credentials with businessman Darshan Hiranandani."
doc = nlp(content)
for ent in doc.ents:
    print(ent.text, ent.label_, ent.start_char, ent.end_char)

Trinamool Congress ORG 0 18
Mahua Moitra PERSON 26 38
the Supreme Court ORG 49 66
Moitra NORP 157 163
Parliament ORG 184 194
last week DATE 195 204
the Ethics Committee ORG 211 231
Darshan Hiranandani PERSON 373 392


It displays the names of the entities, their start and end positions in the text and their predicted labels.



Step 4: Visualizing Entities

We will highlight the text with their categories using visualizing technique for better understanding.

displacy.render(doc, style="ent"): Visualizing named entities in the processed doc object by highlighting them in the text with their respective categories such as person, organization, location etc

In [None]:
from spacy import displacy
displacy.render(doc, style="ent")

Step 5: Creating a DataFrame for Entities

entities = [(ent.text, ent.label_, ent.lemma_) for ent in doc.ents]: Creating a list of tuples where each tuple contains the text, label (type) and lemma (base form) of each named entity found in the processed doc object

In [None]:
entities = [(ent.text, ent.label_, ent.lemma_) for ent in doc.ents]
df = pd.DataFrame(entities, columns=['Text', 'Type', 'Lemma'])
df

Unnamed: 0,Text,Type,Lemma
0,Trinamool Congress,ORG,Trinamool Congress
1,Mahua Moitra,PERSON,Mahua Moitra
2,the Supreme Court,ORG,the Supreme Court
3,Moitra,NORP,Moitra
4,Parliament,ORG,Parliament
5,last week,DATE,last week
6,the Ethics Committee,ORG,the Ethics Committee
7,Darshan Hiranandani,PERSON,Darshan Hiranandani
