**Import Required Library**

In [1]:
import spacy

import spacy: Imports the spacy library, which is used for natural language processing tasks including tokenization, part-of-speech tagging, and named entity recognition.

**Load Spacy Model**

In [2]:
nlp = spacy.load('en_core_web_sm')

nlp = spacy.load('en_core_web_sm'): Loads the small English language model from spacy. This model contains the necessary tools and data for processing English text, including tokenization, part-of-speech tagging, named entity recognition, and more.

**Define the Text to be Processed**

In [3]:
# text = "Michael Jackson came to India in 1996 for a concert in Mumbai."
text = "I am Mahiyat Tanzim, a CSE undergraduate in Jahangirnagar University, Savar, Dhaka, Bangladesh."

text: The input text that contains named entities that need to be anonymized.

**Process the Text with Spacy**

In [4]:
doc = nlp(text)

doc = nlp(text): Processes the text using the spacy model, creating a doc object that contains the processed text, including tokens, part-of-speech tags, entities, and more.

**Anonymize Named Entities**

In [5]:
anonymized_tokens=[]
for token in doc:
  if token.ent_type_ in['PERSON', 'GPE', 'DATE', 'ORG']:
    anonymized_tokens.append(token.ent_type_)
  else:
    anonymized_tokens.append(token.text)



*   anonymized_tokens = []: Initializes an empty list to store the anonymized tokens.
*   for token in doc: Iterates over each token in the processed text (doc).
    *   if token.ent_type_ in ['PERSON', 'GPE', 'DATE', 'ORG']: Checks if the token is a named entity of type PERSON, GPE (Geopolitical Entity), DATE, or ORG (Organization).
        *   If true, it appends the entity type (e.g., 'PERSON', 'GPE') to the anonymized_tokens list, effectively replacing the entity with its type.
        *   If false, it appends the original token text to the anonymized_tokens list.



**Combine Tokens into Anonymized Text**

In [6]:
anonymized_text = ' '.join(anonymized_tokens)
print(anonymized_text)

I am PERSON PERSON , a ORG undergraduate in ORG ORG , GPE , GPE , GPE .




*   anonymized_text = ' '.join(anonymized_tokens): Joins the tokens in the anonymized_tokens list into a single string, with spaces separating each token, to form the anonymized version of the original text.
*   print(anonymized_text): Prints the anonymized text.



**Summary**<br>
This notebook processes a text to replace named entities (person names, geopolitical entities, dates, and organizations) with their respective entity types. The output is an anonymized version of the original text, where sensitive information is replaced by the entity type labels.