<a href="https://colab.research.google.com/github/HiranmaiKaredla/NLP/blob/main/NER.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import spacy
# Use the command to install the SpaCy:
# > pip install -U spacy
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)



In [None]:
## Use the command to download the SpaCy English model:
!python -m spacy download en_core_web_sm

2023-03-21 21:46:35.196653: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-03-21 21:46:35.196826: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-03-21 21:46:39.822082: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting en-core-web-sm==3.5.0
  Downloading https://github.com/explosion/spacy-models/releases/download

In [None]:
# Import SpaCy English model
import en_core_web_sm
from spacy import displacy
from collections import Counter
from pprint import pprint

In [None]:
# Load English tokenizer, tagger, parser, NER and word vectors
nlp = en_core_web_sm.load()

In [None]:
# transform the text to spacy doc format
sentence = 'Albert Einstein was born in Germany'


In [None]:
print('Original sentence:')
print(sentence)
spacy_obj= nlp(sentence)

Original sentence:
Albert Einstein was born in Germany


In [None]:
print('Token-labels:')
for x in spacy_obj:
  print(dict(text=x, prefix=x.ent_iob_, entity_type=x.ent_type_))

Token-labels:
{'text': Albert, 'prefix': 'B', 'entity_type': 'PERSON'}
{'text': Einstein, 'prefix': 'I', 'entity_type': 'PERSON'}
{'text': was, 'prefix': 'O', 'entity_type': ''}
{'text': born, 'prefix': 'O', 'entity_type': ''}
{'text': in, 'prefix': 'O', 'entity_type': ''}
{'text': Germany, 'prefix': 'B', 'entity_type': 'GPE'}


In [None]:
print('Entitites:')
for x in spacy_obj.ents:
  print(dict(text=x.text, label=x.label_, start_char=x.start_char, end_char=x.start_char+len(x.text)))


Entitites:
{'text': 'Albert Einstein', 'label': 'PERSON', 'start_char': 0, 'end_char': 15}
{'text': 'Germany', 'label': 'GPE', 'start_char': 28, 'end_char': 35}


In [None]:
print('Entity counts:')
# Count every named entity
labels = [x.label_ for x in spacy_obj.ents]
from collections import Counter
counts = Counter(labels)
print(counts)


Entity counts:
Counter({'PERSON': 1, 'GPE': 1})


In [None]:
print('Get detailed info:')
# Get more detailed infor for named entities and descriptions for corresponding labels
for ent in spacy_obj.ents:
  print(ent.text, ent.start_char, ent.end_char, \
        ent.label_, spacy.explain(ent.label_))


Get detailed info:
Albert Einstein 0 15 PERSON People, including fictional
Germany 28 35 GPE Countries, cities, states


In [None]:
# Visualize Entities
displacy.render(spacy_obj, jupyter=True, style='ent')

In [None]:
''' Task
Replace all identified named entities with the assigned tag in square brackets.

For example:
  Albert Einstein was born in Germany -->
  [PERSON] was born in [GPE].

Tips: The most correct way to de-identify the text is to use the character
indices output by Spacy to locate the named entities, as opposed to doing a
global search and replace based on the entity strings. Before starting,
formulate a plan. Is it better to start at the beginning of the document or at
the end?

'''

# https://www.nobelprize.org/prizes/physics/1903/marie-curie/biographical/
text = '''
Marie Curie, née Maria Sklodowska, was born in Warsaw on November 7, 1867, the daughter of a secondary-school teacher.
She received a general education in local schools and some scientific training from her father.
She became involved in a students’ revolutionary organization and found it prudent to leave Warsaw, then in the part of Poland dominated by Russia, for Cracow, which at that time was under Austrian rule.
In 1891, she went to Paris to continue her studies at the Sorbonne where she obtained Licenciateships in Physics and the Mathematical Sciences.
She met Pierre Curie, Professor in the School of Physics in 1894 and in the following year they were married.
She succeeded her husband as Head of the Physics Laboratory at the Sorbonne, gained her Doctor of Science degree in 1903, and following the tragic death of Pierre Curie in 1906, she took his place as Professor of General Physics in the Faculty of Sciences, the first time a woman had held this position.
She was also appointed Director of the Curie Laboratory in the Radium Institute of the University of Paris, founded in 1914.
'''
print(text)





Marie Curie, née Maria Sklodowska, was born in Warsaw on November 7, 1867, the daughter of a secondary-school teacher. 
She received a general education in local schools and some scientific training from her father. 
She became involved in a students’ revolutionary organization and found it prudent to leave Warsaw, then in the part of Poland dominated by Russia, for Cracow, which at that time was under Austrian rule. 
In 1891, she went to Paris to continue her studies at the Sorbonne where she obtained Licenciateships in Physics and the Mathematical Sciences. 
She met Pierre Curie, Professor in the School of Physics in 1894 and in the following year they were married. 
She succeeded her husband as Head of the Physics Laboratory at the Sorbonne, gained her Doctor of Science degree in 1903, and following the tragic death of Pierre Curie in 1906, she took his place as Professor of General Physics in the Faculty of Sciences, the first time a woman had held this position. 
She was also app

In [None]:
# Create entity tagger function
nlp = en_core_web_sm.load()
def tagger(sentence, nlp=nlp):

  print('')
  print('Original sentence')
  print(sentence)

  spacy_obj= nlp(sentence)
  entities = list(spacy_obj.ents)

  #print('Entitites:')
  rows = []
  for x in entities:
    rows.append(dict(text=x.text, label=x.label_, start_char=x.start_char, end_char=x.start_char+len(x.text)))

  df = pd.DataFrame(rows)
  print(df)

  # Visualize Entities
  displacy.render(spacy_obj, jupyter=True, style='ent')

  return entities

In [None]:
# Get all entites in document
entities = tagger(text)


Original sentence

Marie Curie, née Maria Sklodowska, was born in Warsaw on November 7, 1867, the daughter of a secondary-school teacher. 
She received a general education in local schools and some scientific training from her father. 
She became involved in a students’ revolutionary organization and found it prudent to leave Warsaw, then in the part of Poland dominated by Russia, for Cracow, which at that time was under Austrian rule. 
In 1891, she went to Paris to continue her studies at the Sorbonne where she obtained Licenciateships in Physics and the Mathematical Sciences. 
She met Pierre Curie, Professor in the School of Physics in 1894 and in the following year they were married. 
She succeeded her husband as Head of the Physics Laboratory at the Sorbonne, gained her Doctor of Science degree in 1903, and following the tragic death of Pierre Curie in 1906, she took his place as Professor of General Physics in the Faculty of Sciences, the first time a woman had held this position

In [None]:
# Deidentify entity

# Get entity
entity = entities[-1]
print(f'entity: {entity}')

# Convert str to list of characters
text_as_list = list(text)
print(f'text_as_list: {text_as_list}')

# Replace entity with deidentification token
start = entity.start_char
end = entity.end_char
label = entity.label_
deid_token = '[' + label + ']'
text_as_list[start:end]  = list(deid_token)

# Convert list of char back to string
text_as_str = ''.join(text_as_list)
print(f'text_as_str: {text_as_str}')

entity: 1914
text_as_list: ['\n', 'M', 'a', 'r', 'i', 'e', ' ', 'C', 'u', 'r', 'i', 'e', ',', ' ', 'n', 'é', 'e', ' ', 'M', 'a', 'r', 'i', 'a', ' ', 'S', 'k', 'l', 'o', 'd', 'o', 'w', 's', 'k', 'a', ',', ' ', 'w', 'a', 's', ' ', 'b', 'o', 'r', 'n', ' ', 'i', 'n', ' ', 'W', 'a', 'r', 's', 'a', 'w', ' ', 'o', 'n', ' ', 'N', 'o', 'v', 'e', 'm', 'b', 'e', 'r', ' ', '7', ',', ' ', '1', '8', '6', '7', ',', ' ', 't', 'h', 'e', ' ', 'd', 'a', 'u', 'g', 'h', 't', 'e', 'r', ' ', 'o', 'f', ' ', 'a', ' ', 's', 'e', 'c', 'o', 'n', 'd', 'a', 'r', 'y', '-', 's', 'c', 'h', 'o', 'o', 'l', ' ', 't', 'e', 'a', 'c', 'h', 'e', 'r', '.', ' ', '\n', 'S', 'h', 'e', ' ', 'r', 'e', 'c', 'e', 'i', 'v', 'e', 'd', ' ', 'a', ' ', 'g', 'e', 'n', 'e', 'r', 'a', 'l', ' ', 'e', 'd', 'u', 'c', 'a', 't', 'i', 'o', 'n', ' ', 'i', 'n', ' ', 'l', 'o', 'c', 'a', 'l', ' ', 's', 'c', 'h', 'o', 'o', 'l', 's', ' ', 'a', 'n', 'd', ' ', 's', 'o', 'm', 'e', ' ', 's', 'c', 'i', 'e', 'n', 't', 'i', 'f', 'i', 'c', ' ', 't', 'r', 'a', 

In [None]:
entities

[Marie Curie,
 Maria Sklodowska,
 Warsaw,
 November 7, 1867,
 Warsaw,
 Poland,
 Russia,
 Cracow,
 Austrian,
 1891,
 Paris,
 Sorbonne,
 Licenciateships in Physics,
 the Mathematical Sciences,
 Pierre Curie,
 the School of Physics,
 1894,
 the following year,
 the Physics Laboratory,
 Sorbonne,
 1903,
 Pierre Curie,
 1906,
 General Physics,
 the Faculty of Sciences,
 first,
 the Curie Laboratory,
 the Radium Institute of the University of Paris,
 1914]

In [None]:
entities.reverse()

In [None]:
for entity in entities[1:] :
  print(entity)
  start = entity.start_char
  end = entity.end_char
  label = entity.label_
  deid_token = '[' + label + ']'
  text_as_list[start:end]  = list(deid_token)

  # Convert list of char back to string
  text_as_str = ''.join(text_as_list)

the Radium Institute of the University of Paris
the Curie Laboratory
first
the Faculty of Sciences
General Physics
1906
Pierre Curie
1903
Sorbonne
the Physics Laboratory
the following year
1894
the School of Physics
Pierre Curie
the Mathematical Sciences
Licenciateships in Physics
Sorbonne
Paris
1891
Austrian
Cracow
Russia
Poland
Warsaw
November 7, 1867
Warsaw
Maria Sklodowska
Marie Curie


In [None]:
print(f'text_as_str: {text_as_str}')

text_as_str: 
[PERSON], née [PERSON], was born in [GPE] on [DATE], the daughter of a secondary-school teacher. 
She received a general education in local schools and some scientific training from her father. 
She became involved in a students’ revolutionary organization and found it prudent to leave [GPE], then in the part of [GPE] dominated by [GPE], for [GPE], which at that time was under [NORP] rule. 
In [DATE], she went to [GPE] to continue her studies at the [PRODUCT] where she obtained [ORG] and [ORG]. 
She met [PERSON], Professor in [ORG] in [DATE] and in [DATE] they were married. 
She succeeded her husband as Head of [ORG] at the [PRODUCT], gained her Doctor of Science degree in [DATE], and following the tragic death of [PERSON] in [DATE], she took his place as Professor of [ORG] in [ORG], the [ORDINAL] time a woman had held this position. 
She was also appointed Director of [ORG] in [ORG], founded in [DATE].

