<a href="https://colab.research.google.com/github/AtifQureshi110/BERT/blob/main/pipeline_ner.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

- ner_pipeline = pipeline('ner', grouped_entities=True)

- The ner_pipeline is created using the command: pipeline('ner', grouped_entities=True).
- This NER (Named Entity Recognition) pipeline is designed to identify names of persons, places, locations, and other nouns within text.
- When the parameter grouped_entities is set to True, the pipeline groups together multi-word names like "NEW YORK CITY" instead of treating them as separate words. This ensures that entities are recognized correctly.
- When grouped_entities is set to False, the pipeline would recognize the name as individual words like "NEW", "YORK", and "CITY".
- This pipeline requires input text to perform the classification and identify named entities.

#### lib

In [None]:
!pip install transformers



In [None]:
from transformers import pipeline

In [None]:
# Actvation of NER pipeline
ner_pipeline = pipeline('ner', grouped_entities=True)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


### basic understanding purpose

In [None]:
input = "Embark on an adventure with Sarah and her trusty camera, capturing breathtaking USA landscapes she likes work in NASA."
detection = ner_pipeline(input)

In [None]:
detection

[{'entity_group': 'PER',
  'score': 0.9947277,
  'word': 'Sarah',
  'start': 28,
  'end': 33},
 {'entity_group': 'LOC',
  'score': 0.99928904,
  'word': 'USA',
  'start': 80,
  'end': 83},
 {'entity_group': 'ORG',
  'score': 0.9953433,
  'word': 'NASA',
  'start': 113,
  'end': 117}]

In [None]:
# for understanding purpose
info = []
for i in detection:
  info.append((i["entity_group"], i["word"]))
info

[('PER', 'Sarah'), ('LOC', 'USA'), ('ORG', 'NASA')]

In [None]:
entity_words_info = [([i['entity_group'],i['word']]) for i in detection]
entity_words_info

[['PER', 'Sarah'], ['LOC', 'USA'], ['ORG', 'NASA']]

In [None]:
entity_data = [f"{entry[0]}: {entry[1]}" for entry in entity_info]
ner_df = pd.DataFrame({"text": [input_text], "entity_data": [", ".join(entity_data)]})
ner_df

Unnamed: 0,text,entity_data
0,Embark on an adventure with Sarah and her trus...,"PER: Sarah, LOC: USA, ORG: NASA"


#### data list with in a list

In [None]:
# for understanding purpose
info = []
for i in detection:
  info.append((i["entity_group"], i["word"]))
info

[('PER', 'Sarah'), ('LOC', 'USA'), ('ORG', 'NASA')]

In [None]:
entity_words_info = [([i['entity_group'],i['word']]) for i in detection]
entity_words_info

[['PER', 'Sarah'], ['LOC', 'USA'], ['ORG', 'NASA']]

In [None]:
entity_data = [f"{entry[0]}: {entry[1]}" for entry in entity_info]
ner_df = pd.DataFrame({"text": [input_text], "entity_data": [", ".join(entity_data)]})
ner_df


Unnamed: 0,text,entity_data
0,Embark on an adventure with Sarah and her trus...,"PER: Sarah, LOC: USA, ORG: NASA"


#### data in a list

In [None]:
ner = []
for i in info:
  ner.append(i[1])
ner

['Sarah', 'USA', 'NASA']

In [None]:
entity_words = [i[1] for i in info]
entity_words

['Sarah', 'USA', 'NASA']

In [None]:
import pandas as pd
ner_df = pd.DataFrame({"text":[input], "entity_words":[", ".join(ner)]})
ner_df

Unnamed: 0,text,entity_words
0,Embark on an adventure with Sarah and her trus...,"Sarah, USA, NASA"


#### data

In [None]:
import pandas as pd

entity_info = [['PER', 'Sarah'], ['LOC', 'USA'], ['ORG', 'NASA']]
entity_words = [entry[1] for entry in entity_info]

ner_df = pd.DataFrame({"text": [input_text], "entity_words": [", ".join(entity_words)]})
ner_df


In [None]:
detection = [{'entity_group': 'PER', 'score': 0.9947277, 'word': 'Sarah', 'start': 28, 'end': 33},
 {'entity_group': 'LOC', 'score': 0.99928904, 'word': 'USA', 'start': 80, 'end': 83},
 {'entity_group': 'ORG', 'score': 0.9953433, 'word': 'NASA', 'start': 113, 'end': 117}]

entity_words = [entry["word"] for entry in detection]

ner_df = pd.DataFrame({"Entity Words": [", ".join(entity_words)]})
ner_df


Unnamed: 0,Entity Words
0,"Sarah, USA, NASA"


In [None]:
detection = [{'entity_group': 'PER', 'score': 0.9947277, 'word': 'Sarah', 'start': 28, 'end': 33},
 {'entity_group': 'LOC', 'score': 0.99928904, 'word': 'USA', 'start': 80, 'end': 83},
 {'entity_group': 'ORG', 'score': 0.9953433, 'word': 'NASA', 'start': 113, 'end': 117}]

info = []
for i in detection:
    info.append((i["entity_group"], i["word"]))

input_text = "Embark on an adventure with Sarah and her trusty camera, capturing breathtaking USA landscapes she likes work in NASA."

ner_df = pd.DataFrame({"text": [input_text] * len(info), "ner": info})
ner_df


Unnamed: 0,text,ner
0,Embark on an adventure with Sarah and her trus...,"(PER, Sarah)"
1,Embark on an adventure with Sarah and her trus...,"(LOC, USA)"
2,Embark on an adventure with Sarah and her trus...,"(ORG, NASA)"


## NER pipeline**


In [None]:
# the NER pipeline main setecion other for undersanding selection
# creation of custom dataset
import pandas as pd
sentences = [
    "Embark on an adventure with Sarah and her trusty camera, capturing breathtaking landscapes in the UK. She loves her work at NASA.",
    "The bustling streets of New York City are always filled with energy and excitement, making it a dream destination for many.",
    "Walking through the charming streets of Paris, the Eiffel Tower stands tall as a symbol of romance and architecture.",
    "Sarah's adventurous spirit led her to explore the stunning landscapes of the USA, while she pursued her dreams at NATO.",
    "In the heart of Paris, the Eiffel Tower stands as a testament to the city's romantic charm and architectural marvels.",
    "Exploring the vast Amazon rainforest, exotic birds showcase nature's vibrant melodies and beauty.",
    "The bustling streets of London are a captivating blend of tradition and modernity, making it a favorite destination.",
    "Amidst the tranquility of the Swiss Alps, a cozy chalet offers respite and breathtaking mountain views.",]
# Create a DataFrame
df = pd.DataFrame({"Text": sentences})
# Display the DataFrame
df

In [None]:
# Actvation of NER pipeline
ner_pipeline = pipeline('ner', grouped_entities=True)

In [None]:
# Detecion of text which belong of NER category
ner_detection = [ner_pipeline(ner_info) for ner_info in df['Text']]
ner_detection

In [None]:
# selectiion of only words and their category
selected_entities = [
    [(entity['entity_group'], entity['word']) for entity in sublist]
    for sublist in ner_detection
]

selected_entities

In [None]:
# finally adding the info with csutom dataset
df["entity_words"] = selected_entities

In [None]:
df

#### genration

In [None]:
generator = pipeline("text-generation", model="distilgpt2")

Downloading (…)lve/main/config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
prompt = "AI and data science course we will teach you how to"
entity_names = ["John", "TechCorp", "San Francisco"]

In [None]:
i want that it genrate the entity my self which contain name of person , place, and orgainzation
prompt = "AI and data science course we will teach you how to"

entity_names = ["John", "TechCorp", "San Francisco"]
prompt = "AI and data science course we will teach you how to"
entity_names = ["John", "TechCorp", "San Francisco"]
output = generator(
    prompt + f" {entity_names[0]} will guide {entity_names[1]} in {entity_names[2]}",
    max_length=30,
    num_return_sequences=3,
    top_k=50
)
gen_text=[]
for idx, sequence in enumerate(output):
  gen_text.append(sequence)
  print(f"Generated Text {idx+1}: {sequence['generated_text']}\n")


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated Text 1: AI and data science course we will teach you how to John will guide TechCorp in San Francisco.

Generated Text 2: AI and data science course we will teach you how to John will guide TechCorp in San Francisco.

Generated Text 3: AI and data science course we will teach you how to John will guide TechCorp in San Francisco. See more Collider at their website http://www.



In [None]:
for i in gen_text:
  print(i['generated_text'])

AI and data science course we will teach you how to John will guide TechCorp in San Francisco.
AI and data science course we will teach you how to John will guide TechCorp in San Francisco.
AI and data science course we will teach you how to John will guide TechCorp in San Francisco. See more Collider at their website http://www.


In [None]:
i fro i in gen_text
  File "<ipython-input-95-c86224b3916c>", line 1
    i fro i in gen_text:
      ^
SyntaxError: invalid syntax

In [None]:
prompt = "AI and data science course we will teach you how to"
entity_names = ["John", "TechCorp", "San Francisco"]

output = generator(
    prompt + f" {entity_names[0]} will guide {entity_names[1]} in {entity_names[2]}",
    max_length=30,
    num_return_sequences=3,
    top_k=50
)

gen_text = []
for idx, sequence in enumerate(output):
    gen_text.append(sequence)
    print(f"Generated Text {idx+1}: {sequence['generated_text']}\n")


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated Text 1: AI and data science course we will teach you how to John will guide TechCorp in San Francisco using free online data.

The courseederalist

Generated Text 2: AI and data science course we will teach you how to John will guide TechCorp in San Francisco. Our website will have a list of resources, links

Generated Text 3: AI and data science course we will teach you how to John will guide TechCorp in San Francisco. Our course is taught online on the web with Google



In [None]:
prompt = "AI and data science course we will teach you how to"

# Generate entity names using the text generation pipeline
entity_name_generation_prompt = "Generate entity names: <mask> <mask> <mask>"
entity_name_output = generator(
    entity_name_generation_prompt,
    max_length=20,
    num_return_sequences=3,
    top_k=50
)


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [None]:

# Extract generated entity names
entity_names = [sequence['generated_text'].replace("Generate entity names: ", "") for sequence in entity_name_output]

# Use the generated entity names in the main prompt
main_output = generator(
    prompt + f" {entity_names[0]} will guide {entity_names[1]} in {entity_names[2]}",
    max_length=30,
    num_return_sequences=3,
    top_k=50
)

gen_text = []
for idx, sequence in enumerate(main_output):
    gen_text.append(sequence)
    print(f"Generated Text {idx+1}: {sequence['generated_text']}\n")


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated Text 1: AI and data science course we will teach you how to <mask> <mask> <mask> <mask> <mask> will guide <mask> <mask> <mask> </mask> </filter> in <mask> <mask> <mask> <mask> <mask> <

Generated Text 2: AI and data science course we will teach you how to <mask> <mask> <mask> <mask> <mask> will guide <mask> <mask> <mask> </mask> </filter> in <mask> <mask> <mask> <mask> <mask> <

Generated Text 3: AI and data science course we will teach you how to <mask> <mask> <mask> <mask> <mask> will guide <mask> <mask> <mask> </mask> </filter> in <mask> <mask> <mask> <mask> <mask> <



In [None]:


prompt = "AI and data science course we will teach you how to"

# Generate entity names using the text generation pipeline
entity_name_generation_prompt = "Generate entity names: <mask> <mask> <mask>"
entity_name_output = generator(
    entity_name_generation_prompt,
    max_length=20,
    num_return_sequences=3,
    top_k=50
)

# Extract generated entity names
entity_names = [sequence['generated_text'].replace("Generate entity names: ", "") for sequence in entity_name_output]

# Use the generated entity names in the main prompt
main_output = generator(
    prompt + f" {entity_names[0]} will guide {entity_names[1]} in {entity_names[2]}",
    max_length=30,
    num_return_sequences=3,
    top_k=50,
    max_new_tokens=100  # Increase this value as needed
)

gen_text = []
for idx, sequence in enumerate(main_output):
    gen_text.append(sequence)
    print(f"Generated Text {idx+1}: {sequence['generated_text']}\n")


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=100) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Generated Text 1: AI and data science course we will teach you how to <mask> <mask> <mask> <mask> <mask> will guide <mask> <mask> <mask> <mask> <mask> in <mask> <mask> <mask> <mask> <mask> <mask> In class we use the "mask" in class to define an implementation, but you must have enough knowledge from previous modules in order to be able to understand other module types. If we can help you understand other modules we will be able to develop new functions that represent the entire module. Once a module can take place in class, we will add them directly into the package in your class.



In this class we will define a new class which uses the "mask"

Generated Text 2: AI and data science course we will teach you how to <mask> <mask> <mask> <mask> <mask> will guide <mask> <mask> <mask> <mask> <mask> in <mask> <mask> <mask> <mask> <mask> </ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> <ul> 