# Local Inference

This notebook demonstrates how to use a fine-tuned NER model for inferencing on example sentences.

It includes:
- Loading the fine-tuned model.
- Performing Named Entity Recognition (NER) on example sentences.
- Cleaning and visualizing the extracted entities using Pandas DataFrames.

### Changing to the main directory

In [None]:
%cd ..

### Importing Necessary Libraries

In [2]:
from transformers import pipeline
from transformers import BertTokenizerFast, DistilBertTokenizerFast 
from transformers import AutoModelForTokenClassification 

import pandas as pd 

from utilities import OUTPUT_MODEL
from utilities import clean_and_group_entities

### Load Fine-tuned Model

In [3]:
model_fine_tuned = AutoModelForTokenClassification.from_pretrained(OUTPUT_MODEL)
tokenizer = DistilBertTokenizerFast.from_pretrained(OUTPUT_MODEL)

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'BertTokenizer'. 
The class this function is called from is 'DistilBertTokenizerFast'.


### Inferencing Model

Performing NER on an example sentence to identify entities.

In [4]:
nlp = pipeline("ner", model=model_fine_tuned, tokenizer=tokenizer, aggregation_strategy="simple")

example = """Texas Instruments LM358 op-amp requires dual power supply."""

ner_results = nlp(example)

print(ner_results)

Device set to use cuda:0


[{'entity_group': 'VENDOR', 'score': 0.8711902, 'word': 'texas instruments', 'start': 0, 'end': 17}, {'entity_group': 'PRODUCT', 'score': 0.8888053, 'word': 'l', 'start': 18, 'end': 19}, {'entity_group': 'PRODUCT', 'score': 0.9007871, 'word': '##m', 'start': 19, 'end': 20}, {'entity_group': 'PRODUCT', 'score': 0.88946533, 'word': '##35', 'start': 20, 'end': 22}, {'entity_group': 'PRODUCT', 'score': 0.87212956, 'word': '##8', 'start': 22, 'end': 23}, {'entity_group': 'COMPONENT', 'score': 0.8891128, 'word': 'op', 'start': 24, 'end': 26}, {'entity_group': 'COMPONENT', 'score': 0.65238696, 'word': '- amp', 'start': 26, 'end': 30}]


The `clean_and_group_entities` function ensures that subword tokens are grouped and overlapping entities are merged for better readability.

In [17]:
# Clean and group entities
cleaned_results = clean_and_group_entities(ner_results)

# Convert to DataFrame for better visualization
df = pd.DataFrame(cleaned_results)
df

Unnamed: 0,entity_group,word,start,end,score
0,VENDOR,xilinx,4,10,0.956922
1,VENDOR,art,60,63,0.87337
2,TECHNOLOGY,fpga,68,72,0.93157
3,DESIGN_PARAM,100mhz,121,127,0.990575
4,COMPONENT,crystal,128,135,0.767239
5,COMPONENT,os,136,138,0.627278
6,COMPONENT,llator,140,146,0.699194
7,COMPONENT,adc,160,163,0.847222


In [18]:
example_2 = "Use the oscilloscope to measure the 2.4GHz signal."

In [19]:
ner_results = nlp(example_2)
df = pd.DataFrame(ner_results)
df

Unnamed: 0,entity_group,score,word,start,end
0,EQUIPMENT,0.993913,os,8,10
1,EQUIPMENT,0.994249,##ci,10,12
2,EQUIPMENT,0.993899,##llo,12,15
3,EQUIPMENT,0.993257,##scope,15,20
4,DESIGN_PARAM,0.988858,2,36,37
5,DESIGN_PARAM,0.990325,.,37,38
6,DESIGN_PARAM,0.989654,4,38,39
7,DESIGN_PARAM,0.991569,##gh,39,41
8,DESIGN_PARAM,0.991421,##z,41,42


In [20]:
# Clean and group entities
cleaned_results = clean_and_group_entities(ner_results)

# Convert to DataFrame for better visualization
df = pd.DataFrame(cleaned_results)
df

Unnamed: 0,entity_group,word,start,end,score
0,EQUIPMENT,oscilloscope,8,20,0.993257
1,DESIGN_PARAM,2.4ghz,36,42,0.988858


In [21]:
test_samples = [
    "During the testing phase, the Tektronix oscilloscope was used to analyze the 2.4GHz signal from the ESP32 microcontroller through a high-bandwidth SMA connector.",
    "The engineering team used MATLAB and LTSpice software to simulate the behavior of a complex RF circuit containing multiple BC547 transistors and 0.1μF ceramic capacitors operating at 915MHz.",
    "The PCB manufacturing process at Texas Instruments follows strict IPC-A-610 and RoHS standards, using high-grade FR4 substrate and lead-free solder for mounting surface-mount devices including 0402 resistors.",
    "Using the Keysight spectrum analyzer connected to a 50Ω impedance matched antenna, we measured the signal strength of the nRF52840 Bluetooth module transmitting at -20dBm through a quarter-wave monopole antenna.",
    "The Xilinx Vivado development suite was used to program the Artix-7 FPGA on a custom development board, which included a 100MHz crystal oscillator and multiple ADC channels for sensor interfacing via I2C protocol.",
]

In [22]:
ner_results = nlp(test_samples[0])
cleaned_results = clean_and_group_entities(ner_results)
pd.DataFrame(cleaned_results)

Unnamed: 0,entity_group,word,start,end,score
0,VENDOR,tektronix,30,39,0.993652
1,EQUIPMENT,oscilloscope,40,52,0.988301
2,DESIGN_PARAM,2.4ghz,77,83,0.991594
3,PRODUCT,esp,100,103,0.679144
4,TECHNOLOGY,microcontroller,106,121,0.893514
5,COMPONENT,sma connector,147,160,0.671683


In [23]:
ner_results = nlp(test_samples[1])
cleaned_results = clean_and_group_entities(ner_results)
pd.DataFrame(cleaned_results)

Unnamed: 0,entity_group,word,start,end,score
0,SOFTWARE,matlab,26,32,0.934292
1,SOFTWARE,ltspice,37,44,0.890051
2,TECHNOLOGY,rf,92,94,0.792354
3,COMPONENT,transistors,129,140,0.842874
4,DESIGN_PARAM,0.1μf,145,150,0.975451
5,MATERIAL,ceramic,151,158,0.846612
6,COMPONENT,capacitors,159,169,0.692648
7,DESIGN_PARAM,915mhz,183,189,0.965939


In [24]:
ner_results = nlp(test_samples[2])
cleaned_results = clean_and_group_entities(ner_results)
pd.DataFrame(cleaned_results)

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


Unnamed: 0,entity_group,word,start,end,score
0,TECHNOLOGY,pcb,4,7,0.776639
1,VENDOR,texas instruments,33,50,0.853849
2,MATERIAL,lead,131,135,0.839052
3,COMPONENT,0402,193,197,0.858203
4,COMPONENT,resistors,198,207,0.677335


In [25]:
ner_results = nlp(test_samples[3])
cleaned_results = clean_and_group_entities(ner_results)
pd.DataFrame(cleaned_results)

Unnamed: 0,entity_group,word,start,end,score
0,VENDOR,keysight,10,18,0.938339
1,EQUIPMENT,spectrum analyzer,19,36,0.889039
2,DESIGN_PARAM,50ω,52,55,0.990709
3,PRODUCT,nrf52840,122,130,0.923482
4,PRODUCT,blue,131,135,0.615939
5,DESIGN_PARAM,20dbm,165,170,0.669012
6,COMPONENT,monopole,194,202,0.810546


In [26]:
ner_results = nlp(test_samples[4])
cleaned_results = clean_and_group_entities(ner_results)
pd.DataFrame(cleaned_results)

Unnamed: 0,entity_group,word,start,end,score
0,VENDOR,xilinx,4,10,0.956922
1,VENDOR,art,60,63,0.87337
2,TECHNOLOGY,fpga,68,72,0.93157
3,DESIGN_PARAM,100mhz,121,127,0.990575
4,COMPONENT,crystal,128,135,0.767239
5,COMPONENT,os,136,138,0.627278
6,COMPONENT,llator,140,146,0.699194
7,COMPONENT,adc,160,163,0.847222
