<a href="https://colab.research.google.com/github/RajarajachozhanVK/RajarajachozhanVK/blob/main/Syntactic_Parser_and_Semantic_Parser_for_Translation_of_Word_Forms_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Syntactic Parser and Semantic Parser for Translation of Word Forms

Introduction

The aim is to develop a syntactic and semantic parser to translate word forms. In this context, we will focus on syntactic parsing to identify and structure parts of speech and semantic parsing to understand and derive meaning from the text.

Syntactic Parsing
Syntactic parsing involves analyzing the grammatical structure of a sentence to understand the relationships between words. It identifies parts of speech (POS) and arranges words into a parse tree.

Tokenization: Break text into tokens (words or phrases).
POS Tagging: Assign grammatical categories (tags) to each token (e.g., noun, verb).
Chunking: Group tokens into meaningful phrases (e.g., noun phrases, verb phrases).
Semantic Parsing
Semantic parsing involves understanding the meaning of the text by interpreting the relationships and roles of words within a sentence.

Named Entity Recognition (NER): Identify and classify entities (e.g., names, dates, locations).
Relation Extraction: Determine the relationships between entities.
Semantic Role Labeling: Assign roles to words (e.g., agent, object).

**Syntactic Parser Implementation**

In [1]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag
from nltk.chunk import RegexpParser

# Ensure you have the necessary NLTK resources downloaded
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

# Tokenize the text into words
def tokenize_and_tag(text):
    tokens = word_tokenize(text)
    tagged_tokens = pos_tag(tokens)
    return tagged_tokens

# Define your chunking pattern
def chunk_text(tagged_tokens):
    pattern = 'NP: {<DT>?<JJ>*<NN>}'  # Chunk noun phrases
    cp = RegexpParser(pattern)  # Create a chunk parser
    chunked_tree = cp.parse(tagged_tokens)
    return chunked_tree

# Extract chunks based on the specified pattern
def extract_chunks(chunked_tree):
    chunks = []
    for subtree in chunked_tree.subtrees(filter=lambda t: t.label() == 'NP'):
        chunks.append(' '.join(word for word, pos in subtree.leaves()))
    return chunks

# Example usage
text = "This domain uses prior coordination and permission information."
tagged_tokens = tokenize_and_tag(text)
chunked_tree = chunk_text(tagged_tokens)
chunks = extract_chunks(chunked_tree)

# Print chunked sentences
for chunk in chunks:
    print(chunk)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


This domain
prior coordination
permission
information


**Semantic Parser Implementation**

In [2]:
import spacy

# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

# Function to perform named entity recognition
def named_entity_recognition(text):
    doc = nlp(text)
    entities = [(entity.text, entity.label_) for entity in doc.ents]
    return entities

# Function to perform relation extraction
def relation_extraction(text):
    doc = nlp(text)
    relations = []
    for ent in doc.ents:
        for token in ent.root.head.children:
            if token.dep_ in ('attr', 'dobj'):
                relations.append((ent.text, token.head.text, token.text))
    return relations

# Example usage
text = "Barack Obama was born in Hawaii."
entities = named_entity_recognition(text)
relations = relation_extraction(text)

# Print named entities
print("Named Entities:")
for entity in entities:
    print(entity)

# Print relations
print("Relations:")
for relation in relations:
    print(relation)

Named Entities:
('Barack Obama', 'PERSON')
('Hawaii', 'GPE')
Relations:


In [3]:
import requests
from bs4 import BeautifulSoup
import nltk
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag
from nltk.chunk import RegexpParser

# Ensure you have the necessary NLTK resources downloaded
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

# Fetches content from a URL and parses the text
def fetch_html_content(url):
    response = requests.get(url)
    return response.text

def parse_html_to_text(html_content):
    soup = BeautifulSoup(html_content, 'html.parser')
    text = soup.get_text()
    return text

# Tokenize the text into words
def tokenize_and_tag(text):
    tokens = word_tokenize(text)
    tagged_tokens = pos_tag(tokens)
    return tagged_tokens

# Define your chunking pattern
def chunk_text(tagged_tokens):
    pattern = 'NP: {<DT>?<JJ>*<NN>}'  # Chunk noun phrases
    cp = RegexpParser(pattern)  # Create a chunk parser
    chunked_tree = cp.parse(tagged_tokens)
    return chunked_tree

# Extract chunks based on the specified pattern
def extract_chunks(chunked_tree):
    chunks = []
    for subtree in chunked_tree.subtrees(filter=lambda t: t.label() == 'NP'):
        chunks.append(' '.join(word for word, pos in subtree.leaves()))
    return chunks

# Example usage
url = 'http://example.com'  # Replace with your target URL
html_content = fetch_html_content(url)
text = parse_html_to_text(html_content)
tagged_tokens = tokenize_and_tag(text)
chunked_tree = chunk_text(tagged_tokens)
chunks = extract_chunks(chunked_tree)

# Print chunked sentences
for chunk in chunks:
    print(chunk)

# Print chunked subtrees
for subtree in chunked_tree.subtrees(filter=lambda t: t.label() == 'NP'):
    print(subtree)

This domain
use
this domain
literature
prior coordination
permission
information
(NP This/DT domain/NN)
(NP use/NN)
(NP this/DT domain/NN)
(NP literature/NN)
(NP prior/JJ coordination/NN)
(NP permission/NN)
(NP information/NN)


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
