
# Module 1: Input

## Import required Libraries

In [1]:
import spacy
import nltk 
from spacy.tokenizer import Tokenizer
import re

## Load the model for English Language

In [2]:
nlp = spacy.load("en_core_web_sm")

## Input from user of words whos sentence has to be made

As of now, the input being taken is a sequence  of words. But the input can also be taken in as a list or a tuple as well.

In [3]:
doc = nlp(input("> "))

> i stay pune



# Module 2: Preprocessing

## Perform Tagging of Part Of Speech with depth of the word used. 

With the help of spacy library, we are implementing a for loop, in which every word in the sequence can get its POS tag.

In [4]:
for token in doc: 
   print(" ->Token {} \nPOS: {}, \ndep: {}".format(token.text, token.tag_, token.dep_))    

 ->Token i 
POS: PRP, 
dep: nsubj
 ->Token stay 
POS: VBP, 
dep: ROOT
 ->Token pune 
POS: NN, 
dep: attr


In [5]:
print("Verbs:", [token.text for token in doc if token.pos_ == "VERB"])

Verbs: ['stay']


# Module 2: Subject -Verb Object Extraction

## Determine Subject,Wh- words and Object Constants

The way a word in the sequence can appear by determining the constants which will help in identifying Synatctic Dependency relation

In [6]:
OBJECT_DEPS = {"dobj", "dative", "attr", "oprd"}
SUBJECT_DEPS = {"nsubj", "nsubjpass", "csubj", "agent", "expl"}
WH_WORDS = {"WP", "WP$", "WRB"}

## Extract Subject, Verb and Object in the existing words

Subject, Verb and Object are extracted from the token.dep_ feature of words whose POS is identified.

In [7]:
def extract_svo(doc):
    sub = []
    at = []
    print(type(at))
    ve = []
    for token in doc:
        # is this a verb?
        if token.pos_ == "VERB":
            ve.append(token.text)
        # is this the object?
        if token.dep_ in OBJECT_DEPS or token.head.dep_ in OBJECT_DEPS:
            at.append(token.text)
        # is this the subject?
        if token.dep_ in SUBJECT_DEPS or token.head.dep_ in SUBJECT_DEPS:
            sub.append(token.text)
    return " ".join(sub).strip().lower(), " ".join(ve).strip().lower(), " ".join(at).strip().lower()

## Check if a question exists

The Tag of a question helps to recognize a WH- word in the sequence of words, and accordingly to find if a question is stated or not

In [8]:
def is_question(doc):
    # is the first token a verb?
    if len(doc) > 0 and doc[0].pos_ == "VERB":
        return True, ""
    # go over all words
    for token in doc:
        # is it a wh- word?
        if token.tag_ in WH_WORDS:
            return True, token.text.lower()
    return False, ""

Run the above functions

In [9]:
subject, verb, attribute = extract_svo(doc)
question, wh_word = is_question(doc)
print("svo:, \nsubject: {}, \nverb: {}, \nattribute: {}, \nquestion: {}, wh_word: {}".format(subject, verb, attribute, question, wh_word))

<class 'list'>
svo:, 
subject: i, 
verb: stay, 
attribute: pune, 
question: False, wh_word: 


In [10]:
type(attribute)
attribute_present = False
if(not(attribute and attribute.strip())): 
    attribute_present = False  
else : 
    attribute_present = True
print(attribute_present)

True


# Module 3: Grammar Rules Designing 

## Create a complete sentence

Demo of how Parsing functions can be done. This one is done on the basis of only one use case of the sequence, 

"I stay Pune"

The accurate prediction of appropriate preposition and similarly prefix, suffix etc will be accordingly done.

<img src= "tree.png">

In [11]:
def assemble(*args):
    return " ".join(args)

In [12]:
def determine_preposition(attribute):
    return assemble("in", attribute)

In [13]:
def sentence(NP,VP):
    return assemble(NP,VP)

In [14]:
def NP(T,N):
    return assemble(T,N)

In [15]:
def VP(Verb,N):
    return assemble(Verb,N)

In [16]:
print(subject)
print(verb)

i
stay


# Module 4: Parsing Trial 

In [17]:
VP1 = VP(verb,determine_preposition(attribute))
output = sentence(subject,VP1) 
print(output) 

i stay in pune


"I stay in Pune" is the output for this demo test case.

Here we had only one preposition to be added. Many use cases cannot be satisfied by these functions.
Tenses are also a major factor determining the sentence structure. At the moment, we are brainstorming on them

Determining Appropriate stop words, prepositions, and tense of the sentences is the Objective which we will be working on.