#  English-to-Clips programming language translator. 

English variants for *deftemplate*:
- __ template has properties..
- Template "__" has properties..
- Create a __ template.. 
- __ has properties


English variants for *assert*:
- There exists..
- Assert..
- Add a fact about../Add to the base..


English variants for *defrule*:
- If..
- When..

Examples:
- Cat template has properties of color, age, and name.

*(deftemplate cat (slot color) (slot age) (slot name))*

- There exists a cat with the name Bob.

*(assert (cat (name “Bob”)))*

- If there exists cat named Bob then there exists a cat named Tom.

*(defrule rule1 (cat (name “Bob”)) => (assert (cat (name “Tom”))))*

### Algorithm:
1. ask for a sentence 
2. check grammar and syntax of the sentence
3. imply semantics from a given sentence
  - tokenize the sentence
  - determine the type of the sentence (template/assertion/rule)
4. produce a translated expression 

### MVP specifications

1. implement translation into 3 basic CLIPS statements: defrule, deftemplate, assert

2. realize the idea (not a wholly hardcoded mapping of words): use tokens, compose a grammar, apply parsing

3. write clean code and necessary documentation

4. include grammar/syntax check of the given sentence


In [None]:
!pip install spacy beautifulsoup4
!pip install language-tool-python

Collecting language-tool-python
  Downloading language_tool_python-2.5.5-py3-none-any.whl (31 kB)
Installing collected packages: language-tool-python
Successfully installed language-tool-python-2.5.5


In [None]:
import spacy
from spacy import displacy
import en_core_web_sm
import language_tool_python
import copy

In [None]:
# Functions for analysis of tokens

def check_for_defrule(tokens):
  # if..
  # when..
  # NOTE: for now, the word "then" is obligatory in the inintial sentence
  presence_main_clause = False
  presence_cond_clause = False
  for token in tokens:
    if (token.text.casefold() == 'if') or (token.text.casefold() == 'when'):
      presence_cond_clause = True
    if (token.text.casefold() == 'then'):
      presence_main_clause = True
  if (presence_main_clause and presence_cond_clause):
    return 'defrule'
    
  return 'undefined'


def check_for_deftemplate(tokens):
  #- __ template has properties..
  #- Template "__" has properties..
  #- Create a __ template.. 
  #- __ has properties  
  presence_template_word = False
  presence_property_word = False
  for token in tokens:
    if (token.text.casefold() == 'template'):
      presence_template_word = True
    if (token.text.casefold() == 'property') or (token.text.casefold() == 'properties'):
      presence_property_word = True
  if (presence_template_word or presence_property_word):
    return 'deftemplate'
    
  return 'undefined'


def check_for_assert(tokens):
  #- There exists..
  #- Assert..
  #- Add a fact about../Add to the base.
  # combination of (subject+verb) or just verbs
  presence_subject = False
  presence_verb = False
  for token in tokens:
    if (token.dep_ == 'nsubj') and (token.head.text is not None):
      presence_subject = True
      presence_verb = True
    if ('VB' in token.tag_ ):
      presence_verb = True
  if (presence_subject or presence_verb):
    return 'assert'
    
  return 'undefined'  


def determine_sentence_type(tokens):
  #1. check for patterns of DEFRULE
  #2. check for patterns of DEFTEMPLATE
  #3. check for pattern of ASSERT (presence of subject+verb)
  #4. otherwise "undefined type"
  sent_type = 'undefined'
  sent_type = check_for_defrule(tokens)
  if sent_type == 'undefined':
    sent_type = check_for_deftemplate(tokens)
    if sent_type == 'undefined':
      sent_type = check_for_assert(tokens)

  return sent_type

In [None]:
# Functions for parsing tokens


def extract_subject_verb_pairs(tokens):
  # 1. search through noun chunks
  # 2. find subjects - chunks with 'nsubj' as a root; -> assign them to actors in the separate facts
  # 3. find corresponding verbs for subjects 
  # NOTE: Sentences of type There is/are do NOT have subjects, instead they have attributes attached to verbs
  # Also, the order of words in facts is: (verb, subject, attribute/object)
  subject_verb_pairs = []
  enhanced_pairs = []
  for chunk in tokens.noun_chunks:
      if (chunk.root.dep_ == 'nsubj'):
        subject_verb_pairs.append([chunk.root.head.text, chunk.root.text]) #verb, subject
        enhanced_pairs.append([chunk.root.head.text, chunk.root.text])
      if (chunk.root.dep_ == 'attr'):
        subject_verb_pairs.append([chunk.root.head.text, chunk.root.text]) #verb, attribute
        enhanced_pairs.append([chunk.root.head.text, chunk.root.text])
      if (chunk.root.dep_ == 'dobj'):
        i = len(enhanced_pairs)
        for pair in reversed(enhanced_pairs): 
          i -= 1
          if (pair[0] == chunk.root.head.text):
            enhanced_pairs[i].append(chunk.root.text)
            break
  return subject_verb_pairs


def enhance_pairs_with_objects(tokens, subject_verb_pairs):
  # attach objects to corresponsing pairs (verb+subject)
  # Example: (have+I) <- apple
  enhanced_pairs = copy.deepcopy(subject_verb_pairs)
  for chunk in tokens.noun_chunks:
    if (chunk.root.dep_ == 'dobj'):
        i = len(enhanced_pairs)
        for pair in reversed(enhanced_pairs): 
          i -= 1
          if (pair[0] == chunk.root.head.text):
            enhanced_pairs[i].append(chunk.root.text)
            break
  return enhanced_pairs


def enhance_pairs_with_adjectives(tokens, enhanced_pairs):
  # attach adjecctives to corresponsing pairs (verb+subject)
  # Example: (am+I) <- cool
  enhanced_pairs_v2 = copy.deepcopy(enhanced_pairs)
  start = 0
  for i in range(len(list(tokens))):
    if (tokens[i].dep_ == 'acomp'):
      for j in range(start, len(enhanced_pairs_v2)):
        if (enhanced_pairs_v2[j][0] == tokens[i].head.text):
          enhanced_pairs_v2[j].append(tokens[i].text)
          start += 1
          break
  return enhanced_pairs_v2


def extract_facts(tokens):
  # Extraction of facts(parsing) is done with the sequence of steps:
  # 1. find pairs (action+actor)
  # 2. enrich the pairs with corresponding objects
  # 3. enrich the pairs with adjectival complements
  subject_verb_pairs = extract_subject_verb_pairs(tokens)
  enhanced_pairs_v1 = enhance_pairs_with_objects(tokens, subject_verb_pairs)
  enhanced_pairs_v2 = enhance_pairs_with_adjectives(tokens, enhanced_pairs_v1)
  return enhanced_pairs_v2

In [None]:
def extract_defrule_details(tokens):
  # List of the details:
  # 1. explicitly given rule's name (not implemented in MVP)
  # 2. facts in conditional part 
  # 3. facts in action part 
  details = {}
  # assigning default 'name-of-the-rule'
  details['name'] = 'name-of-the-rule'

  # 1. determine which part of the sentence (1st or 2nd) is conditional
  # 2. determine which facts from conditional part and which ones from action part (mark verbs from the facts accordingly) 
  details['conditional_verbs'] = []
  details['action_verbs'] = []
  if (tokens[0].text == 'If' ) or (tokens[0].text == 'When'):
    details['condition'] = 1
    for token in tokens:
      if (token.text == 'then'):
        break
      if ('VB' in token.tag_ ):
        details['conditional_verbs'].append(token.text)

  if (tokens[0].text == 'Then' ):
    details['condition'] = 2 
    for token in tokens:
      if (token.text == 'if') or (token.text == 'when'):
        break
      if ('VB' in token.tag_ ):
        details['action_verbs'].append(token.text)

  return details


def extract_deftemplate_details(tokens):
  # List of the details:
  # 1. template's name
  # 2. slot names

  # Two types of sentences supported for MVP:
  #1 - __ template has..
  #2 - __ has.. 

  # Also, only one template per sentence is supported for MVP
  details = {}
  details['name'] = ''
  #look for word expression "__ template"
  for token in tokens:
    if (token.dep_ == 'compound') and (token.head.text == 'template'):
      details['name'] = token.text
  if (details['name'] == ''):
    #or assign subject as a name 
    for token in tokens:
      if (token.dep_ == 'subj'):
        details['name'] = token.text
        break

  #To extract properties:
  # 1. Find the first noun (pobj or)
  # 2. Iterate over the next tokens while they are delimited with ','/'and' + they are nouns

  #NOTE: there are two variants:
  # 1 - there is a collocation 'property of'/'properties of'
  # 2 - otherwise
  details['slots'] = []
  for i in range(1, len(list(tokens))):
    if ((tokens[i].text == 'of') and (tokens[i-1].text == 'properties' or tokens[i].text == 'property')) \
        or (tokens[i].text == 'have' or tokens[i].text == 'has'):
      j = i+1
      while (j < len(list(tokens))) and (tokens[j].pos_ == 'NOUN' or tokens[j].pos_ == 'PUNCT' or tokens[j].pos_ == 'CCONJ'):
        if (tokens[j].pos_ == 'NOUN'):
          details['slots'].append(tokens[j].text)
        j += 1
      break 

  return details


def extract_assert_details(tokens):
  # List of the details:
  details = {}
  return details


def extract_specifications(sentence_type, tokens):
  # Extract specific word expressions (keywords, commands) for each type of the sentence
  if sentence_type == 'defrule':
    return extract_defrule_details(tokens)
  if sentence_type == 'deftemplate':
    return extract_deftemplate_details(tokens) 
  if sentence_type == 'assert':
    return extract_assert_details(tokens)

  return None

In [None]:
def form_defrule(specifications, facts):
  # (defrule rule1 (cat (name “Bob”)) => (assert (cat (name “Tom”))))
  translated_sentence = '(defrule ' + specifications['name'] + '\n    '
  # will call translate_to_assert(specifications, facts)

  num_cond = len(specifications['conditional_verbs'])
  num_action = len(specifications['action_verbs'])
  start = 0
  end = 0
  if (specifications['condition'] == 1):
    start = 0
    end = num_cond
  else:
    start = num_action
    end = len(facts)

  for i in range(start, end):
      fact = ' '.join(facts[i])
      translated_sentence = translated_sentence + '(' + fact + ')\n    ' 

  translated_sentence = translated_sentence + '=>\n    '

  if (specifications['condition'] == 2):
    start = 0
    end = num_action
  else:
    start = num_cond
    end = len(facts)

  for i in range(start, end):
    fact = ' '.join(facts[i])
    translated_sentence = translated_sentence + '(assert (' + fact + '))\n    ' 

  return translated_sentence


def form_deftemplate(specifications, facts):
  # (deftemplate cat (slot color) (slot age) (slot name)) 
  translated_sentence = '(deftemplate ' + specifications['name'] + '\n    '
  num_slots = len(specifications['slots'])
  for i in range(num_slots-1):
    translated_sentence = translated_sentence + '(slot ' + specifications['slots'][i] +  ')\n    '

  translated_sentence = translated_sentence + '(slot ' + specifications['slots'][num_slots-1] + '))'
  return translated_sentence


def form_assert(specifications, facts):
  # (assert (cat (name “Bob”)))
  translated_sentences = []
  for fact in facts:
    fact_str = ' '.join(fact)
    sentence = '(assert (' + fact_str + '))'
    translated_sentences.append(sentence) 

  return translated_sentences

In [None]:
class Translator(object):

  def tokenize(self, sentence):
    # divide the sentence into predefined tokens
    nlp = en_core_web_sm.load()
    doc = nlp(sentence)
    return doc


  def analyse_tokens(self, tokens):
    result = []
    sentence_type = determine_sentence_type(tokens)
    if sentence_type == 'undefined':
      return 'Sorry, the translator does not support such kind of the sentence.'

    facts = extract_facts(tokens)
    specifications = extract_specifications(sentence_type, tokens)

    if sentence_type == 'defrule':
      result = form_defrule(specifications, facts)
    if sentence_type == 'deftemplate':
      result = form_deftemplate(specifications, facts)
    if sentence_type == 'assert':
      result = form_assert(specifications, facts)

    return result


  def translate(self, sentence):
    # integrate all translation steps
    tokens = self.tokenize(sentence)
    translated_sentence = self.analyse_tokens(tokens)
    return translated_sentence

In [None]:
def ask_for_sentence(tool):
  # ask for a sentence from the user
  sentence = ""

  # check the sentence for mistakes in grammar/syntax
  while True:
    sentence = input('Enter a sentence in English: ')
    matches = tool.check(sentence)    
    if (len(matches) == 0):
      break
    print('The entered sentence has incorrect grammar or syntax. Please try again.')
  return sentence


def print_translation(translated_sentence):
  if isinstance(translated_sentence, list):
    for sentences in translated_sentence:
      print(sentences)
  else:
    print(translated_sentence)

## A set of English sentences supported for translation by the MVP version
**To define a "defrule" provide a sentence:**
- with conditional part starting with "if"/"when"
- with actional part starting with "then"
- there can be multiple facts in both parts

**To define a "deftemplate" provide a sentence:**
- with a single template's definition
- with template's name as the following examples:
  - *template-name* template has..
  - *template-name* has.. 
- with at least one word from the list: "template", "property", "properties" (to correctly determine the type of the sentence). 


**To define an "assert" provide a sentence:**
- see Notes below
- in case of multiple facts in one sentence, they will be split into multiple "assert" statements 

### Notes:
- homogeneous subjects are not supported (e.g. ~Alice and Tom sing.~)
- homogeneous actions are not supported (e.g. ~I sing and walk.~)
- homogeneous adjectives are not supported (e.g. ~I am happy and calm.~))
- the order of words in the produced translation: *(verb subject object/adjective)*. For example, (assert (have I apple)). Taken from some tutorial.
- construction of "assert" facts with multiple attributes is not supported (e.g. ~(assert (person (name Bob) (age 20)))~) 

In [None]:
translator = Translator()
tool = language_tool_python.LanguageTool('en-US')
sentence = ""

while (sentence != 'stop'):
  sentence = ask_for_sentence(tool)
  if sentence == 'stop': 
    break
  translated_sentence = translator.translate(sentence)
  print_translation(translated_sentence)

Downloading LanguageTool: 100%|██████████| 203M/203M [00:11<00:00, 17.7MB/s]
Unzipping /tmp/tmpwbfihrlj.zip to /root/.cache/language_tool_python.
Downloaded https://www.languagetool.org/download/LanguageTool-5.4.zip to /root/.cache/language_tool_python.


(assert (have I apple))
The entered sentence has incorrect grammar or syntax. Please try again.
(defrule name-of-the-rule
    (is it rainy)
    =>
    (assert (walk I))
    
(deftemplate Cat
    (slot name))
