# Document for demonstrating the pipeline of recognition

### The Pipeline for Sytax tree generation

The following image shows the steps to be taken to covert the natural language sentence into a document, which is used later to convert into python code.

![Text](https://spacy.io/assets/img/pipeline.svg)

### Import all necessary dependencies 

In [1]:
import spacy
from spacy import displacy
from IPython.display import HTML, display
print('Imports successful')

Imports successful


### Load Spacy Model for recognition

In [2]:
nlp = spacy.load('en')

### A utility to display tabes for better visualisation of output

In [3]:
def displayAsTable(data):
    display(HTML(
    '<table><tr>{}</tr></table>'.format(
        '</tr><tr>'.join(
            '<td>{}</td>'.format('</td><td>'.join(str(_) for _ in row)) for row in data)
        )
     ))

In [None]:
input_text = input("Input a sentence or a group of sentences: ")
doc = nlp(input_text)

## Step 1 : Tokenisation and Part-of-Speech Tagging

Tokenisation breaks down the sentence into a list of tokens.

**The algorithm can be summarized as follows:**

1. Iterate over space-separated substrings
2. Check whether we have an explicitly defined rule for this substring. If we do, use it.
3. Otherwise, try to consume a prefix.
4. If we consumed a prefix, go back to the beginning of the loop, so that special-cases always get priority.
5. If we didn't consume a prefix, try to consume a suffix.
6. If we can't consume a prefix or suffix, look for "infixes" — stuff like hyphens etc.
7. Once we can't consume any more of the string, handle it as a single token.

In [5]:
token_list = []
for token in doc:
    if not token.is_stop:
        token_list.append([token.text, token.pos_, spacy.explain(token.pos_), spacy.explain(token.dep_)])
displayAsTable(token_list)

0,1,2,3
Create,VERB,verb,
array,NOUN,noun,direct object
10,NUM,numeral,
numbers,NOUN,noun,object of preposition
.,PUNCT,punctuation,punctuation


## Step 2 : Word similarity 


In [6]:
words = nlp(u'create develop initialise start')

similarity_table = []

for token1 in words:
    for token2 in words:
        similarity_table.append([token1.text, token2.text, token1.similarity(token2)])

displayAsTable(similarity_table[:3])
displayAsTable(similarity_table[-2:-1])

0,1,2
create,create,1.0
create,develop,0.5023777
create,initialise,0.2719381


0,1,2
start,initialise,0.47232231


In [7]:
options = {'compact': False, 'bg': '#09a3d5',
           'color': 'white'}

spacy.displacy.render(doc, style='dep', jupyter=True, options=options)

In [11]:
for chunk in doc.noun_chunks:
    print(chunk.text, chunk.root.text, chunk.root.dep_,
          chunk.root.head.text, sep='|')

the leftmost element|element|pobj|from
the list|list|pobj|of
one by one compare|compare|conj|Start
each element|element|pobj|with
the list|list|pobj|of
