## NLP SECOND ASSIGNMENT

The assignment consists in the development a pipeline that, starting from a text in input, in a given language (English, French, German, Italian) outputs the syntactic tree of the sentence itself.

We can assume the following:
- Adjectives in English and German are prefixed to nouns 
- Adjectives in Italian and French are postponed to nouns 
- Verbs are all at present tense 
- No pronouns are admitted 
- Only one adverb is admitted and is always postponed with respect to the verb

For this assignment I decided to write a simple context free grammar that fits the sentences that I give as inputs, I used spacy for pos tagging and 
NLTK for grammar parsing and tree generation.

## Preliminary imports

In [13]:
import nltk
import spacy
from nltk.tree import TreePrettyPrinter

## TREE GENERATOR

In [14]:
def tree_generator(file, grammar_init, nlp):
    for sent in file:
        parsed_sent = nlp(sent)
        for token in parsed_sent:
            print(token.text, token.pos_)
        print(f"{sent}\n")

        # Collect possible parts of speech and words for each part of speech
        grammar = {token.pos_: [f'"{token.text}"'] for token in parsed_sent}

        for type, words in grammar.items():
            for token in parsed_sent:
                if token.pos_ == type and f'"{token.text}"' not in words:
                    words.append(f'"{token.text}"')

            # Create grammar rules
            grammar_string = f"{type} -> {' | '.join(words)}\n"
            grammar_init += grammar_string

        nltk_grammar = nltk.CFG.fromstring(grammar_init)
        parser = nltk.ChartParser(nltk_grammar)

        spacy_tokenized = [token.text for token in parsed_sent]
        trees = list(parser.parse(spacy_tokenized))
        if trees:
            print(trees[0])
            print(TreePrettyPrinter(trees[0]).text())

In [15]:
grammar = """
S -> NP VP PUNCT | NP VP | PUNCT NP VP PUNCT
NP -> NOUN | NP ADJ | DET NP | ADJ NP
VP -> VP NP | VERB | VP ADV | VP PUNCT 
""" 

## ITALIAN

In [16]:
file = [
    "L'automobile rossa corre velocemente."
]

# Load spacy pos tag for Italian
nlp = spacy.load("it_core_news_sm") 

tree_generator(file, grammar, nlp)

L' DET
automobile NOUN
rossa ADJ
corre VERB
velocemente ADV
. PUNCT
L'automobile rossa corre velocemente.

(S
  (NP (NP (DET L') (NP (NOUN automobile))) (ADJ rossa))
  (VP (VP (VERB corre)) (ADV velocemente))
  (PUNCT .))
                     S                              
             ________|___________________________    
            NP                  |                |  
      ______|________           |                |   
     NP              |          VP               |  
  ___|______         |      ____|_______         |   
 |          NP       |     VP           |        |  
 |          |        |     |            |        |   
DET        NOUN     ADJ   VERB         ADV     PUNCT
 |          |        |     |            |        |   
 L'     automobile rossa corre     velocemente   .  



## ENGLISH

In [17]:
file = [
    "The red car runs quickly.",
]

# load spacy pos tag for English
nlp = spacy.load("en_core_web_sm") 

tree_generator(file, grammar, nlp)

The DET
red ADJ
car NOUN
runs VERB
quickly ADV
. PUNCT
The red car runs quickly.

(S
  (NP (DET The) (NP (ADJ red) (NP (NOUN car))))
  (VP (VP (VERB runs)) (ADV quickly))
  (PUNCT .))
             S                         
      _______|______________________    
     NP                |            |  
  ___|___              |            |   
 |       NP            VP           |  
 |    ___|___      ____|_____       |   
 |   |       NP   VP         |      |  
 |   |       |    |          |      |   
DET ADJ     NOUN VERB       ADV   PUNCT
 |   |       |    |          |      |   
The red     car  runs     quickly   .  



## GERMAN

In [18]:
file = [ 
    "Die hohe Palme schwankt sanft."
]

# load spacy pos tag for German
nlp = spacy.load("de_core_news_sm") 

tree_generator(file, grammar, nlp)

Die DET
hohe ADJ
Palme NOUN
schwankt VERB
sanft ADV
. PUNCT
Die hohe Palme schwankt sanft.

(S
  (NP (DET Die) (NP (ADJ hohe) (NP (NOUN Palme))))
  (VP (VP (VERB schwankt)) (ADV sanft))
  (PUNCT .))
               S                           
      _________|________________________    
     NP                      |          |  
  ___|____                   |          |   
 |        NP                 VP         |  
 |    ____|____        ______|____      |   
 |   |         NP     VP          |     |  
 |   |         |      |           |     |   
DET ADJ       NOUN   VERB        ADV  PUNCT
 |   |         |      |           |     |   
Die hohe     Palme schwankt     sanft   .  



## FRENCH

In [19]:
file = [
    "Le chat noir dormir tranquillement."
]

# load spacy pos tag for French
nlp = spacy.load("fr_core_news_sm") 

tree_generator(file, grammar, nlp)

Le DET
chat NOUN
noir ADJ
dormir VERB
tranquillement ADV
. PUNCT
Le chat noir dormir tranquillement.

(S
  (NP (NP (DET Le) (NP (NOUN chat))) (ADJ noir))
  (VP (VP (VERB dormir)) (ADV tranquillement))
  (PUNCT .))
              S                                  
          ____|_______________________________    
         NP               |                   |  
      ___|____            |                   |   
     NP       |           VP                  |  
  ___|___     |      _____|________           |   
 |       NP   |     VP             |          |  
 |       |    |     |              |          |   
DET     NOUN ADJ   VERB           ADV       PUNCT
 |       |    |     |              |          |   
 Le     chat noir dormir     tranquillement   .  

