# Morphological analysis

One of the first tasks in morphological analysis is to detect the language of the message. This stage generally underpins most of the processing that can be carried out at higher levels.


In [2]:
#A simple example of an API that detects language is as follows

from langdetect import detect, DetectorFactory #langdetect supports over 55 languages
DetectorFactory.seed = 0 

print("Premier exemple: ", detect("War doesn't show who's right, just who's left."))

print("Deuxième exemple: ", detect("Ein, zwei, drei, vier"))

Premier exemple:  en
Deuxième exemple:  de


More details on this library [here](https://github.com/Mimino666/langdetect/blob/master/README.md)

## Exercise 1


In this section, we'll be looking at the **spaCy** library.

Using [this page](https://applied-language-technology.mooc.fi/html/notebooks/part_ii/03_basic_nlp.html#morphological-analysis) as a reference, students are requested to:

- Consider the following sentences:
    1. "إننا نتعلم من خلال الممارسة والخطأ"
    2. "It’s by practicing and mistaking that we learn"
    3. "C'est en pratiquant et en se trompant qu'on apprend"
    4. "Es practicando y equivocándonos que aprendemos"
    5. "我们通过练习和错误学习"

- For each of the sentences, you are asked to detect its language (using the Spacy library). Then for the sentence in English, you are asked to:

    1. List the different morphemes present in the sentence
    2. Specify the aspects of the third morpheme
    3. Return a dictionary of this phrase

In [5]:
import spacy
from langdetect import detect

# Load SpaCy's English language model
nlp = spacy.load("en_core_web_sm")

# Sentences in different languages
sentences = [
    "إننا نتعلم من خلال الممارسة والخطأ",
    "It’s by practicing and mistaking that we learn",
    "C'est en pratiquant et en se trompant qu'on apprend",
    "Es practicando y equivocándonos que aprendemos",
    "我们通过练习和错误学习"
]

# Dictionary to store results
results = {}

# Loop through each sentence
for sentence in sentences:
    # Detect language
    language = detect(sentence)
    print(f"Detected language for sentence: '{sentence}' is {language}")

    # Only process the English sentence for further analysis
    if language == 'en':
        # Process the sentence with SpaCy
        doc = nlp(sentence)

        # Extract morphemes (token text)
        morphemes = [token.text for token in doc]
        print("Morphemes in the English sentence:", morphemes)

        # Get the third morpheme and its aspects
        if len(morphemes) >= 3:
            third_morpheme = doc[2]
            aspects = {
                "text": third_morpheme.text,
                "lemma": third_morpheme.lemma_,
                "POS": third_morpheme.pos_,
                "dependency": third_morpheme.dep_,
                "shape": third_morpheme.shape_,
                "is_alpha": third_morpheme.is_alpha,
                "is_stop": third_morpheme.is_stop
            }
            print("Aspects of the third morpheme:", aspects)

        # Create a dictionary representation of the sentence
        sentence_dict = {token.text: {"lemma": token.lemma_, "POS": token.pos_, "dependency": token.dep_} for token in doc}
        print("Dictionary representation of the English sentence:", sentence_dict)

        # Store the results for the English sentence
        results["English"] = {
            "morphemes": morphemes,
            "third_morpheme_aspects": aspects,
            "sentence_dict": sentence_dict
        }

print("Results:", results)



Detected language for sentence: 'إننا نتعلم من خلال الممارسة والخطأ' is ar
Detected language for sentence: 'It’s by practicing and mistaking that we learn' is en
Morphemes in the English sentence: ['It', '’s', 'by', 'practicing', 'and', 'mistaking', 'that', 'we', 'learn']
Aspects of the third morpheme: {'text': 'by', 'lemma': 'by', 'POS': 'ADP', 'dependency': 'prep', 'shape': 'xx', 'is_alpha': True, 'is_stop': True}
Dictionary representation of the English sentence: {'It': {'lemma': 'it', 'POS': 'PRON', 'dependency': 'nsubj'}, '’s': {'lemma': '’', 'POS': 'VERB', 'dependency': 'ROOT'}, 'by': {'lemma': 'by', 'POS': 'ADP', 'dependency': 'prep'}, 'practicing': {'lemma': 'practice', 'POS': 'VERB', 'dependency': 'pcomp'}, 'and': {'lemma': 'and', 'POS': 'CCONJ', 'dependency': 'cc'}, 'mistaking': {'lemma': 'mistake', 'POS': 'VERB', 'dependency': 'conj'}, 'that': {'lemma': 'that', 'POS': 'PRON', 'dependency': 'dobj'}, 'we': {'lemma': 'we', 'POS': 'PRON', 'dependency': 'nsubj'}, 'learn': {'lemma

In addition, these same tasks can be performed with a dedicated Linux library called Polyglot, which can perform the tasks described above and many others: [(Find some examples of: Tokenization, Part of speech tagging, Named Entity recognition, polarity detection, Embeddings, Transliteration)](https://pypi.org/project/polyglot/)[or this site, which offers the same tasks:](https://www.geeksforgeeks.org/natural-language-processing-using-polyglot-introduction/).Finally, we can go even further with the "Morfessor". [See this link](https://polyglot.readthedocs.io/en/latest/MorphologicalAnalysis.html) and [ this one here](http://aayushsanghavi.blogspot.com/2018/03/morphological-segmentation-of-words.html) also.

# Syntactic analysis

## Syntactic analysis with NLTK

In this first introductory section, we will first look at a simple example of syntax tree generation using the NLTK library.

### Example 1

In [None]:
import nltk
regles_grammaire = nltk.CFG.fromstring("""
 S -> NP VP
 PP -> P NP
 NP -> Det N | Det N PP | 'I'
 VP -> V NP | VP PP
 Det -> 'a' | 'my'
 N -> 'mouse' | 'closet'
 V -> 'found'
 P -> 'in'
 """)

sent = ['I', 'found', 'a', 'mouse', 'in', 'my', 'closet']
parser = nltk.ChartParser(regles_grammaire)
for tree in parser.parse(sent):
    print(tree)
        

Go further (by exploring the notion of *context free gramar*) with [this example](https://www.nltk.org/book/ch08.html).

## Exercise 2

Taking the previous example, you are asked to change the rules base to take into account the following sentences:

- I found a mouse in my closet 
- The fox jumps over the dog
- I bought a toy for my son
- I am in the classroom

In [None]:
#Copy your solution here
#Let's do it

### Example 2

In [2]:
# Importation of acquired libraries
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
from nltk import pos_tag, word_tokenize, RegexpParser

# Text example
sample_text = "The quick brown fox jumps over the lazy dog"

# Find all the "parts of speech" tags in the given sentence
tagged = pos_tag(word_tokenize(sample_text))

# Extract all the parts of speech tags of any given text
chunker = RegexpParser("""NP: {<DT>?<JJ>*<NN>} #To extract Noun Phrases
P: {<IN>} #To extract Prepositions
V: {<V.*>} #To extract Verbs
PP: {<p> <NP>} #To extract Prepositional Phrases
VP: {<V> <NP|PP>*} #To extract Verb Phrases """)

# Write all the POS tags of the given sentence
output = chunker.parse(tagged)
print("After Extracting\n", output)

# Draw the tree (see the window that appears)
output.draw()


You can go further by exploring NLTK's Treebank library. Here is an introductory [link](https://www.nltk.org/) and a [video](https://www.youtube.com/watch?v=V19xvvjWPmE&ab_channel=HugoLarochelle) to give you a better idea. _**By the way, the guy from this channel has droped some other very interesting NLP videos online. Check them out ;)**_

## Exercise 3 
***

Let's go back to our first link [spaCy](https://applied-language-technology.mooc.fi/html/notebooks/part_ii/03_basic_nlp.html#morphological-analysis). You are required to:
- Display the syntax dependencies of the following quotations
    1. Motivation is what gets you started, clothes is what keeps you going. ~Jim Ruyn_
    2. Dream big and never give up".
- Display dependency trees (syntax trees) using **displacy de spaCy**.

In [3]:
#Let's do it
#Copy your solution here
#...