# Dolores Version 01
Parse a natural sentence and store the prsed tree in a graph database.   
Ask a question in natural language, parse the question and retrieve the answer from the graph database.    
Christoph Windheuser, May 24, 2020

## Parse a sentence
Use Google Natural Language Service for this.    
Docu: https://cloud.google.com/natural-language/docs/analyzing-syntax#language-syntax-string-python    
Documentation on the parsing: https://cloud.google.com/natural-language/docs/morphology

In [1]:
from google.cloud import language_v1
from google.cloud.language_v1 import enums

def sample_analyze_syntax(text_content):
    """
    Analyzing Syntax in a String

    Args:
      text_content The text content to analyze
    """

    # client = language_v1.LanguageServiceClient()
    client = language_v1.LanguageServiceClient.from_service_account_json('/Users/cwindheu/Dev/hobby-projects/neobot/env/KnowledgeGraphChatbot-fc85f16b2135.json')
 
    # text_content = 'This is a short sentence.'

    # Available types: PLAIN_TEXT, HTML
    type_ = enums.Document.Type.PLAIN_TEXT

    # Optional. If not specified, the language is automatically detected.
    # For list of supported languages:
    # https://cloud.google.com/natural-language/docs/languages
    language = "en"
    document = {"content": text_content, "type": type_, "language": language}

    # Available values: NONE, UTF8, UTF16, UTF32
    encoding_type = enums.EncodingType.UTF8

    response = client.analyze_syntax(document, encoding_type=encoding_type)
    # Loop through tokens returned from the API
    for token in response.tokens:
        # Get the text content of this token. Usually a word or punctuation.
        text = token.text
        print(u"Token text: {}".format(text.content))
        print(
            u"Location of this token in overall document: {}".format(text.begin_offset)
        )
        # Get the part of speech information for this token.
        # Parts of spech are as defined in:
        # http://www.lrec-conf.org/proceedings/lrec2012/pdf/274_Paper.pdf
        part_of_speech = token.part_of_speech
        # Get the tag, e.g. NOUN, ADJ for Adjective, et al.
        print(
            u"Part of Speech tag: {}".format(
                enums.PartOfSpeech.Tag(part_of_speech.tag).name
            )
        )
        # Get the voice, e.g. ACTIVE or PASSIVE
        print(u"Voice: {}".format(enums.PartOfSpeech.Voice(part_of_speech.voice).name))
        # Get the tense, e.g. PAST, FUTURE, PRESENT, et al.
        print(u"Tense: {}".format(enums.PartOfSpeech.Tense(part_of_speech.tense).name))
        # See API reference for additional Part of Speech information available
        # Get the lemma of the token. Wikipedia lemma description
        # https://en.wikipedia.org/wiki/Lemma_(morphology)
        print(u"Lemma: {}".format(token.lemma))
        # Get the dependency tree parse information for this token.
        # For more information on dependency labels:
        # http://www.aclweb.org/anthology/P13-2017
        dependency_edge = token.dependency_edge
        print(u"Head token index: {}".format(dependency_edge.head_token_index))
        print(
            u"Label: {}".format(enums.DependencyEdge.Label(dependency_edge.label).name)
        )
        print ()

    # Get the language of the text, which will be the same as
    # the language specified in the request or, if not specified,
    # the automatically-detected language.
    print(u"Language of the text: {}".format(response.language))


In [2]:
sample_analyze_syntax ("John has a green car.")


Token text: John
Location of this token in overall document: 0
Part of Speech tag: NOUN
Voice: VOICE_UNKNOWN
Tense: TENSE_UNKNOWN
Lemma: John
Head token index: 1
Label: NSUBJ

Token text: has
Location of this token in overall document: 5
Part of Speech tag: VERB
Voice: VOICE_UNKNOWN
Tense: PRESENT
Lemma: have
Head token index: 1
Label: ROOT

Token text: a
Location of this token in overall document: 9
Part of Speech tag: DET
Voice: VOICE_UNKNOWN
Tense: TENSE_UNKNOWN
Lemma: a
Head token index: 4
Label: DET

Token text: green
Location of this token in overall document: 11
Part of Speech tag: ADJ
Voice: VOICE_UNKNOWN
Tense: TENSE_UNKNOWN
Lemma: green
Head token index: 4
Label: AMOD

Token text: car
Location of this token in overall document: 17
Part of Speech tag: NOUN
Voice: VOICE_UNKNOWN
Tense: TENSE_UNKNOWN
Lemma: car
Head token index: 1
Label: DOBJ

Token text: .
Location of this token in overall document: 20
Part of Speech tag: PUNCT
Voice: VOICE_UNKNOWN
Tense: TENSE_UNKNOWN
Lemma: .
