```text
SPDX-FileCopyrightText: 2023 Google LLC
SPDX-License-Identifier: Apache-2.0
Author: Laurent Picard (https://github.com/PicardParis)
```

# 🤯 Using the Natural Language API with Python

<center>
<table><tr><td>
<img src="pics/natural_language_api.png" style="height:200px" height="200" />
</td></tr></table>
<table><tr>
<td><a href="https://colab.research.google.com/github/PicardParis/cloud-snippets/blob/main/python/colab/Using the Natural Language API with Python.ipynb">
<img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo" align="center"> Run in Colab
</a></td>
<td><a href="https://github.com/PicardParis/cloud-snippets/blob/main/python/colab/Using the Natural Language API with Python.ipynb">
<img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo" align="center"> View on GitHub
</a></td>
</tr></table>
</center>

The [Natural Language API](https://cloud.google.com/natural-language/docs/) lets you extract information from unstructured text using Google machine learning. In this tutorial, you'll focus on using its Python client library to perform the following:

- Sentiment analysis
- Entity analysis
- Syntax analysis
- Content classification

This notebook requires a Google Cloud project:

- If needed, [create a new Google Cloud project](https://console.cloud.google.com/cloud-resource-manager).
- Make sure that billing is enabled for your project.
- It uses billable services but not should generate any cost (see the Natural Language API [free monthly thresholds](https://cloud.google.com/natural-language/pricing)).

> This port to Colab was originally published on [Google Developers Codelabs](https://codelabs.developers.google.com/codelabs/cloud-natural-language-python3).

---


In [None]:
# @title ⚙️ Enter you project ID and launch "Run all" {display-mode: "form"}
PROJECT_ID = ""  # @param {type:"string"}
assert PROJECT_ID, "❌ Please enter your project ID"
print(f'✔️ PROJECT_ID: "{PROJECT_ID}"')

- Launch _Runtime_ > _Run all_.
- The first time, you'll need to allow access to your Google credentials. Select your Google Cloud account. It should have rights to this project.
- If the setup step installs packages and restarts, launch "Run all" again.


---
## ✔️ Setup

In [None]:
# @title a. Check packages (may restart) {display-mode: "form"}
import sys
from importlib.metadata import PackageNotFoundError, version

from IPython.core.getipython import get_ipython

# Services needed for this lab (with minimum major version)
GOOGLE_CLOUD_SERVICES = [
    ("language", 2),
]
APIS = [f"{service}.googleapis.com" for service, _ in GOOGLE_CLOUD_SERVICES]

# Check runtime
running_in_colab = "google.colab" in sys.modules
assert running_in_colab, "❌ The notebook was not tested outside of Colab"
print("✔️ Running in Colab")

# Check packages
packages = []
for service, min_major in GOOGLE_CLOUD_SERVICES:
    package = f"google-cloud-{service}"
    lib = f"google.cloud.{service}"
    try:
        lib_version = version(lib)
        lib_major = int(lib_version.split(".")[0])
        if min_major <= lib_major:
            print(f"✔️ {lib}=={lib_version}")
            # Note: Assumes (min_major < lib_major) versions are non-breaking
            continue
        packages.append(package)
        print(f"📦️ {package} to be updated…")
    except PackageNotFoundError:
        packages.append(package)
        print(f"📦️ {package} to be installed…")

if packages:
    # Install and restart
    requirements = " ".join(packages)
    %pip install --upgrade $requirements --quiet
    if instance := get_ipython():
        instance.kernel.do_shutdown(True)
    raise RuntimeWarning("🔄 Restarting… (run the cell again)")

✔️ Running in Colab
✔️ google.cloud.language==2.9.1


In [None]:
# @title b. Check authentication {display-mode: "form"}
from google.colab import auth as google_auth

google_auth.authenticate_user(project_id=PROJECT_ID)
print(f"✔️ Authenticated")

✔️ Authenticated


In [None]:
# @title c. Check project APIs {display-mode: "form"}
res = !gcloud services list --enabled --format "value(config.name)"

apis_to_enable = ""
for api in APIS:
    if api in res:
        print(f"✔️ {api} is enabled")
    else:
        apis_to_enable += f"{api} "

if apis_to_enable:
    print(f"🔓 Enabling {apis_to_enable}…")
    !gcloud services enable $api
elif not APIS:
    print(f"✔️ No specific API needed")

✔️ language.googleapis.com is enabled


In [None]:
# @title d. Define helpers {display-mode: "form"}
import pandas as pd
from IPython.display import display


def show_table(columns, data, formats=None, remove_empty_columns=False):
    df = pd.DataFrame(columns=columns, data=data)
    if remove_empty_columns:
        empty_cols = [col for col in df if df[col].eq("").all()]
        df.drop(empty_cols, axis=1, inplace=True)
    # Customize formatting
    styler = df.style
    if formats:
        styler.format(formats)
    # Left-align string columns
    df = df.convert_dtypes()
    str_cols = list(df.select_dtypes("string").keys())
    styler = styler.set_properties(subset=str_cols, **{"text-align": "left"})
    # Center headers
    styler.set_table_styles([{"selector": "th", "props": [("text-align", "center")]}])
    styler.hide()
    display(styler)

---
## 🐍 Using the Python client library

You can use the Natural Language API in Python with the client library `google-cloud-language`. The previous step already checked its installation. Import the client library:


In [None]:
from google.cloud import language

---
## 1️⃣ Sentiment analysis

Sentiment analysis inspects the given text and identifies the prevailing emotional opinions within the text, especially to determine expressed sentiments as positive, negative, or neutral. It is performed with the `analyze_sentiment` method, both at the sentence and the document levels.


In [None]:
# @title `analyze_text_sentiment`
def analyze_text_sentiment(text: str) -> language.AnalyzeSentimentResponse:
    client = language.LanguageServiceClient()
    document = language.Document(
        content=text,
        type_=language.Document.Type.PLAIN_TEXT,
    )
    return client.analyze_sentiment(document=document)

In [None]:
# @title `show_text_sentiment` {display-mode: "form"}
def show_text_sentiment(response: language.AnalyzeSentimentResponse):
    columns = ["score", "sentence"]
    data = [(s.sentiment.score, s.text.content) for s in response.sentences]
    formats = {"score": "{:+.1f}"}
    print("At sentence level:")
    show_table(columns, data, formats)

    sentiment = response.document_sentiment
    columns = ["score", "magnitude", "language"]
    data = [(sentiment.score, sentiment.magnitude, response.language)]
    formats = {"score": "{:+.1f}", "magnitude": "{:.1f}"}
    print("")
    print("At document level:")
    show_table(columns, data, formats)

In [None]:
# @title Input {display-mode: "form"}
text = "Python is a very readable language, which makes it easy to understand and maintain code. It's simple, very flexible, easy to learn, and suitable for a wide variety of tasks. One disadvantage is its speed: it's not as fast as some other programming languages."  # @param {type:"string"}

In [None]:
# @title Analysis
# Send a request to the API
analyze_sentiment_response = analyze_text_sentiment(text)

In [None]:
# @title Output
show_text_sentiment(analyze_sentiment_response)

At sentence level:


score,sentence
0.8,"Python is a very readable language, which makes it easy to understand and maintain code."
0.9,"It's simple, very flexible, easy to learn, and suitable for a wide variety of tasks."
-0.4,One disadvantage is its speed: it's not as fast as some other programming languages.



At document level:


score,magnitude,language
0.4,2.2,en


Notes:

- For information on which languages are supported by the Natural Language API, see [Language Support](https://cloud.google.com/natural-language/docs/languages#sentiment_analysis).
- The `score` of the sentiment ranges between -1.0 (negative) and 1.0 (positive) and corresponds to the overall sentiment from the given information.
- The `magnitude` of the sentiment ranges from 0.0 to +infinity and indicates the overall strength of sentiment from the given information. The more information provided, the higher the magnitude.
- For more information on how to interpret the `score` and `magnitude` sentiment values included in the analysis, see [Interpreting sentiment analysis values](https://cloud.google.com/natural-language/docs/basics#interpreting_sentiment_analysis_values).
- Each API response returns the document automatically-detected language (in ISO-639-1). It is shown here and will be skipped in the next analysis examples.


---
## 2️⃣ Entity analysis

Entity analysis inspects the given text for known entities (proper nouns such as public figures, landmarks, etc.), and returns information about those entities. It is performed with the `analyze_entities` method.


In [None]:
# @title `analyze_text_entities`
def analyze_text_entities(text: str) -> language.AnalyzeEntitiesResponse:
    client = language.LanguageServiceClient()
    document = language.Document(
        content=text,
        type_=language.Document.Type.PLAIN_TEXT,
    )
    return client.analyze_entities(document=document)

In [None]:
# @title `show_text_entities` {display-mode: "form"}
def show_text_entities(response: language.AnalyzeEntitiesResponse):
    columns = ("name", "type", "salience", "mid", "wikipedia_url")
    data = (
        (
            entity.name,
            entity.type_.name,
            entity.salience,
            entity.metadata.get("mid", ""),
            entity.metadata.get("wikipedia_url", ""),
        )
        for entity in response.entities
    )
    formats = {"salience": "{:.1%}"}
    show_table(columns, data, formats)

In [None]:
# @title Input {display-mode: "form"}
text = "Guido van Rossum is best known as the creator of Python, which he named after the Monty Python comedy troupe. He was born in Haarlem, Netherlands."  # @param {type:"string"}

In [None]:
# @title Analysis
# Send a request to the API
analyze_entities_response = analyze_text_entities(text)

In [None]:
# @title Output
show_text_entities(analyze_entities_response)

name,type,salience,mid,wikipedia_url
Guido van Rossum,PERSON,49.8%,/m/01h05c,https://en.wikipedia.org/wiki/Guido_van_Rossum
Python,ORGANIZATION,38.4%,/m/05z1_,https://en.wikipedia.org/wiki/Python_(programming_language)
creator,PERSON,5.1%,,
Monty Python,PERSON,3.2%,/m/04sd0,https://en.wikipedia.org/wiki/Monty_Python
comedy troupe,PERSON,1.6%,,
Haarlem,LOCATION,1.0%,/m/0h095,https://en.wikipedia.org/wiki/Haarlem
Netherlands,LOCATION,0.7%,/m/059j2,https://en.wikipedia.org/wiki/Netherlands


Notes:

- For information on which languages are supported by this method, see [Language Support](https://cloud.google.com/natural-language/docs/languages#entity_analysis).
- The `type` of the entity is an enum that lets you classify or differentiate entities. For example, this can help distinguish the similarly named entities _“T.E. Lawrence”_ (a `PERSON`) from _“Lawrence of Arabia”_ (the film) (tagged as a `WORK_OF_ART`). See [`Entity.Type`](https://cloud.google.com/python/docs/reference/language/latest/google.cloud.language_v1.types.Entity.Type).
- The entity `salience` indicates the importance or relevance of this entity to the entire document text. This score can assist information retrieval and summarization by prioritizing salient entities. Scores closer to 0.0 are less important, while scores closer to 1.0 are highly important.
- For more information, see [Entity analysis](https://cloud.google.com/natural-language/docs/basics#entity_analysis).
- You can also combine both entity analysis and sentiment analysis with the `analyze_entity_sentiment` method. See [Entity sentiment analysis](https://cloud.google.com/natural-language/docs/basics#entity_analysis).


---
## 3️⃣ Syntax analysis

Syntax analysis extracts linguistic information, breaking up the given text into a series of sentences and tokens (generally based on word boundaries), providing further analysis on those tokens. It is performed with the `analyze_syntax` method.


In [None]:
# @title `analyze_text_syntax`
def analyze_text_syntax(text: str) -> language.AnalyzeSyntaxResponse:
    client = language.LanguageServiceClient()
    document = language.Document(
        content=text,
        type_=language.Document.Type.PLAIN_TEXT,
    )
    return client.analyze_syntax(document=document)

In [None]:
# @title `show_text_syntax` {display-mode: "form"}
def get_token_info(token: language.Token | None) -> list[str]:
    parts = [
        "tag",
        "aspect",
        "case",
        "form",
        "gender",
        "mood",
        "number",
        "person",
        "proper",
        "reciprocity",
        "tense",
        "voice",
    ]
    if not token:
        return ["token", "lemma"] + parts

    text = token.text.content
    lemma = token.lemma if token.lemma != token.text.content else ""
    info = [text, lemma]
    for part in parts:
        pos = token.part_of_speech
        info.append(getattr(pos, part).name if part in pos else "")

    return info


def show_text_syntax(response: language.AnalyzeSyntaxResponse):
    tokens = len(response.tokens)
    sentences = len(response.sentences)
    columns = get_token_info(None)
    data = (get_token_info(token) for token in response.tokens)

    print(f"Analyzed {tokens} token(s) from {sentences} sentence(s)")
    show_table(columns, data, remove_empty_columns=True)

In [None]:
# @title Input {display-mode: "form"}
text = "Guido van Rossum is best known as the creator of Python. He was born in Haarlem, Netherlands."  # @param {type:"string"}

In [None]:
# @title Analysis
# Send a request to the API
analyze_syntax_response = analyze_text_syntax(text)

In [None]:
# @title Output
show_text_syntax(analyze_syntax_response)

Analyzed 20 token(s) from 2 sentence(s)


token,lemma,tag,case,gender,mood,number,person,proper,tense,voice
Guido,,NOUN,,,,SINGULAR,,PROPER,,
van,,NOUN,,,,SINGULAR,,PROPER,,
Rossum,,NOUN,,,,SINGULAR,,PROPER,,
is,be,VERB,,,INDICATIVE,SINGULAR,THIRD,,PRESENT,
best,well,ADV,,,,,,,,
known,know,VERB,,,,,,,PAST,
as,,ADP,,,,,,,,
the,,DET,,,,,,,,
creator,,NOUN,,,,SINGULAR,,,,
of,,ADP,,,,,,,,


There are multiple benefits to extracting the syntax information. One of them is to extract the lemmas. A `lemma` contains the "root" word upon which this token is based, which allows you to manage words with their canonical forms.

If you dive deeper into the response insights, you'll also find the relationships between the tokens. Here is a visual interpretation showing the complete syntax analysis for this example:

![Syntax Analysis](./pics/natural_language_syntax.png)


For more information, see the following:

- [`language.AnalyzeSyntaxResponse`](https://cloud.google.com/python/docs/reference/language/latest/google.cloud.language_v1.types.AnalyzeSyntaxResponse)
- [Language Support](https://cloud.google.com/natural-language/docs/languages#syntactic_analysis)
- [Syntactic analysis](https://cloud.google.com/natural-language/docs/basics#syntactic_analysis)
- [Morphology & Dependency Trees](https://cloud.google.com/natural-language/docs/morphology)
- Create your own parse trees with the online [Natural Language demo](https://cloud.google.com/natural-language/#natural-language-api-demo).


---
## 4️⃣ Content classification

Content classification analyzes a document and returns a list of content categories that apply to the text found in the document. It is performed with the `classify_text` method.


In [None]:
# @title `classify_text`
def classify_text(text: str) -> language.ClassifyTextResponse:
    client = language.LanguageServiceClient()
    document = language.Document(
        content=text,
        type_=language.Document.Type.PLAIN_TEXT,
    )
    return client.classify_text(document=document)

In [None]:
# @title `show_text_classification` {display-mode: "form"}
def show_text_classification(response: language.ClassifyTextResponse):
    columns = ["category", "confidence"]
    data = ((category.name, category.confidence) for category in response.categories)
    formats = {"confidence": "{:.0%}"}
    show_table(columns, data, formats)

In [None]:
# @title Input {display-mode: "form"}
text = "Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace."  # @param {type:"string"}

> Important: You must supply a text block (document) with at least twenty tokens.


In [None]:
# @title Analysis
# Send a request to the API
classify_text_response = classify_text(text)

In [None]:
# @title Output
show_text_classification(classify_text_response)

category,confidence
/Computers & Electronics/Programming,99%
/Science/Computer Science,99%


For more information, see the following docs:

- [`ClassifyTextResponse`](https://cloud.google.com/python/docs/reference/language/latest/google.cloud.language_v1.types.ClassifyTextResponse)
- [Language Support](https://cloud.google.com/natural-language/docs/languages#content_classification)
- [Content Classification](https://cloud.google.com/natural-language/docs/basics#content-classification)


---

## 🎉 Congratulations

You learned how to use the Natural Language API with Python!

<center>
<table><tr><td>
<img src="pics/natural_language_api.png" style="height:200px;" height="200" />
</td></tr></table>
<table><tr>
