# GCP | Natual Language API
This notebook serves as a tool to introduce and setup the Google Cloud Natural Language API. Everything from setting up the Anaconda environment to explaining what the API is doing behind the scences is contained here. The sample code provided by Google is included as a `.py` file for each method. It is unchanged from their given source code as of this writing. All links to source code are included in this notebook. 

## Overview
The links below are to outisde references that were used to build this notebook. Additional references are included on a per-section basis.
* [Google Natual Language Docs](https://cloud.google.com/natural-language/docs/how-to)
* Methods
    * [Analyzing Sentiment](https://cloud.google.com/natural-language/docs/analyzing-sentiment)
    * [Analyzing Entities](https://cloud.google.com/natural-language/docs/analyzing-entities)
    * Analyzing Entity Sentiment
    * Analyzing Syntax
    * Classifying Content
* [Google Cloud | Natural Language API Basics](https://cloud.google.com/natural-language/docs/basics)

## Getting Started
Before getting into the methods, it's best to understand what needs to be done so that Windows can access the python API from google. It's not always intuitive and some steps here may not be necessary, but this is what I did on a fresh install of Windows to get everything working. After installing the necessary files, we will run the standard python imports that are common across the API's methods. There also some settings that are common to the methods. Those have been left in the sample code `.py` files, but are discussed in a section here. The next step after the imports is to set up the Google Cloud authentication. We will then grab some sample text and move on to running the API.

### Basic Setup (Windows)
There is a really good intro to this on the [Google Cloud Python GitHub](https://github.com/googleapis/google-cloud-python/blob/master/language/README.rst)
To use the google cloud sdk with anaconda on windows, run the following lines of code in the Anaconda prompt:
 <br> 
 ```  
conda install -c conda-forge google-cloud-core
conda install -c conda-forge google-api-python-client 
conda install -c conda-forge google-cloud-language 
```

### Python Imports
Each of the methods used in this api requires the same standard imports.

In [1]:
# package import
from google.cloud import language_v1
from google.cloud.language_v1 import enums

### Common Settings
The following parameters are presented here for reference only. Each method has it's own references to this code so that they can each be used independently.


```python 
client = language_v1.LanguageServiceClient()
```
The client is the specific instance of the API that will be performing the request. Initializing the client is where the API checks you usage, access, etc. The authorization, project settings, and billing are all incorperated int this step. 

```python
 # Available types: PLAIN_TEXT, HTML
type_ = enums.Document.Type.PLAIN_TEXT
```

```python
# Optional. If not specified, the language is automatically detected.
# For list of supported languages:
# https://cloud.google.com/natural-language/docs/l anguages
language = "en"
document = {"content": text_content, "type": type_, "language": language}

# Available values: NONE, UTF8, UTF16, UTF32
encoding_type = enums.EncodingType.UTF8
```

### Sample Text Setup
The sample text for this run will come from Seeking Alpha. Though anticipated to part of a web scraping python, the example for now will assume that we were able to scrape, clean and provide a plain text input to the program.

In [2]:
# example text from the sample_text.txt file in the local directory
with open('sample_text.txt') as sample_text:
    text_content = sample_text.read()
#print(text_content)

### Authentication
In order to runt the app, you need to have a service account key available. You can set one up [here](https://console.cloud.google.com/apis/credentials/serviceaccountkey?_ga=2.7810612.-720658759.1575606022&_gac=1.241828278.1575606022.EAIaIQobChMIhO_p55Wg5gIV6B-tBh00MQ5aEAAYASAAEgLlf_D_BwE) or read more about it at this [link](https://cloud.google.com/natural-language/docs/reference/libraries). This is set on the `os.environ` key for `'GOOGLE_APPLICATION_CREDENTIALS'` so that it can be read by any of the methods we use. The script below provides an extra check to make sure that the directory exists in order to prevent the error when running the methods of the API. 

In [3]:
# credentials
import os
path_to_auth = "C:/users/b/OneDrive/Luminous/Luminous-io-3888a66e965f.json"

if (os.path.exists(path_to_auth)):
    os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = path_to_auth
else:
    print('Error: the path to the authetication file does not exist.')


***

## Methods
### Analyzing Sentiment
> Sentiment Analysis inspects the given text and identifies the prevailing emotional opinion within the text, especially to determine a writer's attitude as positive, negative, or neutral. Sentiment analysis is performed through the analyzeSentiment method.

#### Cautions
When first running the app, you may get a `PermissionDenied` error if the app is not enabled for the project ID. The stack trace will provide a URL to enable Cloud Natual Language API for your project. For me, it was this [link](https://console.developers.google.com/apis/library/language.googleapis.com?project=luminous-io)

#### Sample Code

In [4]:
from ex_analyze_sentiment import sample_analyze_sentiment as analyze_sentiment
analyze_sentiment(text_content, language_v1, enums)

Document sentiment score: -0.10000000149011612
Document sentiment magnitude: 10.899999618530273
***
Sentence text: We expect the two higher end models (one 6.1â€³, one 6.7â€³) to include mmWave support, triple camera and World facing 3D sensing, while the lower-end models (one 6.1â€³, one 5.4â€³) will include support for only sub-6 GHz and dual camera (no world-facing 3D sensing).
Sentence sentiment score: -0.6000000238418579
Sentence sentiment magnitude: 0.6000000238418579

Sentence text: I wrote this article myself, and it expresses my own opinions.
Sentence sentiment score: 0.800000011920929
Sentence sentiment magnitude: 0.800000011920929


Language of the text: en


### Interpretting the Output

YOu can read more about the details of waht is happeneding here in the [Natural Language API Basics - Interpreting Sentiment Analysis](https://cloud.google.com/natural-language/docs/basics#interpreting_sentiment_analysis_values). 

In short, the *score* is -1.0 for a negative sentiment and 1.0 for a positive sentiment. The *magnitude* is the overall strength of emmotion - i.e. a low magnitude indicates the document is not overly emmotional.


## Analyzing Entities
> Entity Analysis inspects the given text for known entities (proper nouns such as public figures, landmarks, etc.), and returns information about those entities. Entity analysis is performed with the analyzeEntities method. For information about the types of entities Natural Language API identifies, see the Entity documentation

```analyzeEntities()```
> Finds named entities (currently proper names and common nouns) in the text along with entity types, salience, mentions for each entity, and other properties.

### Types 

<table class="constants responsive" id="Type.ENUM_VALUES-table">
            <thead>
              <tr>
                <th colspan="2">Enums</th>
              </tr>
            </thead>
            <tbody>
              <tr id="Type.ENUM_VALUES.UNKNOWN">
                <td><code class="apitype"><span>UNKNOWN</span></code></td>
                <td>Unknown</td>
              </tr>
              <tr id="Type.ENUM_VALUES.PERSON">
                <td><code class="apitype"><span>PERSON</span></code></td>
                <td>Person</td>
              </tr>
              <tr id="Type.ENUM_VALUES.LOCATION">
                <td><code class="apitype"><span>LOCATION</span></code></td>
                <td>Location</td>
              </tr>
              <tr id="Type.ENUM_VALUES.ORGANIZATION">
                <td><code class="apitype"><span>ORGANIZATION</span></code></td>
                <td>Organization</td>
              </tr>
              <tr id="Type.ENUM_VALUES.EVENT">
                <td><code class="apitype"><span>EVENT</span></code></td>
                <td>Event</td>
              </tr>
              <tr id="Type.ENUM_VALUES.WORK_OF_ART">
                <td><code class="apitype"><span>WORK_OF_ART</span></code></td>
                <td>Artwork</td>
              </tr>
              <tr id="Type.ENUM_VALUES.CONSUMER_GOOD">
                <td><code class="apitype"><span>CONSUMER_GOOD</span></code></td>
                <td>Consumer product</td>
              </tr>
              <tr id="Type.ENUM_VALUES.OTHER">
                <td><code class="apitype"><span>OTHER</span></code></td>
                <td>Other types of entities</td>
              </tr>
              <tr id="Type.ENUM_VALUES.PHONE_NUMBER">
                <td><code class="apitype"><span>PHONE_NUMBER</span></code></td>
                <td><p>Phone number</p><p>The metadata lists the phone number, formatted according to local convention, plus whichever additional elements appear in the text:</p>
<ul>
  <li><code>number</code> - the actual number, broken down into sections as per local convention</li>
  <li><code>national_prefix</code> - country code, if detected</li>
  <li><code>area_code</code> - region or area code, if detected</li>
  <li><code>extension</code> - phone extension (to be dialed after connection), if detected</li>
</ul></td>
              </tr>
              <tr id="Type.ENUM_VALUES.ADDRESS">
                <td><code class="apitype"><span>ADDRESS</span></code></td>
                <td><p>Address</p><p>The metadata identifies the street number and locality plus whichever additional elements appear in the text:</p>
<ul>
  <li><code>street_number</code> - street number</li>
  <li><code>locality</code> - city or town</li>
  <li><code>street_name</code> - street/route name, if detected</li>
  <li><code>postal_code</code> - postal code, if detected</li>
  <li><code>country</code> - country, if detected&lt;</li>
  <li><code>broad_region</code> - administrative area, such as the state, if detected</li>
  <li><code>narrow_region</code> - smaller administrative area, such as county, if detected</li>
  <li><code>sublocality</code> - used in Asian addresses to demark a district within a city, if detected</li>
</ul></td>
              </tr>
              <tr id="Type.ENUM_VALUES.DATE">
                <td><code class="apitype"><span>DATE</span></code></td>
                <td><p>Date</p><p>The metadata identifies the components of the date:</p>
<ul>
  <li><code>year</code> - four digit year, if detected</li>
  <li><code>month</code> - two digit month number, if detected</li>
  <li><code>day</code> - two digit day number, if detected</li>
</ul></td>
              </tr>
              <tr id="Type.ENUM_VALUES.NUMBER">
                <td><code class="apitype"><span>NUMBER</span></code></td>
                <td><p>Number</p><p>The metadata is the number itself.</p></td>
              </tr>
              <tr id="Type.ENUM_VALUES.PRICE">
                <td><code class="apitype"><span>PRICE</span></code></td>
                <td><p>Price</p><p>The metadata identifies the <code>value</code> and <code>currency</code>.</p></td>
              </tr>
            </tbody>
</table>

[source](https://cloud.google.com/natural-language/docs/reference/rest/v1/Entity#Type)

### Sample Code
The sample source code has been altered to include a filder based on sentience and an argument for `show_all` that defaults to `False` but will print all entities and their sentience if set to `True`.

In [None]:
from ex_analyze_entities import sample_analyze_entities as analyze_entities
analyze_entities(text_content)
#analyze_entities(text_content, True)

### Interpreting the Output
From my current understanding of NLP, *salience* appears to be the key metric for determing the relevent content of thhe document. From the [Natural Language API Basics - Entity Analysis](https://cloud.google.com/natural-language/docs/basics#entity_analysis)
> salience indicates the importance or relevance of this entity to the entire document text. This score can assist information retrieval and summarization by prioritizing salient entities. Scores closer to 0.0 are less important, while scores closer to 1.0 are highly important.

The original sample code has been edited for that reason to limit the salience to values greater than 0.10. This value is arbitrary and was run based on this specific text. The sampel code could further be extened to filter the response to include the top 10 entities by salience.

It appears that the reponses are organized by salience, i.e. the first entity in the response is the one the document is most likely about. 