# Google's Natural Language service

## Google's AI and machine learning products
[Google's AI and machine learning products](https://cloud.google.com/products/ai/?tab=tab2)
+ **AI Hub**, hosted repository of plug-and-play AI components, encourages experimentation and collaboration within an organization.
+ **AI building blocks** make it easy for developers to add some AI to their applications.
+ **AI Platform**, code-based data science development environment, lets ML developers and data scientists quickly take projects from ideation to deployment.

In this section, we are interested in the ["AI building blocks"](https://cloud.google.com/products/ai/building-blocks/), which consists of 4 categories:
+ Sight: Vision, Video
+ Language: Translation, Natural Language
+ Conversation: dialogflow, Cloud Text-to-Speech API,Cloud Speech-to-Text API
+ Structured data: AutoML Tables, Recommendations AI, Cloud Inference API

More precisely, we will discover the **Natural Language** service (API and autoML) in the category "Language", and some **Vision API** (object detection).





## Natural Language

**Natural Language** service uses Google machine learning to reveal the structure and meaning of text. We can extract information about people, places, and events, and better understand social media sentiment and customer conversations. **Natural Language** enables us to analyze text and also integrate it with our document storage on Google Cloud Storage. 

In the service, Google introduces AutoML Natural Language and Natural Language API, and we are interested in the later one.

**Task 1:** Try the demo of the [**Natural Language** ](https://cloud.google.com/natural-language/)service and observe the results. What do you think about the quality of the analysis?

When we try out some text, the API demo produces some analyses: 
+ **Entity Analysis** provides information about entities in the text, which generally refer to named "things" such as famous individuals, landmarks, common objects, etc. There are proper nouns (specific people, place, organization, etc.) and common nouns. A good general practice to follow is that if something is a noun, it qualifies as an "entity." For each entity, we have:
  + its `type` (location, person, other, etc.) 
  + a `saliance` score, indicating its relevance to the text. Its value is between 0 and 1, where 1 means highly important.
  + some `metadata` which contains source information about the entity's knowledge repository
  + `mentions` indicating offset positions within the text where it is mentioned. 
+ **Sentiment analysis** attempts to determine the attitudes (positive or negative) expressed within: the entire document, each paragraph, and each entity. Sentiment is represented by `score` and `magnitude` values. `Score` ranges from -1 (very negative) to +1 (very positive). `Magnitude` indicates the overall strength of emotion (both positive and negative) within the given text. Unlike `score`, `magnitude` is not normalized. Therefore, longer text may have greater `magnitude`.
+ **Syntactic analysis** provides a powerful set of tools for analyzing and parsing text. The text is divided into sentences, tokens (usually a word); tokens' parts of speech and dependencies are determined; etc.
+ **Content Classification** returns a list of content categories that apply to the text.
  

  **Guide:**
+ Create a service account.
+ [Download a private key as JSON](https://console.cloud.google.com/apis/credentials/serviceaccountkey). Upload the key (if using Colab). (another [guide](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#iam-service-account-keys-create-console))
+ Passing the key directly, like
`client = language.LanguageServiceClient("/content/My First Project-f1aac12d8d07.json")`, [won't work](https://github.com/googleapis/google-cloud-python/issues/5349).




## First text analysis

From [overview page](https://cloud.google.com/natural-language/) of Natural Language, we can select "Get started" then "Natural Language API" to jump to the [documentation site](https://cloud.google.com/natural-language/docs/quickstarts).

**Task 2:** Use Cloud Natural Language API (provided by Google Cloud Client Libraries in Python) to analyze your first text. You may want to use this [Quickstart](https://cloud.google.com/natural-language/docs/quickstart-client-libraries), the documentation, etc. To use this service, you will need to create or choose a project with a bank accout. You may also need to create a private key (JSON).


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [None]:
!pip install --upgrade google-cloud-language

Collecting google-cloud-language
[?25l  Downloading https://files.pythonhosted.org/packages/ba/b8/965a97ba60287910d342623da1da615254bded3e0965728cf7fc6339b7c8/google_cloud_language-1.3.0-py2.py3-none-any.whl (83kB)
[K     |████                            | 10kB 17.6MB/s eta 0:00:01[K     |███████▉                        | 20kB 1.8MB/s eta 0:00:01[K     |███████████▊                    | 30kB 2.6MB/s eta 0:00:01[K     |███████████████▊                | 40kB 1.7MB/s eta 0:00:01[K     |███████████████████▋            | 51kB 2.1MB/s eta 0:00:01[K     |███████████████████████▌        | 61kB 2.5MB/s eta 0:00:01[K     |███████████████████████████▌    | 71kB 2.9MB/s eta 0:00:01[K     |███████████████████████████████▍| 81kB 3.3MB/s eta 0:00:01[K     |████████████████████████████████| 92kB 2.7MB/s 
Installing collected packages: google-cloud-language
  Found existing installation: google-cloud-language 1.2.0
    Uninstalling google-cloud-language-1.2.0:
      Successfully unin

In [None]:
# Imports the Google Cloud client library
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types
import json

# Instantiates a client
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file("/content/drive/My Drive/My_First_Project-599c9a871c2b.json")

client = language.LanguageServiceClient(credentials=credentials)

# The text to analyze
text = u'Hello, world!'
document = types.Document(
    content=text,
    type=enums.Document.Type.PLAIN_TEXT)

# Detects the sentiment of the text
sentiment = client.analyze_sentiment(document=document).document_sentiment

print('Text: {}'.format(text))
print('Sentiment: {}, {}'.format(sentiment.score, sentiment.magnitude))

Text: Hello, world!
Sentiment: 0.30000001192092896, 0.30000001192092896


##Sentiment analysis

**Task 3:** 
+ Download this [data](https://www.kaggle.com/crowdflower/twitter-airline-sentiment) for sentiment analysis from Kaggle. Read the data description for more understanding.

+ Use the model of Google to analyse 10 first tweets. Compare the results the given sentiments, are they different? what is your opinion?





In [None]:
import zipfile
with zipfile.ZipFile('/content/drive/My Drive/Toptal/twitter-airline-sentiment.zip', 'r') as zip_ref:
    zip_ref.extractall('/content/')

In [None]:
import pandas as pd
data = pd.read_csv('/content/Tweets.csv')

In [None]:
data.head()

Unnamed: 0,tweet_id,airline_sentiment,airline_sentiment_confidence,negativereason,negativereason_confidence,airline,airline_sentiment_gold,name,negativereason_gold,retweet_count,text,tweet_coord,tweet_created,tweet_location,user_timezone
0,570306133677760513,neutral,1.0,,,Virgin America,,cairdin,,0,@VirginAmerica What @dhepburn said.,,2015-02-24 11:35:52 -0800,,Eastern Time (US & Canada)
1,570301130888122368,positive,0.3486,,0.0,Virgin America,,jnardino,,0,@VirginAmerica plus you've added commercials t...,,2015-02-24 11:15:59 -0800,,Pacific Time (US & Canada)
2,570301083672813571,neutral,0.6837,,,Virgin America,,yvonnalynn,,0,@VirginAmerica I didn't today... Must mean I n...,,2015-02-24 11:15:48 -0800,Lets Play,Central Time (US & Canada)
3,570301031407624196,negative,1.0,Bad Flight,0.7033,Virgin America,,jnardino,,0,@VirginAmerica it's really aggressive to blast...,,2015-02-24 11:15:36 -0800,,Pacific Time (US & Canada)
4,570300817074462722,negative,1.0,Can't Tell,1.0,Virgin America,,jnardino,,0,@VirginAmerica and it's a really big bad thing...,,2015-02-24 11:14:45 -0800,,Pacific Time (US & Canada)


In [None]:
scores = []
magnitudes = []
for text in data['text'][:10]:
  document = types.Document(
    content=text,
    type=enums.Document.Type.PLAIN_TEXT)
  sentiment = client.analyze_sentiment(document=document).document_sentiment
  scores.append(sentiment.score)
  magnitudes.append(sentiment.magnitude)

In [None]:
pd.DataFrame({'given_sentiment':data['airline_sentiment'][:10],
             'given_confidence':data['airline_sentiment_confidence'][:10],
             'predicted_score':pd.Series(scores)})

Unnamed: 0,given_sentiment,given_confidence,predicted_score
0,neutral,1.0,0.0
1,positive,0.3486,-0.2
2,neutral,0.6837,0.0
3,negative,1.0,-0.9
4,negative,1.0,-0.8
5,negative,1.0,0.1
6,positive,0.6745,0.2
7,neutral,0.634,0.2
8,positive,0.6559,0.5
9,positive,1.0,0.6


## Content classification

**Task 4:** 
1. Find and download a dataset for text classification. 
1. Choose several texts, then apply classification API. Have you obtained good results?
1. Can you use the classification API for a specific text classfication problem?


In [None]:
import urllib.request

url = 'http://qwone.com/~jason/20Newsgroups/20news-19997.tar.gz'
urllib.request.urlretrieve(url, '/content/20news-19997.tar.gz')

('/content/20news-19997.tar.gz', <http.client.HTTPMessage at 0x7f3ab4ae6c50>)

In [None]:
import tarfile
tar = tarfile.open('/content/20news-19997.tar.gz')
tar.extractall()
tar.close()

In [None]:
import six
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types

client = language.LanguageServiceClient(credentials=credentials)

import glob
texts_paths = glob.glob('/content/20_newsgroups/comp.graphics/*')[:5]
print(texts_paths)

for i in range(len(texts_paths)):
  # method 1
  days_file = open(texts_paths[i],'r')
  text = days_file.read()

  document = types.Document(
      content=text.encode('utf-8'),
      type=enums.Document.Type.PLAIN_TEXT)

  categories = client.classify_text(document).categories
  
  print(u'=' * 80)
  print('Document:', texts_paths[i])
  for category in categories:
      print(u'=' * 20)
      print(u'{:<16}: {}'.format('name', category.name))
      print(u'{:<16}: {}'.format('confidence', category.confidence))

['/content/20_newsgroups/comp.graphics/38918', '/content/20_newsgroups/comp.graphics/38442', '/content/20_newsgroups/comp.graphics/38971', '/content/20_newsgroups/comp.graphics/38584', '/content/20_newsgroups/comp.graphics/38543']
Document: /content/20_newsgroups/comp.graphics/38918
name            : /Jobs & Education/Education
confidence      : 0.5199999809265137
Document: /content/20_newsgroups/comp.graphics/38442
name            : /Computers & Electronics
confidence      : 0.7599999904632568
name            : /Science/Computer Science
confidence      : 0.6800000071525574
Document: /content/20_newsgroups/comp.graphics/38971
name            : /Arts & Entertainment
confidence      : 0.800000011920929
Document: /content/20_newsgroups/comp.graphics/38584
Document: /content/20_newsgroups/comp.graphics/38543
name            : /Computers & Electronics/Software
confidence      : 0.5


## AutoML Natural Language

**AutoML** services allow us to adapt Google models to our problems.

**Task 5:** 
+ Build a custom classification model using the Cloud AutoML Natural Language. You might use this [Quickstart](https://cloud.google.com/natural-language/automl/docs/quickstart). 
+ When the training is finished, see if you get good test performance. 
+ Apply the model to classify the text below using Python. Print the labels and the scores.

**Notice:** 
+ You shouldn't use lots of data because uploading and training is quite time-consuming. Training with a data of 1000 first examples of the data in the Quickstart may take 4 hours (the training time is not proportional to the number of examples). 
+ Proceed the next tasks while the model is trained on the cloud.


In [None]:
# text to test
content = 'I measured my weight and found to be 1 pound lesser than the earlier day'

In [None]:
!pip install google-cloud-automl

Collecting google-cloud-automl
[?25l  Downloading https://files.pythonhosted.org/packages/6d/83/13ec95d3689b53586f72f5e81c253eb11e43ef5219ba55b7d7495ea86324/google_cloud_automl-0.10.0-py2.py3-none-any.whl (372kB)
[K     |████████████████████████████████| 378kB 2.8MB/s 
Installing collected packages: google-cloud-automl
Successfully installed google-cloud-automl-0.10.0


In [None]:
import sys

from google.cloud import automl_v1beta1
from google.cloud.automl_v1beta1.proto import service_pb2


from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file('/content/drive/My Drive/My_First_Project-599c9a871c2b.json')

def get_prediction(content, project_id, model_id, credentials):
  prediction_client = automl_v1beta1.PredictionServiceClient(credentials=credentials)

  name = 'projects/{}/locations/us-central1/models/{}'.format(project_id, model_id)
  payload = {'text_snippet': {'content': content, 'mime_type': 'text/plain' }}
  params = {}
  request = prediction_client.predict(name, payload, params)
  return request  # waits till request is returned

project_id = 'vocal-unfolding-268915'
model_id = 'TCN1454844679040226528'

pred=get_prediction(content, project_id,  model_id, credentials)

for i in pred.payload:
  print('Name:',i.display_name+'; score:', i.classification.score)

NotFound: ignored

## Entity Analysis

**Task 6:** Use the Google API to find the entities in the text below. For each entity, print its name, type, salience and wikipedia url. 

In [None]:
# Text to analyse
text = 'The name machine learning was coined in 1959 by \
Arthur Samuel. Tom M. Mitchell provided a widely quoted, \
more formal definition of the algorithms studied in the machine \
learning field: "A computer program is said to learn from experience \
E with respect to some class of tasks T and performance measure P if \
its performance at tasks in T, as measured by P, improves with experience E."'

In [None]:
import six
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types

client = language.LanguageServiceClient(credentials=credentials)

if isinstance(text, six.binary_type):
    text = text.decode('utf-8')

# Instantiates a plain text document.
document = types.Document(
    content=text,
    type=enums.Document.Type.PLAIN_TEXT)

# Detects entities in the document. You can also analyze HTML with:
#   document.type == enums.Document.Type.HTML
entities = client.analyze_entities(document).entities

for entity in entities:
    entity_type = enums.Entity.Type(entity.type)
    print('=' * 20)
    print(u'{:<16}: {}'.format('name', entity.name))
    print(u'{:<16}: {}'.format('type', entity_type.name))
    print(u'{:<16}: {}'.format('salience', entity.salience))
    print(u'{:<16}: {}'.format('wikipedia_url',
          entity.metadata.get('wikipedia_url', '-')))

name            : name machine learning
type            : OTHER
salience        : 0.3405567407608032
wikipedia_url   : -
name            : T
type            : OTHER
salience        : 0.09719357639551163
wikipedia_url   : -
name            : performance measure P
type            : OTHER
salience        : 0.09703781455755234
wikipedia_url   : -
name            : Arthur Samuel
type            : PERSON
salience        : 0.07251947373151779
wikipedia_url   : https://en.wikipedia.org/wiki/Arthur_Samuel
name            : experience
type            : OTHER
salience        : 0.04233669862151146
wikipedia_url   : -
name            : machine learning field
type            : LOCATION
salience        : 0.040550436824560165
wikipedia_url   : -
name            : computer program
type            : OTHER
salience        : 0.040550436824560165
wikipedia_url   : -
name            : algorithms
type            : OTHER
salience        : 0.040059760212898254
wikipedia_url   : -
name            : respect
type

## Google Vision API

[Cloud Vision API](https://cloud.google.com/vision/) allows developers to easily integrate vision detection features within applications, including image labeling, face and landmark detection, optical character recognition (OCR), and tagging of explicit content.

#### List of all Cloud Vision API features:
1. Face detection
1. Landmark detection
1. Logo detection
1. Label detection (Provides generalized labels for an image)
1. Text detection
1. Document text detection (dense text / handwriting)
1. Image properties
1. Object localization
1. Crop hint detection
1. Web entities and pages

[Try it!](https://cloud.google.com/vision/docs/drag-and-drop)

**Task 7:** Detect objects in the [image](https://upload.wikimedia.org/wikipedia/commons/1/14/Animal_diversity.png) below using Google API and comment the result.

![](https://upload.wikimedia.org/wikipedia/commons/1/14/Animal_diversity.png)



In [None]:
!pip install --upgrade google-cloud-vision

Collecting google-cloud-vision
[?25l  Downloading https://files.pythonhosted.org/packages/eb/23/6d5a728333ce568fb484d0d7edd0b7c04b16cf6325af31d957eb51ed077d/google_cloud_vision-0.42.0-py2.py3-none-any.whl (435kB)
[K     |████████████████████████████████| 440kB 2.7MB/s 
Installing collected packages: google-cloud-vision
Successfully installed google-cloud-vision-0.42.0


In [None]:
import urllib.request

url = "https://upload.wikimedia.org/wikipedia/commons/1/14/Animal_diversity.png"
urllib.request.urlretrieve(url, '/content/animals.png')

('/content/animals.png', <http.client.HTTPMessage at 0x7f053f7017f0>)

In [None]:
import io
import os

# Imports the Google Cloud client library
from google.cloud import vision
from google.cloud.vision import types

client = vision.ImageAnnotatorClient(credentials=credentials)

file_name = '/content/animals.png' # Loads the image into memory
with io.open(file_name, 'rb') as image_file:
    content = image_file.read()



In [None]:
image = types.Image(content=content)

In [None]:
print('='*40)
print('Label detection:')
response = client.label_detection(image=image)
labels = response.label_annotations

print('Labels:')
for label in labels:
    print(label.description)

# Multiple oject detection
print('='*40)
print('Multiple oject detection:')
path =  '/content/animals.png'
with open(path, 'rb') as image_file:
    content = image_file.read()
image = vision.types.Image(content=content)

objects = client.object_localization(
    image=image).localized_object_annotations

print('Number of objects found: {}'.format(len(objects)))
for object_ in objects:
    print('\n{} (confidence: {})'.format(object_.name, object_.score))
    print('Normalized bounding polygon vertices: ')
    for vertex in object_.bounding_poly.normalized_vertices:
        print(' - ({}, {})'.format(vertex.x, vertex.y))

Label detection:
Labels:
Bengal tiger
Wildlife
Organism
Tiger
Graphic design
Collage
Adaptation
Illustration
Photography
Font
Multiple oject detection:
Number of objects found: 10

Tiger (confidence: 0.8511397838592529)
Normalized bounding polygon vertices: 
 - (0.3848460614681244, 0.33132120966911316)
 - (0.6488093733787537, 0.33132120966911316)
 - (0.6488093733787537, 0.49733075499534607)
 - (0.3848460614681244, 0.49733075499534607)

Animal (confidence: 0.8192296028137207)
Normalized bounding polygon vertices: 
 - (0.6680896878242493, 0.5144973397254944)
 - (0.9220726490020752, 0.5144973397254944)
 - (0.9220726490020752, 0.6526311039924622)
 - (0.6680896878242493, 0.6526311039924622)

Animal (confidence: 0.8099767565727234)
Normalized bounding polygon vertices: 
 - (0.6845981478691101, 0.33075228333473206)
 - (0.9412550926208496, 0.33075228333473206)
 - (0.9412550926208496, 0.5037023425102234)
 - (0.6845981478691101, 0.5037023425102234)

Animal (confidence: 0.7711507678031921)
Normal

## Extra tasks
+ Build a model using "AutoML Vision".
+ Try similar tasks with other services.
+ Use other programming languages.