## Invoking an ML API

This notebook demonstrates how to invoke a deployed ML model (in this case, the Google Cloud Natural Language API)
from a batch or streaming pipeline

We will use Apache Beam.

### Install Beam

Restart the kernel after installing Beam

In [3]:
%pip install --upgrade --quiet apache-beam[gcp]

Note: you may need to restart the kernel to use updated packages.


## Try out Beam

In [1]:
!rm -rf output.txt* beam-temp*

In [119]:
import apache_beam as beam
from apache_beam.ml.gcp import naturallanguageml as nlp

def parse_nlp_result(response):
    """
Pulls required info from a response that looks like this:

sentences {
  text {
    content: "I love walking along the Seine."
  }
  sentiment {
    magnitude: 0.699999988079071
    score: 0.699999988079071
  }
}
entities {
  name: "Seine"
  type: LOCATION
  metadata {
    key: "mid"
    value: "/m/0f3vz"
  }
  metadata {
    key: "wikipedia_url"
    value: "https://en.wikipedia.org/wiki/Seine"
  }
  salience: 1.0
  mentions {
    text {
      content: "Seine"
      begin_offset: 25
    }
    type: PROPER
  }
}
document_sentiment {
  magnitude: 0.699999988079071
  score: 0.699999988079071
}
language: "en"
    """
    def get_entity_value(entities, search_key):
        for entity in entities:
            return (entity.metadata[search_key])
        return ''
    
    return [
        # response, # entire string
        response.sentences[0].text.content, # first sentence
        [entity.name for entity in response.entities], # all entities
        [entity.metadata['wikipedia_url'] for entity in response.entities], # urls
        response.language,
        response.document_sentiment.score
    ]


features = nlp.types.AnnotateTextRequest.Features(
    extract_entities=True,
    extract_document_sentiment=True,
    extract_syntax=False
)

p = beam.Pipeline()
(p 
 | beam.Create(['Has President Obama been to Paris?', 'Sophie loves walking along the Seine.', "C'est terrible"])
 | beam.Map(lambda x : nlp.Document(x, type='PLAIN_TEXT'))
 | nlp.AnnotateText(features)
 | beam.Map(parse_nlp_result)
 | beam.io.WriteToText('output.txt')
)
result = p.run()
result.wait_until_finish()

'DONE'

In [120]:
!cat output.txt*

['Has President Obama been to Paris?', ['Obama', 'Paris'], ['https://en.wikipedia.org/wiki/Barack_Obama', 'https://en.wikipedia.org/wiki/Paris'], 'en', 0.0]
["C'est terrible", [], [], 'fr', -0.8999999761581421]
['Sophie loves walking along the Seine.', ['Sophie', 'Seine'], ['', 'https://en.wikipedia.org/wiki/Seine'], 'en', 0.800000011920929]


## Changing input to BigQuery and running on Cloud

Use DataflowRunner

In [17]:
%%bigquery
SELECT text FROM `bigquery-public-data.hacker_news.comments`
WHERE author = 'AF' LIMIT 10

Unnamed: 0,text
0,"I think there's a major problem with this, and..."
1,"Speaking of Rails, there are other options in ..."
2,I don't see the point in this as a serious pro...
3,"Nope. It is a nice package, but there's too ma..."
4,I'll be perfectly honest: what is popular on D...
5,Also keep in mind that Python has much better ...
6,"I'm just wondering, what specifically about th..."
7,Just get a basic knowledge of each of them.<p>...
8,Haven't we already discussed this? Google is n...
9,The general idea is interesting and possibly e...


In [74]:
%%writefile nlp_pipeline.py

PROJECT='ai-analytics-solutions'
BUCKET='ai-analytics-solutions-kfpdemo'
REGION='us-central1'

from datetime import datetime
import apache_beam as beam

def parse_nlp_result(response):
    return [
        # response, # entire string
        response.sentences[0].text.content,
        response.language,
        response.document_sentiment.score
    ]

def run():
    from apache_beam.ml.gcp import naturallanguageml as nlp
    
    features = nlp.types.AnnotateTextRequest.Features(
        extract_entities=True,
        extract_document_sentiment=True,
        extract_syntax=False
    )
    options = beam.options.pipeline_options.PipelineOptions()
    google_cloud_options = options.view_as(beam.options.pipeline_options.GoogleCloudOptions)
    google_cloud_options.project = PROJECT
    google_cloud_options.region = REGION
    google_cloud_options.job_name = 'nlpapi-{}'.format(datetime.now().strftime("%Y%m%d-%H%M%S"))
    google_cloud_options.staging_location = 'gs://{}/staging'.format(BUCKET)
    google_cloud_options.temp_location = 'gs://{}/temp'.format(BUCKET)
    options.view_as(beam.options.pipeline_options.StandardOptions).runner = 'DataflowRunner' # 'DirectRunner'

    p = beam.Pipeline(options=options)
    (p 
     | 'bigquery' >> beam.io.Read(beam.io.BigQuerySource(
         query="SELECT text FROM `bigquery-public-data.hacker_news.comments` WHERE author = 'AF' AND LENGTH(text) > 10",
         use_standard_sql=True))
      | 'txt'      >> beam.Map(lambda x : x['text'])
      | 'doc'      >> beam.Map(lambda x : nlp.Document(x, type='PLAIN_TEXT'))
    #  | 'todict'   >> beam.Map(lambda x : nlp.Document.to_dict(x))
      | 'nlp'      >> nlp.AnnotateText(features, timeout=10)
      | 'parse'    >> beam.Map(parse_nlp_result)
      | 'gcs'      >> beam.io.WriteToText('gs://{}/output.txt'.format(BUCKET), num_shards=1)
    )
    result = p.run()
    result.wait_until_finish()

if __name__ == '__main__':
    run()

Overwriting nlp_pipeline.py


In [75]:
!python3 nlp_pipeline.py

In [76]:
!gsutil cat gs://$BUCKET/output.txt*

["I think there's a major problem with this, and it is that discussions come in all shapes and sizes across the web.<p>Think about it.", 'en', -0.20000000298023224]
['It is just a joke that Facebook could be valued at $6 billion.', 'en', -0.5]
["This article doesn't make too much sense to me.<p>First of all, as Sam mentioned, the companies aren't that much different in size.", 'en', -0.4000000059604645]
['&#62; If only a real Lisp would come along that had the leadership, library, community and documentation of Python.<p>Someone was asking for that just the other day.', 'en', 0.0]
["If the original poster has his service out when he does this, it could end up being both very bad PR for her and very good PR for him.<p>In fact, her ripping off his idea could end up being better publicity than he would've gotten anywhere else.", 'en', -0.699999988079071]
["I've had my share of bad experiences with Apple products/computers.", 'en', 0.0]
['Of course.', 'en', 0.0]
["I don't think so.", 'en',

Copyright 2020 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License