# Quotes Pipeline

<img src="./Quote-Sentiment-Pipeline.jpg">

## Building the Quote Fetcher Cloud Function

Steps are modified from GCP docs tutorial [Using Pub/Sub to trigger a Cloud Function](https://cloud.google.com/scheduler/docs/tut-pub-sub) along with the Quick Start example for [Functions Framework GitHub README](https://github.com/GoogleCloudPlatform/functions-framework-python)

Steps:

1) Create Pub/Sub topic to write quotes to from Quote Fetcher Cloud Function

2) Create Quote Fetcher Cloud Function 

3) Create Pub/Sub topic to trigger Quote Fetcher Cloud Function

4) Create Cloud Scheduler job to invoke Pub/Sub topic


Create a pubsub topic for the Quote Fetcher cloud function to publish quotes to.

In [48]:
! gcloud pubsub topics create quotes

Created topic [projects/qwiklabs-gcp-04-2ad6a04dc593/topics/quotes].


In [11]:
%%bash

if [ ! -d quote_fetcher ]
then
  echo "creating quote_fetcher directory"
  mkdir quote_fetcher
fi

if [ ! -d pubsub_schd ]
then
  echo "creating pubsub_schd directory"
  mkdir pubsub_schd
fi

creating pubsub_schd directory


Write requirements.txt for Quote Fetcher Cloud Function

In [2]:
%%writefile ./quote_fetcher/requirements.txt

requests>=2.26.0,<2.27.0
beautifulsoup4>=4.9.3,<4.10.0
pydantic>=1.8.2,<1.9.0
google-cloud-language>=2.2.2,<2.3.0

Overwriting ./quote_fetcher/requirements.txt


Write Quote Fetcher Cloud Function source code

In [54]:
%%writefile ./quote_fetcher/main.py
"""
Cloud Function to fetch quotes from quotes.toscrape.com/random 
and publish them to PubSub
"""

import json
import os
import typing

import requests

from bs4 import BeautifulSoup
from google.cloud import language_v1, pubsub_v1

from pydantic import BaseModel

PROJECT_ID = os.environ['PROJECT_ID']
TOPIC_ID = os.environ['TOPIC_ID']


class Quote(BaseModel):
    text : str
    author : str
    tags : typing.Sequence[str]
    sentiment : typing.Optional[float]
    magnitude : typing.Optional[float]
    
    def calc_sentiment(self):
        client = language_v1.LanguageServiceClient()
        doc = {
          'content': self.text,
          'type_': language_v1.Document.Type.PLAIN_TEXT,
          'language': 'en' 
        }
        
        request = {
            'document': doc,
            'encoding_type': language_v1.EncodingType.UTF8
        }
        response = client.analyze_sentiment(request)
        
        self.sentiment = response.document_sentiment.score
        self.magnitude = response.document_sentiment.magnitude
        

def fetch_quote(events, context):
    quote_url = 'https://quotes.toscrape.com/random'

    response = requests.get(quote_url)

    soup = BeautifulSoup(response.content, 'html.parser')

    quote_el = soup.find('div', class_='quote')

    quote = Quote(
        text=quote_el.find('span', class_='text').get_text(),
        author=quote_el.find('small', class_='author').get_text(),
        tags=[el.get_text() for el in quote_el.find_all('a', class_='tag')]
    )

    quote.calc_sentiment()
    
    # TODO: publish to pubsub topic
    
    quote_data = quote.dict()
    print("PROJECT_ID " + PROJECT_ID)
    print("TOPIC_ID " + TOPIC_ID)
    print(quote_data)
    
    publisher = pubsub_v1.PublisherClient()
    topic_path = publisher.topic_path(PROJECT_ID, TOPIC_ID)
    publisher.publish(topic_path, json.dumps(quote_data).encode('utf-8'))
    
    return quote_data

Overwriting ./quote_fetcher/main.py


In [55]:
%%writefile ./quote_fetcher/deploy-cloud-function.sh

#!/bin/bash

if [ -d quote_fetcher ]
then
  cd quote_fetcher
fi

set -ex

PROJECT_ID=$(gcloud config get-value project)
TOPIC_ID=quotes

gcloud functions deploy quote_fetcher \
  --set-env-vars PROJECT_ID=$PROJECT_ID,TOPIC_ID=$TOPIC_ID \
  --entry-point fetch_quote \
  --runtime python37 \
  --trigger-topic quote-fetcher-topic

Overwriting ./quote_fetcher/deploy-cloud-function.sh


Deploy the Quote Fetcher Cloud Function 

In [56]:
%%bash

chmod +x quote_fetcher/deploy-cloud-function.sh

./quote_fetcher/deploy-cloud-function.sh

availableMemoryMb: 256
buildId: 26ae3d5e-cd07-47f4-9d22-fb2e11e21dbd
buildName: projects/774131484409/locations/us-central1/builds/26ae3d5e-cd07-47f4-9d22-fb2e11e21dbd
entryPoint: fetch_quote
environmentVariables:
  PROJECT_ID: qwiklabs-gcp-04-2ad6a04dc593
  TOPIC_ID: quotes
eventTrigger:
  eventType: google.pubsub.topic.publish
  failurePolicy: {}
  resource: projects/qwiklabs-gcp-04-2ad6a04dc593/topics/quote-fetcher-topic
  service: pubsub.googleapis.com
ingressSettings: ALLOW_ALL
labels:
  deployment-tool: cli-gcloud
name: projects/qwiklabs-gcp-04-2ad6a04dc593/locations/us-central1/functions/quote_fetcher
runtime: python37
serviceAccountEmail: qwiklabs-gcp-04-2ad6a04dc593@appspot.gserviceaccount.com
sourceUploadUrl: https://storage.googleapis.com/gcf-upload-us-central1-0590737b-324c-4a57-b58b-fd974ee68e4f/dafde907-e63e-49a2-af06-913d2ad64842.zip
status: ACTIVE
timeout: 60s
updateTime: '2021-08-20T18:54:22.876Z'
versionId: '11'


+++ gcloud config get-value project
++ PROJECT_ID=qwiklabs-gcp-04-2ad6a04dc593
++ TOPIC_ID=quotes
++ gcloud functions deploy quote_fetcher --set-env-vars PROJECT_ID=qwiklabs-gcp-04-2ad6a04dc593,TOPIC_ID=quotes --entry-point fetch_quote --runtime python37 --trigger-topic quote-fetcher-topic
Deploying function (may take a while - up to 2 minutes)...
.
For Cloud Build Logs, visit: https://console.cloud.google.com/cloud-build/builds;region=us-central1/54d6c459-b086-4628-997f-47ab7f04daaa?project=774131484409
............................done.


Publish some data to the quote-fetcher-topic Pub/Sub topic

In [61]:
! gcloud pubsub topics publish quote-fetcher-topic --message "this is a test message"

messageIds:
- '2904445036089587'


In [62]:
! gcloud functions logs read quote_fetcher --limit 12

LEVEL  NAME           EXECUTION_ID  TIME_UTC                 LOG
D      quote_fetcher  ibww3r39xc0u  2021-08-20 18:54:57.443  Function execution took 378 ms, finished with status: 'ok'
I      quote_fetcher  ibww3r39xc0u  2021-08-20 18:54:57.379  {'text': '“Any fool can know. The point is to understand.”', 'author': 'Albert Einstein', 'tags': ['knowledge', 'learning', 'understanding', 'wisdom'], 'sentiment': -0.4000000059604645, 'magnitude': 1.0}
I      quote_fetcher  ibww3r39xc0u  2021-08-20 18:54:57.379  TOPIC_ID quotes
I      quote_fetcher  ibww3r39xc0u  2021-08-20 18:54:57.379  PROJECT_ID qwiklabs-gcp-04-2ad6a04dc593
D      quote_fetcher  ibww3r39xc0u  2021-08-20 18:54:57.066  Function execution started
D      quote_fetcher  ibwwhjj1rgdk  2021-08-20 18:54:55.964  Function execution took 758 ms, finished with status: 'ok'
I      quote_fetcher  ibwwhjj1rgdk  2021-08-20 18:54:55.941  {'text': "“That's the problem with drinking, I thought, as I poured myself a drink. If something bad ha

In cloud shell runt he following.

```sh
gcloud services enable cloudscheduler.googleapis.com

export PROJECT_ID=$(gcloud config get-value project)
gcloud app create --project $PROJECT_ID --region us-central

gcloud scheduler jobs create pubsub quotefetcher \
  --schedule "*/2 * * * *" \
  --topic quote-fetcher-topic \
  --message-body "fetch quote"
```

/bin/bash: ./quote_fetcher/launch.sh: Permission denied


In [11]:
! ls -l

total 80
-rw-r--r-- 1 jupyter jupyter 64369 Aug 20 15:54 Quote-Sentiment-Pipeline.jpg
drwxr-xr-x 2 jupyter jupyter  4096 Aug 20 16:17 quote_fetcher
-rw-r--r-- 1 jupyter jupyter 11507 Aug 20 16:19 quotes-pipeline.ipynb
