![Clarify Logo](https://global-uploads.webflow.com/5e81e464dad44d3a9a32d1f4/5ed10fc3f1ff8467f4466786_logo.svg)

# Google Cloud Hosting

<img src="https://raw.githubusercontent.com/searis/data-science-tutorials/main/media/docker/cloud.png" alt="clarify doodle" width="400">

In this tutorial, we will create a docker image and deploy it to Google Cloud. The docker image will use the [PyClarify](https://pypi.org/project/pyclarify/) package which provides a fast and easy way to get data from Clarify and write data back. For more details on how to use click [here](https://searis.github.io/pyclarify/) for the documentation. In the docker image, we will also use the [OpenWeather API](https://openweathermap.org/api) but feel free to use any API you prefer.

## Prerequisites
It is highly recommended that you have gone through the [Introduction Notebook](https://colab.research.google.com/github/searis/data-science-tutorials/blob/main/tutorials/Introduction.ipynb) for the basics of using PyClarify. This tutorial is will not go into details about how to use PyClarify. Basic knowledge of docker is also preferable since this tutorial does not aim to show you how docker works, but how to use it with Clarify.

## What do you need
1. [Your Credentails](https://colab.research.google.com/github/searis/data-science-tutorials/blob/main/tutorials/Introduction.ipynb#credentials)
1. [Google Cloud Account](https://cloud.google.com/free)
2. [Google Cloud SDK](https://cloud.google.com/sdk/docs/install)
3. [Docker](https://www.docker.com/products/docker-desktop)

## What we will do
1. [Choose an API from where you will get your data](#api)
2. [Integrating with Clarify](#integration)
3. [Create a docker image](#image)
4. [Google Cloud Hosting](#hosting)
5. [See results in Clarify](#clarify)

<hr>


Other resources:
- [PyClarify documentation](https://searis.github.io/pyclarify/)
- [Clarify Developer documentation](https://docs.clarify.io/reference/http)
- [Introduction Notebook](https://colab.research.google.com/github/searis/data-science-tutorials/blob/main/tutorials/Introduction.ipynb)
- [Google Cloud Platform](https://cloud.google.com)
- [OpenWeather API](https://openweathermap.org)


## Getting started

The first step is to download your credentials and upload the file to this workspace. Click [here](https://colab.research.google.com/github/searis/data-science-tutorials/blob/main/tutorials/Introduction.ipynb#credentials) for how to do it.

After that, download [Docker](https://www.docker.com/products/docker-desktop) and [Google Cloud SDK](https://cloud.google.com/sdk/docs/install).


<a name="api"></a>
# Choose an API from where you will get your data

In this example, we will use the free [OpenWeather API](https://openweathermap.org/api). To use this API the only thing you need is an API key. To get this key you need to sign in [here](https://home.openweathermap.org/users/sign_up). After you sign in you will receive an email with your API key. For more details, you can read the [OpenWeather API documentation](https://openweathermap.org/appid).

To understand how the OpenWeather API works and what data we will use, you can run the cell below. Add your API key and specify a city of your choice.


In [None]:
import requests 

# provide your key
API_key = "<your-API-key>"

# specify a city from which you will recieve weather data, for example Trondheim
city_name = "Trondheim"

# make a get request
url = f"https://api.openweathermap.org/data/2.5/weather?q={city_name}&appid={API_key}&units=metric"
response_data = requests.get(url).json()

print(response_data)

The [response](https://openweathermap.org/current#parameter) will look something like this: 

```json

{
    "coord": {"lon": 10.3951, "lat": 63.4305},
    "weather": [
        {
            "id": 803,                                    -> id
            "main": "Clouds",                             -> main
            "description": "broken clouds",
            "icon": "04d",                               
        }
    ],
    "base": "stations",
    "main": {
        "temp": 4.05,                                     -> temp
        "feels_like": 1.45,
        "temp_min": 3.64,
        "temp_max": 7.05,
        "pressure": 1021,
        "humidity": 90,
    },
    "visibility": 10000,
    "wind": {"speed": 2.89, "deg": 230, "gust": 4.39},
    "clouds": {"all": 63},
    "dt": 1634544451,
    "sys": {
        "type": 2,
        "id": 131520,
        "country": "NO",
        "sunrise": 1634537710,
        "sunset": 1634572297,
    },
    "timezone": 7200,                                     
    "id": 3133880,
    "name": "Trondheim",
    "cod": 200,
}
```

From this response, we will only use `id` and `temp` (temperature).


In [None]:
# extract data from API
temperature = response_data["main"]["temp"]
weather_condition_enum_number = response_data["weather"][0]["id"]
values = [temperature, weather_condition_enum_number]

print(f"temperature: {temperature}, weather_condition_enum_number: {weather_condition_enum_number}")

<a name="integration"></a>
# Integrating with Clarify

<img src="https://raw.githubusercontent.com/searis/data-science-tutorials/main/media/docker/clarify.png" alt="clarify image">

## Create signals

Before we start writing data to Clarify, we will first create two signals with metadata. One signal is for the temperature values and the other signal is for the weather conditions. 

In [None]:
from pyclarify import ClarifyClient, SignalInfo

client = ClarifyClient("./clarify-credentials.json")

INPUT_ID = ["temperature_value", "weather_condition"]

signal_temperature_values = SignalInfo(
    name = "Temperature value",
    type = "numeric",
    description="Temperature from the OpenWeather API",
    labels={"data-source": ["OpenWeather API"], "location": ["Trondheim"]},
    engUnit="°C"
)

signal_temperature_enums = SignalInfo(
    name = "Weather condition",
    type = "enum",
    description="Weather condition from the OpenWeather API",
    labels={"data-source": ["OpenWeather API"], "location": ["Trondheim"]},
)
response_save_signals = client.save_signals(input_ids = INPUT_ID, signals = [signal_temperature_values, signal_temperature_enums])
print(response_save_signals)

Now you should be able to see the two signals in Clarify. Go to **Integrations**, click on the intergration you wrote the signals and go to **Show all signals**.

### Enums

Before creating an item from a signal, we will add some metadata for the `weather_condition` signal. From the OpenWeather API [documentation](https://openweathermap.org/weather-conditions) under the *Weather condition codes* we see groups which in most cases share the same main value. 

We could specify for every ID the main value, but for simplicity reasons in this tutorial, we will only use 6 groups. For every `id` we extract the first number and correspond it to the value of the main. For example, if `id = 201` we will set the id to `2` and the main equal to `Thunderstorm`. 

<img src="https://raw.githubusercontent.com/searis/data-science-tutorials/main/media/docker/group.png" alt="Weather condition codes" width="400">


The 6 groups which we will use are:

```JSON
2: Thunderstorm,
3: Drizzle,
5: Rain,
6: Snow,
7: Atmosphere,
8: Clouds
```


Enums values can be set using the Clarify UI or using the PyClarify SDK.

#### Method 1

Using the User Interface in Clarify, you can specify the enum values.

If the signal is unpublished, you can go to your Signal page and click on the signal you want to publish. From there you can add enum values.

<img src="https://raw.githubusercontent.com/searis/data-science-tutorials/main/media/docker/add_enums.gif" alt="Add enum values" />

A second way is to go to your Item page, click on an Item and from the metadata you can add or change the enum values.

In this tutorial, we will use the UI in Clarify since it is a more generic and dynamic way. 


#### Method 2

Saving enum values using the API can be done with the following code. 

> Note that here you assume that you know all the possible values of the enums.

In [None]:
from pyclarify import SignalInfo

main = {
    '2': 'Thunderstorm',
    '3': 'Drizzle',
    '5': 'Rain',
    '6': 'Snow',
    '7': 'Atmosphere',
    '8': 'Clouds'
}

signal = SignalInfo(name="Weather condition", type="enum", enumValues=main)

response_save_signals = client.save_signals(input_ids = [INPUT_ID[1]], signals=[signal])
print(response_save_signals)

## Write data to Clarify

Using the PyClarify package we can write data to Clarify. For more information check out the [PyClarify documentation](https://searis.github.io/pyclarify/) and the [Introduction tutorial](https://colab.research.google.com/github/searis/data-science-tutorials/blob/main/tutorials/Introduction.ipynb). 

In [None]:
from pyclarify import DataFrame
from datetime import datetime

# Get local time in ISO 8601
current_time = datetime.now().strftime("%Y-%m-%dT%H:%M:%S%Z")
current_time += "Z" # ADD UTC time offset

# extract data from API
temperature = response_data["main"]["temp"]
weather_condition_enum_number = int(str(response_data["weather"][0]["id"])[0])
values = [temperature, weather_condition_enum_number]

# create a data frame model
series = {}
for input_id_, data_ in zip(INPUT_ID, values):
    series[input_id_] = [data_]

df = DataFrame(times=[current_time], series=series)

# write signal data to Clarify
response = client.insert(data=df)
print(response)

<a name="image"></a> 
# Create a docker image

Now we are ready to create our docker image, which will write data in a fixed interval to our signal that we created [previously](#signal).

The folder structure which we want to create is the following:

```
docker/
├── Dockerfile
├── clarify-credentials.json
├── app.py
└── requirements.txt
```

Click [here](https://github.com/searis/data-science-tutorials/tree/main/tutorials/docker) to see the folder and files in the github repository.

## Python Server

We need a python server which will make our docker container run every 1 minute and return data.

For this we will use [Flask](https://flask.palletsprojects.com/en/2.0.x/) and [Gunicorn](https://gunicorn.org).

Create a `app.py` script.

Here is an example code which will write data every minute to the `temperature_value` and `weather_condition` signal. The city from which we will get the weather data is Trondheim.

Don't forget to specify your `API_KEY`.

```python
from apscheduler.schedulers.background import BackgroundScheduler
from datetime import datetime
from flask import Flask
from pyclarifyimport APIClient, DataFrame
import requests


INPUT_ID = ["temperature_value", "weather_condition"]
API_KEY = "<your-API-key>"
CITY_NAME = "Trondheim" 

def main():
   
    client = APIClient("./clarify-credentials.json")
    url = f"https://api.openweathermap.org/data/2.5/weather?q={CITY_NAME}&appid={API_KEY}&units=metric"
    response = requests.get(url).json()

    # Get local time in ISO 8601
    current_time = datetime.now().strftime("%Y-%m-%dT%H:%M:%S%Z")
    current_time += "Z" # ADD UTC time offset

    # extract data from API
    temperature = response["main"]["temp"]
    weather_condition_enum_number = int(str(response["weather"][0]["id"])[0])
    values = [temperature, weather_condition_enum_number]

    # create a data frame model
    series = {}
    for input_id_, data_ in zip(INPUT_ID, values):
        series[input_id_] = [data_]

    df = DataFrame(times=[current_time], series=series)

    # write signal data to Clarify
    response = client.insert(data=df)
    print(response)


app = Flask(__name__)

sched = BackgroundScheduler(daemon=True)
sched.add_job(main, 'interval', minutes=1)
sched.start()

if __name__ == "__main__":
    app.run(debug=True, host="0.0.0.0", port=8080)
```

## Dockerfile

In the Dockerfile we use as a base image Python 3.9, we copy the requirements, the clarify credentials and our python script. We install the requirements file and run the [gunicorn](https://docs.gunicorn.org/en/latest/run.html) server. 

```bash 
FROM python:3.9-slim

WORKDIR /app

COPY requirements.txt clarify-credentials.json app.py ./

RUN pip3 install -r requirements.txt

CMD gunicorn --workers 1 --threads 8 --timeout 0 app:app
```

## Requirements

Create a `requirements.txt` file and add the following:

```
pyclarify==0.1.0
requests==2.26.0
gunicorn==20.1.0
Flask==2.0.2
APScheduler==3.8.0
```

## Build the docker image

If you want to test you image locally run

```sh
$ docker build -t YOUR-IMAGE-TAG .
$ docker run YOUR-IMAGE-TAG
```

To stop you image run 

```sh 
$ docker ps
```

Copy the container-id 

```sh 
$ docker stop <container-id>
```

<a name="hosting"></a>
# Google Cloud Hosting

<img src="https://raw.githubusercontent.com/searis/data-science-tutorials/main/media/docker/hosting.png" alt="clarify image" width="600">

Now that we have a working docker image, we can push it to a cloud environment. We will use the Google Cloud Run environment to host our docker image. Google has great guides on [adding your docker image to their registry](https://cloud.google.com/container-registry/docs/pushing-and-pulling) and [deploying it on their run service](https://cloud.google.com/run/docs/deploying#command-line). 

> Keep in mind that their systems may change in the future and their guides should be the source of truth!


Make sure you have downloaded [Google Cloud SDK](https://cloud.google.com/sdk/docs/install) before we start. Check that it is installed correctly by running the `gcloud version`.

Before we connect to your google account with the SDK, you can [create a project](https://console.cloud.google.com/projectcreate) so that we can connect to the project as well. If you have multiple Google Accounts, make sure that you are logged into the correct one.

Once that is completed you can connect to your google account running `gcloud auth login` in the terminal. To make sure that you have set the correct account and project run `gcloud config list`.

> To use `Container Registry` and `Cloud Run` environments make sure that you have enabled them both. 
>
>* [Enable Container Registry API](https://console.developers.google.com/apis/api/containerregistry.googleapis.com)
>* [Enable Cloud Run API](https://console.cloud.google.com/apis/library/run.googleapis.com) 



## Publish your image in the Container Registry


For the first step we start by building the image and adding a `tag` to it:
```sh
$ docker build -t YOUR-IMAGE-TAG .
```

Now that we have a built image with a tag, we can set it's target path in [Container Registry](https://cloud.google.com/container-registry/docs/pushing-and-pulling#add-registry), including the gcr.io registry host and the project ID of `YOUR-PROJECT-ID`:

```sh
$ docker tag YOUR-IMAGE-TAG gcr.io/YOUR-PROJECT-ID/YOUR-IMAGE-TAG
```

The final step of publishing the image is to push it to the container registry. To do this run:

```sh
$ docker push gcr.io/YOUR-PROJECT-ID/YOUR-IMAGE-TAG
```

Now your image is published! The Container Registry adds the registry to your project, creates a storage bucket for the registry, and stores the image. For more information please refer to [their guides](https://cloud.google.com/container-registry/docs/pushing-and-pulling).


Head over to the [Container Registry](https://console.cloud.google.com/gcr/images/) where you can verify that your image is there. 



## Use Cloud run services to run your image

**Method 1**

To create a service in Cloud run you can either go to [Cloud Run](https://console.cloud.google.com/run) and create a service by clicking on the create service button. From there you can select your image URL, give a name for your service and specify a location. Under `Autoscaling`, set the `Minimum number of instances` and `Maximum number of instances` to 1 so that you always sending data to Clarify. Check the `Require authentication` button and click create.  

> Google Cloud Run uses [autoscaling](https://cloud.google.com/run#all-features) which means that it scales from zero to N containers depending on the traffic received. This means that if your site receives no traffic it will scale down to zero instances after 15 minutes and thus stop sending data with the background scheduler. As this container's main purpose is to write data to Clarify with a background scheduler, we want it to be only one container at all times.



**Method 2**

The same steps can be done by running the command: 

```sh
$ gcloud run deploy YOUR-IMAGE-TAG --image gcr.io/YOUR-PROJECT-ID/YOUR-IMAGE-TAG --min-instances 1 --max-instances 1
```

After you run this command you will be asked two questions. They are pretty self-explanatory, but if you want to follow completely run:
- 13 (for europe-north1 region)
- N (to not allow unauthenticated invocations)

Google provides a quickstart guide for building and deploying python services [here](https://cloud.google.com/run/docs/quickstarts/build-and-deploy/python).


<a name="clarify"></a>
# See results in Clarify

Time to see your data in a Timeline. Clarify provides a nice way to visualize your data clearly and easily. For more information about how to create a timeline in Clarify click [here](https://colab.research.google.com/github/searis/data-science-tutorials/blob/main/tutorials/Introduction.ipynb#bonus).




<img src="https://raw.githubusercontent.com/searis/data-science-tutorials/main/media/docker/timeline.gif" alt="Clarify Timeline" />

# Where to go next 

* [Forecasting](https://colab.research.google.com/github/clarify/data-science-tutorials/blob/main/tutorials/Forecasting.ipynb)
* [Pattern Recognition](https://colab.research.google.com/github/searis/data-science-tutorials/blob/main/tutorials/Pattern%20Recognition.ipynb)