# RSS Feed

## Summary

The use case provided in this notebook takes the latest update of an RSS feed from CNN, downloads the article as text, 
  embeds the text into a vector database, uses Google Bison to summarize the text, and provides a summary
  of the article in a streamlit app.

This notebook outlines how to:

1. Download an RSS feed and the article text
2. Make an AI catalog dataset from the RSS feed data/article content data
3. Embed content into a vector database on DR (using DR API)
4. Set up GenAI model (LLM) using Google Bison for deployment
5. Deploy in Datarobot
6. Build streamlit front-end for for interacting with the deployment
7. Receive article summaries and recommendations back via Streamlit app


# Requirements
The datarobot package will required an API token and an endpoint to interact with the Datarobot offering. See https://docs.datarobot.com/en/docs/api/api-quickstart/index.html#configure-api-authentication for the available methods and pick the one relevant to you.

In [None]:
# These are packages used in this accelerator
# The below format is used in the Datarobot notebooks to install packages. If running this in a DR notebook, uncomment the below entries

# !pip install beautifulsoup4
# !pip install datarobot
# !pip install datarobot-early-access
# !pip install feedparser
# !pip install requests

## Setup

### Import libraries

In [None]:
import json
import time
import zipfile

from bs4 import BeautifulSoup
import datarobot as dr
from datarobot import Dataset as ds
from datarobot import Deployment as dep
from datarobot import UseCase as uc
from datarobot._experimental.models.genai.llm_blueprint import LLMBlueprint as llm_bp
from datarobot._experimental.models.genai.playground import Playground as pg
from datarobot._experimental.models.genai.vector_database import (
    ChunkingParameters,
    VectorDatabase,
)
import feedparser as fp
import requests as r

### Bind variables

In [None]:
# These variables can aso be fetched from a secret store or config files
# The URL may vary depending on your hosting preference, the above example is for DataRobot Managed AI Cloud
DATAROBOT_ENDPOINT = "https://app.datarobot.com/"

# The API Token can be found or created by clicking the avatar icon and then </> Developer Tools in the Datarobot interface.
DATAROBOT_API_TOKEN = "...put your Datarobot API Token here..."

# To create a google service account that has access to Google Bison: https://cloud.google.com/iam/docs/service-account-overview
# To create a Datarobot in-platform credential: https://docs.datarobot.com/en/docs/data/connect-data/stored-creds.html#credentials-management
# This is the in-datarobot credential ID. It should look something like "65a84c02f5c11145013f9848"
GCP_CREDENTIALS = "<put your credentials id here>"

### Connect to DataRobot

You can read more about different options for [connecting to DataRobot from the client](https://docs.datarobot.com/en/docs/api/api-quickstart/api-qs.html).

In [None]:
# Set up a datarobot client using the bound variabels
dr.Client(endpoint=DATAROBOT_ENDPOINT, token=DATAROBOT_API_TOKEN)

## Download RSS feed and the article text

In [None]:
# using feedparser grab the remote xml
cnnfeed = fp.parse("http://rss.cnn.com/rss/cnn_topstories.rss")

# pick the first entry (article) in the rss feed
cnn_0 = r.get(cnnfeed.entries[0].link)

# parse the returned html file using beautiful soup
soup = BeautifulSoup(cnn_0.content, "html.parser")

# grab the actual html element required, in this case the one that has the `articleBody` sections that make the actual text of the article
# thereby not getting the various advertisements and such
soup.body.contents[11].string

# load that text (its a string in the soup output) into json for easier parsing
htmlcontentjson = json.loads(soup.body.contents[11].string)

# empty string into which to append the article content
article_text = ""

# loop through the json and grab just the article content
for k in htmlcontentjson["liveBlogUpdate"]:
    article_text = article_text + k["articleBody"]

## Make AI catalog dataset from the RSS feed/article content

In [None]:
# write article text to a file on disk
with open("cnn_article.txt", "w") as file:
    file.write(article_text)

# make a zipfile with that file inside it
with zipfile.ZipFile("cnn_article.zip", "w", zipfile.ZIP_DEFLATED, False) as zip_file:
    zip_file.write("cnn_article.txt")

# push that to datarobot
# https://datarobot-public-api-client.readthedocs-hosted.com/en/latest-release/autodoc/api_reference.html#datasets
cnn_dataset = ds.upload("cnn_article.zip")

# get the dataset id
cnn_dataset_id = cnn_dataset.id

## Create a USE Case

In [None]:
# create a use case
# https://datarobot-public-api-client.readthedocs-hosted.com/en/latest-release/autodoc/api_reference.html#use-cases
cnn_use_case = uc.create(name="RSS_FEED", description="A Use Case for the RSSFeed accelerator")

## Embed content (text file) into vector db on DataRobot using API

In [None]:
# https://datarobot-public-api-client.readthedocs-hosted.com/en/early-access/autodoc/api_reference.html#datarobot._experimental.models.genai.vector_database.VectorDatabase

# make a ChunkingParameters object
cp = ChunkingParameters("jinaai/jina-embedding-t-en-v1", "recursive", 256, 0, ["\n"])

# make a vectordb
cnn_vdb = VectorDatabase.create(
    cnn_dataset_id, cp, use_case=cnn_use_case, name="cnn_article_jina_emb_t_en_v1"
)

# sleep for 1 minute to allow the vectordb to be created
time.sleep(60)

# Create an LLM Playground instance

In [None]:
# https://datarobot-public-api-client.readthedocs-hosted.com/en/early-access/autodoc/api_reference.html#datarobot._experimental.models.genai.playground.Playground

# Create the playground in the cnn_use_case
cnn_rssfeed_playground = pg.create(
    "CNN_RSS_FEED", "Used for the CNN RSS Feed Accelerator Example", cnn_use_case
)

# Create an LLM Model Blueprint

In [None]:
# https://datarobot-public-api-client.readthedocs-hosted.com/en/early-access/autodoc/api_reference.html#datarobot._experimental.models.genai.llm_blueprint.LLMBlueprint

# Create string for system prompt, summarize the article
sys_prompt = "You are a helpful and factual AI assistant. Your job is to help summarize news articles. \
    Be as concise as possible and do not make anything up. If you do not know the answer or do not have enough context, respond accordingly. \
    Do not make assumptions about what values to use with functions. Always ask for clarification if a user request is ambiguous. \
    If you need to clarify a question, reply by asking for more details before giving an answer. \
    Format the answer as readable to a business executive."

# Create the LLM Blueprint
cnn_llm_blueprint = llm_bp.create(
    cnn_rssfeed_playground,
    "cnn_blueprint",
    llm="google-bison",
    llm_settings={"system_prompt": sys_prompt},
    vector_database=cnn_vdb,
)

# Save the LLM Blueprint, this "locks" the blueprint in the GUI so it is no longer editable
cnn_llm_blueprint.update(is_saved=True)

# Get the Blueprint ID
cnn_llm_blueprint_id = cnn_llm_blueprint.id

## Deploy LLM Blueprint

In [None]:
# Register custom model from Blueprint
# https://datarobot-public-api-client.readthedocs-hosted.com/en/early-access/autodoc/api_reference.html#datarobot._experimental.models.genai.llm_blueprint.LLMBlueprint.register_custom_model
deployed_cnn_llm_blueprint = cnn_llm_blueprint.register_custom_model(
    prompt_column_name="prompt_text", target_column_name="response_text"
)

# This next section is used to 'edit' the deployed blueprint and add the required GCP credentials.

# Get the url for the deployed llm bp
deployed_cnn_llm_blueprint_url = deployed_cnn_llm_blueprint._path.format(
    deployed_cnn_llm_blueprint.custom_model_id
)

# Make a string of that url
path = f"{deployed_cnn_llm_blueprint_url}"

# Create a payload to upload the gcp creds edit
# the baseEnvirontmentID is from v6 of the `[DataRobot] Python 3.11 GenAI` pre-made environment
# if using a different environment, that will need to be updated.
payload = {
    "baseEnvironmentId": "64d2ba178dd3f0b1fa2162f0",
    "runtimeParameterValues": json.dumps(
        [
            {
                "fieldName": "GOOGLE_SERVICE_ACCOUNT",
                "type": "credential",
                "value": GCP_CREDENTIALS,
            }
        ]
    ),
}

# update the deployed blueprint with the credentials it needs to connect to GCP
response = deployed_cnn_llm_blueprint._client.patch(path, json=payload)

# update to the latest version of the blueprint as the above patch method creates a new CustomModelVersion
deployed_cnn_llm_blueprint_patched = dr.CustomModelVersion.get(
    deployed_cnn_llm_blueprint.custom_model_id, response.json()["id"]
)

# The default prediction server is used when making predictions against the deployment, and is a requirement for creating a deployment on DataRobot cloud.
prediction_server = dr.PredictionServer.list()[0]

# Deploy custom model
cnn_custom_model_deployment = dep.create_from_custom_model_version(
    deployed_cnn_llm_blueprint_patched.id,
    "cnn deployment",
    description=None,
    default_prediction_server_id=prediction_server.id,
    max_wait=600,
    importance=None,
)

## Make a Front End local web page

The following section is an example of the content of a `make_web_page.py` file that can be used to render 
a small .html page that can then be used to interact with the LLM deployment created above.

There are many ways to make a front end, this is just a quick example.

In [None]:
#!/usr/bin/env python

# ----- imports -----
import os.path

from jinja2 import Template

# ----- variables ----
# Set these variables to be used in the following html web page render.

# Datarobot Icon, update as needed
customerLogoURL = "https://app.datarobot.com/static/assets/dr-logo-for-dark-bg.svg"

# The API Token can be found or created by clicking the avatar icon and then </> Developer Tools in the Datarobot interface.
API_KEY = "...put your Datarobot API Token here..."

# The identifier for the prediction server in datarobot to use.
# This is the key for the `prediction_server = dr.PredictionServer.list()[0]` as seen above
DATAROBOT_KEY = "544ec55f-61bf-f6ee-0caf-15c7f919a45d"

# The ID for your deployment
# it will look something like: "65a855245f01592b653ae283"
DEPLOYMENT_ID = "...your deployment id here..."

# The main datarobot prediction endpoint. Update as needed.
API_URL = "https://mlops.dynamic.orm.datarobot.com/predApi/v1.0/deployments/"

# The file name for the rendered html page.
filename = "rss_feed_app.html"

# ----- functions -----

# function used to render the .html


def create_webpage(
    customerLogoURL="https://app.datarobot.com/static/assets/dr-logo-for-dark-bg.svg",
    customerLogoSizePercent=100,
    API_KEY=None,
    DATAROBOT_KEY=None,
    DEPLOYMENT_ID=None,
    API_URL=None,
    filename="datarobot_llm_deployment_app.html",
):
    template = Template(
        """ <!DOCTYPE html>
        <html lang="en">
        <head>
    <meta charset="UTF-8">
    <title>DataRobot API</title>
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>

    <style>
        body {
            display: flex;
            flex-direction: column;
            align-items: center;
            height: 100vh;
            margin: 0;
            background-color: black;
            color: white;
            font-family: "Roboto", sans-serif;
        }

        h1 {
            font-family: "Roboto", sans-serif;
        }

        #header {
            display: flex;
            flex-direction: column;
            align-items: center;
        }

        #header img {
            width: 5%;
            max-width: 600px;
            height: auto;
            margin-top: 20px;
            margin-bottom: 20px;
        }

        #chatContainer {
            width: 360px;
            height: 500px;
            border: 1px solid #A9A9A9;
            display: flex;
            flex-direction: column;
            justify-content: space-between;
            background-color: black;
            color: white;
            font-family: "Roboto", sans-serif;
            border-radius: 5px;
        }

        #chatBox {
            height: 80%;
            padding: 15px;
            overflow-y: auto;
            border-bottom: 1px solid #000000;
        }

        #inputContainer {
            height: 20%;
            padding: 10px 15px;
            box-sizing: border-box;
        }

        #inputContainer textarea {
            background-color: #2d8fe2;
            color: white;
            width: 100%;
            height: 50%;
            resize: none;
            box-sizing: border-box;
        }

        #inputContainer button {
            width: 100%;
            height: 30%;
            box-sizing: border-box;
            background-color: #ff5600;
            color: white;
            border-radius: 5px;
        }

        .botMessage,
        .userMessage {
            margin-bottom: 10px;
            padding: 10px;
            border-radius: 4px;
            white-space: pre-wrap;
        }

        .botMessage {
            background-color: #39b54a;
            align-self: flex-start;
        }

        .userMessage {
            background-color: #2d8fe2;
            color: white;
            align-self: flex-end;
        }
    </style>
</head>
<body>
    <div id="header">
        <img src="https://assets-global.website-files.com/6394776949089d0d96702959/63d1bd209b9d3181111c34d6_DataRobotWhite-p-500.png" alt="DataRobot Mascot" style="width: 5%; max-width: 600px; height: auto;">
        <img src="{{ customerLogoURL }}" alt="Customer Logo" style="width: {{ customerLogoSizePercent }}%; max-width: 600px; height: auto;">
        </div>

    <div id="chatContainer">
        <div id="chatBox"></div>

        <div id="inputContainer">
            <textarea id="inputText"></textarea>
            <button id="submitBtn">Submit</button>
        </div>
    </div>

    <script>
        $(document).ready(function () {
            $("#submitBtn").click(sendData);
        });

        function sendData() {
            let inputText = document.getElementById("inputText").value;
            let csv = "prompt_text\\n" + inputText;

            if (inputText.trim() !== "") {
                appendMessage(inputText, "user");
                document.getElementById("inputText").value = "";

        let api_key = "{{ API_KEY }}";
        let datarobot_key = "{{ DATAROBOT_KEY }}";
        let deployment_id = "{{ DEPLOYMENT_ID }}";
        let api_url = "{{ API_URL }}" + deployment_id + "/predictions";

        $.ajax({
                    url: api_url,
                    method: "POST",
                    data: csv,
                    headers: {
                        "Content-Type": "text/plain; charset=UTF-8",
                        "Authorization": "Bearer " + api_key,
                        "DataRobot-Key": datarobot_key
                    },
                    success: function (data) {
                        console.log(data);
                        appendMessage(data.data[0].prediction, "bot");
                    },
                    error: function (error) {
                        console.log(error);
                        appendMessage("Error: " + JSON.stringify(error, undefined, 2), "bot");
                    }
                });
            }
        }

        function appendMessage(message, sender) {
            let chatBox = document.getElementById("chatBox");
            let messageBox = document.createElement("pre");

            messageBox.classList.add(sender + "Message");
            messageBox.textContent = message;

            chatBox.appendChild(messageBox);
            chatBox.scrollTop = chatBox.scrollHeight;
        }
    </script>

</body>

        </html>
        """
    )

    html_string = template.render(
        {
            "customerLogoURL": customerLogoURL,
            "customerLogoSizePercent": customerLogoSizePercent,
            "API_KEY": API_KEY,
            "DATAROBOT_KEY": DATAROBOT_KEY,
            "DEPLOYMENT_ID": DEPLOYMENT_ID,
            "API_URL": API_URL,
        }
    )

    with open(os.path.expanduser(filename), "w") as f:
        f.write(html_string)


# ----- main -----
if __name__ == "__main__":
    # Don't forget to call the function at the end
    create_webpage(
        customerLogoURL=customerLogoURL,
        customerLogoSizePercent=100,
        API_KEY=API_KEY,
        DATAROBOT_KEY=DATAROBOT_KEY,
        DEPLOYMENT_ID=DEPLOYMENT_ID,
        API_URL=API_URL,
        filename=filename,
    )