# Fully Managed workflow orchestration with [*Covalent Cloud*](https://www.covalent.xyz/cloud/).

<div align="center">
<img src="https://www.covalent.xyz/wp-content/uploads/2023/05/cloud-hero-small.png" alt="covalent-cloud-hero-image" height=300>
</div>

## Usage

*Covalent Cloud* provides a very similar interface to open-source Covalent.

👉 Add one or more [@ct.electron](https://docs.covalent.xyz/docs/user-documentation/concepts/covalent-basics#electron) decorators to designate workflow tasks (i.e. *electrons*).

👉 Specify executors to choose electron *compute resources*, without worrying about provider-specific details.

## Covalent Cloud Highlights

- ☁️ Complete cloud abstraction. No infrastructure setup, no provider credentials required.
- 🤝 Support for multiple users and enterprises.
- 🧾 Unified billing and usage tracking.



---

## Environment

Run the following commands to create and activate the environment:
```shell
$ conda env create -f "environment-cc.yml"
$ conda activate covalent_cloud_pydata_2023
```

---

# TUTORIAL: AI Image Generator for Wikipedia Articles

## Imports

In [17]:
import re
from collections import OrderedDict
from pathlib import Path
from typing import Any, List

import covalent as ct
import covalent_cloud as cc
import torch
from diffusers import StableDiffusionPipeline
from transformers import T5ForConditionalGeneration, T5Tokenizer
from wikipediaapi import Wikipedia

cc.save_api_key(open("cc_api_key").read().strip())

## Create Reusable Cloud Environment(s)

Refer to environment by name hereafter.

In [18]:
ENV_NAME = "wiki-llm"

cc.create_env(
    ENV_NAME,
    pip="requirements-cc.txt",
    conda=["python=3.9", "pip"],
)

Environment Already Exists.


# Covalent Cloud Executors

- no infrastructure setup
- no provider credentials
- only environment and resources

In [19]:
CPU_EXECUTOR = cc.CloudExecutor(
    env=ENV_NAME,
    num_cpus=2,
    memory="12GB",
    time_limit=1800
)

GPU_EXECUTOR = cc.CloudExecutor(
    env=ENV_NAME,
    num_cpus=4,
    num_gpus=1,
    memory="32GB",
    time_limit=3600,
    gpu_type="t4",
)

## Constants

In [20]:
# Declare some constants.
MAX_LENGTH_PROMPT = 77
INPUT_BLANK = "summarize: {text}"  # format prompt text into this
TOKENIZER_MODEL = "t5-small"

## First, some useful helper code...

Post processor class that will render the outputted images in this Jupyter Notebook.

In [21]:
class PostProcessor:
    """Post-process the generated images and summaries."""

    # Regex pattern for splitting sentences.
    _end_sentence_pattern = r"(?<=[.!?])\s+(?=[a-z])"

    # HTML template for displaying image-summary pairs.
    _html = """
<!DOCTYPE html>
<html>
<body>
    <h2>{title}</h2>
    <img src="data:image/jpeg; base64,{src}" alt="{src}">
    <p style="font-size:18px">{summary}</p>
</body>
</html>
"""

    def __init__(
        self,
        section_names: List[str],
        summaries: List[str],
        images: List[Any],
    ):

        self.section_names = self._process_sections(section_names)
        self.summaries = self._process_summaries(summaries)
        self.images = images

    @property
    def contents(self) -> List[tuple]:
        return list(zip(self.section_names, self.summaries, self.images))

    @classmethod
    def _process_summaries(cls, summaries: List[str]) -> List[str]:
        """Apply aesthetic clean-ups to the summaries."""

        _summaries = []
        for summary in summaries:
            sentences = re.split(cls._end_sentence_pattern, summary)
            _summaries.append(' '.join(map(str.capitalize, sentences)))

        return _summaries

    @staticmethod
    def _process_sections(section_names: List[str]) -> List[str]:
        """Apply aesthetic clean-ups to section names."""

        _section_names = []
        for name in section_names:
            words = name.split(' ')
            _section_names.append(' '.join(map(str.capitalize, words)))

        return _section_names

    def display_all(self):
        """Display title-image-summary for each section."""

        import base64
        from io import BytesIO
        from IPython.display import display, HTML

        # Render images and embed into notebook.
        for section_name, summary, image in self.contents:

            # get image as bytes
            buffered = BytesIO()
            image.save(buffered, format="JPEG")
            img_bytes = base64.b64encode(buffered.getvalue()).decode("utf-8")

            # save image file
            image.save(f"{section_name}.png", format="PNG")

            display(
                HTML(self._html.format(title=section_name, src=img_bytes, summary=summary))
            )


A function to retrieve the pre-trained model from download or local files.

In [22]:
def get_model(type_: object, id_: str, label=None, **params):
    """Download files for the model if necessary and return the model."""

    _base_dir = Path('/tmp/wiki_image_summary')
    if not _base_dir.exists():
        _base_dir.mkdir()

    model_id = id_.replace('/', '_')
    if label:
        model_id += f"_{label}"

    model_path = _base_dir / model_id

    if model_path.exists():
        return type_.from_pretrained(model_path, **params)

    model = type_.from_pretrained(id_, **params)
    model.save_pretrained(model_path)
    return model


# Now, let's define a workflow!

## Electrons

#### These are the tasks that Covalent will ship to various backends.

### 1. Read from Wikipedia (local execution)
A task that uses the Wikipedia API to find an article by name and retrieve text from the specified sections.

In [23]:
@ct.electron(executor=CPU_EXECUTOR)
def get_page_sections(
    page_title: str,
    section_titles: List[str],
) -> OrderedDict:
    """Get the title and text for each section in the page"""

    wiki_wiki = Wikipedia("AQUser", 'en')
    page = wiki_wiki.page(page_title)

    if not page.exists():
        raise RuntimeError(f"Wikipedia page '{page_title}' not found.")

    section_titles = [s.lower() for s in section_titles]

    section_texts = OrderedDict()
    for section in page.sections:
        section_title = section.title.lower()

        if section_title in section_titles:
            text_parts = [section.text]
            for subsection in section.sections:
                text_parts.append(subsection.text)

            section_texts[section_title] = '\n'.join(text_parts)

    if len(section_texts) == 0:
        raise RuntimeError("No text retrieved from any sections.")

    return section_texts


### 2. Summarize Article Sections

Here we define the text-to-text model that creates a prompt for the subsequent Stable Diffusion model.

In [24]:
@ct.electron(executor=CPU_EXECUTOR)
def generate_reduced_summaries(
    section_texts: OrderedDict,
    model_name: str = TOKENIZER_MODEL,
    max_length: int = MAX_LENGTH_PROMPT,
) -> List[str]:
    """Generate a `max_length` summary from the batch of text sections."""

    # Encode the article and generate a title
    section_texts_formatted = [
        INPUT_BLANK.format(text=section_text)
        for section_text in section_texts.values()
    ]

    # Reduce the full text into a shorter digest.
    tokenizer = get_model(T5Tokenizer, model_name, "tokenizer", suffix="tokenizer")
    inputs = tokenizer(
        section_texts_formatted,
        return_tensors="pt",
        padding="max_length",
        truncation=True,
        max_length=1024,
    )

    # Generate a `max_length` summary from the digest.
    model = get_model(T5ForConditionalGeneration, model_name)
    output_sequences = model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        do_sample=False,
        max_length=max_length,
        num_beams=4,
        length_penalty=2.0,
        early_stopping=True,
    )

    # Decode the summaries.
    summaries = [
        tokenizer.decode(output, skip_special_tokens=True)
        for output in output_sequences
    ]

    if torch.cuda.is_available():
        model.to("cpu")

    return summaries


### 3. Generate Images from Summaries

In [25]:
@ct.electron(executor=GPU_EXECUTOR)
def generate_images(
    prompts: List[str],
    width: int,
    height: int,
    num_inference_steps: int,
    *,
    model_type: Any = StableDiffusionPipeline,
    model_id: str = "runwayml/stable-diffusion-v1-5",
    **params,
):
    """Generate an image based on a summary prompt."""

    model = get_model(model_type, model_id, **params)

    if torch.cuda.is_available():
        model.to("cuda")

    model_output = model(
        prompts,
        width=width,
        height=height,
        num_inference_steps=num_inference_steps,
    )

    if torch.cuda.is_available():
        model.to("cpu")

    return model_output.images

### 4. Obtain Post-Processor Object (local execution)

This creates an instance of our `PostProcessor` (class created at the top of this notebook), which renders the model outputs in this notebook.

In [26]:
@ct.electron(executor=CPU_EXECUTOR)
def post_process(
    section_names: List[List[str]],
    summaries: List[List[str]],
    images: List[List[Any]],
) -> PostProcessor:
    """Post-process and optionally upload the generated images and summaries."""

    # Flatten once-nested input lists.
    summaries = [summary for summaries in summaries for summary in summaries]
    images = [image for images in images for image in images]

    pp = PostProcessor(section_names, summaries, images)

    return pp

<a id='Lattice'></a>

## Lattice

#### Definition of the main workflow function

In [27]:
@ct.lattice(executor=CPU_EXECUTOR, workflow_executor=CPU_EXECUTOR)
def workflow(
    page_title: str,
    sections: List[str],
    width: int = 800,
    height: int = 640,
    num_inference_steps: int = 100,
    batch_size: int = 2,
) -> None:
    """Retrieve text sections from Wikipedia page. Generate summaries and images."""

    summaries_all = []
    images_all = []

    for i in range(0, len(sections), batch_size):

        sections_batch = sections[i:i + batch_size]
        section_texts = get_page_sections(page_title, sections_batch)
        section_text_summaries = generate_reduced_summaries(section_texts)

        section_text_images = generate_images(
            section_text_summaries,
            width,
            height,
            num_inference_steps,
        )

        summaries_all.append(section_text_summaries)
        images_all.append(section_text_images)

    return post_process(sections, summaries_all, images_all)


## Dispatching...

Generate summaries from sections of the Wikipedia entry for the famous Dutch artist M. C. Escher.

In [28]:
PAGE_TITLE = "M. C. Escher"

PAGE_SECTIONS = [
    "Early life",
    "Study journeys",
    "Later life",
    "Mathematically inspired work",
    "Legacy",
]

In [29]:
dispatch_id = cc.dispatch(workflow)(PAGE_TITLE, PAGE_SECTIONS)
print(dispatch_id)
results = cc.get_result(dispatch_id, wait=True)

Output()

9f5da563-31ba-47e8-addd-0dc0c3cb2435


## Workflow Graph: Running

- Navigate to https://app.covalent.xyz (requires login credentials)
- Select the corresponding dispatch

<div align='center'>
<img src="https://drive.google.com/uc?id=1sm6vsmpjCzwAV9c85xp8Pdn8KLFD7hkT" alt="ui_saas_running" width=1200>
</div>

# Getting workflow results

In [30]:
# Download results from cloud server.
results.result.load()
post_processor = results.result.value

# Render the model outputs.
post_processor.display_all()