# Notebook for GoogleMultiModalFlow 

In this example, we will show you how to use MultiModal as a classifier using Google's models via uniflow.

### Before running the code

You will need to `uniflow` conda environment to run this notebook. You can set up the environment following the instruction:
```
conda create -n uniflow python=3.10 -y
conda activate uniflow  # some OS requires `source activate uniflow`
```

Next, you will need a valid [Google API key](https://ai.google.dev/tutorials/setup) to run the code. Once you have the key, set it as the environment variable `GOOGLE_API_KEY` within a `.env` file in the root directory of this repository. For more details, see this [instruction](https://github.com/CambioML/uniflow/tree/main#api-keys)

### Update system path

In [1]:
%reload_ext autoreload
%autoreload 2

import sys

sys.path.append(".")
sys.path.append("..")
sys.path.append("../..")

## Import dependency

In [3]:
import PIL.Image
import pprint

from dotenv import load_dotenv
from IPython.display import display

from uniflow import PromptTemplate
from uniflow.flow.client import TransformClient
from uniflow.flow.flow_factory import FlowFactory
from uniflow.flow.config  import TransformConfig
from uniflow.op.model.model_config  import GoogleMultiModalModelConfig
from uniflow.viz import Viz
from uniflow.op.prompt import Context

load_dotenv()

True

### Display the different flows

In [3]:
FlowFactory.list()

{'extract': ['ExtractHTMLFlow',
  'ExtractImageFlow',
  'ExtractIpynbFlow',
  'ExtractMarkdownFlow',
  'ExtractPDFFlow',
  'ExtractTxtFlow'],
 'transform': ['TransformAzureOpenAIFlow',
  'TransformCopyFlow',
  'TransformGoogleFlow',
  'TransformGoogleMultiModalModelFlow',
  'TransformHuggingFaceFlow',
  'TransformLMQGFlow',
  'TransformOpenAIFlow'],
 'rater': ['RaterFlow']}

### Prepare Prompts
Here, we will load all images that needs to be classified.

In [4]:
input = [
    PIL.Image.open('data/dog.jpeg'),
    PIL.Image.open('data/cat.jpeg'),
    PIL.Image.open('data/monkey.jpeg'),
]

Next, for the given raw text strings `raw_context_input` above, we convert them to the `Context` class to be processed by `uniflow`.

In [5]:

data = [
    Context(context=c)
    for c in input
]

### Use LLM to generate data
In this example, we use the base `Config` defaults with the GoogleModelConfig to generate questions and answers.

In [17]:
config = TransformConfig(
    flow_name="TransformGoogleMultiModalModelFlow",
    model_config=GoogleMultiModalModelConfig(),
    prompt_template=PromptTemplate( # update with your prompt.
        instruction="""You are a multimodal AI model designed to classify images based on their content.
        Your specific task is to determine whether the provided image is dog or cat.
        Answer dog if dog is in image, cat if cat is in image, and neither if neither dog or cat is in image.
        Explain your answer step by step, then output your result.
        Your output should be in format. Explain: ... Answer: dog, cat, neither.""",
    ),
)
client = TransformClient(config)

Now we call the `run` method on the `client` object to execute the question-answer generation operation on the data shown above.

In [18]:
output = client.run(data)


  0%|          | 0/3 [00:00<?, ?it/s]

100%|██████████| 3/3 [00:12<00:00,  4.20s/it]


### View the output

Let's take a look of the generated output.

In [19]:
pprint.pprint(output)

[{'output': [{'error': 'No errors.',
              'response': [' **Explain:** The image shows a golden retriever '
                           'puppy sitting on green grass. The puppy is looking '
                           'up at something off camera. There are yellow '
                           'flowers scattered on the ground around the puppy.\n'
                           '\n'
                           '**Answer:** dog']}],
  'root': <uniflow.node.Node object at 0x106bbc0a0>},
 {'output': [{'error': 'No errors.',
              'response': [' Explain: The image shows a gray cat with stripes '
                           'lying on a white surface. The cat is looking at '
                           'the camera.\n'
                           'Answer: cat']}],
  'root': <uniflow.node.Node object at 0x1061b36a0>},
 {'output': [{'error': 'No errors.',
              'response': [' There is a monkey in the image.\n'
                           'Explain: The image shows a monkey sitting on a