# Notebook for Paper Comparison Flow via OpenAI
In this example, we will show you how to generate question-answers (QAs) from a pdf using OpenAI's models via `uniflow`'s [OpenAIJsonModelFlow](https://github.com/CambioML/uniflow/blob/main/uniflow/flow/model_flow.py#L125).


### Before running the code

You will need to `uniflow` conda environment to run this notebook. You can set up the environment following the instruction: https://github.com/CambioML/uniflow/tree/main#installation.

Next, you will need a valid [OpenAI API key](https://platform.openai.com/api-keys) to run the code. Once you have the key, set it as the environment variable `OPENAI_API_KEY` within a `.env` file in the root directory of this repository. For more details, see this [instruction](https://github.com/CambioML/uniflow/tree/main#api-keys)

In this example, we'll be using two papers in markdown format from under 'example/transform/data/raw_input/'

### Update system path

In [81]:
%reload_ext autoreload
%autoreload 2

import sys

sys.path.append(".")
sys.path.append("..")
sys.path.append("../..")

### Install helper packages

### Import Dependency

In [82]:
from dotenv import load_dotenv

from uniflow.flow.flow_factory import FlowFactory
from uniflow.flow.client import TransformClient
from uniflow.flow.config import TransformOpenAIConfig
from uniflow.op.model.model_config import OpenAIModelConfig
from uniflow.op.prompt import Context

load_dotenv()


True

In [83]:
FlowFactory.list()

{'extract': ['ExtractHTMLFlow',
  'ExtractImageFlow',
  'ExtractIpynbFlow',
  'ExtractMarkdownFlow',
  'ExtractPDFFlow',
  'ExtractTxtFlow',
  'ExtractGmailFlow'],
 'transform': ['TransformAzureOpenAIFlow',
  'TransformCopyFlow',
  'TransformGoogleFlow',
  'TransformGoogleMultiModalModelFlow',
  'TransformHuggingFaceFlow',
  'TransformLMQGFlow',
  'TransformOpenAIFlow',
  'TransformSummaryGoogleFlow',
  'TransformComparisonOpenAIFlow'],
 'rater': ['RaterFlow']}

### Prepare the input data
They are in preprocessed in markdown formats

In [86]:
with open(r"data/raw_input/paper_1_raw.md", 'r') as file:
    paper_1_content = file.read()

with open(r"data/raw_input/paper_2_raw.md", 'r') as file:
    paper_2_content = file.read()

raw_context_input = [
    paper_1_content,
    paper_2_content,
]
raw_context_input

['# The Impact of Artificial Intelligence on Healthcare: A Review\n\n## Abstract\nArtificial intelligence (AI) has emerged as a transformative technology in healthcare, offering opportunities to improve patient outcomes, streamline processes, and enhance decision-making. This paper provides an overview of the current state of AI in healthcare, explores its applications, benefits, challenges, and future prospects.\n\n## Introduction\nIn recent years, artificial intelligence has gained significant traction across various industries, and healthcare is no exception. With advancements in machine learning, natural language processing, and robotics, AI has the potential to revolutionize healthcare delivery, diagnosis, treatment, and management. This paper aims to delve into the role of AI in healthcare, highlighting its implications, challenges, and future directions.\n\n## Background\nThe integration of AI into healthcare systems has been facilitated by the exponential growth of data, couple

### Use LLM to compare two papers

In this example, we will use the [OpenAIModelConfig](https://github.com/CambioML/uniflow/blob/main/uniflow/model/config.py#L17)'s default LLM to generate questions and answers.

In [99]:
input_data = [[Context(Context=raw_context_input[0]), Context(Context=raw_context_input[1])]]
config = TransformOpenAIConfig(
    flow_name="TransformComparisonOpenAIFlow",
    model_config=OpenAIModelConfig(),
)
client = TransformClient(config)

Now we call the `run` method on the `client` object to execute the question-answer generation operation on the data shown above.

In [100]:
output = client.run(input_data)

100%|██████████| 1/1 [01:05<00:00, 65.77s/it]


### Process the output

Let's take a look of the generated output. We need to do a little postprocessing on the raw output.

In [101]:
comparison = []

for item in output:
    for i in item.get('output', []):
        for response in i.get('response', []):
            comparison.append(response)

comparison

['Similarities between paper A and paper B:\n1. Both papers focus on the impact of renewable energy utilization on global carbon emissions in response to climate change.\n2. Both papers emphasize the need to transition from non-renewable to renewable energy sources to reduce carbon emissions.\n3. They both highlight the importance of renewable energy sources such as solar, wind, and hydroelectric power as potential solutions to mitigate climate change.\n\nDifferences between paper A and paper B:\n1. Paper A specifically mentions the use of statistical models to analyze data from various countries over the past 20 years to assess the effectiveness of renewable energy in reducing carbon footprints, while paper B does not mention specific research methods.\n2. Paper A focuses on the findings that suggest adopting renewable energy can significantly reduce global carbon emissions, emphasizing the importance of policies that support renewable energy investments and infrastructure development

## End of the notebook

Check more Uniflow use cases in the [example folder](https://github.com/CambioML/uniflow/tree/main/example/model#examples)!

<a href="https://www.cambioml.com/" title="Title">
    <img src="../image/cambioml_logo_large.png" style="height: 100px; display: block; margin-left: auto; margin-right: auto;"/>
</a>