# Project: Building a Research Paper Analyst with Prompt Engineering
* Notebook by Adam Lang
* Date: 6/27/2024

# Overview
* This is a mini-project using GPT models and LangChain to build a Research Paper Analyst with Prompt Engineering.
* As in the previous project we will use ChatGPT API but also output parsers in LangChain.

## Install dependencies

In [1]:
# imports
!pip install langchain==0.1.19
!pip install langchain-openai==0.1.6
!pip install langchain-community==0.0.38

Collecting langchain==0.1.19
  Downloading langchain-0.1.19-py3-none-any.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain==0.1.19)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB)
Collecting langchain-community<0.1,>=0.0.38 (from langchain==0.1.19)
  Downloading langchain_community-0.0.38-py3-none-any.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m13.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-core<0.2.0,>=0.1.52 (from langchain==0.1.19)
  Downloading langchain_core-0.1.52-py3-none-any.whl (302 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.9/302.9 kB[0m [31m10.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-text-splitters<0.1,>=0.0.1 (from langchain==0.1.19)
  Downloading langchain_text_splitters-0.0.2-py3-none-any.whl (23 kB)
Collecti

## API Tokens
* Open AI access

In [2]:
## enter the API key
from getpass import getpass


OPENAI_KEY = getpass('Enter your Open AI key: ')

Enter your Open AI key: ··········


In [3]:
## openai environment variables
import os

os.environ['OPENAI_API_KEY'] = OPENAI_KEY

## LLM Dependencies

In [4]:
## imports
from langchain_core.prompts import ChatPromptTemplate # prompt templates from langchain
from langchain_openai import ChatOpenAI

In [5]:
# instantiate chatgpt instance
chatgpt = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0) # less randomness

# Project: Building a Research Paper Analyst
Scenario: Let's build an application using LangChain and ChatGPT to analyze Research paper abstracts. The application will do the following:
1. Write a short summary maximum 10 lines for a general audience based on the input research paper abstract.
2. Give a detailed report for a healthcare company with bullet points for pros and cons of the ethics of using Generative AI found in the paper.
3. Give a detailed report for a Generative AI company solving healthcare problems and use bullet points for key issues mentioned for Gen AI for text, images and structured healthcare data.

Goal:
1. We will try to use the `ChatPromptTemplate` to have a conversation with ChatGPT for each task above and "talk to the data".
2. Each of the scenario points above would be outputs for a different level of stakeholder or audience.

## Research Paper Abstract

In [6]:
paper_abstract = f"""
The widespread use of ChatGPT and other emerging technology powered by generative
artificial intelligence (AI) has drawn much attention to potential ethical issues, especially in
high-stakes applications such as healthcare.1–3 However, less clear is how to resolve such
issues beyond following guidelines and regulations that are still under discussion and
development. On the other hand, other types of generative AI have been used to synthesize
images and other types of data for research and practical purposes, which have resolved some
ethical issues and exposed other ethical issues,4,5 but such technology is less often the focus
of ongoing ethical discussions. Here we highlight gaps in current ethical discussions of
generative AI via a systematic scoping review of relevant existing research in healthcare, and
reduce the gaps by proposing an ethics checklist for comprehensive assessment and
transparent documentation of ethical discussions in generative AI development. While the
checklist can be readily integrated into the current peer review and publication system to
enhance generative AI research, it may also be used in broader settings to disclose ethicsrelated considerations in generative AI-powered products (or real-life applications of such
products) to help users establish reasonable trust in their capabilities.

Current ethical discussions on generative AI in healthcare
We conducted a systematic scoping review to analyse current ethical discussions on
generative AI in healthcare. Our search in four major academic research databases for
relevant publications from January 2013 to July 2023 yielded 2859 articles (see Methods for
detailed search strategy and Supplementary Figure S1 for the PRISMA flow diagram), of
which 193 articles were included for analysis based on application data modality (text, image,
or structured data), ethical issues discussed, generative AI involved, and whether generative
AI causes or offers technical solutions for issues raised.

Generative AI for text data-based healthcare
Forty-one of the 193 articles discussed ethical considerations pertaining to generative AI
applications for text data, with 20 articles describing methodological developments or
applications of generative AI and the other 21 articles describing review-type works on this
topic. Although some of these review-type articles used the general term “generative AI”, the
main body and supporting evidence focused on LLMs. Twenty-nine articles had in-depth
discussions on ethical issues, whereas the other 12 articles only briefly touched on some
ethical aspects.
Among the 41 articles, 29 articles focused on discussing ethical issues caused by LLMs (and
specifically by GPT in 16 of the articles), covering a wide range of application scenarios and
considered the application of all 10 ethical principles identified in the review (see Figure 1),
as well as other less discussed concerns such as human-AI interaction, and the rights of
LLMs to be considered as co-authors in scientific papers. One paper only commented briefly
on the need for ethical considerations in LLMs and is summarised in the “Others” category.
Although all ethical principles are equally important, some are discussed more often than
others, e.g., non-maleficence (also referred to in the literature as ‘benevolence’), equity, and
privacy.
Fifteen of the 41 articles aimed to resolve some existing ethical issues (for example,
confidentiality of medical data) by using LLMs and other generative AI (e.g., GAN,
autoencoder or diffusion), such as, to reduce privacy concerns by generating synthetic
medical text, to reduce disparity by providing accessible services and assistance, to detect
health-related misinformation, to generate trusted content, and to improve accountability or
transparency over existing approaches. While most articles focused on either identifying
ethical issues caused by generative AI or proposing generative AI-based solutions, three
articles discussed both to provide a more balanced perspective.

Generative AI for image and structured data-based healthcare
Unlike the diverse application scenarios of generative AI based on text data, for image and
structured data, this use of generative AI focuses on data synthesis and encryption. Hence the
majority of articles discussed the methodological developments of generative AI as giving
rise to a more distinctive and focused set of ethical issues.
5
Notably, of the 98 articles on image data and 58 articles on structured data, more than half
(n=63 for image data and n=33 for structured data) only mentioned ethical considerations as a
brief motivation for methodological developments or as a general discussion point. The rest
included more in-depth discussions or evaluations of ethical issues. Among these 155 articles
(as one article covered multiple modalities), 11 articles were review-type work, where 10
articles reviewed methods that mentioned one or two ethical perspectives, and only one
article24 discussed detailed ethical concerns on generative AI applications.
Resolving privacy issues was the main aim of articles for these two data modalities (n=74 for
image data and n=50 for structured data; see Figure 1), predominantly by generating synthetic
data using GAN. Eight articles on image data and 9 articles on structured data used
generative AI to reduce bias, e.g., by synthesizing data for under-represented subgroups in
existing databases. For both data modalities, we did not see explicit discussions on resolving
autonomy, integrity, or morality issues using generative AI, and for structured data the articles
additionally lacked discussions on trust or transparency.
Only 11 articles for image data selectively discussed some ethical issues that generative AI
can give rise to, without specific discussions regarding autonomy, integrity, or morality. For
structured data, only 4 articles discussed equity, privacy, or data security issues caused by
generative AI. Only two articles on structured data included both the cause and resolving
perspectives by discussing ethical issues that may arise from limitations of methods
proposed, specifically bias induced when synthesizing data in order to resolve privacy issues.
"""

In [7]:
print(len(paper_abstract))

6230


We can see this is a pretty big abstract with 6,230 characters.

## Prompt Template creation
* Research paper analysis
* Transformation

In [8]:
## using a system prompt role for ChatGPT
SYS_PROMPT = """
You are an expert in Artificial Intelligence or AI.
Your job is to transform the input data which is a research paper abstract given below
based on the instruction input by the user.


"""
## prompt instantiation for Chat
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", SYS_PROMPT),
        ("human", "{instruction}"),

    ]
)

## LCEL LLM Chain Creation
* No output parser is needed --> we are not generating any structured output.

In [9]:
## LCEL chain
chain = (prompt
           |
         chatgpt)

## Generate Summary Report #1
* Below we are using `HumanMessage` which is for generating "human like output" which would be stored for conversational messages rather than direct output.

In [12]:
# prompt
prompt_txt = f"""
Based on the following research paper abstract,
create a summary report of maximum 10 lines
for a general audience of stakeholders

Abstract:
{paper_abstract}
"""

print(prompt_txt)


Based on the following research paper abstract,
create a summary report of maximum 10 lines
for a general audience of stakeholders

Abstract:

The widespread use of ChatGPT and other emerging technology powered by generative
artificial intelligence (AI) has drawn much attention to potential ethical issues, especially in
high-stakes applications such as healthcare.1–3 However, less clear is how to resolve such
issues beyond following guidelines and regulations that are still under discussion and
development. On the other hand, other types of generative AI have been used to synthesize
images and other types of data for research and practical purposes, which have resolved some
ethical issues and exposed other ethical issues,4,5 but such technology is less often the focus
of ongoing ethical discussions. Here we highlight gaps in current ethical discussions of
generative AI via a systematic scoping review of relevant existing research in healthcare, and
reduce the gaps by proposing an ethi

In [13]:
from langchain_core.messages import HumanMessage

# prompt sent as a "human"
prompt_txt = f"""
Based on the following research paper abstract,
create a summary report of maximum 10 lines
for a general audience of stakeholders

Abstract:
{paper_abstract}
"""

# store human message in an empty list --> store chat history
messages = [HumanMessage(content=prompt_txt)]

## sent as a user instruction to the LLM
user_instruction = {'instruction': messages}

# invoke chain response
response = chain.invoke(user_instruction)
messages.append(response)

In [14]:
## print response (10 line summary)
print(response.content)

Summary Report:

The use of generative artificial intelligence (AI), such as ChatGPT, in high-stakes applications like healthcare has raised ethical concerns. While guidelines and regulations are still evolving, there is a need to address gaps in ethical discussions. A systematic review highlighted the importance of an ethics checklist for transparent documentation in generative AI development. 

In healthcare, generative AI is being used for text, image, and structured data. Ethical considerations vary across these modalities, with a focus on issues like privacy, bias reduction, and misinformation detection. While some articles propose solutions using generative AI, others highlight ethical dilemmas caused by these technologies. 

For image and structured data, privacy concerns are a key focus, with efforts to generate synthetic data to mitigate risks. However, discussions on autonomy, integrity, and morality issues are lacking. Overall, there is a need for comprehensive ethical asses

In [15]:
response.content

'Summary Report:\n\nThe use of generative artificial intelligence (AI), such as ChatGPT, in high-stakes applications like healthcare has raised ethical concerns. While guidelines and regulations are still evolving, there is a need to address gaps in ethical discussions. A systematic review highlighted the importance of an ethics checklist for transparent documentation in generative AI development. \n\nIn healthcare, generative AI is being used for text, image, and structured data. Ethical considerations vary across these modalities, with a focus on issues like privacy, bias reduction, and misinformation detection. While some articles propose solutions using generative AI, others highlight ethical dilemmas caused by these technologies. \n\nFor image and structured data, privacy concerns are a key focus, with efforts to generate synthetic data to mitigate risks. However, discussions on autonomy, integrity, and morality issues are lacking. Overall, there is a need for comprehensive ethica

In [17]:
## lets look at a list of messages stored
messages

[HumanMessage(content='\nBased on the following research paper abstract,\ncreate a summary report of maximum 10 lines\nfor a general audience of stakeholders\n\nAbstract:\n\nThe widespread use of ChatGPT and other emerging technology powered by generative\nartificial intelligence (AI) has drawn much attention to potential ethical issues, especially in\nhigh-stakes applications such as healthcare.1–3 However, less clear is how to resolve such\nissues beyond following guidelines and regulations that are still under discussion and\ndevelopment. On the other hand, other types of generative AI have been used to synthesize\nimages and other types of data for research and practical purposes, which have resolved some\nethical issues and exposed other ethical issues,4,5 but such technology is less often the focus\nof ongoing ethical discussions. Here we highlight gaps in current ethical discussions of\ngenerative AI via a systematic scoping review of relevant existing research in healthcare, an

Summary:
* What we see above is the prompt --> then the LLM response

## Generate Summary Report #2
* Now we don't have to generate this message again and again, we stored it as a HumanMessage.
* We do have to create a new prompt though to generate the 2nd report summary.

In [18]:
## prompt 2
prompt_txt = f"""
Use only the research paper abstract from earlier and create a detailed report for a healthcare company.
In the detailed report you create, make sure to include a maximum of 3 bullet points that contain the pros and cons of ethics
in Generative AI

"""

## appending the message to the previous human messages
## the HumanMessage now will contain: original prompt --> system response --> new prompt
messages.append(HumanMessage(content=prompt_txt))

# user_instruction
user_instruction = {'instruction': messages}

# invoke response
response = chain.invoke(user_instruction)
messages.append(response)

In [19]:
# response
response

AIMessage(content='**Detailed Report for Healthcare Company:**\n\nThe use of generative artificial intelligence (AI), such as ChatGPT, in high-stakes applications like healthcare has raised ethical concerns. While guidelines and regulations are still evolving, there is a need to address gaps in ethical discussions. A systematic review highlighted the importance of an ethics checklist for transparent documentation in generative AI development.\n\nIn healthcare, generative AI is being used for text, image, and structured data. Ethical considerations vary across these modalities, with a focus on issues like privacy, bias reduction, and misinformation detection. While some articles propose solutions using generative AI, others highlight ethical dilemmas caused by these technologies.\n\nFor image and structured data, privacy concerns are a key focus, with efforts to generate synthetic data to mitigate risks. However, discussions on autonomy, integrity, and morality issues are lacking. Overa

In [20]:
print(response.content)

**Detailed Report for Healthcare Company:**

The use of generative artificial intelligence (AI), such as ChatGPT, in high-stakes applications like healthcare has raised ethical concerns. While guidelines and regulations are still evolving, there is a need to address gaps in ethical discussions. A systematic review highlighted the importance of an ethics checklist for transparent documentation in generative AI development.

In healthcare, generative AI is being used for text, image, and structured data. Ethical considerations vary across these modalities, with a focus on issues like privacy, bias reduction, and misinformation detection. While some articles propose solutions using generative AI, others highlight ethical dilemmas caused by these technologies.

For image and structured data, privacy concerns are a key focus, with efforts to generate synthetic data to mitigate risks. However, discussions on autonomy, integrity, and morality issues are lacking. Overall, there is a need for c

### Check history of messages so far....

In [21]:
messages

[HumanMessage(content='\nBased on the following research paper abstract,\ncreate a summary report of maximum 10 lines\nfor a general audience of stakeholders\n\nAbstract:\n\nThe widespread use of ChatGPT and other emerging technology powered by generative\nartificial intelligence (AI) has drawn much attention to potential ethical issues, especially in\nhigh-stakes applications such as healthcare.1–3 However, less clear is how to resolve such\nissues beyond following guidelines and regulations that are still under discussion and\ndevelopment. On the other hand, other types of generative AI have been used to synthesize\nimages and other types of data for research and practical purposes, which have resolved some\nethical issues and exposed other ethical issues,4,5 but such technology is less often the focus\nof ongoing ethical discussions. Here we highlight gaps in current ethical discussions of\ngenerative AI via a systematic scoping review of relevant existing research in healthcare, an

## Generate Summary Report #3
* Now we add the previous LLM responses and new instructions to the list of messages and pass the entire thing to the LLM so it has access to the historical conversation.

In [22]:
## 3rd prompt
prompt_txt = f"""
Use only the research paper abstract from earlier and create a detailed report for a Generative AI company that is solving healthcare problems using structured data.
In this report please include the following sections with maximum 3 key points in each section related to issues with structured data in healthcare:
1. Generative AI for text
2. Generative AI for images
3. Generative AI for structured data

"""

## append message history
messages.append(HumanMessage(content=prompt_txt))

# user instruction
user_instruction = {'instruction': messages}

# response
response = chain.invoke(user_instruction)

In [23]:
# response
response

AIMessage(content='**Detailed Report for Generative AI Company in Healthcare (Structured Data Focus):**\n\nThe use of generative artificial intelligence (AI), such as ChatGPT, in high-stakes healthcare applications has brought attention to ethical concerns. Guidelines and regulations are evolving, emphasizing the need for comprehensive ethical assessments and transparency in development.\n\n**Generative AI for Text:**\n- Ethical considerations in text-based generative AI applications are crucial for maintaining patient privacy and ensuring trustworthy outputs.\n- Issues such as misinformation detection and human-AI interaction need to be addressed to build user trust in AI-powered healthcare solutions.\n- Clear ethical guidelines can guide the responsible development and deployment of generative AI for text data in healthcare.\n\n**Generative AI for Images:**\n- Privacy concerns are a significant focus in generative AI applications for image data, with efforts to generate synthetic dat

In [24]:
## actual response
print(response.content)

**Detailed Report for Generative AI Company in Healthcare (Structured Data Focus):**

The use of generative artificial intelligence (AI), such as ChatGPT, in high-stakes healthcare applications has brought attention to ethical concerns. Guidelines and regulations are evolving, emphasizing the need for comprehensive ethical assessments and transparency in development.

**Generative AI for Text:**
- Ethical considerations in text-based generative AI applications are crucial for maintaining patient privacy and ensuring trustworthy outputs.
- Issues such as misinformation detection and human-AI interaction need to be addressed to build user trust in AI-powered healthcare solutions.
- Clear ethical guidelines can guide the responsible development and deployment of generative AI for text data in healthcare.

**Generative AI for Images:**
- Privacy concerns are a significant focus in generative AI applications for image data, with efforts to generate synthetic data to mitigate risks.
- Addres

# Summary
* What we were able to do was use the `ChatPromptTemplate` and `HumanMessages` to store our conversation history as we went and refer to the context of the `HumanMessages` as we went along to give the LLM the conversation history for every new prompt.
* There are some better ways to do this, this is the "basic" way to create a "chatbot" with your data. There are more efficient ways to do this which I will cover in a separate project.