In [None]:
'''
TODO LIST:
-Write a POC for evaluating and creating a suggestion based on a single requirement/guideline/best practice
-Include an example of the author's past writing for tone adjustment
-Add a guideline about length
-Prompting strategies like generating 5 different outputs and having it judge each one and take the best components of each
'''

In [1]:
import openai
import guidance
from dotenv import load_dotenv
from os import environ

In [13]:
load_dotenv()
openai.api_key = environ['OPENAI_API_KEY']
guidance.llm = guidance.llms.OpenAI("gpt-3.5-turbo")

example_significance_section = '''Chronic infections caused by biofilms annually affect 17 million Americans, cause at least 550,000 American
deaths, and cost the US healthcare system billions of dollars [2-8]. Chronic wounds in particular can cost, per
patient, tens of thousands of dollars per year, and are prevented from healing because they are infected by
bacterial biofilms dominated by Pseudomonas aeruginosa [10]. Biofilm infection in chronic wounds afflicts both
diabetic and non-diabetic patients and can lead to amputation [14].
Biofilms resist antibiotics and evade the host immune defense [10, 15-18]. In biofilms, a heterogeneous
matrix of differentiated extracellular polymers (EPS) and proteins holds bacteria in place [19-22], thus
controlling intercellular associations and differentiation of microenvironments [19, 23-34]. Matrix polymers and
proteins also confer intercellular cohesion on biofilm bacteria, thereby determining the mechanical resistance
of the biofilm to physical breakup. The impact of spatial structure and matrix mechanics on biofilm properties
such as virulence, antibiotic resistance, and immune evasion are largely unknown. Indeed, we know little
about what specific structures and mechanics develop in biofilm infections, and extant techniques to probe
these properties are largely lacking. These are significant and critical unaddressed problems.
We will bring to bear a unique combination of techniques, a number of which we have recently
developed, for measuring and controlling the in vivo and in vitro structural and the in vitro mechanical
properties of biofilms [9, 11, 35]. Using our distinctive combination of microbiological and physical expertise,
we will also develop new approaches to measuring the mechanical properties of biofilms in vivo and how they
impact resistance to immunological clearance. We will determine: what mechanical and structural properties
develop for biofilm infections in chronic wounds in vivo; how the spatial structures and associated material
transport in biofilm infections impact antibiotic resistance and host-damaging virulence; how virulence and
biofilm mechanics impact resistance to immunological clearance. In short, we will contribute knowledge of
what biofilm structures and mechanics develop in P. aeruginosa biofilm infections in chronic wounds, and the
degree to which these structures and mechanics impact the virulence, antibiotic resistance, and persistence of
the biofilm infection. This contribution will be significant because it will identify cases where structure and/or
mechanics should be or inform specific therapeutic targets. We expect that treatments addressing physical
properties could also be combined with traditional therapeutics for enhanced clinical outcome. In addition to
treatments addressing virulence, antibiotic resistance, and immune evasion, our work is also likely to lead to
improved approaches for debridement and negative pressure wound therapy.
In the short term, our work will advance understanding of an important pathogen that forms biofilm
infections in wounds and many other anatomical sites [36-51]. Moreover, the work proposed here will develop
a platform of complementary techniques and knowledge that will be extensible to future studies of other
infection sites and other organisms, including multi-species infections and engineered microbial consortia [52-
54]. Thus, this platform will be a foundational resource for the emerging field of physical microbiology &
medicine. We expect that our immediate results, as well as the subsequent work building on the platform we
develop, will lead to improved quality of life and medical outcomes, and reduced healthcare costs.'''

guidelines = '''1. Try to build a 'funnel' of ideas - start with the broad problem and work down toward the specific problem that you're focusing on.
2. Don't just focus on public health benefits, also focus on how it will move fellow researchers' work forward.
3. Don't just explain why the research is needed, but why is it needed NOW, moreso than other possible work in this area?
4. Use language directly from the descriptions of the information given above to make it clear how you are utilizing each piece of information
5. Don't use hedging words like 'may' as one might in a scholarly manuscript - this is not a manuscript, it's effectively a sales pitch for your research.
6. The final product should be between 800 and 1200 words. Use any extra space to explain more about the background behind the proposed project, as the people reading this aren't necessarily experts in the subfield of the proposed research.'''

example_research_QnA = '''Q. Give a short summary of the issue that your research is hoping to address.
A. Alzheimer's Disease often does not show noticeable clinical symptoms until there is already somewhat advanced molecular pathology. This means that early detection of AD is necessary if we hope to prevent brain damage and cognitive decline. Early detection of AD is difficult because the symptoms are not yet measurable.
Q. How widespread is the issue that you are researching? How many people are affected by it? How much money is wasted dealing with this issue each year?
A. More than 6 million Americans have AD and AD is expected to cost the US $345 billion in 2023 and that number is expected to grow to over $1 trillion by 20250.
Q. What approach has prior research taken to address this issue? What methods or perspectives have been used to try and solve this problem?
A. Broadly, prior research has used a number of different input variables to predict AD development. These include features extracted from imaging and signal data, genetic data, and neuropsychological data. Prediction algorithms range from heavily parameterized and biologically motivated models to data-driven machine learning models. Relevant to my project, people have used hemodynamic measures in the past such as CBF and ATT as input variables to neural networks to predict AD development. 
Q. How did the outcomes of prior research contribute to an understand of or solution to this issue?
A. Prior research has established that CBF and ATT are related to symptoms of AD, such as hippocmapal volume, white matter signal abnormality volume, and scores on neuropsychological examinations such as NIH toolbox. There is a lot of evidence that correlations between hemodynamic measures and AD symptoms exist.
Q. Where do the outcomes of prior research fall short? What need still exists that has not been satisfied by prior research?
A. While strong correlations between hemodynamic measures and AD symptoms ahve been found, prior research has not successfully found a robust method for predicting AD development longitudinally, before clinical symptoms present. This is problematic because it means that we cannot detect AD early enough to prevent brain damage and cognitive decline.
Q. Give a short summary of your proposed project. What approach will you use that has not been used before, and why will this approach enable you to satisfy the unmet need when prior research couldn't?
A. I will use convolutional neural networks to detect late-onset Alzheimer's disease before cognitive symptoms become measureable. The input to the network will be cortical surface maps of cerebral blood flow and arterial transit time, calculated from arterial spin labeling MRI scans and FreeSurfer's cortical surface reconstruction. This research is the first to use vertex-wise hemodynamic data directly as input to a CNN rather than using summary measures as inputs to a more traditional neural network structure. This may be more effective at early diagnosis than existing methods because vertex-wise data is incredibly high-dimensional and difficult for humans to interpret. However, this complexity may be exactly what is needed to predict symptomatic decline. 
Q. When you fill these holes in prior research, why will that be useful? How will it benefit the people suffering from the core issue?
A. Earlier diagnosis means better treatment which means better patient outcomes. This will greatly reduce financial and emotional burden on both patients and caregivers everywhere, as well as save the U.S. healthcare system billions of dollars.
Q. When you fill these holes in prior research, how will it move your field forward? How the concepts, methods, technologies, treatments, services, or preventative interventions that drive this field will be changed if the proposed aims are achieved?
A. This will be groundbreaking work on a new research strategy of using hemodynamic cortical surface maps as direct inputs to CNNs for prediction of clinical variables. If achievable, this will not only focus the field on a promising new research avenue, but also provide a framework for using this new methodology to study other pathologies.
Q. Why is NOW an opportune time for this research to be conducted? What new technological or conceptual advance has enabled this research such that it could not have been done in the past? Why is it pressing enough that it cannot wait until the future?
A. Recent advances in the field of data science such as intuitive deep learning software packages, cloud computing services, and conceptual work on neural network structure has democratized what used to be considered advanced techniques. As recently as 10 years ago, very few labs would have had the compute power or computer science knowledge to conduct research with advanced CNN techniques. In addition, ASL MRI is a relatively new technique. multi-delay ASL allowing for the computation of ATT is even newer and still not widely available. There has never been a better tiem to perform this style of research.'''

full_prompt = f'''{{{{#system~}}}}
You are GrantGPT - A helpful AI that aids investigators in writing grant proposals for scientific research funded by the National Institutes of Health (NIH).
You are an expert at writing the Significance section of the Research Strategy portion of the grant proposal.

Here is an example of a Significance section:
{example_significance_section}

The investigator has responded to several questions about their research. You will be given the questions and answers. Use the information
contained in the answers to construct the signifiance section.

While constructing the significance section, you will follow the following guidelines:
{guidelines}
{{{{~/system}}}}

{{{{#user~}}}}
{{{{research_QnA}}}}
{{{{~/user}}}}

{{{{#assistant~}}}}
{{{{gen 'draft' max_tokens=1500}}}}
{{{{~/assistant}}}}'''

program = guidance(full_prompt)

In [14]:
print(full_prompt)

{{#system~}}
You are GrantGPT - A helpful AI that aids investigators in writing grant proposals for scientific research funded by the National Institutes of Health (NIH).
You are an expert at writing the Significance section of the Research Strategy portion of the grant proposal.

Here is an example of a Significance section:
Chronic infections caused by biofilms annually affect 17 million Americans, cause at least 550,000 American
deaths, and cost the US healthcare system billions of dollars [2-8]. Chronic wounds in particular can cost, per
patient, tens of thousands of dollars per year, and are prevented from healing because they are infected by
bacterial biofilms dominated by Pseudomonas aeruginosa [10]. Biofilm infection in chronic wounds afflicts both
diabetic and non-diabetic patients and can lead to amputation [14].
Biofilms resist antibiotics and evade the host immune defense [10, 15-18]. In biofilms, a heterogeneous
matrix of differentiated extracellular polymers (EPS) and pro

In [15]:
executed_program = program(research_QnA=example_research_QnA)