## Publication highlight generator

In [None]:
import glob
import os

import docx2txt
import openai
import pandas as pd
import PyPDF2
from tqdm import tqdm

# assumes your openai key is set as an environment variable
openai.api_key = os.getenv("OPENAI_API_KEY")


### Example title and abstracts to use for training

In [None]:
# sample text from the publication
example_text_one = """Title:
Multisector Dynamics: Advancing the Science of Complex Adaptive Human-Earth Systems

Abstract:
The field of MultiSector Dynamics (MSD) explores the dynamics and co-evolutionary pathways of human and Earth systems with a focus on critical goods, services, and amenities delivered to people through interdependent sectors. This commentary lays out core definitions and concepts, identifies MSD science questions in the context of the current state of knowledge, and describes ongoing activities to expand capacities for open science, leverage revolutions in data and computing, and grow and diversify the MSD workforce. Central to our vision is the ambition of advancing the next generation of complex adaptive human-Earth systems science to better address interconnected risks, increase resilience, and improve sustainability. This will require convergent research and the integration of ideas and methods from multiple disciplines. Understanding the tradeoffs, synergies, and complexities that exist in coupled human-Earth systems is particularly important in the context of energy transitions and increased future shocks.

"""

example_text_two = """Title:
The Role of Regional Connections in Planning for Future Power System Operations Under Climate Extremes

Abstract:
Identifying the sensitivity of future power systems to climate extremes must consider the concurrent effects of changing climate and evolving power systems. We investigated the sensitivity of a Western U.S. power system to isolated and combined heat and drought when it has low (5%) and moderate (31%) variable renewable energy shares, representing historic and future systems. We used an electricity operational model combined with a model of historically extreme drought (for hydropower and freshwater-reliant thermoelectric generators) over the Western U.S. and a synthetic, regionally extreme heat event in Southern California (for thermoelectric generators and electricity load). We found that the drought has the highest impact on summertime production cost (+10% to +12%), while temperature-based deratings have minimal effect (at most +1%). The Southern California heat wave scenario impacting load increases summertime regional net imports to Southern California by 10–14%, while the drought decreases them by 6–12%. Combined heat and drought conditions have a moderate effect on imports to Southern California (−2%) in the historic system and a stronger effect (+8%) in the future system. Southern California dependence on other regions decreases in the summertime with the moderate increase in variable renewable energy (−34% imports), but hourly peak regional imports are maintained under those infrastructure changes. By combining synthetic and historically driven conditions to test two infrastructures, we consolidate the importance of considering compounded heat wave and drought in planning studies and suggest that region-to-region energy transfers during peak periods are key to optimal operations under climate extremes.
"""


### Prompt engineering

In [36]:
# setting the system scope
system_scope = """You are a technical science editor.  You are constructing high impact highlight content from recent publications."""


#### For Word document content

In [37]:
# title generation
title_prompt = """
- Generate a title for the highlight that should pique the interest of the reader while also being somewhat descriptive. 
- Strictly clever titles do not do as well.  
- Output as one short sentance. 

The following are example prompts with appropriate responses:

PROMPT: {0}
RESPONSE: Setting the Stage for the Future of MultiSector Dynamics Research
##
PROMPT: {1}
RESPONSE: Planning Future Power Systems Under Climate Extreme

###
PROMPT: {2}

"""

# subtitle generation
subtitle_prompt = """
- Provide a short subtitle.
- Use no more than 155 characters with spaces.
- The goal for the subtitle is to provide further information that will encourage people to read more.  
- Do not produce sentances using colons. 

The following are example prompts with approprate responses:

PROMPT: {0}
RESPONSE: This commentary defines key terms and concepts for the field of MultiSector Dynamics and identifies important science questions driving the field forward.
##
PROMPT: {1}
RESPONSE: The combined impact of heat waves and drought on the Western U.S. significantly affects energy production costs and interregional transfers during peak hours.

### 
PROMPT: {2}

"""

# the science section
science_prompt = """
- Describe the scientific results for a non-expert, non-scientist audience.
- Use 75 to 100 words. 
- The paragraph should be understandable to a high school senior or college freshman. 
- Use short sentences and short words. 
- Avoid technical terms if possible; if necessary, define them. 
- Provide the necessary context so someone can have a very basic understanding of what you did. 
- Start with things the reader already knows and move on to more complex ideas. 

The following are example prompts with approprate responses:

PROMPT: {0}
RESPONSE: MultiSector Dynamics (MSD) is a scientific field that studies the co-evolution of human and Earth systems. Example research areas include sustainability, climate change risks, and energy system transitions. In this commentary we provide definitions for core concepts and themes in the field. We also describe important science questions, ongoing activities, and provide a vision for the field moving forward. A key part of the future vision is the goal to facilitate a diverse, transdisciplinary workforce and to leverage open science to tackle MSD problems.
##
PROMPT: {1}
RESPONSE: This study investigates the importance of the interactions between climate change and energy system transitions for power system planning and operations. The research examines the individual and combined impacts of a Southern California heat wave and a Western U.S. drought on the historical (5% renewables) and a projected future Western U.S. power system (31% renewables). The key findings are that drought has a higher impact on energy production costs than the heat wave, the cost increases for the combined events are similar to the drought scenario alone, and interregional transfers during peak demand hours are complex and highly sensitive to extreme events and the generation mix.

### 
PROMPT: {2}

"""

# the impact section
impact_prompt = """
- Describe the impact of the research to a non-expert, non-scientist audience.
- Use 75-100 words. 
- The impact of use-inspired science is typically a potential technological advance while the impact of discovery research might be to open up new frontiers of science or resolve a long-standing question. 
- The paragraph should be understandable to a high school senior or college freshman. 
- Use short sentences and short words. 
- Avoid technical terms if possible; if necessary, define them. 
- Include fields impacted such as energy generation, quantum computing, disease diagnostics, etc. 

The following are example prompts with approprate responses:

PROMPT: {0}
RESPONSE: This is the first paper that comprehensively describes the field of MultiSector Dynamics, carefully laying out key terms and concepts for the community. The survey of existing research in the field is a useful marker for where we are and where we want and need to go. This paper will help grow the MSD community by making it easier for new researchers to assimilate into the field.
##
PROMPT: {1}
RESPONSE: This research highlights the importance of considering the compounded effects of extreme heat and drought in planning studies for evolving power systems. The study emphasizes the need for interregional coordination to respond to extreme events such as heat and drought and identifies supply-side water stress and demand-side temperature stress as key variables that need to be considered. The findings of the study can help power system planners focus on scenario development, model complexity, and computational power for improving the operational efficiency of power systems under extreme weather conditions. The limitations of the study highlight the need for further research to address the challenges faced by power systems due to extreme weather conditions.

### 
PROMPT: {2}

"""

# the summary section
summary_prompt = """
- A paragraph or two with additional details of the work. 
- Use no more than 200 words.
- It should be still accessible to the non-specialist but may be more technical if necessary. 
- As a point of style, we usually do not mention the name of the institution. 
- If there is a DOE Office of Science user facility involved, such as NERSC, you can mention the user facility. 

The following are example prompts with approprate responses:

PROMPT: {0}
RESPONSE:  The field of MultiSector Dynamics (MSD) explores the dynamics and co-evolutionary pathways of human and Earth systems with a focus on critical goods, services, and amenities delivered to people through interdependent sectors. This commentary lays out core definitions and concepts, identifies MSD science questions in the context of the current state of knowledge, and describes ongoing activities to expand capacities for open science, leverage revolutions in data and computing, and grow and diversify the MSD workforce. Central to our vision is the ambition of advancing the next generation of complex adaptive human-Earth systems science to better address interconnected risks, increase resilience, and improve sustainability. This will require convergent research and the integration of ideas and methods from multiple disciplines. Understanding the tradeoffs, synergies, and complexities that exist in coupled human-Earth systems is particularly important in the context of energy transitions and increased future shocks.
##
PROMPT: {1}
RESPONSE: This study evaluates the sensitivity of the Western U.S. power grid to the impact of a single drought scenario and a single Southern California heat wave scenario on generation and load, simulating two levels of variable renewable generation shares in each scenario (5% and 31%). The findings suggest that the Western U.S. responds to drought by using additional natural gas generation and to Southern California heat by leveraging the system's interconnectedness. During peak times, regional transfers are just as high in the moderate variable renewable energy (VRE) system as in the low VRE system simulated, highlighting the need for hourly interregional transfers and related transmission expansion, energy storage, or market flexibility solutions. The study demonstrates the importance of modeling water-based grid stress and extreme electricity demand scenarios, and the need to use power system models that represent regional grid interactions to design and evaluate infrastructure under extreme events. These findings are crucial in developing a risk-based approach to planning for extreme events and improving the operational efficiency of power systems under extreme weather conditions. The study provides a valuable toolset that can be expanded to incorporate other grid stressors in high-resolution power system models.

### 
PROMPT: {2}

"""


#### For powerpoint content

In [38]:
ppt_objective_prompt = """
- One sentence stating the core purpose of the study.

The following are example prompts with approprate responses:

PROMPT: {0}
RESPONSE: This commentary defines key terms and concepts for the field of MultiSector Dynamics and identifies important science questions driving the field forward.
##
PROMPT: {1}
RESPONSE: This study investigates the effects of temperature and drought extremes on the Western U.S. power grid, while taking into account the increasing penetration of variable renewable energy sources, using a high-resolution operational power system model.

### 
PROMPT: {2}

"""

ppt_approach_prompt = """
- Clearly and concisely state in 2-3 points how this work accomplished the stated objective.

The following are example prompts with approprate responses:

PROMPT: {0}
RESPONSE: 
- MultiSector Dynamics (MSD) is a scientific field that studies the co-evolution of human and Earth systems. Example research areas include sustainability, climate change, and energy transitions.
- In this commentary we provide definitions for core concepts and themes in the field. We also describe important science questions, ongoing activities, and provide a vision for the field moving forward. 
- A key part of the future vision is the goal to facilitate a diverse, transdisciplinary workforce and to leverage open science to tackle MSD problems.
##
PROMPT: {1}
RESPONSE:
- Evaluated contemporary and hypothesized Western U.S. infrastructures with 5% and 31% variable renewable generation shares for sensitivity to drought and Southern California heat wave scenarios on generation and load.
- Used a stochastic temperature simulation combined with spatially resolved historical drought as a toolset to incorporate other grid stressors in high-resolution power system models, leading to improved sensitivity analyses not limited by the current ability of climate models to capture extreme conditions.

### 
PROMPT: {2}

"""

ppt_impact_prompt = """
- Clearly and concisely state in 2-3 points the critical results and outcomes from this research.

The following are example prompts with approprate responses:

PROMPT: {0}
RESPONSE:
- This is the first paper that comprehensively describes the field of MultiSector Dynamics, carefully laying out key terms and concepts for the community.
- The survey of existing research in the field is a useful marker for where we are and where we want and need to go.
- This paper will help grow the MSD community by making it easier for new researchers to assimilate into the field.
##
PROMPT: {1}
RESPONSE:
- Highlights the need to consider regional connectedness in planning for extreme events and that this need may persist with increasing penetration of variable renewable energy resources.
- Emphasizes the importance of modeling water-based grid stress and extreme electricity demand scenarios with high-resolution power system models for a risk-based approach to planning for extreme events.

### 
PROMPT: {2}

"""


### Content for highlight.  Usually the title and abstract of the publication.

In [None]:
text = """Title:
Addressing Uncertainty in MultiSector Dynamics Research

Abstract:
This online book is meant to provide an open science “living” resource on uncertainty characterization methods for the MultiSector Dynamics (MSD) community and other technical communities confronting sustainability, climate, and energy transition challenges. The last decade has seen rapid growth in science efforts seeking to address the interconnected nature of these challenges across scales, sectors, and systems. Accompanying these advances is the growing realization that the deep integration of research from many disciplinary fields is non-trivial and raises important questions. How and why models are developed seems to have an obvious answer (“to gain understanding”). But what does it actually mean to gain understanding? What if a small change in a model or its data fundamentally changes our perceptions of what we thought we understood? What controls the outcomes of our model(s)? How do we understand the implications of model coupling, such as when one model is on the receiving end of several other models that are considered “input data”?
The often quoted “All models are wrong, but some are useful.” (George Box) is a bit of a conflation trap, often used to excuse known weaknesses in complex models as just an unavoidable outcome of being a modeler. In fact, the quote actually refers to a specific class of small-scale statistical models within an application context that assures a much higher degree of understanding and data quality control than is typical for the coupled human-natural systems applications in the MSD area. Moreover, Box was actually warning readers to avoid overparameterization and emphasizing the need to better understand what underlying factors cause your model to be wrong [1].
So, in short, there is a tension when attaining better performance by means of increasing the complexity of a model or model-based workflow. Box highlights that a modeler requires a clear diagnostic understanding of this performance-complexity tradeoff. If we move from small-scale models simulating readily-observed phenomena to the MSD context, things get quite a bit more complicated. How can we provide robust insights for unseen futures that emerge across a myriad of human and natural systems? Sometimes even asking, “what is a model?” or “what is data?” is complicated (e.g., data assimilated weather products, satellite-based signals translated through retrieval algorithms, demographic changes, resource demands, etc.). This MSD guidance text seeks to help readers navigate these challenges. It is meant to serve as an evolving resource that helps the MSD community learn how to better address uncertainty while working with complex chains of models bridging sectors, scales, and systems. It is not intended to be an exhaustive resource, but instead should be seen as a guided tour through state-of-the-science methods in uncertainty characterization, including global sensitivity analysis and exploratory modeling, to provide insights into complex human-natural systems interactions.
To aid readers in navigating the text, the key goals for each chapter are summarized below.
Chapter 1 uses the Integrated Multisector Multiscale Modeling project as a living lab to encapsulate the challenges that emerge in bridging disciplines to make consequential model-based insights while acknowledging the tremendous array of uncertainties that shape them.
Chapter 2 helps the reader to better understand the importance of using diagnostic modeling to interrogate why uncertain model behaviors may emerge. The chapter also aids readers to better understand the diverse disciplinary perspectives that exist on how best to pursue consequential model-based discoveries.
Chapter 3 is a technical tools-focused primer for readers on the key elements of uncertainty characterization that includes ensemble-based design of experiments, quantitative methods for computing global sensitivities, and a summary of existing software packages.
Chapter 4 narrates for readers how and why the tools from the previous chapter can be applied in a range of tasks from diagnosing model performance to formal exploratory modeling methods for making consequential model-based discoveries.
The supplemental appendices provided in the text are also important resources for readers. They provide a glossary to help bridge terminology challenges, a brief summary of uncertainty quantification tools for more advanced readers, and a suite of Jupyter notebook tutorials that provide hands-on training tied to the contents of Chapter 3 and Chapter 4.
This text was written with a number of different audiences in mind.
Technical experts in uncertainty may find this to be a helpful and unique resource bridging a number of perspectives that have not been combined in prior books (e.g., formal model diagnostics, global sensitivity analysis, and exploratory modeling under deep uncertainty).
Readers from different sector-specific and disciplinary-specific backgrounds can use this text to better understand potential differences and opportunities in how to make model-based insights.
Academic or junior researchers can utilize this freely available text for training and teaching resources that include hands-on coding experiences.
This text itself represents our strong commitment to open science and will evolve as a living resource as the communities of researchers provide feedback, innovations, and future tools."""


### Generate content sections for the Word document

In [None]:
messages=[{"role": "system",
           "content": system_scope},
          {"role": "user",
           "content": title_prompt.format(example_text_one, example_text_two, text)}]


response = openai.ChatCompletion.create(
    model="gpt-4",
    max_tokens=3000,
    temperature=0.7,
    messages=messages)

title_content = response["choices"][0]["message"]["content"]

title_content


In [None]:
messages=[{"role": "system",
           "content": system_scope},
          {"role": "user",
           "content": subtitle_prompt.format(example_text_one, example_text_two, text)}]

response = openai.ChatCompletion.create(
    model="gpt-4",
    max_tokens=3000,
    temperature=0.5,
    messages=messages)

subtitle_content = response["choices"][0]["message"]["content"]

subtitle_content


In [None]:
messages=[{"role": "system",
           "content": system_scope},
          {"role": "user",
           "content": science_prompt.format(example_text, example_text_two, text)}]

response = openai.ChatCompletion.create(
    model="gpt-4",
    max_tokens=3000,
    temperature=0.5,
    messages=messages)

science_content = response["choices"][0]["message"]["content"]

science_content


In [None]:
messages=[{"role": "system",
           "content": system_scope},
          {"role": "user",
           "content": impact_prompt.format(example_text_one, example_text_two, text)}]

response = openai.ChatCompletion.create(
    model="gpt-4",
    max_tokens=3000,
    temperature=0.5,
    messages=messages)

impact_content = response["choices"][0]["message"]["content"]

impact_content


In [None]:
messages=[{"role": "system",
           "content": system_scope},
          {"role": "user",
           "content": summary_prompt.format(example_text_one, example_text_two, text)}]

response = openai.ChatCompletion.create(
    model="gpt-4",
    max_tokens=3000,
    temperature=0.5,
    messages=messages)

summary_content = response["choices"][0]["message"]["content"].replace("\n\n", " ")

summary_content


### Generate content for the powerpoint template

In [39]:
messages=[{"role": "system",
           "content": system_scope},
          {"role": "user",
           "content": ppt_objective_prompt.format(example_text_one, example_text_two, text)}]

response = openai.ChatCompletion.create(
    model="gpt-4",
    max_tokens=3000,
    temperature=0.5,
    messages=messages)

ppt_objective_content = response["choices"][0]["message"]["content"].replace("\n\n", " ")

ppt_objective_content


'RESPONSE: This online book serves as a living resource to guide the MultiSector Dynamics community in addressing uncertainty and navigating the challenges associated with complex chains of models bridging sectors, scales, and systems, while providing insights into human-natural systems interactions.'

In [40]:
messages=[{"role": "system",
           "content": system_scope},
          {"role": "user",
           "content": ppt_approach_prompt.format(example_text_one, example_text_two, text)}]

response = openai.ChatCompletion.create(
    model="gpt-4",
    max_tokens=3000,
    temperature=0.5,
    messages=messages)

ppt_approach_content = response["choices"][0]["message"]["content"].replace("\n\n", " ")

ppt_approach_content


'RESPONSE:\n- This online book serves as a living resource for the MultiSector Dynamics (MSD) community, addressing uncertainty characterization methods and providing guidance on navigating challenges in complex model-based workflows.\n- The book is divided into four chapters, covering the challenges of bridging disciplines, the importance of diagnostic modeling, a technical primer on uncertainty characterization, and the application of these tools in various tasks. The supplemental appendices include a glossary, a summary of uncertainty quantification tools, and hands-on Jupyter notebook tutorials.'

In [41]:
messages=[{"role": "system",
           "content": system_scope},
          {"role": "user",
           "content": ppt_impact_prompt.format(example_text_one, example_text_two, text)}]

response = openai.ChatCompletion.create(
    model="gpt-4",
    max_tokens=3000,
    temperature=0.5,
    messages=messages)

ppt_impact_content = response["choices"][0]["message"]["content"].replace("\n\n", " ")

ppt_impact_content


'RESPONSE:\n- The online book serves as a comprehensive and evolving resource for understanding and addressing uncertainty in MultiSector Dynamics research, including global sensitivity analysis and exploratory modeling.\n- The text is designed for various audiences, including technical experts, researchers from different backgrounds, and academics, and includes hands-on coding experiences and training resources to facilitate learning and application.'