<span style="color:orange; font-weight:bold">Note: To answer questions based on text documents, we recommend the procedure in <a href="https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb">Question Answering using Embeddings</a>. Some of the code below may rely on <a href="https://github.com/openai/openai-cookbook/tree/main/transition_guides_for_deprecated_API_endpoints">deprecated API endpoints</a>.</span>

# Creating a synthetic Q&A dataset
I use chatGPT to finish the FAQ work.

The csv file I create is a introduction section of a journal paper named "Long short-term memory recurrent neural network for modeling temporal patterns in long-term power forecasting for solar PV facilities: Case study of South Korea". (https://www.sciencedirect.com/science/article/pii/S095965261934346X)

## 1. Read in the data, and create a context
Create a context by concatenating the title, the heading and the content of that section

In [45]:
import pandas as pd
df = pd.read_csv('paper.csv')
df['context'] = df.title + "\n" + df.heading + "\n\n" + df.content
df['token'] = df['context'].str.len()
df.head()

Unnamed: 0,title,heading,content,context,token
0,Long short-term memory recurrent neural networ...,Introduction1,"Among other renewable energy sources (e.g., wi...",Long short-term memory recurrent neural networ...,2158
1,Long short-term memory recurrent neural networ...,Introduction2,"To address those issues, machine learningebase...",Long short-term memory recurrent neural networ...,2417
2,Long short-term memory recurrent neural networ...,Introduction3,Many of the previous studies listed in Table 1...,Long short-term memory recurrent neural networ...,1722
3,Long short-term memory recurrent neural networ...,Introduction4,"Although short-term forecasting (e.g., 1 h or ...",Long short-term memory recurrent neural networ...,1700
4,Long short-term memory recurrent neural networ...,Introduction5,"Thererfore, the purpose of this study is to pr...",Long short-term memory recurrent neural networ...,1575


## 2. Create questions based on the context
Use gpt-3.5-turbo to generate a number of plausible questions relating to the paper introduction section contents.

In [46]:
import openai

openai.api_key = 'your_key'


def get_questions_chatGPT(context):
    
    prompt = f"Write questions based on the text below\n\nText: {context}\n\nQuestions:\n1."
    
    completion = openai.ChatCompletion.create(
                model="gpt-3.5-turbo", 
                messages=[{"role": "user", "content": prompt}]
                )
    return completion.choices[0]["message"]["content"]


df['questions']= df.context.apply(get_questions_chatGPT)
df['questions'] = "1." + df.questions
print(df[['questions']].values[0][0])

1.What is PV solar energy and why is it important in renewable energy sources?

2. Why is accurately predicting the potential PV power available at candidate sites critical to the success of solar PV projects?

3. Why is long-term forecasting of PV power important in balancing electricity supply and demand, improving energy performance, and financial planning?

4. What are the challenges in estimating the potential of power generation at new candidate sites?

5. Why is the use of solar irradiation data at high temporal resolutions important when designing PV systems for smart grids?

6. What was pointed out in previous studies about the verification of potential energy generation with actual PV power data?


In [47]:
df

Unnamed: 0,title,heading,content,context,token,questions
0,Long short-term memory recurrent neural networ...,Introduction1,"Among other renewable energy sources (e.g., wi...",Long short-term memory recurrent neural networ...,2158,1.What is PV solar energy and why is it import...
1,Long short-term memory recurrent neural networ...,Introduction2,"To address those issues, machine learningebase...",Long short-term memory recurrent neural networ...,2417,1.What approach has been adopted to address th...
2,Long short-term memory recurrent neural networ...,Introduction3,Many of the previous studies listed in Table 1...,Long short-term memory recurrent neural networ...,1722,1.What was the focus of previous studies liste...
3,Long short-term memory recurrent neural networ...,Introduction4,"Although short-term forecasting (e.g., 1 h or ...",Long short-term memory recurrent neural networ...,1700,1.Why is long-term forecasting needed in asses...
4,Long short-term memory recurrent neural networ...,Introduction5,"Thererfore, the purpose of this study is to pr...",Long short-term memory recurrent neural networ...,1575,1.What is the purpose of the study?\n2. What t...


In [49]:
print(df.content.values[0])

Among other renewable energy sources (e.g., wind, tides, geothermal heat), photovoltaic (PV) solar energy is one of the most promising renewable energies available all over the world (International Energy Agency, 2018). However, solar energy generation is affected by geographical location, and thus accurately predicting the potential PV power available at candidate sites is critical to the success of solar PV projects (International Finance Corporation (IFC), 2019). For example, estimated power generation commonly serves as a crucial input to assess the feasibility of PV projects and select a suitable installation location for PV panels (Liu et al., 2017). Long-term forecasting of PV power is also important in balancing electricity supply and demand, improving energy performance (Lin and Pai, 2016), and financial planning (International Finance Corporation, 2019). However, estimating the potential for solar PV power generation is challenging because of topographical and meteorological 

## 3. Create answers based on the context
Use gpt-3.5-turbo to to answer the questions given the relevant paper introduction section contents.

In [50]:
def get_answers_chatGPT(row):
    
    prompt = f"Write answer based on the text below\n\nText: {row.context}\n\nQuestions:\n{row.questions}\n\nAnswers:\n1."
    
    completion = openai.ChatCompletion.create(
                model="gpt-3.5-turbo", 
                messages=[{"role": "user", "content": prompt}]
                )
    return completion.choices[0]["message"]["content"]


df['answers']= df.apply(get_answers_chatGPT, axis=1)
df['answers'] = "1." + df.answers
df = df.dropna().reset_index().drop('index',axis=1)
print(df[['answers']].values[0][0])

1.PV solar energy is a renewable energy source that is considered as one of the most promising sources of renewable energy worldwide. It is important because it can help reduce dependence on fossil fuels and mitigate the negative effects of climate change on the environment.

2. Accurately predicting the potential PV power available at candidate sites is critical to the success of solar PV projects because it serves as a crucial input to assess the feasibility of PV projects and select a suitable installation location for PV panels.

3. Long-term forecasting of PV power is important in balancing electricity supply and demand, improving energy performance, and financial planning. It helps ensure that energy supply meets demand and helps optimize the performance of PV facilities.

4. The challenges in estimating the potential of power generation at new candidate sites include topographical and meteorological conditions that differ from region to region and vary over time. The amount of p

In [51]:
df

Unnamed: 0,title,heading,content,context,token,questions,answers
0,Long short-term memory recurrent neural networ...,Introduction1,"Among other renewable energy sources (e.g., wi...",Long short-term memory recurrent neural networ...,2158,1.What is PV solar energy and why is it import...,1.PV solar energy is a renewable energy source...
1,Long short-term memory recurrent neural networ...,Introduction2,"To address those issues, machine learningebase...",Long short-term memory recurrent neural networ...,2417,1.What approach has been adopted to address th...,1.Machine learning-based approaches have been ...
2,Long short-term memory recurrent neural networ...,Introduction3,Many of the previous studies listed in Table 1...,Long short-term memory recurrent neural networ...,1722,1.What was the focus of previous studies liste...,1.Generating power predictions for a single PV...
3,Long short-term memory recurrent neural networ...,Introduction4,"Although short-term forecasting (e.g., 1 h or ...",Long short-term memory recurrent neural networ...,1700,1.Why is long-term forecasting needed in asses...,1.Long-term forecasting is needed in assessing...
4,Long short-term memory recurrent neural networ...,Introduction5,"Thererfore, the purpose of this study is to pr...",Long short-term memory recurrent neural networ...,1575,1.What is the purpose of the study?\n2. What t...,1.The purpose of the study is to propose and e...


## 4. Save the  Q&A dataset based on paper introduction section

In [52]:
df.to_csv('paper_qa.csv', index=False)