### Prompting

In [3]:
from openai import OpenAI
import os

key = os.environ.get("OPENAI_API_KEY")
client = OpenAI(api_key=key)

completion = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": "You are an expert in Requirements Engineering. Your purpose is to organizing scientific data in an openly available and long-term way with respect to building, publishing, and evaluating an initial Knowledge Graph of empirical research in Requirement Engineering. To achieve this goal, you need to create an knowledge graph which enables sustainable literature reviews to synthesize a comprehensive, up-to-date, and long-term available overview of the state and evolution of empirical research in Requirement Engineering. To create such a knowledge graph, you first need to come up with some competency questions. An competency question is a natural language question that represents an information need related to the content of a knowledge graph and for which a knowledge graph must provide relevant information to anwser the question"},
    {"role": "user", "content": "Now you are developing an knowledge graph about the state and evolution of the empirical research in Requirements Engineering. Derive 77 competency questions."}
  ]
)
# delete "Now you are given a set of documents about the state and evolution of this field."
print(completion.choices[0].message.content)

1. What are the key topics in empirical research in Requirements Engineering?
2. Which are the most cited empirical research papers in Requirements Engineering?
3. What are the main challenges in conducting empirical research in Requirements Engineering?
4. Which research methods are commonly used in empirical research in Requirements Engineering?
5. What are the emerging trends in empirical research in Requirements Engineering?
6. How has the focus of empirical research in Requirements Engineering evolved over time?
7. What are the most influential conferences or journals in empirical research in Requirements Engineering?
8. What are the differences between qualitative and quantitative research in Requirements Engineering?
9. How do researchers in Requirements Engineering ensure the validity of their empirical studies?
10. What are the ethical considerations in conducting empirical research in Requirements Engineering?
11. How do researchers select appropriate research participants in

In [4]:
completion = client.chat.completions.create(
  model="gpt-4-turbo",
  messages=[
    {"role": "system", "content": "You are an expert in Requirements Engineering. Your purpose is to organizing scientific data in an openly available and long-term way with respect to building, publishing, and evaluating an initial Knowledge Graph of empirical research in Requirement Engineering. To achieve this goal, you need to create an knowledge graph which enables sustainable literature reviews to synthesize a comprehensive, up-to-date, and long-term available overview of the state and evolution of empirical research in Requirement Engineering. To create such a knowledge graph, you first need to come up with some competency questions. An competency question is a natural language question that represents an information need related to the content of a knowledge graph and for which a knowledge graph must provide relevant information to anwser the question"},
    {"role": "user", "content": "Now you are developing an knowledge graph about the state and evolution of the empirical research in Requirements Engineering. Derive 77 competency questions."}
  ]
)
# delete "Now you are given a set of documents about the state and evolution of this field."
print(completion.choices[0].message.content)

Creating 77 competency questions to guide the development of a Knowledge Graph on the state and evolution of empirical research in Requirements Engineering is an extensive task that requires thoughtful consideration of the potential queries that researchers, practitioners, educators, and other stakeholders might have. Below are a range of competency questions grouped by different themes relevant to Requirements Engineering research:

### General Overview
1. What research has been published in Requirements Engineering since [year]?
2. Which journals publish the most on Requirements Engineering?
3. Who are the leading authors in Requirements Engineering research?
4. What are the most cited papers in Requirements Engineering?
5. What universities or institutions lead in Requirements Engineering research?

### Trends and Evolution
6. What are the emerging topics in Requirements Engineering?
7. How has the focus of Requirements Engineering research shifted over the last decade?
8. What rese

In [8]:
completion = client.chat.completions.create(
  model="gpt-4-turbo",
  messages=[
    {"role": "system", "content": "You are an expert in Requirements Engineering. Your purpose is to organizing scientific data in an openly available and long-term way with respect to building, publishing, and evaluating an initial Knowledge Graph of empirical research in Requirement Engineering. To achieve this goal, you need to create an knowledge graph which enables sustainable literature reviews to synthesize a comprehensive, up-to-date, and long-term available overview of the state and evolution of empirical research in Requirement Engineering. To create such a knowledge graph, you first need to come up with some competency questions. An competency question is a natural language question that represents an information need related to the content of a knowledge graph and for which a knowledge graph must provide relevant information to anwser the question"},
    {"role": "user", "content": "Now you are developing an knowledge graph about the state and evolution of the empirical research in Requirements Engineering. Derive 77 competency questions."}
  ]
)
# delete "Now you are given a set of documents about the state and evolution of this field."
print(completion.choices[0].message.content)

Creating 77 competency questions to cover the breadth and depth of empirical research in Requirements Engineering (RE) for a knowledge graph is ambitious but very impactful. Here are the competency questions sorted into various categories to cover diverse aspects of the domain:
 
### General Overview
1. What are the key theories in Requirements Engineering?
2. Who are the leading researchers in Requirements Engineering?
3. What institutions are prominent in Requirements Engineering research?
4. When was the earliest published research in Requirements Engineering?
5. How has the funding trend for Requirements Engineering research changed over time?
6. What are the most cited papers in Requirements Engineering?
7. Which countries contribute most to Requirements Engineering research?

### Methodology
8. What methodologies are most commonly used in Requirements Engineering research?
9. How have methodologies in Requirements Engineering evolved over the years?
10. What new methodologies hav

### Use Sentence Transformers to compare CQs similarity

In [5]:
# !pip install -U sentence-transformers
from sentence_transformers import SentenceTransformer, util
import pandas as pd

model = SentenceTransformer("all-MiniLM-L6-v2")

data = pd.read_csv('/Users/sherry/python-coding/Prompting/requirement-engeneering/genCQs-expertCQs-no-reference.csv', names=['expertCQs', 'genCQs'], header=0)

genCQs = list(data['genCQs'])
expertCQs = list(data['expertCQs'])

genCQs_embeddings = model.encode(genCQs)
expertCQs_embeddings = model.encode(expertCQs)

# Compute cosine similarity between all pairs
cos_sim = util.cos_sim(genCQs_embeddings, expertCQs_embeddings)



In [6]:
# Add all pairs to a list with their cosine similarity score
all_sentence_combinations = []
for genCQs_idx in range(77):
    for expertCQs_idx in range(77):
        all_sentence_combinations.append([cos_sim[genCQs_idx][expertCQs_idx], genCQs_idx, expertCQs_idx])

# Sort list by the highest cosine similarity score
all_sentence_combinations = sorted(all_sentence_combinations, key=lambda x: x[0], reverse=True)

print("Top-5 most similar pairs:")
for score, genCQs_idx, expertCQs_idx in all_sentence_combinations[0:5]:
    print("{} \t {} \t {:.4f}".format(genCQs[genCQs_idx], expertCQs[expertCQs_idx], cos_sim[genCQs_idx][expertCQs_idx]))

Top-5 most similar pairs:
What are the advantages of using systematic reviews in synthesizing empirical research in Requirements Engineering? 	 How has the proportions of  empirical methods to conduct (systematic literature) reviews, so-called secondary research, evolved over time? 	 0.6904
What are the advantages of using systematic reviews in synthesizing empirical research in Requirements Engineering? 	 What empirical methods are used to conduct integrative and interpretive (systematic literature) reviews, so-called secondary research? 	 0.6341
How do researchers in Requirements Engineering mitigate the influence of researcher bias in their studies? 	 How do the authors justify preventing bias in the study results? 	 0.6187
Which research methods are commonly used in empirical research in Requirements Engineering? 	 How often are which empirical methods used? 	 0.6155
How do researchers in Requirements Engineering address the issue of publication bias in their research? 	 How do the

### Save cosine similarity score and the corresponding CQ pairs to a csv file

In [7]:
# save cosine similarity score and the corresponding CQ pairs to a csv file
import torch

genCQ_ls = []
expertCQ_ls = []
score_ls = []
n = len(all_sentence_combinations)
for score, genCQs_idx, expertCQs_idx in all_sentence_combinations[0:n]:
    # print("{} \t {} \t {:.4f}".format(genCQs[genCQs_idx], expertCQs[expertCQs_idx], cos_sim[genCQs_idx][expertCQs_idx]))
    # print(f"{score.item():.4f}")
    # print(genCQs[genCQs_idx])
    # print(expertCQs[expertCQs_idx])
    genCQ_ls.append(genCQs[genCQs_idx])
    expertCQ_ls.append(expertCQs[expertCQs_idx])
    score_ls.append(f"{score.item():.4f}")
cos_sim_df = pd.DataFrame()
cos_sim_df['genCQ'] = genCQ_ls
cos_sim_df['expertCQ'] = expertCQ_ls
cos_sim_df['cos_score'] = score_ls

cos_sim_df.to_csv("/Users/sherry/python-coding/Prompting/requirement-engeneering/gen-expert-CQs-cos-no-reference.csv")