Data

Repository for code and data associated with the CAM framework paper on analogy generation and evaluation via InstructGPT (Davinci).

Data

PLM-based Analogy Generation

/data/non_adapt: Candidate Analogies for Cybersecurity (cyber), Machine Learning (ai), and High-school science (sci) domains
/data/non_analogies: Non-Analogies used as negative sample for training Analoginess Scorer
/data/sci_src: Additional Analogies (generated with source concept as part of the prompt) for High-school science domain used as positive samples for training Analoginess Scorer

For all files names, pn means analogies generated with prompt id n in the paper, ht means high temperature, lt means low temperature.

Each file contains the following fields separated by tab: (1) Generated Analogy, (2) Target Concept, (3) Prompt.

Analogical Style Scorer

/data/analoginess_scorer/non_adapt.txt: Candidate Analogies classified as non-analogy and analogy

Each file contains the following fields separated by tab: (1) Generated Analogy, (2) Target Concept, (3) Prompt, (4) Temperature (low -- lt or high --ht) (5) Domain (for non-adaptive analogies)/Preference/Discipline, (6) Predicted Class (0 -- non-analogy, 1 -- analogy).

After Prompt Refinement:

/data/analoginess_scorer/prompt_refiner/non_adapt.txt: Candidate Analogies classified as non-analogy and analogy after prompt refinement

Each file contains the following fields separated by tab: (1) Generated analogy after prompt refinement, (2) Target Concept, (3) Prompt, (4) Temperature (low -- lt or high --ht) (5) Domain (for non-adaptive analogies)/Preference/Discipline, (6) Predicted Class (0 -- non-analogy, 1 -- analogy), (7) Generated analogy before prompt refinement (classified as non-analogy).

PLM-based Source Extraction

/data/extracted_src/non_adapt.txt: Source extracted from Analogies

Each file contains the following fields separated by tab: (1) Generated Analogy, (2) Target Concept, (3) Prompt, (4) Temperature (low -- lt or high --ht) (5) Domain (for non-adaptive analogies)/Preference/Discipline, (6) Extracted Source(s).

In case the PLM generated multiple mappings between source and target concepts, they are separated by ###. We only used the first mapping in our experiments.

Analogies retrieved from Web

/data/ret_am/: Each file contains analogies retrieved from the Web for one target and source concept pair. It has a json dictionary with the following keys and values: (1) "src_spec_res" with lists of Bing search results returned for source-specific queries, (2) "gen_res" with lists of Bing search results returned for general queries, (3) "all_bing_queries" with the list of all the source-specific and general queries.

Ranked Analogies by Unsupervised CAM

/data/ranked_analogies/non_adapt.txt: Contains Creative Ranking Score, Reliability Score, and Creativity Score of Analogies.

Analogical Style Scorer Model

Trained model available at: https://drive.google.com/file/d/1egpgQyjDd9I0b-_wmIRPsgnK0OGs9FqX/view?usp=sharing.

Download the model and place in under /model under this repo folder to run Analogical Style Scorer Prediction Code (src/analoginess_scorer_pred.py)

Crowd Evaluation

data/crowd-eval/science.json, data/crowd-eval/ml.json, data/crowd-eval/cyber.json: Format {analogy: meaning: {worker id1: score1, ...}, novelty: {worker id1: score1, ...}, queries: {worker id1: queries1}, urls found: {worker id1: urls1}}

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
data		data
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

src

src

README.md

README.md

Repository files navigation

Data

PLM-based Analogy Generation

Analogical Style Scorer

After Prompt Refinement:

PLM-based Source Extraction

Analogies retrieved from Web

Ranked Analogies by Unsupervised CAM

Analogical Style Scorer Model

Crowd Evaluation

About

Releases

Packages

Languages

Bhaavya/CAM

Folders and files

Latest commit

History

Repository files navigation

Data

PLM-based Analogy Generation

Analogical Style Scorer

After Prompt Refinement:

PLM-based Source Extraction

Analogies retrieved from Web

Ranked Analogies by Unsupervised CAM

Analogical Style Scorer Model

Crowd Evaluation

About

Resources

Stars

Watchers

Forks

Languages