Skip to content

Bhaavya/CAM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 

Repository files navigation

Repository for code and data associated with the CAM framework paper on analogy generation and evaluation via InstructGPT (Davinci).

Data

PLM-based Analogy Generation

  • /data/non_adapt: Candidate Analogies for Cybersecurity (cyber), Machine Learning (ai), and High-school science (sci) domains
  • /data/non_analogies: Non-Analogies used as negative sample for training Analoginess Scorer
  • /data/sci_src: Additional Analogies (generated with source concept as part of the prompt) for High-school science domain used as positive samples for training Analoginess Scorer

For all files names, pn means analogies generated with prompt id n in the paper, ht means high temperature, lt means low temperature.

Each file contains the following fields separated by tab: (1) Generated Analogy, (2) Target Concept, (3) Prompt.

Analogical Style Scorer

  • /data/analoginess_scorer/non_adapt.txt: Candidate Analogies classified as non-analogy and analogy

Each file contains the following fields separated by tab: (1) Generated Analogy, (2) Target Concept, (3) Prompt, (4) Temperature (low -- lt or high --ht) (5) Domain (for non-adaptive analogies)/Preference/Discipline, (6) Predicted Class (0 -- non-analogy, 1 -- analogy).

After Prompt Refinement:

  • /data/analoginess_scorer/prompt_refiner/non_adapt.txt: Candidate Analogies classified as non-analogy and analogy after prompt refinement

Each file contains the following fields separated by tab: (1) Generated analogy after prompt refinement, (2) Target Concept, (3) Prompt, (4) Temperature (low -- lt or high --ht) (5) Domain (for non-adaptive analogies)/Preference/Discipline, (6) Predicted Class (0 -- non-analogy, 1 -- analogy), (7) Generated analogy before prompt refinement (classified as non-analogy).

PLM-based Source Extraction

  • /data/extracted_src/non_adapt.txt: Source extracted from Analogies

Each file contains the following fields separated by tab: (1) Generated Analogy, (2) Target Concept, (3) Prompt, (4) Temperature (low -- lt or high --ht) (5) Domain (for non-adaptive analogies)/Preference/Discipline, (6) Extracted Source(s).

In case the PLM generated multiple mappings between source and target concepts, they are separated by ###. We only used the first mapping in our experiments.

Analogies retrieved from Web

/data/ret_am/: Each file contains analogies retrieved from the Web for one target and source concept pair. It has a json dictionary with the following keys and values: (1) "src_spec_res" with lists of Bing search results returned for source-specific queries, (2) "gen_res" with lists of Bing search results returned for general queries, (3) "all_bing_queries" with the list of all the source-specific and general queries.

Ranked Analogies by Unsupervised CAM

  • /data/ranked_analogies/non_adapt.txt: Contains Creative Ranking Score, Reliability Score, and Creativity Score of Analogies.

Analogical Style Scorer Model

Trained model available at: https://drive.google.com/file/d/1egpgQyjDd9I0b-_wmIRPsgnK0OGs9FqX/view?usp=sharing.

Download the model and place in under /model under this repo folder to run Analogical Style Scorer Prediction Code (src/analoginess_scorer_pred.py)

Crowd Evaluation

data/crowd-eval/science.json, data/crowd-eval/ml.json, data/crowd-eval/cyber.json: Format {analogy: meaning: {worker id1: score1, ...}, novelty: {worker id1: score1, ...}, queries: {worker id1: queries1}, urls found: {worker id1: urls1}}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages