Skip to content

incorrect neuronpedia reference for few-shot sample in intruder_prompt.py #154

@d0rbu

Description

@d0rbu

delphi/scorers/classifier/prompts/intruder_prompt.py includes some samples for few-shot prompting and most of them have a helpful comment pointing to the original neuronpedia link where that example came from. however, i think DSCORER_EXAMPLE_THREE has the wrong link attached, as i was not able to find it in the attached link and it matches the link for the second example. probably a copy-paste error?

# https://www.neuronpedia.org/gpt2-small/8-res-jb/12654
DSCORER_EXAMPLE_TWO = """Examples:

Example 0: enact an individual health insurance mandate?âĢĿ, Pelosi's response was to dismiss both
Example 1: climate, TomblinâĢĻs Chief of Staff Charlie Lorensen said.Ċ
Example 2: no wonderworking relics, no true Body and Blood of Christ, no true Baptism
Example 3:ĊĊIt has been devised by Director of Public Prosecutions (DPP)
Example 4: and fair investigation not even include the Director of Athletics? · Finally, we believe the
"""
DSCORER_RESPONSE_TWO_COT = "I can see that there are several examples that have the word 'of' before a capital letter. The intruder is 0 because it does not."
DSCORER_RESPONSE_TWO = "[RESPONSE]: 0"

# this is the same link as up above, i wasn't able to find these words on the page but i was able to find the examples above on it
# https://www.neuronpedia.org/gpt2-small/8-res-jb/12654
DSCORER_EXAMPLE_THREE = """Examples:

Example 0: Climbing
Example 1: running
Example 2: swim
Example 3: eating
Example 4: cycling
"""
DSCORER_RESPONSE_THREE_COT = "All examples are related to activities, the first 3 and the last one being about sports and physical activities. Eating is not a sport or physical activity so it is the intruder."

DSCORER_RESPONSE_THREE = "[RESPONSE]: 3"

i would appreciate if it could be updated to the correct link, but if that is no longer available then it's probably better to just remove it altogether

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions