Skip to content

error when runing 'Snippet to concepts generation' script #10

@gnohgnailoug

Description

@gnohgnailoug

Issue: KeyError 'seed' When Running Snippet to Concepts Generation Script

Description

When attempting to run the 'Snippet to concepts generation' script, I encounter a KeyError related to a missing 'seed' key in the input data.

Error Output

  File "/share/home/starcoder2-self-align/src/star_align/self_ossinstruct.py", line 514, in <module>
    asyncio.run(main())
  File "/share/home/anaconda3/envs/starcoder-generate/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/share/home/anaconda3/envs/starcoder-generate/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/share/home/starcoder2-self-align/src/star_align/self_ossinstruct.py", line 411, in main
    kwargs = build_kwargs(args.instruct_mode, example)
  File "/share/home/starcoder2-self-align/src/star_align/self_ossinstruct.py", line 307, in build_kwargs
    kwargs["snippet"] = example["seed"]
KeyError: 'seed'

Command Used

python src/star_align/self_ossinstruct.py \
    --instruct_mode "S->C" \
    --seed_data_files seed.jsonl \
    --max_new_data 50000 \
    --tag concept_gen \
    --temperature 0.7 \
    --seed_code_start_index 0 \
    --model bigcode/starcoder2-15b \
    --num_fewshots 8 \
    --num_batched_requests 32 \
    --num_sample_per_request 1

Source of Seed Data

The seed data file used is sourced from the following URL:
bigcode/python-stack-v1-functions-filtered-sc2

Steps to Reproduce

  1. Setup the environment using the provided command.
  2. Run the script as shown in the command section.

Actual Behavior

The script fails with a KeyError indicating that the 'seed' key is missing from the input data examples.

Additional Information

  • Environment: Conda environment with Python 3.10

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions