Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update alpaca gpt4 to use dataset entry #2869

Merged

Conversation

CloseChoice
Copy link
Collaborator

@CloseChoice CloseChoice commented Apr 23, 2023

#2827

Update alpaca gpt4 to use dataset entry.

I ran

~$ python check_dataset_appearances.py -d alpaca_gpt4 --cache_dir .cache --mode sft
'Found the following occurances in TRAIN alpaca_gpt4:'
{   re.compile('\\[\\d+(?:,\\s*\\d+)*?\\]'): [   '[3, 45, 99, 2, 8, 6, 72]',
                                                 '[10, 8, 7, 4]',
                                                 '[1, 2, 3, 4, 5]',
                                                 '[7, 3, 4, 6, 2]']}
'Found the following occurances in VAL alpaca_gpt4:'
{   'openai': [   'u’re approved, get your API key.\n'
                  '\n'
                  '2. Install the `openai` library in your Python environment '
                  'using p']}

Checked all the occurances for the references, but all are programming related and have nothing to do with references, so this looks fine:

DatasetEntry(questions=['Re-order the integer list given in the input field such that 
                                         all odd numbers are first and even numbers are last.
                                         \n[2, 3, 8, 45, 6, 99, 72]'], 
                      answers=['[3, 45, 99, 2, 8, 6, 72]'],
                      context=None, 
                      lang=None, 
                      length=None, 
                      quality=None, 
                      humor=None, 
                      creativity=None)

@CloseChoice CloseChoice marked this pull request as ready for review April 23, 2023 21:54
Copy link
Collaborator

@andreaskoepf andreaskoepf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok .. nice!

@andreaskoepf andreaskoepf merged commit f745129 into LAION-AI:main Apr 24, 2023
1 check passed
grgau pushed a commit to grgau/Open-Assistant that referenced this pull request May 8, 2023
LAION-AI#2827

Update alpaca gpt4 to use dataset entry.

I ran 
```bash
~$ python check_dataset_appearances.py -d alpaca_gpt4 --cache_dir .cache --mode sft
'Found the following occurances in TRAIN alpaca_gpt4:'
{   re.compile('\\[\\d+(?:,\\s*\\d+)*?\\]'): [   '[3, 45, 99, 2, 8, 6, 72]',
                                                 '[10, 8, 7, 4]',
                                                 '[1, 2, 3, 4, 5]',
                                                 '[7, 3, 4, 6, 2]']}
'Found the following occurances in VAL alpaca_gpt4:'
{   'openai': [   'u’re approved, get your API key.\n'
                  '\n'
                  '2. Install the `openai` library in your Python environment '
                  'using p']}
```
Checked all the occurances for the references, but all are programming
related and have nothing to do with references, so this looks fine:
```python
DatasetEntry(questions=['Re-order the integer list given in the input field such that 
                                         all odd numbers are first and even numbers are last.
                                         \n[2, 3, 8, 45, 6, 99, 72]'], 
                      answers=['[3, 45, 99, 2, 8, 6, 72]'],
                      context=None, 
                      lang=None, 
                      length=None, 
                      quality=None, 
                      humor=None, 
                      creativity=None)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants