Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added biostars_qa dataset and pre-processing scripts #2353

Merged
merged 5 commits into from Apr 11, 2023
Merged

Added biostars_qa dataset and pre-processing scripts #2353

merged 5 commits into from Apr 11, 2023

Conversation

cannin
Copy link
Contributor

@cannin cannin commented Apr 6, 2023

Adds BioStars QA dataset and Resolves: #2236

@github-actions
Copy link

github-actions bot commented Apr 6, 2023

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@olliestanley
Copy link
Collaborator

Would it be possible to put the code in each file into a function, and then have a single script which calls both functions? That would make it a little easier to reproduce this dataset quickly if needed in future

Copy link
Collaborator

@olliestanley olliestanley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@olliestanley olliestanley enabled auto-merge (squash) April 11, 2023 07:44
@olliestanley olliestanley merged commit 2997fd1 into LAION-AI:main Apr 11, 2023
1 check passed
@simorgh10
Copy link

Hello,

I started to delve into Open Assistant code, and I am going through the latest commits. I am also new to Open Source contribution. So I beg your indulgence.

In commit 2997fd1, parameters passed to function get_biostars_dataset are shadowed by local variables.

Seems to a be minor detail though. Need a new issue for that ?

def get_biostars_dataset(start_idx=9557161, accept_threshold=1000000, sleep=0.1, folder="biostars"):
    ...
    start_idx = 9557161
    accept_threshold = 1000000
    sleep = 0.1

@olliestanley
Copy link
Collaborator

Hello,

I started to delve into Open Assistant code, and I am going through the latest commits. I am also new to Open Source contribution. So I beg your indulgence.

In commit 2997fd1, parameters passed to function get_biostars_dataset are shadowed by local variables.

Seems to a be minor detail though. Need a new issue for that ?

def get_biostars_dataset(start_idx=9557161, accept_threshold=1000000, sleep=0.1, folder="biostars"):
    ...
    start_idx = 9557161
    accept_threshold = 1000000
    sleep = 0.1

Good spot. I don't think this is going to be a problem, but you're welcome to make a pull request fixing it, no need to make an issue first!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Biostars Dataset for Bioinformatics QA
3 participants