Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets for benchmarking strategies #8

Merged
merged 59 commits into from May 26, 2022
Merged

Datasets for benchmarking strategies #8

merged 59 commits into from May 26, 2022

Conversation

paulmorio
Copy link
Collaborator

Modules to download and process datasets from online sources intotorch.utils.data.Dataset instances, with additional attributes for (stratified) k-fold CV as described in the paper.

This incurs a few new dependencies namely openpyxl, xlrd, and pyreadr for processing the excel and R data storage formats for original raw datas.

Also included are utility functions for transforming each of the datasets into datamanagers that have "cold" or "warm" label initialisations for benchmarking AL strategies on the datasets.

Includes tests for all the modules implemented

…UCI (and some others that are stored as R mat files or excel xls files)
…l dataset instead of just the train portion using concatdataset
@paulmorio
Copy link
Collaborator Author

Updated with some tests for uci_datasets I looked into updating the coverage dynamically and the best solution I've come across so far is described here for github actions: https://github.com/marketplace/actions/dynamic-badges

Unfortunately I don't have the rights to access the secrets settings for the repo so I can't finish the instructions there.

I've added a reference documentation page. This could be followed by another tutorial at a later date.

thomasgaudelet
thomasgaudelet previously approved these changes May 9, 2022
a-pouplin
a-pouplin previously approved these changes May 11, 2022
@paulmorio paulmorio dismissed stale reviews from a-pouplin and thomasgaudelet via f6c0513 May 22, 2022 15:26
thomasgaudelet
thomasgaudelet previously approved these changes May 23, 2022
Copy link
Contributor

@thomasgaudelet thomasgaudelet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

a-pouplin
a-pouplin previously approved these changes May 24, 2022
@paulmorio paulmorio dismissed stale reviews from a-pouplin and thomasgaudelet via 110cc48 May 25, 2022 12:09
@paulmorio paulmorio merged commit 86899d1 into main May 26, 2022
@thomasgaudelet thomasgaudelet deleted the datasets branch May 30, 2022 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants