-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset splitting not working #3
Comments
Hi Danuta, |
Hi Christian,
Thank you for getting back to me so quickly.
My data are indeed from long deployments of acoustic tags on penguins, and
even though they are saved in shorter chunks, to synchronize the audio
recordings with data from the tag's other sensors, I use a single
deployment ID and time re start of deployment. I will try renaming the
files and giving the training another go.
Thanks again for your help,
Danuta
…On Tue, May 23, 2023 at 6:18 PM ChristianBergler ***@***.***> wrote:
Hi Danuta,
i am not aware about your data structure, but i have a guess what might
went wrong. ANIMAL-SPOT internally takes the following filename structure
"label_id_year_tape_startlabeltime_endlabeltime" ... Based on the "Year and
Tape" information it internally creates a set of "recording tapes" based on
the given data. A recording tape is always the comination between year and
tapename. When ANIMAL-SPOT is doing the data split (automatically) it makes
sure that NONE of the tapes are shared across partitions, in order to avoid
"cheating", e.g. audio data from the same tape, distributed across training
and test, makes it easier for the model, because it has already seen the
data during training. So, and i think this is your problem. Very likely the
amount of different tapes (in your case) is not much, so ANIMAL-SPOT puts
the stuff either in one of the buckets but nothing is left for the
remaining buckets. In case you dont have more different tapes and
everything comes e.g. from one recording, you can also "fool" ANIMAL-SPOT
by naming the "year_tape" information in an artificial random way, to
simulate different recording tapes. That should solve your problem
—
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AID6EJ6YYRJJUR6KUPAFYX3XHTPMJANCNFSM6AAAAAAYKONRZQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hi again, The training works with the randomized tape names. Thanks again! Best, |
Hi,
when setting up training for an Animal-Spot binary classification I am presented with a weird error. The dataset seems to not be split according to specified values in
main.py
. As you can see in the error messages below, the training set contains 0 files whereas the validation and test split contain the remaining files.When I run the script multiple time, it is random whether the train, val or test dataset is omitted. In all re-runs one of the categories contains 0 files which results in the error below.
What can I do?
Greetings,
Danuta
The text was updated successfully, but these errors were encountered: