Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questions about the dataset #1

Open
coconutbrand opened this issue Apr 12, 2024 · 3 comments
Open

questions about the dataset #1

coconutbrand opened this issue Apr 12, 2024 · 3 comments

Comments

@coconutbrand
Copy link

Thanks a lot for creating this dataset! I can't find much descriptions about the dataset, but it looks really useful, so I have a few questions. 1. I wondering what is the difference between this and the LA MIDI dataset? Is this a superset of LA MIDI dataset or are they for different purposes? 2. Another question is does this or LA MIDI contain any AI generated music (actually I had a similar question for the tegridy midi dataset you created too)?

@asigalov61
Copy link
Owner

@coconutbrand Thank you for your questions :)

Monster MIDI Dataset is indeed a superset of LA MIDI Dataset. The main difference is that Monster is a raw/unfiltered dataset (noisy) while LA dataset was filtered to be suitable for training AI models.

I have compiled Monster dataset for MIR purposes mostly, while LA dataset was specifically compiled for Music AI purposes.

RE AI generated music in Monster... Monster MIDI dataset contains all sorts of MIDIs (thats why it is a raw MIDI dataset), it also contains black MIDIs, MIDI art, low quality MIDIs and melody MIDIs. There maybe other stuff too but this is what I have noticed while working with it.

To my best knowledge, only quality AI generated music was added to the Monster so it should not be a concern since it would be indistinguishable from human music.

To elaborate a bit more, Monster basically a superset of all datasets (with some exceptions) that are present or listed in Tegridy MIDI dataset repo. So you can cross-reference MIDIs that way if you want.

Also, please note that all MIDIs in Monster dataset were read-checked and rewritten into a proper MIDI format so the md5 hashes are different from the originals. This was done to normalize and standardize the dataset and also fix all errors and spam like bad MIDI sigs or erroneous information.

Hope this answers your questions but if not, feel free to ask :)

Sincerely,

Alex

@coconutbrand
Copy link
Author

Thank you for the detailed answer! One last question: did you transcribe any midi yourself or are these mostly from MIDI available on the internet? Again thanks for making this!

@asigalov61
Copy link
Owner

@coconutbrand Yes, Monster includes transcribed MIDIs (by me and others) but not a lot. So yes, its mostly publicly avaialble MIDIs off the internet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants