Migration to python3 and increase chance to make it run on your own machine #21

benelot · 2019-05-08T15:58:39Z

Migrate to python3 and pure keras, clean up and beautify the code, structure input and output data and added a requirement.txt and some nice READMEs.

I hope this serves some people to get this running.

Note: It seems that the compiled version in the repo is from some other state not included in the repository and has some very minor changes such as the bottom-left buttons for different instruments (as far as I could tell) (it is a rewrite in C++, check here: #22).

benelot · 2019-05-12T06:43:12Z

I should also add python2 compatibility for print at least. But probably I used some other non python2 syntax, I am not sure.

benelot · 2019-05-12T06:43:48Z

I should add to the documentation how to install it.

benelot · 2019-05-12T06:44:14Z

I should add how to load the saved slider settings.

benelot · 2019-05-12T06:44:47Z

I should add timestamps to the files written to disk.

HackerPoet · 2019-05-12T06:49:31Z

How stable would you say this PR is right now? I haven't been able to test it myself, but if you were able to parse the data, train a network, and use the generator successfully, then I would feel comfortable merging this in.

benelot · 2019-05-18T20:41:08Z

I did some more stuff since then, but I think it is not in this branch. I will test it again and give you the good to merge. At the time I made this PR, I could properly load and parse the data, train the net and use the generator.

Just in case you are interested: In some continued work of mine, I also added pytorch training and composer support. Should I make you a separate PR or do you want to keep your project as is and just get the migration?

ghost · 2019-06-05T11:16:34Z

Hello,

I'm using Python 3.7.3; and I installed the appropriate libraries listed in the requirements.txt you included in your pull request.
However, there was a deviation in using "pip install" to get the library you listed for numpy.

I was trying to use pip install numpy==1.14.3 but since there were issues trying to get that version, I installed numpy 1.16.4 instead. Would this cause instability?

The main issue I'm encountering is being able to properly load songs.
Is there more instructions on how to properly use the load_songs.py script?

I created a folder titled "Music" into the directory with all the other python files from your pull request, and I have 150 MIDI files in there. However, when I try running load_songs.py the command prompt shows that no samples are saved (see the image attachment).

Is there a reason this is happening?

benelot · 2019-06-06T06:18:21Z

So the problem of saving 0 samples is just that the code did not find your midi files. The folder must be called data/raw. It will then place the preprocessed samples into data/interim. The update to numpy==1.16.4 is correct and should not cause any instability. I will finish up the PR soon.

ghost · 2019-06-06T23:19:02Z

Awesome! Thanks for the fast response. That helped me to properly load the MIDI files.
I was able to run some test trials of load_songs.py -> train.py -> live_edit.py and I ran across one more issue. I used a MIDI sample size of 10 files, as well as a sample size of 400 files.

However, after training and running both models (the 10-file size first, the 400-file size second), they both end up condensing all notes to the start of each measure. I did not alter the source code, so 2000 epochs in total during each training phase. Any suspicions on why this is happening?

HackerPoet · 2019-06-06T23:22:52Z

That's typical when all sliders are zeroed at the center. Does it still happen when you generate random songs with 'R' or adjust the sliders?

ghost · 2019-06-06T23:49:59Z

Sorry I should’ve specified. This issue is consistent even when changing the feature-sliders and the note-certainty generation level.

HackerPoet · 2019-06-07T00:00:21Z

Assuming you had WRITE_HISTORY on, did any of the earlier epochs produce music that had notes outside the first beat?

ghost · 2019-06-07T01:04:21Z

Unfortunately no, it seems that the first beat is where all generated notes are restricted to.
Here's a couple snapshots of what the 10th epoch looks like (trained with 400 songs).

HackerPoet · 2019-06-07T01:33:59Z

That's strange. Any other info would be helpful. What did your loss look like? Mine was around 0.003 and converged quickly. If you're getting a similar loss, then it may be a bug with generating the music. If not, then the network is having trouble converging. You may need to adjust the parameters of the network to compensate for your smaller dataset (mine was 10x larger at over 4000 midi files).

ghost · 2019-06-07T01:46:42Z

Here's the loss after training 2000 sessions with 400 MIDI files. I'll adjust the parameters, scrape for more files to increase the sample size and report back here with the results.

HackerPoet · 2019-06-07T02:08:19Z

Okay, that does looks like over-fitting to the smaller dataset. Try increasing the dropout rate DO_RATE to 0.25 or 0.5 to see if that helps. Also, just as a sanity check, can you confirm that the test song test.mid became nearly identical to a song from your sample set?

ghost · 2019-06-14T00:15:56Z

Finally had some time to do a few more tests.
I posted the results in image form, but all 4 of my training tests ultimately faced the same problem, where all notes generated are condensed into the first beat of each measure. The choice of parameters for testing are arbitrary, but I had limited time to try out a few different things:

100 epochs, 400 MIDI files for training, DO rate = 0.1
50 epochs, 400 MIDI files for training, DO rate = 0.5
100 epochs, 400 MIDI files for training, DO rate = 0.25
100 epochs, 3000 MIDI files for training, DO rate = 0.5

The last image I used Euphony (MIDI visual player) to try playing the test.mid file from the fourth trial (100 epochs // 3000 MIDI files // DO rate 0.5).
To clarify, those condensed notes seen in the last image occurs 16 times (notes are only appearing to be generated once per measure after training).

benelot · 2019-06-23T09:40:33Z

Thanks for testing. I am currently on holiday, but if I am back I can give you some test to perform with a simple dataset (bach piano dataset works well for me). Having all notes at the beginning is usually if you did not train for long enough. We will see.

…

On Fri, Jun 14, 2019, 02:15 moonflowAeon ***@***.***> wrote: Finally had some time to do a few more tests. I posted the results in image form, but all 4 of my training tests ultimately faced the same problem, where all notes generated are condensed into the first beat of each measure. - 100 epochs, 400 MIDI files for training, DO rate = 0.1 - 50 epochs, 400 MIDI files for training, DO rate = 0.5 - 100 epochs, 400 MIDI files for training, DO rate = 0.25 - 100 epochs, 3000 MIDI files for training, DO rate = 0.5 The last image I used Euphony (MIDI visual player) to try playing the test.mid files from the fourth trial (100 epochs // 3000 MIDI files // DO rate 0.5). To clarify, those condensed notes seen in the last image occurs 16 times (notes are only appearing to be generated once per measure after training). [image: 4 test results] <https://user-images.githubusercontent.com/51399704/59474913-cd854a00-8e16-11e9-9fc1-df72f6e2979c.png> [image: 16 consolidated notes test (from t4 epoch 100)] <https://user-images.githubusercontent.com/51399704/59474920-d249fe00-8e16-11e9-8066-ffaa8d8fb366.png> — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#21?email_source=notifications&email_token=AAXXXK3PF2U47NDZ2YBLVVLP2LPL7A5CNFSM4HLTBOD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXVLZ7A#issuecomment-501923068>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAXXXK4NWWKNTAT3JHOJ7L3P2LPL7ANCNFSM4HLTBODQ> .

ghost · 2019-07-12T00:54:43Z

Sounds good benelot, I'll do my best to keep trying different variations and tests, this is quite a strange issue.

benelot · 2019-07-14T11:48:20Z

Ok, so finally I have some time to test this through with you @moonflowAeon. Sorry for the delay in answering. I will prepare a run pipeline jupyter notebook/script and include the bach dataset into the repository for some simple training (~1.4 Mb, so it should not inflate/clutter the repo).

…s&lengths

benelot · 2019-07-14T14:01:18Z

I included some more commits I did a while ago and tested all stages with the bach dataset. I included the bach dataset now into the data/raw folder. Please try the whole training again with that dataset and tell me how this goes with your setup. I also updated the requirements.txt to represent the dependencies with which it works for me here. If you want to test the composer stage only. I included a link to a pretrained model in the updated Readme.md in the repo. Here is the excerpt from it:

Pretrained models

Below are some pretrained models for different datasets. To use the model, download the zip from the
link below and extract it into the results/history folder of your project. The zip should contain a folder
named e and some number, indicating the number of epochs the model was trained and some model.h5.
Pay attention when extracting the model not to override one of your own trained, most beloved models!

Bach dataset: https://drive.google.com/open?id=1P_hOF0v55m1Snzvri5OBVP5yy099Lzgx

ahfriedman · 2019-07-26T15:28:20Z

I'm not sure where to post this but I couldn't find a better place so I figured I'd ask here.

In midi_utils.py, I noticed midi_to_samples has an option for encode_length. If I read the comments correctly, it sounds like this should be a way of making it such that the notes are played for longer than one tick. However, when enabling it nothing seems to change. I guessed this was because of the end for the note already being set. But when I tried changing that, I broke it. Are notes with length not yet supported?

benelot · 2019-07-27T15:00:19Z

Good place to ask if you ask me. Are you using the code released in this PR? If yes, I can help you out on making this work. Can you train a network without note lengths? I did not actively work with the note length, but we can look into it.

…

On Fri, Jul 26, 2019, 17:28 AHFriedman ***@***.***> wrote: I'm not sure where to post this but I couldn't find a better place so I figured I'd ask here. In midi_utils.py, I noticed midi_to_samples has an option for encode_length. If I read the comments correctly, it sounds like this should be a way of making it such that the notes are played for longer than one tick. However, when enabling it nothing seems to change. I guessed this was because of the end for the note already being set. But when I tried changing that, I broke it. Are notes with length not yet supported? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#21?email_source=notifications&email_token=AAXXXK6RXMGJTPJBLFJ4NIDQBMJZNA5CNFSM4HLTBOD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD24544I#issuecomment-515497585>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAXXXK6VC4CV2IHK757IIQDQBMJZNANCNFSM4HLTBODQ> .

ahfriedman · 2019-07-27T15:07:56Z

I am using the same code and am able to get the network to train.
When I forced encode_length to be True and re-ran preprocess_songs and then train.py, gt.mid still only had the notes without duration. When I looked at midi_utils.py I wondered if the notes were being cut short because of:

for note in all_notes:
        for start_end in all_notes[note]:
            if len(start_end) == 1:
                start_end.append(start_end[0] + 1)

(starting on line 86 in midi_utils.py) and when it goes to encode the length on line 107, it sees the notes all ending one tick after they start. The program only threw an error when I removed this section to see if the part of the code that encodes the length would make up for it.

benelot · 2019-07-27T17:42:05Z

Cool! Thanks for testing the code. I will see if I can make it work and give you some updates! The original code never uses that, but it could be fun to run on many more midis to learn the length as well. P.S. Just to make this PR ready for merging: Anything special you had to run to make the network train? Any specific dataset you train on that works well?

…

On Sat, Jul 27, 2019, 17:07 AHFriedman ***@***.***> wrote: I am using the same code and am able to get the network to train. When I forced encode_length to be True and re-ran preprocess_songs and then train.py, gt.mid still only had the notes without duration. When I looked at midi_utils.py I wondered if the notes were being cut short because of: for note in all_notes: for start_end in all_notes[note]: if len(start_end) == 1: start_end.append(start_end[0] + 1) (starting on line 86 in midi_utils.py) and when it goes to encode the length on line 107, it sees the notes all ending one tick after they start. The program only threw an error when I removed this section to see if the part of the code that encodes the length would make up for it. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#21?email_source=notifications&email_token=AAXXXK4MJJ3IYOX3KOBUCGDQBRQE5A5CNFSM4HLTBOD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD26NEKA#issuecomment-515691048>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAXXXK6LHGKIEXZSOFFLACTQBRQE5ANCNFSM4HLTBODQ> .

ahfriedman · 2019-07-27T18:03:03Z

Thanks! Even if it just ends up putting a note every tick between the start and end of the actual note and then has to convert back to notes with length, I think the results could be cool. The only change I had to make to the code to get to run was installing pyaudio by doing:

pip install pipwin
pipwin install pyaudio

But I guess I kind of asked for issues because I didn’t switch over to Linux while running this. In requirements.txt pyaduio is also listed twice. I didn’t try this on any specific dataset, just some midi files of songs I found (so I also increased the generated song length).

Thank you for putting time into this. I couldn’t get the original version to work on the GPU (and just adding pip install tensorflow-gpu worked for this). Its been really interesting to mess with. Just for fun and in part because I just took some songs to train it with, I’ve been working on a java program that attempts to split the bass/rhythm/drums or whatever into a second instrument. I did use a comically small sample size for training, so at times it uses small parts of songs. But I deserve and was asking for that by using a comically small sample size. And it still does make interesting results with that.

ghost · 2019-07-27T19:42:11Z

Sorry for the delay @benelot! The updated migration you posted has been stable from the training tests that I’ve run on it.

Here are five models that I recorded data on:

Model 1 parameters (4.5 hours processing time)
• 2000 epochs
• Bach repository

Model 2
• 100 epochs
• Bach repository

Model 3
• 100 epochs
• Random MIDI repository (168 songs)

Model 4
• 100 epochs
• Random MIDI repository (936 songs)

Model 5 (14 hours processing time)
• 4000 epochs
• Bach repository

benelot · 2019-07-29T13:06:41Z

So it looks like it works!

benelot · 2019-07-29T13:07:37Z

@HackerPoet It looks like we are ready to merge.

HackerPoet · 2019-07-30T09:52:37Z

@ahfriedman The note length feature was something I was originally planning to do, but later decided against it to make the training more uniform, but the code was left there still in an unfinished state. The problem was that some of the MIDIs in my database contained zero-length notes, some had small fixed durations, and some had dynamic lengths. I was worried the network would get confused by this with my limited dataset and it would have complicated the synth and midi code, so I just scraped it.

@benelot @moonflowAeon @ahfriedman It's great that you all have it working now! I'll merge this PR after a quick review. I'd love to see any samples or models you produce from other datasets, I'm really curious how it compares to the video game one that I used.

HackerPoet

Where do these Bach midi files come from? Can you add the appropriate licence to this folder and a readme that credits the author of the transcription? I want to verify it has a permissive license before I add it to this repository.

benelot · 2019-07-30T20:05:43Z

Bach Chorales Music21 midis are under the following license:
http://web.mit.edu/music21/doc/about/about.html#the-music21-corpus
Here is the segment that explains the contents of the dataset:
http://web.mit.edu/music21/doc/about/referenceCorpus.html#johann-sebastian-bach

The dataset is also used in the Bachprop paper (which is where I got it from)
https://github.com/FlorianColombo/BachProp

I can produce and upload some samples for you from the bach chorales. The bach trained model is referenced in the readme for download (I will keep it in my google drive for the longest while).

benelot · 2019-07-30T20:06:24Z

Thanks again for this really cool project, it is really fun!

ahfriedman · 2019-08-07T18:17:07Z

To try getting songs with note duration, I ended up writing a java program that is able to take a midi file and re-write the long notes as the same note being played every tick between the start and end of the note. I figured I would train with the outputs and write another thing that would take the notes being played every tick and convert them back to long notes.

After training with this, I noticed I started to get notes with duration from composer.py - which I didn't think composer.py could do. Sadly, running these midi files back through my program still results in them being a noisy mess - with what I'm uploading being one of the better things so far (would upload midi directly but apparently github doesn't support .mid files).
test.zip
.

I'll probably continue to mess with this in hopes of it getting better, but I'm not sure how much better it can get.

ahfriedman · 2019-08-10T23:00:52Z

@HackerPoet @benelot After doing more training, I've been able to get some interesting outputs of songs with note duration. I think its still a bit more chaotic than normal, and training has a lot of variance in it. But its definitely training and starting to have good outputs.

Also @benelot I've noticed that while training, the random vector midi files don't seem to work (I don't know if this is intended). Also, I don't know if this is intended but the best model is saved to /results/history/. But when continuing training, it looks in /results/. If you don't manually copy the model over, it will be overwritten and training will effectively reset.

Thank you both for working on this and making it public so people can use it!

Krakitten · 2019-09-11T02:51:54Z

@benelot @HackerPoet
I was messing around with the composer application and added a few features, only made a small change to the network itself
https://github.com/Krakitten/Composer/tree/migration
More details are in the read me on my fork

Save slider values command (Also saves instrument, model path, threshold, speed, and volume)
Importing the saved txt files
A new instrument
Some additional commands to tweak existing songs
Blending between multiple songs in a series.
I was surprised by how well it transitioned.
I blended 6 of the songs my model generated https://drive.google.com/file/d/17MNTsHMXghApAa_GcUMB-pY0PTWWRnIF/view?usp=sharing

I also noticed the model was overfitting my data (4 thousand midi files from https://www.ninsheetmusic.org/ vast majority Nintendo), so I changed the number of params to 40 and got much better results.

benelot · 2020-02-21T14:55:28Z

Soo, to merge or not to merge?

Benjamin Ellenberger added 9 commits May 8, 2019 16:45

Update repository to python3 and pure keras

a4bdd46

Add requirements.txt with pinned dependencies

a7df6ff

Beautify live edit and write READMEs to run the code

3c9877e

Add a section on how to use the live edit mode

171018c

Format controls section properly

1c889ca

Add improvements in structure and avoid learning from percussion tracks

8218a22

Merge branch 'migration'

d5caffb

Fix path to write the history to

f226a59

Write simple notebook to run the code in colab

58c7324

benelot mentioned this pull request May 12, 2019

Could you possibly make a python 3 version? #11

Open

Benjamin Ellenberger added 8 commits May 13, 2019 12:50

Comment code and extract autoencoder model

0b574f7

Represent paths in a more generic way

60c4301

Keep data interim folder in repository

6a96b56

Fix folder making

bd8082e

Print current epoch and total epoch while training

e06c742

Set instrument of the midi output

cbe4ab9

Refactor code

1ebd6cf

Fix missing learning rate

5441c0e

Benjamin Ellenberger added 5 commits July 14, 2019 14:22

Include bach dataset to start training easily

ac65ab0

Update readme file again

c108850

Update requirements.txt

9f3ca10

Make composer more verbose and robust to missing /data/interim/sample…

9e7b681

…s&lengths

Update pretrained bach model

9885a53

HackerPoet reviewed Jul 30, 2019

View reviewed changes

Migration to python3 and increase chance to make it run on your own machine #21

Are you sure you want to change the base?

Migration to python3 and increase chance to make it run on your own machine #21

Conversation

benelot commented May 8, 2019 • edited Loading

benelot commented May 12, 2019

benelot commented May 12, 2019

benelot commented May 12, 2019

benelot commented May 12, 2019

HackerPoet commented May 12, 2019

benelot commented May 18, 2019

ghost commented Jun 5, 2019 • edited by ghost Loading

benelot commented Jun 6, 2019

ghost commented Jun 6, 2019 • edited by ghost Loading

HackerPoet commented Jun 6, 2019

ghost commented Jun 6, 2019

HackerPoet commented Jun 7, 2019

ghost commented Jun 7, 2019

HackerPoet commented Jun 7, 2019

ghost commented Jun 7, 2019

HackerPoet commented Jun 7, 2019

ghost commented Jun 14, 2019 • edited by ghost Loading

benelot commented Jun 23, 2019 via email

ghost commented Jul 12, 2019

benelot commented Jul 14, 2019

benelot commented Jul 14, 2019

Pretrained models

ahfriedman commented Jul 26, 2019

benelot commented Jul 27, 2019 via email

ahfriedman commented Jul 27, 2019 • edited Loading

benelot commented Jul 27, 2019 via email

ahfriedman commented Jul 27, 2019

ghost commented Jul 27, 2019 • edited by ghost Loading

benelot commented Jul 29, 2019

benelot commented Jul 29, 2019

HackerPoet commented Jul 30, 2019

HackerPoet left a comment

Choose a reason for hiding this comment

benelot commented Jul 30, 2019

benelot commented Jul 30, 2019

ahfriedman commented Aug 7, 2019

ahfriedman commented Aug 10, 2019

Krakitten commented Sep 11, 2019

benelot commented Feb 21, 2020

benelot commented May 8, 2019 •

edited

Loading

ghost commented Jun 5, 2019 •

edited by ghost

Loading

ghost commented Jun 6, 2019 •

edited by ghost

Loading

ghost commented Jun 14, 2019 •

edited by ghost

Loading

ahfriedman commented Jul 27, 2019 •

edited

Loading

ghost commented Jul 27, 2019 •

edited by ghost

Loading