This Image Generation approach is similar to Diffusion, but it uses Upscaling in the process. This is an Experiment using Scikit Learn and in no way better than Stable Diffusion, Dall-E, ...
- A video is first converted into thousands of single frames in desired resolutions. (automatedDataPrep.ipynb)
- This is then fed to upscalerGradually.ipynb which trains a NN on these frames. This works by taking the small Resolution Frames as X (Input) and the bigger Resolution Frames as Y (Output). In my approach I played around with the hidden layer sizes and found that a single layer with as many neurons as the input shape is quite good.
- The Resulting Models are saved into the specified Folder.
- Testing of the Upscaling Models can be done via onlyTestingUpscalerGradual.ipynb
With my computing power (16GB RAM, i7 QuadCore, No GPU used) it was quite hard to get anything meaningful out of this. It is interesting to play around with though. If you gave this 10x the Data and hours or days of training, which my system frankly can't handle, it would probably actually result in something interesting. I think so because when testing what happens when you give it more or less data, you can clearly see that more data results in huge improvements. Mhhh, I should try this in the cloud.