AIFFEL X SeSAC Hackathon Project
- Team-Introduction
- Team-Member
- AKMA-Introduction
- Factors-in-AKMA
- AKMA-Walkthrough
- Changes-from-the-reference
- Project-Timeline
- Custom-Model
- Tech-Stack
- Sample-Result
- Reference
Audio Kinetic Media Art, shortly AKMA, which pronounces the same as the devil in Korean, is the media art generation solution based on StlyeGAN3 that produces a media art reacting actively to the volume of input audio.
Name | Position | Role | |
---|---|---|---|
최동현 | Project Manager | david0302@naver.com | Project Management, Model Framework Development, ML Research, Model Optimization, Data Acquisition |
이상현 | Technology Manager | roughideal@gmail.com | Model Management, StyleGAN3 Training, Server Engineering, ML Research, Model Automation |
김영현 | Lead Programmer | overevo489@gmail.com | Version Control, StyleGAN3 Encoder Training, Model Serving, ML Research, Model Automation |
이호진 | Team Member | hojinlee93@gmail.com | Model QA, Model Experiment, Model Optimization, ML Research, Documentation |
윤세영 | Team Member | yoonsy1023@gmail.com | Data Acquisition |
Audio Kinetic Media Art, shortly AKMA, which pronounces the same as the devil in Korean, is the media art generation solution based on StlyeGAN3 that produces a media art reacting actively to the volume of input audio.
Users can adjust following factors to produce target media art.
- Input Audio : Must be wav file (24bit wav file is not acceptable)
- Input Network_pkl : StyleGAN3 model containing contents that the user wants for media art
- fps : frame per second of the result
- window length : It mainly adjust the flatness of the waveform of input audio, affect how sensitively the media art reacts to the volume of the input audio. Must be an odd number.
- polyorder : The order of the polynomial used to fit the samples. Must be smaller than window length.
- compression : The larger you give a number, the more it compresses the waveform of the input audio exponentially, reducing the variance of the waveform, making the result less reactive to the input audio. The default value is 1. Must be larger than 0. Recommend Value between 0.5 and 2.
- seeds_top_num : Number of seeds corresponds to the largest volume of the waveform. The more you give the number, the model produces more images in the same duration, making the speed of the video faster. Must be larger than 4.
- seeds_bottom_num : Number of seeds corresponds to the 0 volume of the waveform. The more you give the number, the model produces more images in the same duration, making the speed of the video faster. Must be larger than 4.
1. Input audio file. It must be wav file. 24bit wav file is not acceptable.
2. Adjust fps, window length, polyorder, compression. You can check the waveform image which will be the guidance for audio reactive function. Adjust those factors until you get a desirable waveform for your media art.
3. Input StyleGAN3 network pkl file.
4. Adjust the number of seeds for the top and bottom of the waveform. The more you give the number, the model produces more images in the same duration, making the speed of the video faster. Usually, I recommend you to give more numbers for the top, so the video gets more active when the volume gets larger.
5. Start generation and wait till the process ends.
StyleGAN3 applied.
More adjustment functions have been provided for waveform, the guidance for audio reactive function.
Interpolate two videos by the waveform, not only two images.
Able to adjust video speed by adding more seeds.
By achieving Alias-Free using signal processing theory, it solved texture-sticking problem of StyleGAN2, producing higher quality images.
This shows significantly better video generation.
The reference does not keep the original audio file. Now it keeps the original audio file and merges it to enhanced video.
By using sinc filter, Real-ESRGAN reduces ringing and overshooting, improving image quality.
If you are curious about the project timeline, please click the link above.
- The model trained with image data from the East Sea.
-
Model
-
Custom Data
-
Serving