Skip to content

Latest commit

 

History

History
159 lines (83 loc) · 6.82 KB

README.md

File metadata and controls

159 lines (83 loc) · 6.82 KB

AKMA : Audio Kinetic Media Art

AIFFEL X SeSAC Hackathon Project

Table of Contents

Team-Introduction

Audio Kinetic Media Art, shortly AKMA, which pronounces the same as the devil in Korean, is the media art generation solution based on StlyeGAN3 that produces a media art reacting actively to the volume of input audio.

Team-Member

Name Position Email Role
최동현 Project Manager david0302@naver.com Project Management, Model Framework Development, ML Research, Model Optimization, Data Acquisition
이상현 Technology Manager roughideal@gmail.com Model Management, StyleGAN3 Training, Server Engineering, ML Research, Model Automation
김영현 Lead Programmer overevo489@gmail.com Version Control, StyleGAN3 Encoder Training, Model Serving, ML Research, Model Automation
이호진 Team Member hojinlee93@gmail.com Model QA, Model Experiment, Model Optimization, ML Research, Documentation
윤세영 Team Member yoonsy1023@gmail.com Data Acquisition

AKMA-Introduction

Audio Kinetic Media Art, shortly AKMA, which pronounces the same as the devil in Korean, is the media art generation solution based on StlyeGAN3 that produces a media art reacting actively to the volume of input audio.

Factors-in-AKMA

Users can adjust following factors to produce target media art.

- Input Audio : Must be wav file (24bit wav file is not acceptable)
- Input Network_pkl : StyleGAN3 model containing contents that the user wants for media art
- fps : frame per second of the result
- window length : It mainly adjust the flatness of the waveform of input audio, affect how sensitively the media art reacts to the volume of the input audio. Must be an odd number.
- polyorder : The order of the polynomial used to fit the samples. Must be smaller than window length. 
- compression : The larger you give a number, the more it compresses the waveform of the input audio exponentially, reducing the variance of the waveform, making the result less reactive to the input audio. The default value is 1. Must be larger than 0. Recommend Value between 0.5 and 2.
- seeds_top_num : Number of seeds corresponds to the largest volume of the waveform. The more you give the number, the model produces more images in the same duration, making the speed of the video faster. Must be larger than 4.
- seeds_bottom_num : Number of seeds corresponds to the 0 volume of the waveform. The more you give the number, the model produces more images in the same duration, making the speed of the video faster. Must be larger than 4.

AKMA-Walkthrough

Audio Kinetic Media Art

1. Input audio file. It must be wav file. 24bit wav file is not acceptable.


2. Adjust fps, window length, polyorder, compression. You can check the waveform image which will be the guidance for audio reactive function. Adjust those factors until you get a desirable waveform for your media art.


3. Input StyleGAN3 network pkl file.


4. Adjust the number of seeds for the top and bottom of the waveform. The more you give the number, the model produces more images in the same duration, making the speed of the video faster. Usually, I recommend you to give more numbers for the top, so the video gets more active when the volume gets larger.


5. Start generation and wait till the process ends.

Changes-from-the-reference

Changes from StlyeGAN2 audio reactive

StyleGAN3 applied.  

More adjustment functions have been provided for waveform, the guidance for audio reactive function.

Interpolate two videos by the waveform, not only two images.  

Able to adjust video speed by adding more seeds.  

Why StlyeGAN3?

By achieving Alias-Free using signal processing theory, it solved texture-sticking problem of StyleGAN2, producing higher quality images.  

This shows significantly better video generation.  

Changes from Real-ESRGAN-Video-Batch-Process

The reference does not keep the original audio file. Now it keeps the original audio file and merges it to enhanced video.

Why Real-ESRGAN?

By using sinc filter, Real-ESRGAN reduces ringing and overshooting, improving image quality.

If you are curious about the project timeline, please click the link above.

Custom-Model

East-Sea Data

  • The model trained with image data from the East Sea.

Tech-Stack

Sample-Result

Reference