AKMA : Audio Kinetic Media Art

AIFFEL X SeSAC Hackathon Project

Team-Introduction

Audio Kinetic Media Art, shortly AKMA, which pronounces the same as the devil in Korean, is the media art generation solution based on StlyeGAN3 that produces a media art reacting actively to the volume of input audio.

Team-Member

Name	Position	Email	Role
최동현	Project Manager	david0302@naver.com	Project Management, Model Framework Development, ML Research, Model Optimization, Data Acquisition
이상현	Technology Manager	roughideal@gmail.com	Model Management, StyleGAN3 Training, Server Engineering, ML Research, Model Automation
김영현	Lead Programmer	overevo489@gmail.com	Version Control, StyleGAN3 Encoder Training, Model Serving, ML Research, Model Automation
이호진	Team Member	hojinlee93@gmail.com	Model QA, Model Experiment, Model Optimization, ML Research, Documentation
윤세영	Team Member	yoonsy1023@gmail.com	Data Acquisition

AKMA-Introduction

Audio Kinetic Media Art, shortly AKMA, which pronounces the same as the devil in Korean, is the media art generation solution based on StlyeGAN3 that produces a media art reacting actively to the volume of input audio.

Factors-in-AKMA

Users can adjust following factors to produce target media art.

- Input Audio : Must be wav file (24bit wav file is not acceptable)
- Input Network_pkl : StyleGAN3 model containing contents that the user wants for media art
- fps : frame per second of the result
- window length : It mainly adjust the flatness of the waveform of input audio, affect how sensitively the media art reacts to the volume of the input audio. Must be an odd number.
- polyorder : The order of the polynomial used to fit the samples. Must be smaller than window length. 
- compression : The larger you give a number, the more it compresses the waveform of the input audio exponentially, reducing the variance of the waveform, making the result less reactive to the input audio. The default value is 1. Must be larger than 0. Recommend Value between 0.5 and 2.
- seeds_top_num : Number of seeds corresponds to the largest volume of the waveform. The more you give the number, the model produces more images in the same duration, making the speed of the video faster. Must be larger than 4.
- seeds_bottom_num : Number of seeds corresponds to the 0 volume of the waveform. The more you give the number, the model produces more images in the same duration, making the speed of the video faster. Must be larger than 4.

AKMA-Walkthrough

1. Input audio file. It must be wav file. 24bit wav file is not acceptable.


2. Adjust fps, window length, polyorder, compression. You can check the waveform image which will be the guidance for audio reactive function. Adjust those factors until you get a desirable waveform for your media art.


3. Input StyleGAN3 network pkl file.


4. Adjust the number of seeds for the top and bottom of the waveform. The more you give the number, the model produces more images in the same duration, making the speed of the video faster. Usually, I recommend you to give more numbers for the top, so the video gets more active when the volume gets larger.


5. Start generation and wait till the process ends.

Changes-from-the-reference

Changes from StlyeGAN2 audio reactive

StyleGAN3 applied.  

More adjustment functions have been provided for waveform, the guidance for audio reactive function.

Interpolate two videos by the waveform, not only two images.  

Able to adjust video speed by adding more seeds.

Why StlyeGAN3?

By achieving Alias-Free using signal processing theory, it solved texture-sticking problem of StyleGAN2, producing higher quality images.  

This shows significantly better video generation.

Changes from Real-ESRGAN-Video-Batch-Process

The reference does not keep the original audio file. Now it keeps the original audio file and merges it to enhanced video.

Why Real-ESRGAN?

By using sinc filter, Real-ESRGAN reduces ringing and overshooting, improving image quality.

Project-Timeline

If you are curious about the project timeline, please click the link above.

Custom-Model

East-Sea

The model trained with image data from the East Sea.

Tech-Stack

Model
Custom Data
- YOUTUBE-WAVE
- EAST-SEA
Serving
- BentoML
- Google Cloud Platform

Sample-Result

AKMA+awesome_landscape+Real-ESRGAN+PARADOX
AKMA+awesome_wikiart+PARADOX
AKMA+EAST-SEA

Reference

Stylegan3
StyleGAN2 Reactive Audio
Real-ESRGAN
Real-ESRGAN-Video-Batch-Process

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

AKMA : Audio Kinetic Media Art

Table of Contents

Team-Introduction

Team-Member

AKMA-Introduction

Factors-in-AKMA

AKMA-Walkthrough

Changes-from-the-reference

Changes from StlyeGAN2 audio reactive

Why StlyeGAN3?

Changes from Real-ESRGAN-Video-Batch-Process

Why Real-ESRGAN?

Project-Timeline

Custom-Model

East-Sea

Tech-Stack

Sample-Result

Reference

Files

README.md

Latest commit

History

README.md

File metadata and controls

AKMA : Audio Kinetic Media Art

Table of Contents

Team-Introduction

Team-Member

AKMA-Introduction

Factors-in-AKMA

AKMA-Walkthrough

Changes-from-the-reference

Changes from StlyeGAN2 audio reactive

Why StlyeGAN3?

Changes from Real-ESRGAN-Video-Batch-Process

Why Real-ESRGAN?

Project-Timeline

Custom-Model

East-Sea

Tech-Stack

Sample-Result

Reference