[Research] Audio-video offset synchronization and audio+input delay calibration #87

dtinth · 2015-02-04T10:45:34Z

Assumptions

Response time is zero. When the player hears the sound or sees that the note hits the judgement area, he/she pushes the button immediately.
When the game emits the sound, it takes time _S_ until the player hears it. (Audio latency)
When the game renders the display, it takes time _D_ until the player sees it. (Video latency)
When the player hits the button, it takes time _I_ until the computer recognizes it. (Input latency)

Findings

It is impossible to measure/calibrate the values of _S, _D, _I_ separately.
But it is possible to find these values:
- _S+_I (Audio + input latency)
- _D+_I (Video + input latency)
- _S-_D (Audio-video offset)

Methods

Calibration

Measure _A_ = _S+_I

This is the time it takes from when the sound is emitted to when the computer recognizes the button press. This means that when computer emits sound at time _t, the computer will receive button press at time _t'=_t+_A. To compensate, when button press is received at time _t', judgement must be performed for time _t'-_A=_t.
Measure _B_ = _S-_D

This is the audio/video offset. This can be done by letting user adjust the value until audio and video are in sync. With this value, we adjust the display to display notes in time _t+_B.
- When computer emits sound at time _t, player will hear it at time _t+_S_.
- When computer emits graphics at time _t+_B, player will see it at time _t+_D+_S-_D = _t+_S, thus proving that the user will find audio and video to be in sync.

Gameplay

sound(t): At time _t_, emit sound as usual.
display(t+B): At time _t_, emit graphic in sync with the sound.
judgment(t-A): Judge notes _A_ unit of time behind the sound emission.

Auto-Keysounds

This is a problem if the value of _A_ is large. If the player hit the note at the correct time, they will hear delayed keysounds.

That's why for most keysounded music games, user has to press the button significantly before the sound is emitted. Examples include DJMAX Technika and Tone Sphere.

Therefore, we have to play keysound for the player, so that the player hears at the correct time. This is called "Auto-sound" in Open2Jam.

Enabling AutoSound

AutoSound will only be enabled when value of _A_ is significantly large. I'd say 16 milliseconds. A warning message should appear in synchronization dialog saying that AutoSound has been enabled to compensate for audio delay.

AutoSound Mechanics

Each player has a playing state _p_, defaults to true.
- This state is true when player is actively playing and false when player isn't playing any note.
- When user doesn't play any note, we don't want keysounds to be automatically emitted, so we stop autosound mechanism until user hits a note again, setting _p_ back to true.
Each note _n_ will have a keysound state k(n), defaults to "NONE".
If player hits the note _n_ before it is sounded (k(n) is "NONE"):
- Emit the keysound.
- Set k(n) to "EMITTED"
- Set _p_ to true
If it's the time for the note (_t_ = t(n)) and k(n) is "NONE" and the player is playing (_p_ is true):
- Emit the keysound.
- Set k(n) to "EMITTED"
If the note is missed and k(n) is "EMITTED":
- Stop the keysound.
- Set _p_ to false.

The text was updated successfully, but these errors were encountered:

dtinth · 2015-02-04T10:46:07Z

Calibrating the value of _A_

Methods

Play a song and instruct user to press the button on every beat.
Record at least 56 samples of _A_ (time that keypress is registered - time that sound is emitted)
Analyze the recorded samples.

Status

Data have been collected and in analysis.

dtinth · 2015-02-25T10:20:16Z

From the data analysis, we have found that the delay on average is 13.96 ms. This is very small, and we are pretty safe to use 0ms delay as default value.

The average standard deviation of the delay is 21.57 milliseconds. This means that on average, we are 99% confident that the actual delay is within 6.59 of the obtained mean.

This shows that our method is quite effective.

Our advisor, @jittat, suggested that we can try to reduce the number of required samples, so that the calibration process would become shorter.

@Nachanok

Try to simulate by performing the same experiment, but only using the first _n_ samples. Find the smallest value of _n_ such that the average resulting 99% confidence interval is less than 10 milliseconds.

Nachanok · 2015-03-04T04:52:38Z

Mean and Standard Deviation

https://docs.google.com/spreadsheets/d/19ZJjIvLHhHUv6WkkCrK-fif-9eaevjfoqTT65jpOvnE/edit?usp=sharing

Mean and Standard Deviation (length 50)

https://docs.google.com/spreadsheets/d/1napp26yIzTn-jZjnYchcL7BX-9ADih5VW_ikUamWLWQ/edit?usp=sharing

dtinth · 2015-03-05T07:04:30Z

@Nachanok Thank you! I think it's better for you to focus on the skin's code. I'll continue from here. 😄

dtinth · 2015-03-05T07:16:19Z

I've done some changes to our calculation:

Since our sample is going to be small, and we only have sample standard deviation (we don't know the population standard deviation), I used T-score instead of Z-score.

Some communication problems also led to the sample data being incorrectly trimmed.

dtinth · 2015-03-05T07:25:05Z

Here's the confidence interval results by trying various values of _n. For each value of _n, each person's tapping pattern is truncated to the first _n_ taps.

n	90%	95%	99%
9999	3.940332	4.723988	5.654889
50	4.623089	5.554879	6.670214
49	4.613531	5.544791	6.660451
48	4.884783	5.869317	7.047786
47	4.927309	5.921907	7.113446
46	4.976760	5.982942	7.189458
45	5.037827	6.058083	7.282660
44	5.107284	6.143473	7.388458
43	5.138891	6.183502	7.439988
42	5.137204	6.183623	7.443767
41	5.494494	6.611390	7.954823
40	5.514859	6.638205	7.990986
39	5.581117	6.720470	8.094260
38	5.661806	6.820375	8.219230
37	5.737184	6.914181	8.337363
36	5.784753	6.974789	8.416010
35	5.761313	6.950079	8.392228
34	6.284942	7.577877	9.143717
33	6.369274	7.683485	9.277815
32	6.447250	7.781886	9.404011
31	6.690453	8.080363	9.773102
30	6.598211	7.974306	9.653954
29	6.629881	8.018521	9.717638
28	6.577397	7.961592	9.659865
27	7.366215	8.909082	10.796907
26	7.480202	9.054390	10.985765
25	7.561358	9.161038	11.129612
24	7.778887	9.434326	11.478393
23	7.907150	9.601103	11.700677
22	8.114986	9.866621	12.047054
21	8.375315	10.198808	12.479895
20	9.080049	11.039995	13.479732
19	9.291662	11.314664	13.845325
18	9.602974	11.714706	14.371431
17	10.136242	12.391364	15.247396
16	10.484268	12.849220	15.867984
15	10.148322	12.475783	15.474883
14	9.879257	12.191519	15.205670

Each song section is 28 hits. Therefore, with only one section, we are 98% sure that our observed mean will be within 10ms of the actual audio+input delay.

To be more sure (99%), I think obtaining just 35 samples are good enough.

dtinth · 2015-03-07T05:51:06Z

I think the preliminary research has obtained satisfactory result.

The task of implementing them in the game will be another issue.

dtinth mentioned this issue Feb 4, 2015

Auto-synchro: music-based audio+input latency calibration technique #85

Merged

dtinth added c:backlog and removed c:backlog labels Feb 20, 2015

dtinth assigned Nachanok Feb 25, 2015

dtinth added c:ready points:13 labels Feb 25, 2015

dtinth added this to the Feb 25 – Mar 3 milestone Feb 25, 2015

dtinth mentioned this issue Mar 4, 2015

BMS-Tools #109

Closed

dtinth assigned dtinth and unassigned Nachanok Mar 5, 2015

dtinth changed the title ~~Audio-video offset synchronization and audio+input delay calibration~~ [Research] Audio-video offset synchronization and audio+input delay calibration Mar 7, 2015

dtinth closed this as completed Mar 7, 2015

dtinth removed the c:ready label Mar 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Research] Audio-video offset synchronization and audio+input delay calibration #87

[Research] Audio-video offset synchronization and audio+input delay calibration #87

dtinth commented Feb 4, 2015

dtinth commented Feb 4, 2015

dtinth commented Feb 25, 2015

Nachanok commented Mar 4, 2015

dtinth commented Mar 5, 2015

dtinth commented Mar 5, 2015

dtinth commented Mar 5, 2015

dtinth commented Mar 7, 2015

[Research] Audio-video offset synchronization and audio+input delay calibration #87

[Research] Audio-video offset synchronization and audio+input delay calibration #87

Comments

dtinth commented Feb 4, 2015

Assumptions

Findings

Methods

Calibration

Gameplay

Auto-Keysounds

Enabling AutoSound

AutoSound Mechanics

dtinth commented Feb 4, 2015

Calibrating the value of _A_

Methods

Status

dtinth commented Feb 25, 2015

Nachanok commented Mar 4, 2015

dtinth commented Mar 5, 2015

dtinth commented Mar 5, 2015

dtinth commented Mar 5, 2015

dtinth commented Mar 7, 2015