Skip to content

syndata generation

Ben Cowley edited this page Jun 8, 2016 · 5 revisions

Synthetic data generation

Synthetic data generation flowchart Figure 1. Flowchart of the synthetic data generation

Seed data

BCI competition IV dataset 1

We use the calibration data from dataset 1 as an example seed data for synthetic data generation. The dataset is described in full detail at the competition web page (http://bbci.de/competition/iv/desc_1.html).

In short, the dataset is recorded from healthy subjects during motor imagery of limb movement (hands and legs). Visual cues were presented on a computer screen and the subject was instructed to imagine movement of the corresponding limb.

Technical specifications

  • Device: BrainAmp MR plus
  • Sampling rate: 100 Hz (downsampled)
  • Number of channels: 59

Data generation process

Synthetic data can be generated based on seed data using generate_synthetic_data() which first converts the example dataset to EEGLAB-format and then passes it to ctaptest-functions.

The new base data is generated with an AR model using the original data and a target channel location file. Additionally, the user can also specify the order of the AR model. The process for generating synthetic data is as follows:

  1. Pick one channel from the provided channel location file
  2. Find the closest matching channel in the input data set
  3. Generate synthetic data with the AR model
  4. Repeat for the next channel
Artifact injection

Once new synthetic dataset has been generated artifacts can be added to the data. The three artifact types currently included in the ctaptest-module are blinks, EMG and bad channels.

Blinks

Blink-type artifact represent ocular artifacts caused by eye-blinks. These have an impulse-like waveform, are more prominent on the frontal electrodes and gradually degrade towards the central and parietal regions. User can specify the start time, duration and amplitude of each blink.

EMG

EMG artifacts represent the myogenic contamination caused by the activation of muscles. In EEG this typically means a burst-like activity. User can specify the start time, amplitude, duration, location, radius and frequency range of the EMG burst. Location and radius are used to specify the origin and extent of the EMG contamination.

Bad channel

Bad channels are EEG channels where the variation has either been reduced to nearly a flatline (representing a broken electrode) or amplified to produce abnormally high amplitudes (representing a loose electrode). User can specify the channel and the scaling factor for the channel.