Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.


Submission for Kaggle's American Epilepsy Society Seizure Prediction Challenge

This README and repository modelled on

Hardware / OS platform used

  • Various servers owned by Edinburgh University Informatics Department:
    • 64 AMD Opteron cores, 256GB RAM, 4TB disk
    • Scientific Linux
  • Various mid-high end desktops and laptops:
    • Intel processors (i3 and Xeons), 8-64GB RAM, 0.5-8TB disk
    • Arch Linux



  • MATLAB or Octave
  • Python 3.4.1
    • scikit_learn-0.15.2
    • numpy-1.8.1
    • scipy
    • h5py

Generate features

Place path to raw data organised by subject under the RAW_DATA_DIRS key of SETTINGS.json and check the values used in the SETTINGS.json



Then run ./preprocessing.m with:

matlab -nodisplay -nosplash -r "preprocessing"

or similar.

This will calculate features used the feature functions specified in SETTINGS.json FEATURES field and output them to TRAIN_DATA_PATH directory as HDF5 files.

HDF5 structure:

$feature_name.h5 = {$subject: {$type : {$segment_file_name : $feature_vector } } }

  • $feature_name.h5: is the feature name, modification type and version number e.g. (raw_feat_var_v2.h5 or ica_feat_covar_v5.h5 etc)
  • $type: data type e.g. 'preictal', 'interictal' or 'test'
  • $segment_file_name: the filename for the segment from which that vector was generated
  • $feature_vector: A 1xNxM feature vector for that segment using the specified feature function

Train classifier

One classifier is trained for each patient and serialised into the directory specific in SETTINGS.json under MODEL_PATH (default is model/).

This is achieved by running:


To run alternative models the options can be accessed through the standard help interface:

./ -h

Cross validation

Cross validation is run in the process of the script. The AUC for each subject and over all subjects is calculated and saved to the If the verbose option is set this will also print the calculated values to the command line.

Important note: cross validation is run by splitting the data over the hours that it is split into. This is very important, as this respects the split between training and test data for the leader board.

Make prediction

After running model files will be generated in the default model (model) directory. These will be automatically loaded along with the test data to classify the test data points. The results will be written to an output csv in the default output directory (output):


As above, options can be viewed by:

./ -h


    "TRAIN_DATA_PATH": "train", 
    "MODEL_PATH": "model", 
    "SUBJECTS": ["Dog_1",
    "FEATURES": ["feat_var",
    "TEST_DATA_PATH": "test", 
    "SUBMISSION_PATH": "output",
    "VERSION": "_v1",
    "RAW_DATA_DIRS": ["/disk/data2/neuroglycerin/hail-seizure-data/",
  • SUBJECTS: list of which subjects to use in the current run
  • VERSION: string to indicate version number of this run
  • RAW_DATA_DIRS: directory that contains the raw .mat data organised by subject
  • FEATURES: list of features used in this run
  • TRAIN_DATA_PATH: directory holding the preprocessed extracted features from raw data in per-feature HDF5s
  • MODEL_PATH: directory containing the serialised miodels
  • TEST_DATA_PATH: directory containing all output related to model testing (CV etc).
  • SUBMISSION_PATH: directory containing the submission csv for the current run
  • THRESHOLD: if present will activate VarianceThreshold
  • PCA: if present will activate Principle Component analysis transform, options not implemented
  • SELECTION: if present will activate univariate feature selection. Dictionary inside each of these keys will be used as options, keys are:
  • TREE_EMBEDDING: Random Tree Embedding transformation
  • BAGGING: meta-bagger using selected classifier as base, options are set as a dictionary at this key.
  • RFE: use recursive feature elimination, only works with linear SVC

Model documentation

Our final model was a combination of four models, all of which used a support vector machine classifier with feature selection. Notes on this, and the code actually used in the competition can be found the [Comparing outputs][comparing] IPython notebook. The settings for each of these models can be found in the settings directory of the repository.

The important part of this code that can combine the outputs to produce the final csv can be found in the script. Calling this with the four csvs four csvs found in merge.json will produce our final output csv:

./ -s merge.json -o merged_many_v1.csv


Submission for Kaggle's American Epilepsy Society Seizure Prediction Challenge







No releases published


No packages published