Skip to content

Data Formatting Standards

Dan Levenstein edited this page Mar 4, 2019 · 160 revisions

General: Data for analysis are stored and manipulated as structures, which contain all relevant information necessary to work with that data and are in .mat files, in which commonly used data types have a standardized format as outlined in this document.

All files pertaining to a given recording should be in a single self-contained folder called baseName. For example, /recording7/recording7.ripples.events.mat will be a file containing information about ripples from recording7, in the .events format.

As you (inevitably) run into data types that don't really fit into any of these boxes, please consult the #code-development channel at buzsakilab.slack.com/messages/code-development/, and add the necessary format/standards.

Table of Contents: Mat

ANIMAL METADATA

This is the most general category including all information that is valid for the whole experiment (i.e. animal).
File naming format: baseName.AnimalMetadata.mat

Information about the implant and implanted animal/brain.

  • Species
  • Strain
  • Sex
  • Age
  • Weight
  • GeneticLine
  • VirusInjection type
  • VirusCoordinates(AP,ML,DV)
  • VirusInjectionDate
  • SurgeryDate
  • Probes
    • Type
    • SiteSpatialLayout
    • SiteSizesInUm
    • Impedances
    • OrientationOfProbe (not sure best way to do this)
    • APCoordinate
    • MLCooridate
    • APAngle
    • MLAngle
    • DepthFromSurface
  • TargetAnatomy
  • Anesthesia
  • Analgesics
  • Antibiotics
  • SurgicalComplications
  • SurgicalNotes

SESSION INFO

This structure will contain information that is specific for a given session. Most information can be extracted from 'baseName.xml' using LoadParameters.m. The .depth field are NOT generated using LoadParameters and must either be entered manually or with another function. Note: regions can be added to the .xml using the regions plugin, badchannels can be added to the .xml with the badchannels plugin. OR both can be added using bz_getSessionInfo(basePath,'editGUI’,true). (BETA, please improve!)

File naming format: baseName.sessionInfo.mat

Required struct fields:

  • channels: Nx1 vector listing channels in this session
  • region: Nx1 cell array listing brain region for each channel (examples: 'ls', 'CA1','PPC','hpc')
  • depth: screw turns * thread count, since implantation (measured in um)
  • spikeGroups: [1×1 struct]
  • nChannels: 128
  • FileName: 'DT2_rPPC_rCCG_3540um_1288um_20160227_160227_121226'
  • HiPassFreq: 500
  • Date: '2016-02-27'
  • VoltageRange: 20
  • Amplification: 1000
  • lfpSampleRate: 1250

Optional struct fields:

  • badchannels: vector of channels not to use for clustering, LFP analyses (default = [])
  • rates: [1×1 struct] (only for backwards compatibility)
  • Offset: 0
  • nBits: 16
  • AnatGrps: {1×13 cell} (only for backwards compatibility, redundant with .spikeGroups)
  • SpkGrps: {1×13 cell} (only for backwards compatibility, redundant with .spikeGroups)
  • ElecGp: {1×13 cell} (only for backwards compatibility, redundant with .spikeGroups)
  • nElecGps: 13 (only for backwards compatibility, redundant with .spikeGroups)
  • SampleTime: 50
  • LFPLoPassFreq

CELL INFO

A .cellinfo.mat is a file format for storing info about recorded units.
For example, connectivity.cellinfo.mat, celltype.cellinfo.mat. One very common cellinfo.mat file is baseName.spikes.cellinfo.mat.
File naming format: baseName.cellinfoName.cellinfo.mat

Conventions: all cells should have associated with them a UID (unit ID#) that is used to reference that cell in every cellinfo.mat. This UID is generated the first time bz_GetSpikes is called.

Struct Fields (Required):

  • UID
  • sessionName
  • region

Example Struct fields:

  • SpikeGroupID: Which spike group it came from
  • MaxChannel: Channel of max waveform
  • PeakVoltage: Voltage of peak of max waveform
  • PeakToTroughTime: Measured on average waveform
  • LRatio: Clustering quality metric
  • IsolationDistance: Isolation quality metric
  • Class: flexible category per user
  • Promoter: based on optogenetic tagging response
  • SynapticEffect: E or I. Based on short timescale CCGs
  • MeanFiringRate: Hz, Over whole recording
  • ReceptiveField: Orientation, Place, Movement, Color, none
  • WakeFiringRate: Hz
  • NREMFiringRate: Hz
  • REMFiringRate: Hz
  • MicroarousalFiringRate: Hz
  • RippleModulation: Ratio of ripple firing rate to baseline firing rate
  • So many others... we can think of this as something each user can add to - ie theta modulation depth etc etc

POP INFO

A .popinfo.mat is a file format for storing info about a population of neurons. This can be used to store data/results for population decoding methods. File naming format: baseName.popinfoName.popinfo.mat

Struct Fields (Required):

  • UID - list of UIDs for the cells in the population
  • sessionName
  • region Example Struct fields:
  • results: Matlab table with decoding results

CHANNEL INFO

A .channelinfo.mat is a file format for storing info about recorded channels. For example, rippleCorrelogram.channelinfo.mat, csd.channelinfo.mat, electrodeImpedanceMeasures.channelinfo.mat.

File naming format: baseName.channelinfoName.channelinfo.mat

Struct Fields (required):

  • chanID (0-indexed, matching neuroscope)
  • data (the info about recorded channels, channels should be in the first dimension?)
  • baseName
  • detectorinfo: substructure with information about the detection method (fields below)
  • .detectorname: name of the function used for detection
  • .detectionparms: parameters used for detection
  • .detectiondate: date of detection

(optional, depending on what is being stored):

  • x_bins
  • eventName
  • timestamps

LFP

The raw (1250Hz, all channels) LFP will be stored in a baseName.lfp file, a la neuroscope. This file has the LFP data for every channel in the recording, and channels are 0-indexed. LFP can be loaded into a buzcode lfp structure in matlab using the I/O function bz_GetLFP.

An additional .lfp.mat file format is also available for storage processed LFP data or LFP from select channels. For example, one could store the wavelet spectrum or store a copy of a single channel commonly used for analyses with artifacts removed. The .lfp.mat has within it a single struct called lfpprocessName.
File naming format: baseName.lfpprocessName.lfp.mat
(eg. '/recording1/recording1.CA1Wavelets.lfp.mat')

Struct fields (required):

  • data: a [Nt x Nd] matrix of the processed LFP data (i.e. for 100 wavelet frequencies, Nd=100, for beta band-filtered signal from 56 channels, Nd=56)
  • timestamps: a [Nt x 1] vector of timestamps
  • samplingRate: sampling frequency of the processed LFP
  • channels: channel numbers(s) from which the LFP came (0-indexed)

Struct fields (optional):

  • duration: length (in seconds) of recording
  • params: parameters used in processing of the LFP
  • freqs: frequencies of each dimension (if applicable)
  • region: brain region where the recording site is

EVENTS

An .events.mat is a file format for storing times and info about detected events. An .events.mat file has within it a single struct: eventsName.
File naming format: baseName.eventName.events.mat
Functions for detection and creation of .events.mats can be found here: [detectors] (https://github.com/buzsakilab/buzcode/tree/master/detectors/eventDetection)

Struct fields (required):

  •    timestamps: neuroscope compatible matrix with 1-2 columns - [starts stops] (in seconds)  
    
  •    detectorinfo: substructure with information about the detection method (fields below)
    
  •       .detectorname: name of the function used for detection
    
  •       .detectionparms: parameters used for detection  
    
  •       .detectiondate: date of detection
    
  •       .detectionintervals: [start stop] pairs of intervals used for detection (optional)
    
  •       .detectionchannel: channel used for detection (optional)  
    

(examples of optional event-specific fields)

  •  amplitudes: [Nx1 matrix]
    
  • frequencies: [Nx1 matrix]
    
  •   durations: [Nx1 matrix]
    

STATES

A data format for holding multiple temporal states with starts and stops. A .states.mat file has within it a single struct: statetypeName. Examples might be sleep/wake states, behavioral states....
File naming format: baseName.statetypeName.states.mat

Struct fields (required):

  •       ints: a structure containing stop/start times for each state. 
    
  •        .stateName [N x 2] start/stop time for each instance of state stateName
    

For example: SleepState.ints.REM will be a [N x 2] array of start and stop times of all REM epochs and SleepState.ints.NREM will be a [N x 2] array of start and stop times of all NREM epochs.

  •     detectorinfo: substructure with information about the detection method
    
  •     .detectorname: name of the function used for detection
    
  •     .detectionparms: parameters used for detection  
    
  •     .detectiondate: date detection was run  
    

Struct fields (optional):

  •       idx:
    
  •         .states a [t x 1] vector giving the state at each point in time
    
  •         .timestamps [t x 1] vector of times for each timepjoint in idx.states
    
  •         .statenames {Nstates} cell array for the name of each state
    

MANIPULATION

A file format for storing experimental manipulations. A .manipulation.mat file has within it a single struct: manipulationName. File naming format: baseName.manipulationName.manipulation.mat

  •    timestamps: a [Nt x 1] vector of timestamps that correspond to manipulation time
    
  •    data: a [Nt x Nd] matrix containing the corresponding manipulation value (i.e. magnitude) at each timestamp. Can have multiple columns for multiple types of co-occuring manipulation. Manipulations that don't correspond to the same timestamps should have separate manipulation.mat files
    

BEHAVIOR

A data format for storing behavioral data. A .behavior.mat file contains a single struct: behaviorName. The behaviorName struct contains a description of the behavior, time series of important aspects of behavior (i.e. position, head direction), and can also contain relevant behavioral events. Note: care needs to be taken to insure we are storing time data well, i.e. that things align between physiology and behavior. If possible, all time measures should be in units of seconds, and time 0 should be aligned to the start of the dat file.
File naming format: baseName.behaviordataName.behavior.mat

Required fields:

  • timestamps: array of timestamps that match the data subfields (in seconds)
  • samplingRate: sampling rate of behavioral tracking (Hz)
  • (datasubstruct): a substructure containing the behavior values corresponding to each timestamps - see below
  • behaviorinfo: a substructure with information about the behavior, acquisition, and processing
  • .description: text description of the behavior
  • .acquisitionsystem: (example 'optitrack','LED','motion capture')
  • .processingfunction: name of the function used to process the behavior
  • .substructnames: names of the datasubstructs (i.e. 'position')

Possible Data Substructures:

  • position: .x, .y, and .z
  • units: millimeters, centimeters, meters[default]
  • orientation: .x, .y, .z, and .w
  • rotationType: euler or quaternion[default]
  • pupil: .x, .y, .diameter

Optional fields:

  • events: substructure of important time markers and information for behavioral events

LED TRACKING

bz_processConvertLED2Behav.m

OPTITRACK/MOTIVE 3D HEAD TRACKING

bz_processConvertOptitrack2Behav.m

MARKERLESS MOTION CAPTURE

ANALYSIS RESULTS