Data Formatting Standards

General: Data for analysis are stored and manipulated as structures, which contain all relevant information necessary to work with that data and are in .mat files, in which commonly used data types have a standardized format as outlined in this document.

All files pertaining to a given recording should be in a single self-contained folder called baseName. For example, /recording7/recording7.ripples.events.mat will be a file containing information about ripples from recording7, in the .events format.

As you (inevitably) run into data types that don't really fit into any of these boxes, please consult the #code-development channel at buzsakilab.slack.com/messages/code-development/, and add the necessary format/standards.

Table of Contents: Mat

ANIMAL METADATA

This is the most general category including all information that is valid for the whole experiment (i.e. animal).
File naming format: baseName.AnimalMetadata.mat

Information about the implant and implanted animal/brain.

Species
Strain
Sex
Age
Weight
GeneticLine
VirusInjection type
VirusCoordinates(AP,ML,DV)
VirusInjectionDate
SurgeryDate
Probes
- Type
- SiteSpatialLayout
- SiteSizesInUm
- Impedances
- OrientationOfProbe (not sure best way to do this)
- APCoordinate
- MLCooridate
- APAngle
- MLAngle
- DepthFromSurface
TargetAnatomy
Anesthesia
Analgesics
Antibiotics
SurgicalComplications
SurgicalNotes

SESSION INFO

This structure will contain information that is specific for a given session. Most information can be extracted from 'baseName.xml' using LoadParameters.m. The .depth field are NOT generated using LoadParameters and must either be entered manually or with another function. Note: regions can be added to the .xml using the regions plugin, badchannels can be added to the .xml with the badchannels plugin. OR both can be added using bz_getSessionInfo(basePath,'editGUI’,true). (BETA, please improve!)

File naming format: baseName.sessionInfo.mat

Required struct fields:

channels: Nx1 vector listing channels in this session
region: Nx1 cell array listing brain region for each channel (examples: 'ls', 'CA1','PPC','hpc')
depth: screw turns * thread count, since implantation (measured in um)
spikeGroups: [1×1 struct]
nChannels: 128
FileName: 'DT2_rPPC_rCCG_3540um_1288um_20160227_160227_121226'
HiPassFreq: 500
Date: '2016-02-27'
VoltageRange: 20
Amplification: 1000
lfpSampleRate: 1250

Optional struct fields:

badchannels: vector of channels not to use for clustering, LFP analyses (default = [])
rates: [1×1 struct] (only for backwards compatibility)
Offset: 0
nBits: 16
AnatGrps: {1×13 cell} (only for backwards compatibility, redundant with .spikeGroups)
SpkGrps: {1×13 cell} (only for backwards compatibility, redundant with .spikeGroups)
ElecGp: {1×13 cell} (only for backwards compatibility, redundant with .spikeGroups)
nElecGps: 13 (only for backwards compatibility, redundant with .spikeGroups)
SampleTime: 50
LFPLoPassFreq

CELL INFO

A .cellinfo.mat is a file format for storing info about recorded units.
For example, connectivity.cellinfo.mat, celltype.cellinfo.mat. One very common cellinfo.mat file is baseName.spikes.cellinfo.mat.
File naming format: baseName.cellinfoName.cellinfo.mat

Conventions: all cells should have associated with them a UID (unit ID#) that is used to reference that cell in every cellinfo.mat. This UID is generated the first time bz_GetSpikes is called.

Struct Fields (Required):

UID
sessionName
region

Example Struct fields:

SpikeGroupID: Which spike group it came from
MaxChannel: Channel of max waveform
PeakVoltage: Voltage of peak of max waveform
PeakToTroughTime: Measured on average waveform
LRatio: Clustering quality metric
IsolationDistance: Isolation quality metric
Class: flexible category per user
Promoter: based on optogenetic tagging response
SynapticEffect: E or I. Based on short timescale CCGs
MeanFiringRate: Hz, Over whole recording
ReceptiveField: Orientation, Place, Movement, Color, none
WakeFiringRate: Hz
NREMFiringRate: Hz
REMFiringRate: Hz
MicroarousalFiringRate: Hz
RippleModulation: Ratio of ripple firing rate to baseline firing rate
So many others... we can think of this as something each user can add to - ie theta modulation depth etc etc

POP INFO

A .popinfo.mat is a file format for storing info about a population of neurons. This can be used to store data/results for population decoding methods. File naming format: baseName.popinfoName.popinfo.mat

Struct Fields (Required):

UID - list of UIDs for the cells in the population
sessionName
region Example Struct fields:
results: Matlab table with decoding results

CHANNEL INFO

A .channelinfo.mat is a file format for storing info about recorded channels. For example, rippleCorrelogram.channelinfo.mat, csd.channelinfo.mat, electrodeImpedanceMeasures.channelinfo.mat.

File naming format: baseName.channelinfoName.channelinfo.mat

Struct Fields (required):

chanID (0-indexed, matching neuroscope)
data (the info about recorded channels, channels should be in the first dimension?)
baseName
detectorinfo: substructure with information about the detection method (fields below)
.detectorname: name of the function used for detection
.detectionparms: parameters used for detection
.detectiondate: date of detection

(optional, depending on what is being stored):

x_bins
eventName
timestamps

LFP

The raw (1250Hz, all channels) LFP will be stored in a baseName.lfp file, a la neuroscope. This file has the LFP data for every channel in the recording, and channels are 0-indexed. LFP can be loaded into a buzcode lfp structure in matlab using the I/O function bz_GetLFP.

An additional .lfp.mat file format is also available for storage processed LFP data or LFP from select channels. For example, one could store the wavelet spectrum or store a copy of a single channel commonly used for analyses with artifacts removed. The .lfp.mat has within it a single struct called lfpprocessName.
File naming format: baseName.lfpprocessName.lfp.mat
(eg. '/recording1/recording1.CA1Wavelets.lfp.mat')

Struct fields (required):

data: a [Nt x Nd] matrix of the processed LFP data (i.e. for 100 wavelet frequencies, Nd=100, for beta band-filtered signal from 56 channels, Nd=56)
timestamps: a [Nt x 1] vector of timestamps
samplingRate: sampling frequency of the processed LFP
channels: channel numbers(s) from which the LFP came (0-indexed)

Struct fields (optional):

duration: length (in seconds) of recording
params: parameters used in processing of the LFP
freqs: frequencies of each dimension (if applicable)
region: brain region where the recording site is

EVENTS

An .events.mat is a file format for storing times and info about detected events. An .events.mat file has within it a single struct: eventsName.
File naming format: baseName.eventName.events.mat
Functions for detection and creation of .events.mats can be found here: [detectors] (https://github.com/buzsakilab/buzcode/tree/master/detectors/eventDetection)

Struct fields (required):

   timestamps: neuroscope compatible matrix with 1-2 columns - [starts stops] (in seconds)

   detectorinfo: substructure with information about the detection method (fields below)

      .detectorname: name of the function used for detection

      .detectionparms: parameters used for detection

      .detectiondate: date of detection

      .detectionintervals: [start stop] pairs of intervals used for detection (optional)

      .detectionchannel: channel used for detection (optional)

(examples of optional event-specific fields)

```
 amplitudes: [Nx1 matrix]
```
```
frequencies: [Nx1 matrix]
```
```
  durations: [Nx1 matrix]
```

STATES

A data format for holding multiple temporal states with starts and stops. A .states.mat file has within it a single struct: statetypeName. Examples might be sleep/wake states, behavioral states....
File naming format: baseName.statetypeName.states.mat

Struct fields (required):

      ints: a structure containing stop/start times for each state.

       .stateName [N x 2] start/stop time for each instance of state stateName

For example: SleepState.ints.REM will be a [N x 2] array of start and stop times of all REM epochs and SleepState.ints.NREM will be a [N x 2] array of start and stop times of all NREM epochs.

    detectorinfo: substructure with information about the detection method

    .detectorname: name of the function used for detection

    .detectionparms: parameters used for detection

    .detectiondate: date detection was run

Struct fields (optional):

```
      idx:
```

        .states a [t x 1] vector giving the state at each point in time

        .timestamps [t x 1] vector of times for each timepjoint in idx.states

        .statenames {Nstates} cell array for the name of each state

MANIPULATION

A file format for storing experimental manipulations. A .manipulation.mat file has within it a single struct: manipulationName. File naming format: baseName.manipulationName.manipulation.mat

   timestamps: a [Nt x 1] vector of timestamps that correspond to manipulation time

   data: a [Nt x Nd] matrix containing the corresponding manipulation value (i.e. magnitude) at each timestamp. Can have multiple columns for multiple types of co-occuring manipulation. Manipulations that don't correspond to the same timestamps should have separate manipulation.mat files

BEHAVIOR

A data format for storing behavioral data. A .behavior.mat file contains a single struct: behaviorName. The behaviorName struct contains a description of the behavior, time series of important aspects of behavior (i.e. position, head direction), and can also contain relevant behavioral events. Note: care needs to be taken to insure we are storing time data well, i.e. that things align between physiology and behavior. If possible, all time measures should be in units of seconds, and time 0 should be aligned to the start of the dat file.
File naming format: baseName.behaviordataName.behavior.mat

Required fields:

timestamps: array of timestamps that match the data subfields (in seconds)
samplingRate: sampling rate of behavioral tracking (Hz)
(datasubstruct): a substructure containing the behavior values corresponding to each timestamps - see below
behaviorinfo: a substructure with information about the behavior, acquisition, and processing
.description: text description of the behavior
.acquisitionsystem: (example 'optitrack','LED','motion capture')
.processingfunction: name of the function used to process the behavior
.substructnames: names of the datasubstructs (i.e. 'position')

Possible Data Substructures:

position: .x, .y, and .z
units: millimeters, centimeters, meters[default]
orientation: .x, .y, .z, and .w
rotationType: euler or quaternion[default]
pupil: .x, .y, .diameter

Optional fields:

events: substructure of important time markers and information for behavioral events

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Formatting Standards

ANIMAL METADATA

SESSION INFO

CELL INFO

POP INFO

CHANNEL INFO

LFP

EVENTS

STATES

MANIPULATION

BEHAVIOR

LED TRACKING

OPTITRACK/MOTIVE 3D HEAD TRACKING

MARKERLESS MOTION CAPTURE

ANALYSIS RESULTS

Clone this wiki locally