This notebook introduces two packages:
- `npc_lims` for getting information about sessions (metadata, paths, status)
- `npc_sessions` for using processing data and converting to NWB format

# Setup
- first ensure credentials are available via any combination of:
    - system environment variables
    - a .env file
    - `Secrets` in a CodeOcean capsule's `environment` tab
    
- see https://github.com/AllenInstitute/npc_lims/blob/main/README.md for info on
    the minimum set of credentials required, and setting up tokens in CodeOcean

- the first code cell will be a good test: an exception will be raised if credentials
  aren't set up correctly

# Finding available sessions

## from databases of tracked sessions
- if you don't need a specific session, the easiest way to just get a list of all DR
  sessions is via `npc_lims.get_session_info()`

In [1]:
import npc_lims

all_sessions = npc_lims.get_session_info()
len(all_sessions)

4261

- the dataclass returned has minimal information about the session:

In [4]:
import pprint

pprint.pprint(all_sessions[0])

SessionInfo(id='670243_2023-10-17',
            project='DynamicRouting',
            is_ephys=False,
            is_sync=False,
            allen_path=WindowsUPath('//allen/programs/mindscope/workgroups/dynamicrouting/DynamicRoutingTask/Data/670243'),
            day=None,
            session_kwargs={},
            notes='',
            issues=[])


- the two `bool` attributes (`is_ephys` and `is_sync`) are created on
initialization and should give enough information to filter for training sessions
in behavior boxes (no sync), habs or opto-only sessions (sync, no ephys) and
full ephys sessions

- additional attributes can give information about a session on-demand (as they
  take time to look for files on S3):

In [8]:
(
    all_sessions[0].is_uploaded, # this refers to a "proper" upload into aind's system, with metadata
    all_sessions[0].is_sorted, 
    all_sessions[0].is_annotated, 
    all_sessions[0].cloud_path,
)

(False,
 False,
 False,
 S3Path('s3://aind-scratch-data/ben.hardcastle/DynamicRoutingTask/Data/670243'))

- `cloud_path` is an instance of `upath.UPath` 
- it behaves like a `pathlib.Path` object in the Python stdlib, allowing us to work with S3
  much like a regular filesystem - although sometimes with a bit of extra work:

In [15]:
import io
import h5py

stim_files = all_sessions[0].cloud_path.glob('*.hdf5')
for _ in range(5):
    print(next(stim_files).as_posix())
    
h5py.File(io.BytesIO(next(stim_files).read_bytes()), 'r').keys()

s3://aind-scratch-data/ben.hardcastle/DynamicRoutingTask/Data/670243/DynamicRouting1_670243_20230502_152458.hdf5
s3://aind-scratch-data/ben.hardcastle/DynamicRoutingTask/Data/670243/DynamicRouting1_670243_20230503_155301.hdf5
s3://aind-scratch-data/ben.hardcastle/DynamicRoutingTask/Data/670243/DynamicRouting1_670243_20230504_151133.hdf5
s3://aind-scratch-data/ben.hardcastle/DynamicRoutingTask/Data/670243/DynamicRouting1_670243_20230505_134550.hdf5
s3://aind-scratch-data/ben.hardcastle/DynamicRoutingTask/Data/670243/DynamicRouting1_670243_20230508_150905.hdf5


<KeysViewHDF5 ['acquisitionSignalLine', 'ampModFreq', 'autoRewardMissTrials', 'autoRewardOnsetFrame', 'behavNidaqDevice', 'blockCatchProb', 'blockStim', 'blockStimProb', 'blockStimRewarded', 'configPath', 'customSampling', 'deltaWheelPos', 'digitalSolenoidTrigger', 'drawDiodeBox', 'evenSampling', 'firstBlockNogoStim', 'frameIntervals', 'frameRate', 'frameSignalLine', 'framesPerBlock', 'galvoNidaqDevice', 'galvoVoltage', 'gratingEdge', 'gratingEdgeBlurWidth', 'gratingOri', 'gratingPhase', 'gratingSF', 'gratingSize', 'gratingTF', 'gratingType', 'incorrectSound', 'incorrectSoundDur', 'incorrectSoundFreq', 'incorrectSoundLevel', 'incorrectSoundVolume', 'incorrectTimeoutColor', 'incorrectTimeoutFrames', 'incorrectTrialRepeats', 'lickFrames', 'lickLine', 'linearSweepFreq', 'logSweepFreq', 'manualRewardFrames', 'maxFrames', 'maxTrials', 'maxWheelAngleChange', 'microphoneCh', 'microphoneData', 'minUnimodalTrials', 'minWheelAngleChange', 'monBackgroundColor', 'monDistance', 'monGamma', 'monSize

## from subject and date
- to get a specific session:
    - just pass a string containing a labtracks mouse
    ID (MID) and a date and our session databases will
    be queried
        - date should be YYYYDDMM (any separators accepted)
        - MID and date can be in any order (whitespace and underscore separators accepted)
        - i.e. a stim filename works fine (time is ignored):
          `DynamicRouting1_620263_20220425_084516`
- the following are all equivalent:


In [19]:
sessions = (
    npc_lims.get_session_info('670243_20230505'),
    npc_lims.get_session_info('670243 2023-05-05'),
    npc_lims.get_session_info('2023-05-05 670243'),
    npc_lims.get_session_info('DynamicRouting1_670243_20230505_134550'),
)
assert len(set(sessions)) == 1

## from a path
- if we a MID and date can be parsed from the path, it can be used as above
- URIs to cloud paths are fine too:

In [20]:
sessions = (
    npc_lims.get_session_info('//allen/programs/mindscope/workgroups/dynamicrouting/DynamicRoutingTask/Data/670243/DynamicRouting1_670243_20230505_134550.hdf5'),
    npc_lims.get_session_info('s3://aind-scratch-data/ben.hardcastle/DynamicRoutingTask/Data/670243/DynamicRouting1_670243_20230505_134550.hdf5'),
)
assert len(set(sessions)) == 1

---
# session data
For analysis of actual data we switch over to `npc_sessions`
- `npc_sessions` uses `npc_lims` to get paths and session info
- `npc_sessions.utils` contains functions for processing raw data files from a session (sync file, hdf5 stim files, OpenEphys files, etc.)
- one big class is devoted to pulling together all of the processed components and metadata, and converting to NWB format
    - as above, we pass a session ID (labtracks mouse ID and date) or a path
    - importing the package is currenlty slow
    - initialization of the object should always be fast (data is processed on-demand)

In [1]:
import npc_sessions

In [2]:
npc_sessions.DynamicRoutingSession('670243 2023-05-05')

DynamicRoutingSession('670243_2023-05-05')

- again, if you don't need a particular session and just want to loop over all
available sessions, an iterator is provided which queries session-tracking
databases and return `DynamicRoutingSession` instances (most-recent first)
- see the function docstring for more info about usage: https://github.com/AllenInstitute/npc_sessions/blob/f92e226ab0922b4919b9442dc63594de572ecb78/src/npc_sessions/sessions.py#L49

In [2]:
import npc_sessions
 
next(npc_sessions.get_sessions())

DynamicRoutingSession('670243_2023-10-17')

- a metadata-only NWB instance is provided, which uses information from available
  session files to give a richer overview than the dataclass returned from
  `npc_lims.get_session_info()`
  - since it requires opening stim hdf5 files and some globbing in folders
    (potentially in the cloud), generating this object takes some time, but it is
    designed to be relatively fast

In [3]:
session = next(npc_sessions.get_sessions())

print(f"{type(session.metadata) = }")
session.metadata

type(session.metadata) = <class 'pynwb.file.NWBFile'>


- the `keywords` list in the `NWBFile` tells us which components are available for
  the session
- these are derived from `bool` attributes on the session object: 

In [12]:
print(session.keywords)
(
    session.is_task,    # has a readable `DynamicRouting1*.hdf5` file
    session.is_opto,    # opto applied during behavior task trials, not optotagging
    session.is_sync,
    session.is_video,
    session.is_ephys,
    session.is_sorted,
    session.is_annotated,
    session.is_templeton,
)

['behavior', 'sync', 'video', 'ephys', 'no units']


(True, False, True, True, True, False, False, False)

- these are used throughout the session object to determine the course of
  processing, for example:
  - almost every session has a  `DynamicRouting1*.hdf5` file, but we do have a couple where writing
  the file to disk failed, so it's not a given
  - before generating a trials table from the `DynamicRouting1*.hdf5`, we can
    check whether `session.is_task == True`
- these attributes are also useful for filtering sessions:

In [4]:
for session in npc_sessions.get_sessions():
    if session.is_task and session.is_ephys:
        break
session.metadata

  self.falseAlarmSameModal.append(self.falseAlarmTrials[sameModal].sum() / sameModal.sum())
  self.falseAlarmOtherModalGo.append(self.falseAlarmTrials[otherModalGo].sum() / otherModalGo.sum())
  self.catchResponseRate.append(self.catchResponseTrials[blockTrials].sum() / self.catchTrials[blockTrials].sum())
  self.hitRate.append(self.hitTrials[blockTrials].sum() / self.goTrials[blockTrials].sum())
  self.falseAlarmRate.append(self.falseAlarmTrials[blockTrials].sum() / self.nogoTrials[blockTrials].sum())
  self.falseAlarmOtherModalNogo.append(self.falseAlarmTrials[otherModalNogo].sum() / otherModalNogo.sum())


- when generating the "full" `NWBFile` (metadata + data), all available components
  will be processed:
  - for example, if `sync` is not in `keywords`, trial times and running/lick data
    will be be generated with "good-enough" timing info
  - ephys data requires timing info from sync, however, so currently a session
  cannot have any ephys components without sync
    
- processing of some components, like ephys data, may take a minute or two, so it
  may make sense to disable components you don't need:
  - to get the previous session without ephys, we pass a kwarg to
    `get_sessions()` or to `DynamicRoutingSession()`:

In [5]:
session = npc_sessions.DynamicRoutingSession(session.session_id, is_ephys=False)
assert session.is_ephys is False
session.keywords

  self.falseAlarmSameModal.append(self.falseAlarmTrials[sameModal].sum() / sameModal.sum())
  self.falseAlarmOtherModalGo.append(self.falseAlarmTrials[otherModalGo].sum() / otherModalGo.sum())
  self.catchResponseRate.append(self.catchResponseTrials[blockTrials].sum() / self.catchTrials[blockTrials].sum())
  self.hitRate.append(self.hitTrials[blockTrials].sum() / self.goTrials[blockTrials].sum())
  self.falseAlarmRate.append(self.falseAlarmTrials[blockTrials].sum() / self.nogoTrials[blockTrials].sum())
  self.falseAlarmOtherModalNogo.append(self.falseAlarmTrials[otherModalNogo].sum() / otherModalNogo.sum())


['behavior', 'sync', 'video']

- note that `ephys` is no longer in `keywords`

- note that any of the attributes in the `NWBFile` (like `session_id` and `keywords`) are also accessible via our
  `DynamicRoutingSession` instance

# NWB components
https://github.com/AllenInstitute/npc_sessions/tree/f92e226ab0922b4919b9442dc63594de572ecb78#current-nwb-components

## Intervals

- in each NWB file there can be multiple tables (think dataframes) with
  information about intervals of time:
    - the commonality is that each row contains information about one time interval
    - each interval must have a `start_time` and a `stop_time`, specified in
      seconds relative to `session_start_time`

- 

In [5]:
session.nwb

root pynwb.file.NWBFile at 0x2372257158800
Fields:
  acquisition: {
    rewards <class 'ndx_events.events.Events'>
  }
  analysis: {
    performance <class 'pynwb.epoch.TimeIntervals'>
  }
  epoch_tags: ['DynamicRouting1' 'rewards' 'opto']
  epochs: epochs <class 'pynwb.epoch.TimeIntervals'>
  experiment_description: visual-auditory task-switching behavior experiment
  file_create_date: [datetime.datetime(2023, 10, 16, 20, 55, 36, 864721, tzinfo=tzlocal())]
  identifier: b70b7805-3a42-4baf-b781-7b716c58b997
  intervals: {
    DynamicRouting1 <class 'pynwb.epoch.TimeIntervals'>,
    performance <class 'pynwb.epoch.TimeIntervals'>,
    trials <class 'pynwb.epoch.TimeIntervals'>
  }
  keywords: ['behavior']
  lab: B3
  session_description: training session with behavioral task data, without precise timing information
  session_id: 674723_2023-10-16
  session_start_time: 2023-10-16 16:20:09-07:00
  source_script: https://raw.githubusercontent.com/samgale/DynamicRoutingTask/c2b14223cbce1abd

In [3]:
session.intervals.keys()

dict_keys(['trials', 'performance', 'DynamicRouting1'])

In [9]:
session.epochs[:]

Unnamed: 0_level_0,start_time,stop_time,notes,tags
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,0.0,3652.836311,,"[DynamicRouting1, opto, rewards]"


In [12]:
print(session.intervals['performance'].description)
display(session.intervals['performance'][:])

behavioral performance for each context block in task (refers to `trials` or `intervals['DynamicRouting1'])


Unnamed: 0_level_0,start_time,stop_time,block_index,context,cross_modal_dprime,signed_cross_modal_dprime,same_modal_dprime,nonrewarded_modal_dprime,vis_intra_dprime,aud_intra_dprime
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,3.720853,607.965242,0,sound1,3.502658,-3.502658,3.53115,0.373205,0.373205,3.53115
1,610.033819,1215.495176,1,vis1,1.684584,1.684584,3.897895,1.873203,3.897895,1.873203
2,1215.595412,1824.79295,2,sound1,3.190051,-3.190051,2.905353,0.341099,0.341099,2.905353
3,1831.465261,2434.974954,3,vis1,1.834303,1.834303,3.897895,1.377781,3.897895,1.377781
4,2437.443327,3046.691135,4,sound1,3.219483,-3.219483,3.158572,0.65638,0.65638,3.158572
5,3046.991846,3649.050234,5,vis1,2.455901,2.455901,3.507725,1.321337,3.507725,1.321337


In [22]:
trials = session.intervals['DynamicRouting1']
print(trials.description)
display(trials[:])
for column in trials.colnames:
    print(f'{column}: {getattr(trials, column).description}')

visual-auditory task-switching behavior trials


Unnamed: 0_level_0,start_time,stop_time,quiescent_start_time,quiescent_stop_time,stim_start_time,stim_stop_time,opto_start_time,opto_stop_time,response_window_start_time,response_window_stop_time,...,is_catch,is_aud_target,is_vis_target,is_aud_nontarget,is_vis_nontarget,is_vis_context,is_aud_context,is_context_switch,is_repeat,is_opto
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,3.720853,9.243129,3.720853,5.239639,5.239639,5.739639,,,5.323433,6.240619,...,False,True,False,False,False,False,True,False,False,False
1,10.427502,15.948696,10.427502,11.945201,11.945201,12.445201,,,12.028680,12.946130,...,False,True,False,False,False,False,True,False,False,False
2,17.316422,22.837956,17.316422,18.834854,18.834854,19.334854,,,18.918002,19.835282,...,False,True,False,False,False,False,True,False,False,False
3,23.155107,28.676501,23.155107,24.673197,24.673197,25.173197,,,24.756314,25.673491,...,False,True,False,False,False,False,True,False,False,False
4,32.962974,38.484333,32.962974,34.481124,34.481124,34.981124,,,34.564493,35.482080,...,False,True,False,False,False,False,True,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
528,3618.440595,3624.011909,3618.440595,3619.975536,3619.975536,3620.476072,,,3620.058750,3620.976009,...,False,False,False,False,True,True,False,False,False,False
529,3625.663559,3631.235058,3625.663559,3627.214633,3627.214633,3627.714633,,,3627.298461,3628.215501,...,False,True,False,False,False,True,False,False,False,False
530,3631.719120,3637.306608,3631.719120,3633.253021,3633.253021,3633.753021,,,3633.336477,3634.271230,...,False,True,False,False,False,True,False,False,False,False
531,3637.473479,3643.044656,3637.473479,3639.008428,3639.008428,3639.508428,,,3639.091515,3640.008880,...,False,False,False,True,False,True,False,False,False,False


start_time: Start time of epoch, in seconds
stop_time: Stop time of epoch, in seconds
quiescent_start_time: start of interval in which the subject should not lick, otherwise the trial will start over; only the last quiescent interval (which was not violated) is included
quiescent_stop_time: end of interval in which the subject should not lick, otherwise the trial will start over
stim_start_time: onset of visual or auditory stimulus
stim_stop_time: offset of visual or auditory stimulus
opto_start_time: Onset of optogenetic inactivation
opto_stop_time: offset of optogenetic inactivation
response_window_start_time: start of interval in which the subject should lick if a GO trial, otherwise should not lick
response_window_stop_time: end of interval in which the subject should lick if a GO trial, otherwise should not lick
response_time: time of first lick within the response window; nan if no lick occurred
reward_time: delivery time of water reward, for contingent and non-contingent rewards

In [15]:
session.nwb