# check for mismatches in video frames and sync counts
Doug Ollerenshaw (dougo@alleninstitute.org), 3/25/2020  

This addresses MPE issue:  
https://web.powerapps.com/webplayer/app?hidenavbar=true&RequestID=106&appId=%2fproviders%2fMicrosoft.PowerApps%2fapps%2fbe88d2e1-5c4c-41ba-9757-cf5d85ea99e0  
Also:  
https://github.com/AllenInstitute/AllenSDK/issues/967

Short summary of the issue: the line labels on sync were reduced for some subset of experiments such that the line for the eye tracking video was given the label assigned for behavior monitoring and vice versa. 

The purpose of this notebook is to extract the frame count from every eye tracking and behavior video, as well as the sync pulse count from every associated sync file. I then compared the length of each. If labels are assigned correctly, counts should match. 

For each ophys session, I'm extracting the following:  
* `behavior_monitoring_frame_count`: The number of frames in the behavior monitoring movie
* `eye_tracking_frame_count`: The number of frames in the eye tracking movie
* `behavior_monitoring_sync_count`: The number of rising edges in the sync line assigned to the behavior movie
* `eye_tracking_sync_count`: The number of rising edges in the sync line assigned to the eye tracking movie

Then I'm checking the following conditions (False indicates a problem):  
* `behavior_counts_match = behavior_monitoring_frame_count == behavior_monitoring_sync_count`
* `eyetracking_counts_match = eye_tracking_frame_count == eye_tracking_sync_count`

Where counts don't match, the following are possible:  
* Line labels are switched. This would be indicated by the following conditions:
  * `behavior_monitoring_frame_count == eye_tracking_sync_count`
  * `eye_tracking_frame_count == behavior_monitoring_sync_count`  
  * `behavior_monitoring_frame_count != eye_tracking_frame_count` (this last condition avoids the possibility that both movies have the same frame count)
* The sync line for either the behavior or eye camera is corrupted (i.e., it contains spurious events that cause the count not to match the number of frames)
* There is some issue that prevents loading/reading either the movies or the sync file

## imports

In [1]:
from visual_behavior import database as db
from visual_behavior import utilities as vbu

from allensdk.brain_observatory.behavior import sync
from allensdk.brain_observatory.behavior.behavior_project_cache import BehaviorProjectCache as bpc

  PANDAS_TYPES = (pd.Series, pd.DataFrame, pd.Panel)
  from pandas.util.testing import assert_frame_equal


## custom function definitions

In [2]:
def get_path(well_known_files, name):
    '''
    get path from well_known_files dataframe
    inputs:
        well_known_files (dataframe with index set to name)
        name (string with desired name)
    returns:
        string with full filepath
    '''
    if name in well_known_files.index:
        return ''.join(well_known_files.loc[name][['storage_directory', 'filename']].tolist())
    else:
        return None
    

def get_movie_frame_count(osid, movie_type):
    '''
    get movie frame count for a given movie type
    inputs:
        ophys_session_id
        movie_type ('behavior_monitoring' or 'eye_tracking')
    returns:
        number of frames in the given movie
    '''
    well_known_files = db.get_well_known_files(osid).set_index('name')
    if movie_type == 'eye_tracking':
        movie_path = get_path(well_known_files, 'RawEyeTrackingVideo')
    elif movie_type == 'behavior_monitoring':
        movie_path = get_path(well_known_files, 'RawBehaviorTrackingVideo')
    if movie_path:
        movie = vbu.Movie(movie_path)
        return movie.frame_count
    else:
        return None

def get_sync_event_count(osid, movie_type):
    '''
    get sync event count for a given movie type
    inputs:
        ophys_session_id
        movie_type ('behavior_monitoring' or 'eye_tracking')
    returns:
        number of rising edges in the associated sync line
    '''
    well_known_files = db.get_well_known_files(osid).set_index('name')
    sync_path = get_path(well_known_files, 'OphysRigSync')
    if sync_path:
        try:
            sync_data = sync.get_sync_data(sync_path)
            return len(sync_data[movie_type])
        except OSError:
            return None
    else:
        return None

## get all experiments from LIMS, remove duplicate sessions

In [3]:
manifest_file = "/allen/programs/braintv/workgroups/nc-ophys/visual_behavior/2020_cache/manifest_all.json"
cache = bpc.from_lims(manifest=manifest_file)
all_experiments = cache.get_experiment_table() 
all_experiments = all_experiments.reset_index()

#drop duplicate sessions (these represent multiple planes (or "experiments") within a single mesoscope session)
all_experiments = all_experiments.drop_duplicates('ophys_session_id')

## extract frame and sync counts for every session
NOTE: This takes 30-40 minutes to run. Set `use_cache = True` to load in a saved version 

In [4]:
use_cache = True

csv_path = '/allen/programs/braintv/workgroups/nc-ophys/visual_behavior/2020_cache/all_experiments_with_frame_counts.csv'
if use_cache:
    all_experiments = pd.read_csv(csv_path)
else:
    for movie_type in ['behavior_monitoring','eye_tracking']:
        print('on {}'.format(movie_type))
        
        colname = '{}_frame_count'.format(movie_type)
        all_experiments[colname] = all_experiments['ophys_session_id'].map(
            lambda osid: get_movie_frame_count(osid, movie_type)
        )
        
        colname = '{}_sync_count'.format(movie_type)
        all_experiments[colname] = all_experiments['ophys_session_id'].map(
            lambda osid: get_sync_event_count(osid, movie_type)
        )
        
    all_experiments.to_csv(
        csv_path,
        index = False
    )

## compare frame and sync counts for every session

In [5]:
all_experiments['behavior_counts_match'] = (
    all_experiments['behavior_monitoring_frame_count'] == 
    all_experiments['behavior_monitoring_sync_count']
)

all_experiments['eyetracking_counts_match'] = (
    all_experiments['eye_tracking_frame_count'] == 
    all_experiments['eye_tracking_sync_count']
)
all_experiments['likely_label_switch'] = np.logical_and(
    np.logical_and(
        (
            all_experiments['eye_tracking_frame_count'] == 
            all_experiments['behavior_monitoring_sync_count']
        ),
        (
            all_experiments['behavior_monitoring_frame_count'] == 
            all_experiments['eye_tracking_sync_count']
        ),
    ),
    (
        all_experiments['behavior_monitoring_frame_count'] != 
        all_experiments['eye_tracking_frame_count']
    )
)

In [25]:
first_date = all_experiments.query('likely_label_switch==True')['date_of_acquisition'].min()
last_date = all_experiments.query('likely_label_switch==True')['date_of_acquisition'].max()

all_experiments['possible_label_switch'] = np.logical_and(
    np.logical_and(
        all_experiments['date_of_acquisition'] >= first_date,
        all_experiments['date_of_acquisition'] <= last_date,
    ),
    (
        all_experiments['behavior_monitoring_frame_count'] == 
        all_experiments['eye_tracking_frame_count']
    )
)

In [26]:
all_experiments['possible_label_switch'].value_counts()

False    1013
Name: possible_label_switch, dtype: int64

## look at results

In [6]:
cols_to_show = [
    'ophys_session_id',
    'project_code',
    'date_of_acquisition',
    'equipment_name',
    'behavior_monitoring_frame_count',
    'behavior_monitoring_sync_count',
    'behavior_counts_match',
    'eye_tracking_frame_count',
    'eye_tracking_sync_count',
    'eyetracking_counts_match',
    'likely_label_switch',
]
all_experiments[cols_to_show].sample(10)

Unnamed: 0,ophys_session_id,project_code,date_of_acquisition,equipment_name,behavior_monitoring_frame_count,behavior_monitoring_sync_count,behavior_counts_match,eye_tracking_frame_count,eye_tracking_sync_count,eyetracking_counts_match,likely_label_switch
5,958772311,VisualBehaviorMultiscope,2019-10-01 08:25:28.921369,MESO.1,136293.0,136293.0,True,136278.0,136278.0,True,False
410,963969115,VisualBehavior,2019-10-09 19:31:13.000000,CAM2P.5,136064.0,136064.0,True,136067.0,136073.0,False,False
624,809547429,VisualBehavior,2019-01-17 15:40:01.000000,CAM2P.4,135995.0,136001.0,False,136001.0,135995.0,False,True
921,902193346,VisualBehaviorTask1B,2019-07-09 20:13:34.000000,CAM2P.5,,136066.0,False,,136074.0,False,False
769,931972753,VisualBehaviorTask1B,2019-08-26 20:27:44.000000,CAM2P.5,13980.0,135658.0,False,13977.0,135650.0,False,False
154,793204200,VisualBehavior,2018-12-11 20:18:27.000000,CAM2P.4,135919.0,135936.0,False,135936.0,135919.0,False,True
178,940447433,VisualBehavior,2019-09-06 15:30:02.000000,CAM2P.5,135992.0,135992.0,True,136016.0,136016.0,True,False
84,880753403,VisualBehavior,2019-06-04 20:08:17.000000,CAM2P.3,135907.0,135907.0,True,271816.0,271816.0,True,False
858,837083549,VisualBehavior,2019-03-14 17:59:58.000000,CAM2P.5,126973.0,126973.0,True,126990.0,126990.0,True,False
522,914634556,VisualBehaviorTask1B,2019-07-31 15:12:49.000000,CAM2P.3,136900.0,136900.0,True,273803.0,273803.0,True,False


In [7]:
all_experiments['behavior_counts_match'].value_counts()

True     869
False    144
Name: behavior_counts_match, dtype: int64

In [8]:
all_experiments['eyetracking_counts_match'].value_counts()

True     862
False    151
Name: eyetracking_counts_match, dtype: int64

In [9]:
all_experiments['likely_label_switch'].value_counts()

False    947
True      66
Name: likely_label_switch, dtype: int64

### sessions with likely switched line labels:
there are 66 of these and they all appear to be on CAM2P.4:

In [10]:
all_experiments.query('likely_label_switch == True')[cols_to_show].sort_values(by='date_of_acquisition')

Unnamed: 0,ophys_session_id,project_code,date_of_acquisition,equipment_name,behavior_monitoring_frame_count,behavior_monitoring_sync_count,behavior_counts_match,eye_tracking_frame_count,eye_tracking_sync_count,eyetracking_counts_match,likely_label_switch
231,778113069,VisualBehavior,2018-11-13 19:01:21.000000,CAM2P.4,144692.0,144696.0,False,144696.0,144692.0,False,True
232,778861532,VisualBehavior,2018-11-14 17:23:02.000000,CAM2P.4,144824.0,144829.0,False,144829.0,144824.0,False,True
233,779655532,VisualBehavior,2018-11-15 18:17:13.000000,CAM2P.4,144815.0,144823.0,False,144823.0,144815.0,False,True
235,781729538,VisualBehavior,2018-11-19 17:56:34.000000,CAM2P.4,144953.0,144974.0,False,144974.0,144953.0,False,True
237,783911882,VisualBehavior,2018-11-21 17:11:40.000000,CAM2P.4,144770.0,144783.0,False,144783.0,144770.0,False,True
1001,784219195,VisualBehavior,2018-11-21 20:53:09.000000,CAM2P.4,145133.0,145147.0,False,145147.0,145133.0,False,True
1002,785540118,VisualBehavior,2018-11-26 23:27:15.000000,CAM2P.4,144926.0,144940.0,False,144940.0,144926.0,False,True
239,785774407,VisualBehavior,2018-11-27 17:11:43.000000,CAM2P.4,145086.0,145093.0,False,145093.0,145086.0,False,True
1003,786235595,VisualBehavior,2018-11-27 22:45:51.000000,CAM2P.4,144879.0,144887.0,False,144887.0,144879.0,False,True
240,789885492,VisualBehavior,2018-12-03 18:18:06.000000,CAM2P.4,144813.0,144822.0,False,144822.0,144813.0,False,True


### sessions where the eye tracking frame count does not match the associated sync line (excluding the likely switch lines)
There are 85 of these and they don't appear to be clustered by rig or in time

In [11]:
query_string = 'eyetracking_counts_match == False and likely_label_switch == False'
bad_eyetracking_sessions = all_experiments.query(query_string)
print(len(bad_eyetracking_sessions))
bad_eyetracking_sessions[cols_to_show].sort_values(by='date_of_acquisition')

85


Unnamed: 0,ophys_session_id,project_code,date_of_acquisition,equipment_name,behavior_monitoring_frame_count,behavior_monitoring_sync_count,behavior_counts_match,eye_tracking_frame_count,eye_tracking_sync_count,eyetracking_counts_match,likely_label_switch
583,729895074,VisualBehaviorIntegrationTest,2018-08-03 18:45:08.000000,CAM2P.5,127765.0,127765.0,True,127757.0,127776.0,False,False
822,757050730,VisBIntTestDatacube,2018-09-24 16:59:05.000000,CAM2P.5,109457.0,109457.0,True,109450.0,109463.0,False,False
745,759914365,VisBIntTestDatacube,2018-10-02 19:04:04.000000,CAM2P.3,109229.0,109229.0,True,109262.0,109267.0,False,False
21,760965194,VisBIntTestDatacube,2018-10-05 21:48:40.000000,CAM2P.5,116009.0,116057.0,False,116071.0,116074.0,False,False
228,774704549,VisualBehavior,2018-11-07 18:32:44.000000,CAM2P.4,145007.0,145012.0,False,145006.0,145007.0,False,False
229,775377832,VisualBehavior,2018-11-08 20:44:29.000000,CAM2P.4,145218.0,145246.0,False,145245.0,145218.0,False,False
430,775749782,VisualBehavior,2018-11-09 16:53:36.000000,CAM2P.5,,4798.0,False,,145213.0,False,False
431,776926613,VisualBehavior,2018-11-12 17:20:23.000000,CAM2P.5,,,False,,,False,False
230,777022133,VisualBehavior,2018-11-12 18:57:13.000000,CAM2P.4,,144899.0,False,,144881.0,False,False
432,778015591,VisualBehavior,2018-11-13 17:52:17.000000,CAM2P.5,144976.0,144976.0,True,144999.0,145003.0,False,False


### sessions where the eye tracking frame count does not match the associated sync line (excluding the likely switch lines)
There are 78 of these and they don't appear to be clustered by rig or in time

In [12]:
query_string = 'behavior_counts_match == False and likely_label_switch == False'
bad_behavior_sessions = all_experiments.query(query_string)
print(len(bad_behavior_sessions))
bad_behavior_sessions[cols_to_show].sort_values(by='date_of_acquisition')

78


Unnamed: 0,ophys_session_id,project_code,date_of_acquisition,equipment_name,behavior_monitoring_frame_count,behavior_monitoring_sync_count,behavior_counts_match,eye_tracking_frame_count,eye_tracking_sync_count,eyetracking_counts_match,likely_label_switch
817,744914467,VisBIntTestDatacube,2018-08-31 19:03:04.000000,CAM2P.3,110482.0,110484.0,False,110495.0,110495.0,True,False
656,750369782,VisBIntTestDatacube,2018-09-10 20:28:44.000000,CAM2P.3,142843.0,142863.0,False,142872.0,142872.0,True,False
566,754599570,VisBIntTestDatacube,2018-09-18 15:25:06.000000,CAM2P.3,109180.0,109187.0,False,109199.0,109199.0,True,False
744,759663321,VisBIntTestDatacube,2018-10-01 17:27:32.000000,CAM2P.3,109550.0,109551.0,False,109583.0,109583.0,True,False
746,760885292,VisBIntTestDatacube,2018-10-05 17:45:46.000000,CAM2P.3,109138.0,109143.0,False,109148.0,109148.0,True,False
21,760965194,VisBIntTestDatacube,2018-10-05 21:48:40.000000,CAM2P.5,116009.0,116057.0,False,116071.0,116074.0,False,False
228,774704549,VisualBehavior,2018-11-07 18:32:44.000000,CAM2P.4,145007.0,145012.0,False,145006.0,145007.0,False,False
229,775377832,VisualBehavior,2018-11-08 20:44:29.000000,CAM2P.4,145218.0,145246.0,False,145245.0,145218.0,False,False
430,775749782,VisualBehavior,2018-11-09 16:53:36.000000,CAM2P.5,,4798.0,False,,145213.0,False,False
431,776926613,VisualBehavior,2018-11-12 17:20:23.000000,CAM2P.5,,,False,,,False,False
