## fill_imaging_batch_table
It became clear that imaging batches were going to be necessary, just like clearing batches. The solution to this is to create an ImagingBatch() table in the schema just like ClearingBatch() and then add a foreign key to the Sample() table to ImagingBatch(). In order to do this, the ImagingBatch() table needs to be pre-populated before the foreign key can be added to a revised Sample() table. 

In this notebook, I populate the ImagingBatch() table from existing requests. 

In [1]:
import pickle
import os.path, sys
from datetime import datetime
import pandas as pd
import numpy as np
import datajoint as dj
dj.config["enable_python_native_blobs"] = True # So I can store python dictionaries in blob columns

## Connect to the db

In [2]:
dj.config['database.host'] = 'datajoint00.pni.princeton.edu'
dj.conn()

Please enter DataJoint username: ahoag
Please enter DataJoint password: ········
Connecting ahoag@datajoint00.pni.princeton.edu:3306


DataJoint connection (connected) ahoag@datajoint00.pni.princeton.edu:3306

In [3]:
# set up object for light sheet schema
db_lightsheet = dj.create_virtual_module('ahoag_lightsheet_demo','ahoag_lightsheet_demo')
# db_lightsheet = dj.create_virtual_module('u19lightserv_lightsheet','u19lightserv_lightsheet')

## Setup for the ingestion

In [4]:
# Here are the columns
db_lightsheet.Request.ImagingBatch()

username  user in the lab,request_name,imaging_batch_number,imager  user in the lab,number_in_imaging_batch  date that the imaging form was submitted by the imager,imaging_request_date_submitted  date that the user submitted the request for imaging,imaging_request_time_submitted  time that the user submitted the request for imaging,imaging_performed_date  date that the imaging form was submitted by the imager,imaging_progress,imaging_dict
afalkner,"MFNP2,_MFNP3,_MMNP4,_MMNP5,_MMNP6,_FMNP4,_FMNP5,_FMNP6",1,,8,2019-11-18,17:22:11,,complete,=BLOB=
ahoag,test123,1,,1,2020-10-02,16:22:55,,in progress,=BLOB=
ahoag,test_3p6x,1,,1,2020-09-25,12:55:11,,in progress,=BLOB=
ahoag,test_many_samples,1,,10,2020-10-06,12:41:53,,incomplete,=BLOB=
ahoag,test_many_samples2,1,,3,2020-10-16,9:54:39,,in progress,=BLOB=
ahoag,test_many_samples2,2,,1,2020-10-16,9:54:39,,in progress,=BLOB=
ahoag,test_many_samples3,1,,2,2020-10-16,11:58:47,,incomplete,=BLOB=
ahoag,test_many_samples3,2,,1,2020-10-16,11:58:47,,incomplete,=BLOB=
ahoag,test_udisco,1,,1,2020-10-01,9:21:24,,incomplete,=BLOB=
ahoag,test_udisco_rat,1,,1,2020-10-01,10:05:34,,in progress,=BLOB=


Can get most of this information from the ImagingRequest() table.

The thing I need to be careful of is how to assign imaging_batch_number. I need to actually figure out the imaging batches based on what resolution and channels were requested in each request. 

In [5]:
request_contents = db_lightsheet.Request().fetch(as_dict=True)
sample_contents = db_lightsheet.Request.Sample()
imaging_request_contents = db_lightsheet.Request.ImagingRequest()
channel_contents = db_lightsheet.Request.ImagingChannel()

In [81]:
imaging_batch_insert_list = []
for request_dict in request_contents:
    imaging_dict = {}
    username = request_dict['username']
    request_name = request_dict['request_name']
    if request_name != "test123":
        continue
    print(f"Username: {username}")
    print(f"Request Name: {request_name}")
    print()
    date_submitted = request_dict['date_submitted']
    time_submitted = request_dict['time_submitted']
    imaging_batch_master_dict = {
        'username':username,
        'request_name':request_name,
        'imaging_request_date_submitted':date_submitted,
        'imaging_request_time_submitted':time_submitted,
        'imaging_progress':'incomplete'
    }
    # loop through samples and figure out imaging batches
    sample_contents_this_request = (sample_contents & \
        {'username':username,'request_name':request_name}).fetch(as_dict=True)
    for sample_dict in sample_contents_this_request:
        sample_name = sample_dict['sample_name']
        
        imaging_dict[sample_name] = {}
        imaging_request_this_sample = imaging_request_contents & \
            {'username':username,'request_name':request_name,'sample_name':sample_name}
        channel_contents_this_sample = channel_contents & \
            {'username':username,'request_name':request_name,'sample_name':sample_name}
        for channel_dict in channel_contents_this_sample:
            image_resolution = channel_dict['image_resolution']
            channel_name = channel_dict['channel_name']
            if image_resolution in imaging_dict[sample_name].keys():
                imaging_dict[sample_name][image_resolution].append(channel_name)
            else:
                imaging_dict[sample_name][image_resolution] = [channel_name]
    print(imaging_dict)
    # Count up numbers in each batch
    used_dicts_dict = {} # {batch_number:imaging_dict}
    batch_counter_dict = {} # {1:{'number_in_batch':1,'imaging_dict':{},}}
    batch_number = 0
    for sample_name in imaging_dict.keys():
        if imaging_dict[sample_name] in used_dicts_dict.values():
#             print(f"sample {sample_name} belongs to an existing imaging batch")
            # Figure out the imaging batch this sample belongs to
            this_batch_number = [key for key in used_dicts_dict if used_dicts_dict[key] == imaging_dict[sample_name]]

            batch_counter_dict[batch_number]['number_in_imaging_batch'] +=1
            continue
        else:
            batch_number +=1
#             print(f"sample {sample_name} is first in batch number {batch_number}")
            # make a new entry
            batch_counter_dict[batch_number]={'number_in_imaging_batch':1,
                'imaging_dict':imaging_dict[sample_name]}
            used_dicts_dict[batch_number] = imaging_dict[sample_name]
            
    # Loop through numbers_in_sample_dict and make batches
    for batch_number in batch_counter_dict.keys():
        imaging_batch_insert_dict = imaging_batch_master_dict.copy()
        imaging_batch_insert_dict['imaging_batch_number'] = batch_number
        number_in_batch = batch_counter_dict[batch_number]['number_in_imaging_batch']
        imaging_batch_insert_dict['number_in_imaging_batch'] = number_in_batch
        imaging_dict_this_batch = batch_counter_dict[batch_number]['imaging_dict']
        imaging_batch_insert_dict['imaging_dict'] = imaging_dict_this_batch
        imaging_batch_insert_list.append(imaging_batch_insert_dict)
# print(imaging_batch_insert_list)
# db_lightsheet.Request.ImagingBatch.insert(imaging_batch_insert_list,skip_duplicates=True)
# imaging_dict
        

Username: ahoag
Request Name: test123

{'test123-001': {'1.3x': ['488', '647']}}


In [51]:
# Here are the columns
db_lightsheet.Request.ImagingBatch()

username  user in the lab,request_name,imaging_batch_number,imager  user in the lab,number_in_imaging_batch  date that the imaging form was submitted by the imager,imaging_request_date_submitted  date that the user submitted the request for imaging,imaging_request_time_submitted  time that the user submitted the request for imaging,imaging_performed_date  date that the imaging form was submitted by the imager,imaging_progress,imaging_dict
afalkner,"MFNP2,_MFNP3,_MMNP4,_MMNP5,_MMNP6,_FMNP4,_FMNP5,_FMNP6",1,,8,2019-11-18,17:22:11,,incomplete,=BLOB=
ahoag,test123,1,,1,2020-10-02,16:22:55,,incomplete,=BLOB=
ahoag,test_3p6x,1,,1,2020-09-25,12:55:11,,incomplete,=BLOB=
ahoag,test_many_samples,1,,10,2020-10-06,12:41:53,,incomplete,=BLOB=
ahoag,test_many_samples2,1,,3,2020-10-16,9:54:39,,incomplete,=BLOB=
ahoag,test_many_samples2,2,,1,2020-10-16,9:54:39,,incomplete,=BLOB=
ahoag,test_many_samples3,1,,2,2020-10-16,11:58:47,,incomplete,=BLOB=
ahoag,test_many_samples3,2,,1,2020-10-16,11:58:47,,incomplete,=BLOB=
ahoag,test_udisco,1,,1,2020-10-01,9:21:24,,incomplete,=BLOB=
ahoag,test_udisco_rat,1,,1,2020-10-01,10:05:34,,incomplete,=BLOB=


OK, that seemed to work. Now let's make the inserts for the new Sample table with the correct imaging batch numbers

In [84]:
sample_reinsert_list = []
for sample_dict in sample_contents:
    # Figure out which batch this corresponds to 
    sample_reinsert_dict = sample_dict.copy()
    username = sample_dict['username']
    sample_name = sample_dict['sample_name']
    request_name = sample_dict['request_name']
    imaging_dict_this_sample = {}
    imaging_request_this_sample = imaging_request_contents & \
        {'username':username,'request_name':request_name,'sample_name':sample_name}
    channel_contents_this_sample = channel_contents & \
        {'username':username,'request_name':request_name,'sample_name':sample_name}
    for channel_dict in channel_contents_this_sample:
        image_resolution = channel_dict['image_resolution']
        channel_name = channel_dict['channel_name']
        if image_resolution in imaging_dict_this_sample.keys():
            imaging_dict_this_sample[image_resolution].append(channel_name)
        else:
            imaging_dict_this_sample[image_resolution] = [channel_name]
    # Now check the imaging batch entries for a match to this imaging dict
    imaging_batch_contents_this_request =  db_lightsheet.Request.ImagingBatch() & \
        {'request_name':request_name}
    print(f"Sample name: {sample_name}")
    for imaging_batch_dict in imaging_batch_contents_this_request:
#         print("trying to match to imaging dict:")
#         print(imaging_batch_dict['imaging_dict'])
        if imaging_dict_this_sample == imaging_batch_dict['imaging_dict']:
            imaging_batch_number = imaging_batch_dict['imaging_batch_number']
            sample_reinsert_dict['imaging_batch_number'] = imaging_batch_number
            break
    else:
        print(f"No match to this imaging batch dict for sample {sample_name}:")
        print(imaging_dict_this_sample)
#     print()
    sample_reinsert_list.append(sample_reinsert_dict)

Sample name: sample-001
Sample name: sample-002
Sample name: sample-003
Sample name: sample-004
Sample name: sample-005
Sample name: sample-006
Sample name: sample-007
Sample name: sample-008
Sample name: test123-001
Sample name: test_3p6x-001
Sample name: test_many_samples-001
Sample name: test_many_samples-002
Sample name: test_many_samples-003
Sample name: test_many_samples-004
Sample name: test_many_samples-005
Sample name: test_many_samples-006
Sample name: test_many_samples-007
Sample name: test_many_samples-008
Sample name: test_many_samples-009
Sample name: test_many_samples-010
Sample name: test_many_samples2-001
Sample name: test_many_samples2-002
Sample name: test_many_samples2-003
Sample name: test_many_samples2-004
Sample name: test_many_samples3-001
Sample name: test_many_samples3-002
Sample name: test_many_samples3-003
Sample name: test_udisco-001
Sample name: test_udisco_rat-001
Sample name: sample-001
Sample name: sample-001
Sample name: sample-001
Sample name: sample-

Sample name: sample-007
Sample name: sample-008
Sample name: sample-009
Sample name: sample-010
Sample name: sample-011
Sample name: sample-001
Sample name: sample-002
Sample name: sample-003
Sample name: sample-004
Sample name: sample-005
Sample name: sample-006
Sample name: sample-007
Sample name: sample-008
Sample name: sample-001
Sample name: sample-002
Sample name: sample-003
Sample name: sample-004
Sample name: sample-005
Sample name: sample-006
Sample name: sample-007
Sample name: sample-008
Sample name: sample-009
Sample name: sample-010
Sample name: sample-011
Sample name: sample-001
Sample name: sample-001


In [85]:
db_lightsheet.Request.SampleCopy()

username  user in the lab,request_name,sample_name,clearing_protocol,antibody1,antibody2,clearing_batch_number,imaging_batch_number,subject_fullname
,,,,,,,,


In [86]:
db_lightsheet.Request.SampleCopy().insert(sample_reinsert_list)

In [87]:
db_lightsheet.Request.SampleCopy()

username  user in the lab,request_name,sample_name,clearing_protocol,antibody1,antibody2,clearing_batch_number,imaging_batch_number,subject_fullname
afalkner,"MFNP2,_MFNP3,_MMNP4,_MMNP5,_MMNP6,_FMNP4,_FMNP5,_FMNP6",sample-001,iDISCO+_immuno,SySy Rb antiFos 1:1000,Donkey antiRabbit 488 1:500,1,1,
afalkner,"MFNP2,_MFNP3,_MMNP4,_MMNP5,_MMNP6,_FMNP4,_FMNP5,_FMNP6",sample-002,iDISCO+_immuno,SySy Rb antiFos 1:1000,Donkey antiRabbit 488 1:500,1,1,
afalkner,"MFNP2,_MFNP3,_MMNP4,_MMNP5,_MMNP6,_FMNP4,_FMNP5,_FMNP6",sample-003,iDISCO+_immuno,SySy Rb antiFos 1:1000,Donkey antiRabbit 488 1:500,1,1,
afalkner,"MFNP2,_MFNP3,_MMNP4,_MMNP5,_MMNP6,_FMNP4,_FMNP5,_FMNP6",sample-004,iDISCO+_immuno,SySy Rb antiFos 1:1000,Donkey antiRabbit 488 1:500,1,1,
afalkner,"MFNP2,_MFNP3,_MMNP4,_MMNP5,_MMNP6,_FMNP4,_FMNP5,_FMNP6",sample-005,iDISCO+_immuno,SySy Rb antiFos 1:1000,Donkey antiRabbit 488 1:500,1,1,
afalkner,"MFNP2,_MFNP3,_MMNP4,_MMNP5,_MMNP6,_FMNP4,_FMNP5,_FMNP6",sample-006,iDISCO+_immuno,SySy Rb antiFos 1:1000,Donkey antiRabbit 488 1:500,1,1,
afalkner,"MFNP2,_MFNP3,_MMNP4,_MMNP5,_MMNP6,_FMNP4,_FMNP5,_FMNP6",sample-007,iDISCO+_immuno,SySy Rb antiFos 1:1000,Donkey antiRabbit 488 1:500,1,1,
afalkner,"MFNP2,_MFNP3,_MMNP4,_MMNP5,_MMNP6,_FMNP4,_FMNP5,_FMNP6",sample-008,iDISCO+_immuno,SySy Rb antiFos 1:1000,Donkey antiRabbit 488 1:500,1,1,
ahoag,test123,test123-001,iDISCO abbreviated clearing,,,1,1,
ahoag,test_3p6x,test_3p6x-001,iDISCO abbreviated clearing,,,1,1,


Finally, need to go back and update the imaging_progress column of the ImagingBatch table based on ImagingRequest table. 


In [96]:
batch_contents = db_lightsheet.Request.ImagingBatch()
batch_contents

username  user in the lab,request_name,imaging_batch_number,imager  user in the lab,number_in_imaging_batch  date that the imaging form was submitted by the imager,imaging_request_date_submitted  date that the user submitted the request for imaging,imaging_request_time_submitted  time that the user submitted the request for imaging,imaging_performed_date  date that the imaging form was submitted by the imager,imaging_progress,imaging_dict
afalkner,"MFNP2,_MFNP3,_MMNP4,_MMNP5,_MMNP6,_FMNP4,_FMNP5,_FMNP6",1,,8,2019-11-18,17:22:11,,incomplete,=BLOB=
ahoag,test123,1,,1,2020-10-02,16:22:55,,incomplete,=BLOB=
ahoag,test_3p6x,1,,1,2020-09-25,12:55:11,,incomplete,=BLOB=
ahoag,test_many_samples,1,,10,2020-10-06,12:41:53,,incomplete,=BLOB=
ahoag,test_many_samples2,1,,3,2020-10-16,9:54:39,,incomplete,=BLOB=
ahoag,test_many_samples2,2,,1,2020-10-16,9:54:39,,incomplete,=BLOB=
ahoag,test_many_samples3,1,,2,2020-10-16,11:58:47,,incomplete,=BLOB=
ahoag,test_many_samples3,2,,1,2020-10-16,11:58:47,,incomplete,=BLOB=
ahoag,test_udisco,1,,1,2020-10-01,9:21:24,,incomplete,=BLOB=
ahoag,test_udisco_rat,1,,1,2020-10-01,10:05:34,,incomplete,=BLOB=


In [126]:
for batch_dict in batch_contents:
    # find samples in this request with this imaging_batch_number
    username = batch_dict['username']
    request_name = batch_dict['request_name']
    imaging_batch_number = batch_dict['imaging_batch_number']
    print(username,request_name,imaging_batch_number)
    current_imaging_progress = batch_dict['imaging_progress']
    this_batch_contents = batch_contents & \
        {'username':username,
         'request_name':request_name,'imaging_batch_number':imaging_batch_number}
    sample_contents_this_batch = sample_contents & \
        {'username':username,
         'request_name':request_name,'imaging_batch_number':imaging_batch_number}
    sample_names = tuple(sample_contents_this_batch.fetch('sample_name'))
    if len(sample_names) > 1:
        imaging_request_contents_this_batch = imaging_request_contents & \
            {'username':username,
             'request_name':request_name} & f'sample_name IN {sample_names}'
    elif len(sample_names) == 1:
        imaging_request_contents_this_batch = imaging_request_contents & \
        {'username':username,'request_name':request_name,'sample_name':sample_names[0]} 
    imaging_progress_array = imaging_request_contents_this_batch.fetch('imaging_progress')
    if all([x=='complete' for x in imaging_progress_array]):
        imaging_progress = 'complete'
    elif any([x=='in progress' for x in imaging_progress_array]):
        imaging_progress = 'in progress'
    else:
        imaging_progress = 'incomplete'
    # Now update the ImagingBatch entry
    if imaging_progress != current_imaging_progress:
        dj.Table._update(this_batch_contents,'imaging_progress',imaging_progress)

afalkner MFNP2,_MFNP3,_MMNP4,_MMNP5,_MMNP6,_FMNP4,_FMNP5,_FMNP6 1
ahoag test123 1
ahoag test_3p6x 1
ahoag test_many_samples 1
ahoag test_many_samples2 1
ahoag test_many_samples2 2
ahoag test_many_samples3 1
ahoag test_many_samples3 2
ahoag test_udisco 1
ahoag test_udisco_rat 1
apv2 20190313_IBL_DiI_1 1
apv2 ibl_witten_04 1
ejdennis 10-13_brains,_names_TBD 1
ejdennis 201905_atlas00x_where_x=1:n 1
ejdennis 20190606_atlas00x_where_x=11-20 1
ejdennis E112,_E126,_E137 1
ejdennis K310_(CM-diI),_K315_(CM-diI),_K320,_K321,_K323,_K327,_K333,_K334 1
ejdennis W118,_K292,_K293,_K295,_K301,_K302,_K303,_K304,_K305,_K306,_K307 1
ejdennis W122,_W128,_E111 1
ejdennis X013 1
ejdennis X042 1
ejdennis X050 1
jverpeut 10032019_CNOtest 1
jverpeut AdultChronicD_MLI_Lawrence_(1-12_each_batch) 1
jverpeut an1-31 1
jverpeut CDymaze_1-10 1
jverpeut cruslat_ymaze_TiffanyP_6.20.19_(12_samples) 1
jverpeut cruslat_ymaze_TiffanyP_6.20.19_(13_samples) 1
jverpeut DREADDymaze 1
jverpeut EAAT4-_14_samples 1
jverpeut Linds