## Correcting the oversights in image file naming conventions for the February screen
 1. We sequentially named ALL of the image files which had the unfortunate consequence of  the pre and post images not being connected. As a result, the February screening image file names need to be renamed for image analysis. For example, 20200210_01_pre001.tif are the pre images for the all of the plates in 20200210_04_fin004.tif. This error has been noted and we will adjust for future screens.
 2. We will elimante the 2nd numerical identifier in future assays as the scanner software incriments for us. For example: 20200210_01_pre001.tif will become 20200210_pre001.tif in future assays

In [5]:
import os
import pandas as pd
import glob
import pathlib
import fnmatch
import re

## Loading the image folder, retreiving filenames as originally entered and returning modified names to the respective images.

#### Removing the arbitrary middle value, example above


In [20]:
directory = "/Users/emilyfryer/Desktop/Neuroplant/Images/"

In [3]:

for filename in os.listdir(directory):
    old = os.path.join(directory, filename)
    new = os.path.join(directory, filename[:9] + filename[12:])
    os.rename(old, new)



#### Creating lists of pre and fin filenames that can be sorted in ascending order. We need to do this because the files are not read in in numerical or alphabetical order. 

In [2]:
image_folder = pathlib.Path("/Users/emilyfryer/Desktop/Neuroplant/Images/")

In [3]:
pre_list = []
[pre_list.append(img.name) for img in image_folder.glob('*pre*')]

fin_list = []
[fin_list.append(img.name) for img in image_folder.glob('*fin*')]
print(fin_list)

['.20200217_fin007.tif.icloud', '20200210_fin003.tif', '20200228_fin017.tif', '20200228_fin016.tif', '.20200210_fin002.tif.icloud', '.20200210_fin001.tif.icloud', '20200228_fin018.tif', '20200214_fin004.tif', '20200214_fin005.tif', '20200217_fin009.tif', '20200217_fin008.tif', '.20200214_fin006.tif.icloud', '20200224_fin013.tif', '20200221_fin011.tif', '20200224_fin014.tif', '20200224_fin015.tif', '20200221_fin010.tif', '20200221_fin012.tif']


#### Creating functions that will sort the filenames in ascending order based on numeric values in the file name string

In [6]:
def atoi(text):
    return int(text) if text.isdigit() else text
def natural_keys(text):
    return [ atoi(c) for c in re.split('(\d+)',text) ]

pre_list.sort(key=natural_keys)
fin_list.sort(key=natural_keys)
print(fin_list)

['20200210_fin003.tif', '20200214_fin004.tif', '20200214_fin005.tif', '20200217_fin008.tif', '20200217_fin009.tif', '20200221_fin010.tif', '20200221_fin011.tif', '20200221_fin012.tif', '20200224_fin013.tif', '20200224_fin014.tif', '20200224_fin015.tif', '20200228_fin016.tif', '20200228_fin017.tif', '20200228_fin018.tif', '.20200210_fin001.tif.icloud', '.20200210_fin002.tif.icloud', '.20200214_fin006.tif.icloud', '.20200217_fin007.tif.icloud']


#### Replacing the old file names with new ones based on the index value of the sorted lists above. Now the pre and fin images will map to each other based on their numerical values. 
<p>For example: 20200228_pre018.tif and 20200228_fin018.tif<p> 

In [75]:
for filename in os.listdir(directory):
    if fnmatch.fnmatch(filename, '*pre*'):
        if (filename in pre_list):
            index_val = int(pre_list.index(filename)) + 1
            if index_val >= 10:
                old = os.path.join(directory, filename)
                new = os.path.join(directory, filename[:12] + '0' + str(index_val) + '.tif')
                os.rename(old, new)
                
            else:
                old = os.path.join(directory, filename)
                new = os.path.join(directory, filename[:12] + '00' + str(index_val) + '.tif')
                os.rename(old, new)
    elif fnmatch.fnmatch(filename, '*fin*'):
        if (filename in fin_list):
            index_val = int(fin_list.index(filename)) + 1
            if index_val >= 10:
                old = os.path.join(directory, filename)
                new = os.path.join(directory, filename[:12] + '0' + str(index_val) + '.tif')
                os.rename(old, new)
            else:
                old = os.path.join(directory, filename)
                new = os.path.join(directory, filename[:12] + '00' + str(index_val) + '.tif')
                os.rename(old, new)


## Now we need to modify filenames in the metadata so they match the new filenames in the directory

#### Read in the image metadata 

In [14]:

data = pd.read_csv("/Users/emilyfryer/Desktop/feb_analysis/metadata/Feb2020_ImageData (Responses) - Form Responses 1.csv")

In [57]:
# Checking dataframe the structure and headers
data.head()

Unnamed: 0,Timestamp,Name of individual capturing image:,Date:,Image file name:,Is this the pre or post assay image?,Plate number in slot 1:,Plate number in slot 2:,Plate number in slot 3:,Plate number in slot 4:,Compound in slot 1:,Compound in slot 2:,Compound in slot 3:,Compound in slot 4:,Filenames_modified
3,2/10/20 14:55,Emily,2/10/20,20200210_fin004.tif,Post,NPS01_001,NPS01_002,NPS01_003,NPS01_004,Vanillin,Theophylline,Valproic acid,Quinine hydrochloride,20200210_fin001.tif
4,2/10/20 15:03,Emily,2/10/20,20200210_fin005.tif,Post,NPS01_005,NPS01_006,NPS01_007,NPS01_008,Physostigmine,Theobromine,Lobeline hydrochloride,DMSO,20200210_fin002.tif
5,2/10/20 15:06,Emily,2/10/20,20200210_fin006.tif,Post,NPS01_009,NPS01_010,NPS01_011,NPS01_012,Synephrine tartrate,Anisodamine hydrobromide,4-hydroxybenzoic acid,Carvone,20200210_fin003.tif
0,2/10/20 11:04,Emily,2/10/20,20200210_pre001.tif,Pre,NPS01_001,NPS01_002,NPS01_003,NPS01_004,Vanillin,Theophylline,Valproic acid,Quinine hydrochloride,20200210_pre001.tif
1,2/10/20 11:14,Emily,2/10/20,20200210_pre002.tif,Pre,NPS01_005,NPS01_006,NPS01_007,NPS01_008,Physostigmine,Theobromine,Lobeline hydrochloride,DMSO,20200210_pre002.tif


2. Removing the uneccessary middle values

In [16]:
data['Image file name:'] = data['Image file name:'].apply(lambda x: x[:9] + x[12:])


#### Sorting dataframe by file name values so we can be sure that the dataframe will merge appropriatley with the new list of file names generated below

In [32]:
data = data.sort_values(by=['Image file name:'])
data

Unnamed: 0,Timestamp,Name of individual capturing image:,Date:,Image file name:,Is this the pre or post assay image?,Plate number in slot 1:,Plate number in slot 2:,Plate number in slot 3:,Plate number in slot 4:,Compound in slot 1:,Compound in slot 2:,Compound in slot 3:,Compound in slot 4:
3,2/10/20 14:55,Emily,2/10/20,20200210_fin004.tif,Post,NPS01_001,NPS01_002,NPS01_003,NPS01_004,Vanillin,Theophylline,Valproic acid,Quinine hydrochloride
4,2/10/20 15:03,Emily,2/10/20,20200210_fin005.tif,Post,NPS01_005,NPS01_006,NPS01_007,NPS01_008,Physostigmine,Theobromine,Lobeline hydrochloride,DMSO
5,2/10/20 15:06,Emily,2/10/20,20200210_fin006.tif,Post,NPS01_009,NPS01_010,NPS01_011,NPS01_012,Synephrine tartrate,Anisodamine hydrobromide,4-hydroxybenzoic acid,Carvone
0,2/10/20 11:04,Emily,2/10/20,20200210_pre001.tif,Pre,NPS01_001,NPS01_002,NPS01_003,NPS01_004,Vanillin,Theophylline,Valproic acid,Quinine hydrochloride
1,2/10/20 11:14,Emily,2/10/20,20200210_pre002.tif,Pre,NPS01_005,NPS01_006,NPS01_007,NPS01_008,Physostigmine,Theobromine,Lobeline hydrochloride,DMSO
2,2/10/20 14:52,Emily,2/10/20,20200210_pre003.tif,Pre,NPS01_009,NPS01_010,NPS01_011,NPS01_012,Synephrine tartrate,Anisodamine hydrobromide,4-hydroxybenzoic acid,Carvone
12,2/17/20 11:14,Emily,2/14/20,20200214_fin010.tif,Post,NPS01_013,NPS01_014,NPS01_015,NPS01_016,Naringenin,Brucine,Valproic acid,Curcumin
13,2/17/20 11:16,Emily,2/14/20,20200214_fin011.tif,Post,NPS01_017,NPS01_018,NPS01_019,NPS01_020,Quercetin,Picrotoxin,Securinine,DMSO
14,2/17/20 11:16,Emily,2/14/20,20200214_fin012.tif,Post,NPS01_021,NPS01_022,NPS01_023,NPS01_024,Jasmone,Rutin,Phenylethylamine hydrochloride,Carvone
6,2/14/20 12:13,Tessa,2/14/20,20200214_pre007.tif,Pre,NPS01_013,NPS01_014,NPS01_015,NPS01_016,Naringenin,Brucine,Valproic acid,Curcumin


#### Retreiving the newly created file names as a list, sorting them to match the order of the dataframe above, merging the list and dataframe together. This could have been done in the 'for-loop' above but was overlooked.


In [25]:
new_names=[]
for filename in os.listdir(directory):
    if filename != '.DS_Store':
        new_names.append(filename)
new_names.sort()

In [33]:
data['Filenames_modified'] = new_names
data.head()

Unnamed: 0,Timestamp,Name of individual capturing image:,Date:,Image file name:,Is this the pre or post assay image?,Plate number in slot 1:,Plate number in slot 2:,Plate number in slot 3:,Plate number in slot 4:,Compound in slot 1:,Compound in slot 2:,Compound in slot 3:,Compound in slot 4:,Filenames_modified
3,2/10/20 14:55,Emily,2/10/20,20200210_fin004.tif,Post,NPS01_001,NPS01_002,NPS01_003,NPS01_004,Vanillin,Theophylline,Valproic acid,Quinine hydrochloride,20200210_fin001.tif
4,2/10/20 15:03,Emily,2/10/20,20200210_fin005.tif,Post,NPS01_005,NPS01_006,NPS01_007,NPS01_008,Physostigmine,Theobromine,Lobeline hydrochloride,DMSO,20200210_fin002.tif
5,2/10/20 15:06,Emily,2/10/20,20200210_fin006.tif,Post,NPS01_009,NPS01_010,NPS01_011,NPS01_012,Synephrine tartrate,Anisodamine hydrobromide,4-hydroxybenzoic acid,Carvone,20200210_fin003.tif
0,2/10/20 11:04,Emily,2/10/20,20200210_pre001.tif,Pre,NPS01_001,NPS01_002,NPS01_003,NPS01_004,Vanillin,Theophylline,Valproic acid,Quinine hydrochloride,20200210_pre001.tif
1,2/10/20 11:14,Emily,2/10/20,20200210_pre002.tif,Pre,NPS01_005,NPS01_006,NPS01_007,NPS01_008,Physostigmine,Theobromine,Lobeline hydrochloride,DMSO,20200210_pre002.tif


In [34]:
data.to_csv("/Users/emilyfryer/Desktop/feb_analysis/metadata/modified_image_data.csv")

In [35]:
batch = pd.read_csv('/Users/emilyfryer/Desktop/feb_analysis/metadata/Assay Batch Data (Responses) - Form Responses 1.csv')

In [55]:
batch.head()

Unnamed: 0,Timestamp,Recorder's Name:,Date:,Temperature (C):,Humidity:,Date chemotaxis plates were poured:,Worm Strain in Well P:,Start Time for Plating Worms,End Time for Plating Worms,Chemotaxis Start Time:,Chemotaxis End Time:,Additional notes:,Worm Strain in Well Q:.1,Worm Strain in Well R:.1,Worm Strain in Well S:.1,Start Time Compounded Added to Plates:,End Time Compounded Added to Plates:,Modified_date
0,2/10/2020 15:15:50,Sylvia Fechner,2/10/2020,20.3,13.65 (low),2/5/2020,N2,1:05:00 PM,1:36:00 PM,1:36:00 PM,1:55:00 PM,1:56 start time after drying,Osm-9,Tax-4,Double,12:05:00 PM,,20200210
1,2/14/2020 15:03:56,Emily,2/14/2020,20.0,37%,2/5/2020,Osm-9,12:52:00 PM,1:18:00 PM,1:30:00 PM,2:30:00 PM,Plate ID: NPS01_021 Well P: Solvent side has ~...,N2,Tax-4,Double,11:40:00 AM,11:58:00 AM,20200214
2,2/17/2020 13:30:48,Tessa Logan,2/17/2020,23.7,25%,2/12/2020,Tax-4,11:49:00 AM,12:20:00 PM,12:35:00 PM,1:35:00 PM,Worm washing started 11:20; plate 27 and 28 th...,Double,N2,Osm-9,10:51:00 AM,11:05:00 AM,20200217
3,2/21/2020 17:03:45,Emily,2/21/2020,21.6,Lo,2/5/2020,Double,11:23:00 AM,11:55:00 AM,12:10:00 PM,1:10:00 PM,1/2 of chemotaxis plates poured on 12/5/20 and...,Tax-4,Osm-9,N2,10:35:00 AM,10:45:00 AM,20200221
4,2/24/2020 13:20:39,Sylvia Fechner,2/24/2020,23.3,33,2/12/2020,Tax-4,12:33:00 PM,12:58:00 PM,1:19:00 PM,2:19:00 PM,"half plates poured on 02/12, other half 02/19....",Osm-9,Double,N2,12:02:00 PM,12:16:00 PM,20200224


In [52]:
def modify_dates(date):
    month = '0' + date[:1]
    day = date[2:4]
    year = date[5:]
    mod_date = year+month+day
    return(mod_date)

batch['Modified_date'] = batch.apply(lambda x: modify_dates(x['Date:']),axis=1)

#### Dropping unecessary column values and exporting the new dataframe as a .csv to be used for the image analysis

In [58]:
batch = batch.drop(columns=['Worm Strain in Well P: [Tax-4]', 'Worm Strain in Well P: [Osm-9]','Worm Strain in Well Q:','Worm Strain in Well P: [Double]', 'Worm Strain in Well R:','Worm Strain in Well S:'])

KeyError: "['Worm Strain in Well P: [Tax-4]' 'Worm Strain in Well P: [Osm-9]'\n 'Worm Strain in Well Q:' 'Worm Strain in Well P: [Double]'\n 'Worm Strain in Well R:' 'Worm Strain in Well S:'] not found in axis"

In [59]:
batch.to_csv('/Users/emilyfryer/Desktop/feb_analysis/metadata/feb_batch_data_mod.csv')