### This script creates an import spreadsheet for GS record creation. 

Note that some departments have different requirements for included fields. Please reference the Submitting DNA and Tissue Plates to Biorepository.docx located at https://www.dropbox.com/s/jeme1tt7zb0668t/Submitting%20DNA%20and%20Tissue%20Plates%20to%20Biorepository.docx?dl=0

In [1]:
import pandas as pd

#### Import your fims spreadsheets and concatenate them (if multiple):

In [5]:
fims01_df = pd.read_excel('../../../SIBN Projects/Djibouti/FIMS spreadsheets/geomeDownloads/djiboutiGeomeUpload.xlsx', 
                          sheet_name = 'Samples', dtype={'tissueOtherCatalogNumbers':str})
fims02_df = pd.read_excel('../../../SIBN Projects/Djibouti/FIMS spreadsheets/geomeDownloads/smMammP01GeomeUpload.xlsx', 
                          sheet_name = 'Samples', dtype={'tissueOtherCatalogNumbers':str})
specimenData_df = pd.concat([fims01_df, fims02_df], sort = False)
specimenData_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 112 entries, 0 to 93
Data columns (total 34 columns):
materialSampleID             112 non-null object
institutionCode              112 non-null object
kingdom                      112 non-null object
phylum                       112 non-null object
scientificName               112 non-null object
yearCollected                112 non-null int64
locality                     112 non-null object
country                      112 non-null object
tissuePlate                  112 non-null object
tissueWell                   112 non-null object
collectionCode               112 non-null object
catalogNumber                112 non-null int64
class                        18 non-null object
order                        18 non-null object
family                       18 non-null object
genus                        112 non-null object
specificEpithet              112 non-null object
voucherCatalogNumber         112 non-null object
dayCollected       

#### Tell pandas which columns to keep:

For birds & mammals, the required columns are:
Plate Name, 
Well position, 
2D Barcode (including leading zero), 
USNM #, 
Field # (if available), 
Tissue BR# (if known), 
Genus, 
Species, 
Collector, 
Genetic Sample type(e.g. tissue, DNA), 
Genetic Sample Preservative, 
Extraction method, 
GenBank Accession Number,
Marker

In [8]:
gsRecords_df = pd.DataFrame(specimenData_df, columns = ['tissuePlate', 'tissueWell', 'tissueOtherCatalogNumbers', 
                                                        'catalogNumber', 'tissueID', 'genus', 
                                                        'specificEpithet'])
gsRecords_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 112 entries, 0 to 93
Data columns (total 7 columns):
tissuePlate                  112 non-null object
tissueWell                   112 non-null object
tissueOtherCatalogNumbers    112 non-null object
catalogNumber                112 non-null int64
tissueID                     112 non-null object
genus                        112 non-null object
specificEpithet              112 non-null object
dtypes: int64(1), object(6)
memory usage: 7.0+ KB


#### Add the columns that aren't in the fims:

In [9]:
gsRecords_df['genetic sample type'] = 'DNA, RNA, Proteins; Whole genomic DNA'
gsRecords_df['extraction method'] = 'autogen'
gsRecords_df['marker'] = 'COI'
gsRecords_df.head()

Unnamed: 0,tissuePlate,tissueWell,tissueOtherCatalogNumbers,catalogNumber,tissueID,genus,specificEpithet,genetic sample type,extraction method,marker
0,Djibouti16_P01,E10,214375891,602629,AE7VU52,Ichneumia,albicauda,"DNA, RNA, Proteins; Whole genomic DNA",autogen,COI
1,Djibouti16_P01,F11,214375887,602581,AE7VV90,Gerbillus,dasyurus,"DNA, RNA, Proteins; Whole genomic DNA",autogen,COI
2,Djibouti16_P01,D10,214375910,602592,AE7VU44,Rattus,rattus,"DNA, RNA, Proteins; Whole genomic DNA",autogen,COI
3,Djibouti16_P01,E11,214375890,602624,AE7VV87,Genetta,abyssinica,"DNA, RNA, Proteins; Whole genomic DNA",autogen,COI
4,Djibouti16_P01,H11,214375863,602618,AE7VW11,Ichneumia,albicauda,"DNA, RNA, Proteins; Whole genomic DNA",autogen,COI


#### Export the dataframe to an Excel spreadsheet. Make sure to name it something useful!:

In [10]:
gsRecords_df.to_excel('../../../../SIBN Biorepository/GS files sent to data managers/djibouti_mamm_2020_Apr_16.xlsx', 
                      index = False)