## SquamataAeroMag - Jupyter notebook for batch releasing Aeromagnetic data to ScienceBase

### Version 1.4

This module performs the following operations:
- Create list of data directories.
- Identify files accompanying data release.
- Create file listing for metadata XML markup.
- Identify and load Aeromagnetic *.CSV data file.
- Create file list of data release files
- Create User Editable Keywords Listing
- Create entity and attribute XML markup for all files being released.
- Poplulate metadata template
- Validate metadata; create error log; create HTML and FGDC Text versions of the metadata. (In development - use https://mrdata.usgs.gov/validation/ for validation in the interim)

Known issues needing repair:
- Add items here...

## Future development plans for SquamataSB

- Create all child metadata files from first example created in previous steps. 
- Batch upload files to ScienceBase.
- Batch remove files from ScienceBase. 
- Change ScienceBase parameters such as citation information, add orcid ids, add USGS CMS tags, etc. 

### Instructions
- Create a template xml format that contains boilerplate text common to all childeren in a data release.  Be sure this template contains the approriate curly bracket tags, {SquamataTagExample} used to populate the template using SquamataAMT.

### To execute a function/command select a cell and Hold-Shift + Press-Enter

**The 'r' signifies a string literal. Use for paths.**

Metadata wizard:  Advanced, Open In a jupyter Notebook?
Metadata Wizard 2.o from ScienceBase

v1.0: Read csv File into pandas
v1.1: Create Array of csv column headers, assign header array of channel descriptions and units, create entity and attribute text for csv file
v1.2 Create file listing, create editable keywords text
v1.3 Creat entity and attributes for files other than CSV
v1.4 Populate metadata template


In [None]:
# Phil Brown (pbrown@usgs.gov) 2019 Beta
# Working Python 3 Notebook used to facilitate the release of Audio Magnetotelluric (AMT) Data to ScienceBase.

In [None]:
# Test Cell
print ("Jupyter is working.") #To run this cell, hold down Shift and press Enter.

In [1]:
# Load required Libraries
import sys
import os
import zipfile
import csv
#import pysb
import requests
import shutil
from shutil import copyfile
import zipfile
import datetime
import glob
from lxml import etree
import json
import pickle
import shutil
import fileinput
import json
import pandas as pd
import numpy as np
from IPython.core.display import display
from IPython.core.display import HTML
from lxml import etree
##from pymdwizard.core.xml_utils import XMLRecord
##from pymdwizard.core.xml_utils import XMLNode
import re
from ipywidgets import *
from IPython.display import display
from IPython.html.widgets import widgets
import datetime
import dateutil.parser
import time
from IPython.display import Javascript



# 1) Step One - Set Directory Paths
## Please set directory paths below
### Directory paths include
- Data Path: This is the path to the data, data structure should have a directory for each station
- Template Path: The path to the XML metadata template being used for the data.  This template should already include all information common to all child metadata files e.g. originators, larger work citation, etc.

In [2]:
#Set Data Paths - perhaps we'll get a user form to do this some day?
csvPath = r"C:\CurrentWork\DataReleases\VillageGrove\Data" #The 'r' signifies a string literal. Use for paths.
magMetaDataTemplatePath = r"C:\CurrentWork\DataReleases\VillageGrove"
magMetaDataTemplateName = "VillageGrove-Colorado-USA-AirborneMagnetics-2012_TEMPLATE.xml"

In [3]:
#Check Paths for the fun of it
print ('The Mag CSV Data Path is: ' + '"' + csvPath + '"')
magMetaDataTemplatePath + "\\" + magMetaDataTemplateName

The Mag CSV Data Path is: "C:\CurrentWork\DataReleases\VillageGrove\Data"


'C:\\CurrentWork\\DataReleases\\VillageGrove\\VillageGrove-Colorado-USA-AirborneMagnetics-2012_TEMPLATE.xml'

In [4]:
#Explore data files and directory structure hosted below the provided provided parent data directory

In [5]:
#Produce directory listing of station (SB Object Children)
#Either set up the root directory with station subdirectories only or delete non-station directories from the list array
#mtDataDirList = os.walk(mtDataPath)
#mtDataDirList = [entry.path for entry in os.scandir(mtDataPath) if entry.is_dir()]
gridDirList = next(os.walk(csvPath))[0]
gridDirList

'C:\\CurrentWork\\DataReleases\\VillageGrove\\Data'

# Get path for the CSV file

In [6]:
#!!!!!!!!!Run this code and select A single station !!!!!!!!!
#!!!!!!!!!Skip the loop cell below and run cells sequentually !!!!!!!!!!
#!!!!!!!!!Use Shift-Enter to then execute each cell one by one!!!!!!!!!! 
#mtStationPath = mtDataPath + '\\' + mtDataDirList[0]
csvPath = csvPath + "\\"
csvPath

'C:\\CurrentWork\\DataReleases\\VillageGrove\\Data\\'

In [7]:

#Look for CSV file to load
csvList = glob.glob(os.path.join(csvPath , '*.csv'),  recursive=True)
csvFilePath = csvList[0]
print ('CSV File Path:\n' + csvFilePath)
csvPathArray = csvFilePath.split('\\')
csvFileName = str(csvPathArray[len(csvPathArray)-1])
print ('CSV File Name:\n' + csvFileName)

CSV File Path:
C:\CurrentWork\DataReleases\VillageGrove\Data\HighResolutionAeromagneticSurveyVillaGroveColorado2012.csv
CSV File Name:
HighResolutionAeromagneticSurveyVillaGroveColorado2012.csv


In [8]:
# Get file listing for supplemental information and entity and attribute section starting with the geosoftdatabase files
fileListing = ""
gdbList = glob.glob(os.path.join(csvPath, '*.gdb'),  recursive=True)
gdbListFormatted = []
#txtList
print ('Geosoft database, .gdb file only listing:\n')
for i in range(len(gdbList)):
  splitGdbList = gdbList[i].split('\\')
  gdbListFormatted.append(splitGdbList[len(splitGdbList) - 1])
  #print (txtListFormatted[i])  
  fileListing = fileListing + '\t\t\t\t\t\t' + splitGdbList[len(splitGdbList) - 1] + '\n'

#print ('File Listing:\n' + fileListing)
#Remove readme file from list.  May need to search an remove if it does not come up first in alphabetical order?'
#del txtListFormatted[0]
print (gdbListFormatted)

Geosoft database, .gdb file only listing:

['11003_VillaGrove_Final_Data.gdb', '11003_VillaGrove_Final_Data_NAVD88.gdb']


In [9]:
# Get file listing for supplemental information and entity and attribute section starting with the geosoftdatabase files
fileListing = ""
xmlList = glob.glob(os.path.join(csvPath, '*.xml'),  recursive=True)
xmlListFormatted = []
#txtList
print ('Geosoft database, .xml file only listing:\n')
for i in range(len(xmlList)):
  splitXmlList = gdbList[i].split('\\')
  xmlListFormatted.append(splitXmlList[len(splitXmlList) - 1])
  #print (txtListFormatted[i])  
  fileListing = fileListing + '\t\t\t\t\t\t' + splitXmlList[len(splitXmlList) - 1] + '.xml\n'

#print ('File Listing:\n' + fileListing)
#Remove readme file from list.  May need to search an remove if it does not come up first in alphabetical order?'
#del txtListFormatted[0]
print (xmlListFormatted)

Geosoft database, .xml file only listing:

['11003_VillaGrove_Final_Data.gdb', '11003_VillaGrove_Final_Data_NAVD88.gdb']


In [10]:
# Get file listing for geosoft grid files *.grd
grdList = glob.glob(os.path.join(csvPath + "\Grids", '*.grd'),  recursive=True)
grdListFormatted = []
#txtList
print ('Geosoft database, .grd file only listing:\n')
for i in range(len(grdList)):
  splitGrdList = grdList[i].split('\\')
  grdListFormatted.append(splitGrdList[len(splitGrdList) - 1])
  #print (txtListFormatted[i])  
  fileListing = fileListing + '\t\t\t\t\t\t' + splitGrdList[len(splitGrdList) - 1] + '\n'

#print ('File Listing:\n' + fileListing)
#Remove readme file from list.  May need to search an remove if it does not come up first in alphabetical order?'
#del txtListFormatted[0]
print (grdListFormatted)

Geosoft database, .grd file only listing:

['11003_VillaGrove_DTM.grd', '11003_VillaGrove_RadarAlt.grd', '11003_VillaGrove_TMF.grd']


In [11]:
# Get file listing for geosoft grid attribute files *.gxf
gxfList = glob.glob(os.path.join(csvPath + "\Grids", '*.gxf'),  recursive=True)
gxfListFormatted = []
#txtList
print ('Geosoft database, .gxf file only listing:\n')
for i in range(len(gxfList)):
  splitGxfList = gxfList[i].split('\\')
  gxfListFormatted.append(splitGxfList[len(splitGxfList) - 1])
  #print (txtListFormatted[i])  
  fileListing = fileListing + '\t\t\t\t\t\t' + splitGxfList[len(splitGxfList) - 1] + '\n'

#print ('File Listing:\n' + fileListing)
#Remove readme file from list.  May need to search an remove if it does not come up first in alphabetical order?'
#del txtListFormatted[0]
print (gxfListFormatted)

Geosoft database, .gxf file only listing:

['11003_VillaGrove_DTM.gxf', '11003_VillaGrove_RadarAlt.gxf', '11003_VillaGrove_TMF.gxf']


In [12]:
# Get file listing for the ReadMe file *.rtf
rtfList = glob.glob(os.path.join(csvPath, '*.rtf'),  recursive=True)
rtfListFormatted = []
print ('ReadMe file, .rtf file only listing:\n')
for i in range(len(rtfList)):
  splitRtfList = rtfList[i].split('\\')
  rtfListFormatted.append(splitRtfList[len(splitRtfList) - 1])
  fileListing = fileListing + '\t\t\t\t\t\t' + splitRtfList[len(splitRtfList) - 1] + '\n'

print (rtfListFormatted)

ReadMe file, .rtf file only listing:

['11003_VillaGrove_Final_Data_ReadMe.rtf']


In [13]:
# Get file listing for the report file *.pdf
pdfList = glob.glob(os.path.join(magMetaDataTemplatePath + "\Report", '*.pdf'),  recursive=True)
pdfListFormatted = []
#txtList
print ('Geosoft database, .pdf file only listing:\n')
for i in range(len(pdfList)):
  splitPdfList = pdfList[i].split('\\')
  pdfListFormatted.append(splitPdfList[len(splitPdfList) - 1])
  #print (txtListFormatted[i])  
  fileListing = fileListing + '\t\t\t\t\t\t' + splitPdfList[len(splitPdfList) - 1] + '\n'

print (pdfListFormatted)

Geosoft database, .pdf file only listing:

['11003_VillaGrove_Final_Survey_Report.pdf']


In [14]:
#Look at file listing
print ('File listing:\n' + fileListing)

File listing:
						11003_VillaGrove_Final_Data.gdb.xml
						11003_VillaGrove_Final_Data_NAVD88.gdb.xml
						11003_VillaGrove_DTM.grd
						11003_VillaGrove_RadarAlt.grd
						11003_VillaGrove_TMF.grd
						11003_VillaGrove_DTM.gxf
						11003_VillaGrove_RadarAlt.gxf
						11003_VillaGrove_TMF.gxf
						11003_VillaGrove_Final_Data_ReadMe.rtf
						11003_VillaGrove_Final_Survey_Report.pdf



## Create keywords text. 


In [15]:
## Create editable keywords example.  
## Example text is created after running this cell
## This text is displayed by running "display(keywords) below
keywords = widgets.Textarea(
    value='\t\t<keywords>\n\t\t\t<theme>\n\t\t\t\t<themekt>ISO 19115 Topic Category</themekt>' \
    + '\n\t\t\t\t<themekey>biota</themekey>\n\t\t\t</theme>\n\t\t\t<theme>\n\t\t\t\t<themekt>None</themekt>' \
    + '\n\t\t\t\t<themekey>impedance</themekey>\n\t\t\t\t<themekey>tipper</themekey>' \
    + '\n\t\t\t\t<themekey>apparent resistivity</themekey>\n\t\t\t\t<themekey>impedance phase</themekey>' \
    + '\n\t\t\t\t<themekey>impedance strike</themekey>\n\t\t\t\t<themekey>MT</themekey>' \
    + '\n\t\t\t\t<themekey>audiomagnetotelluric</themekey>\n\t\t\t\t<themekey>magnetotelluric</themekey>' \
    + '\n\t\t\t\t<themekey>AMT</themekey>\n\t\t\t\t<themekey>sounding</themekey>' \
    + '\n\t\t\t\t<themekey>Geology, Geophysics, and Geochemistry Science Center</themekey>' \
    + '\n\t\t\t\t<themekey>GGGSC</themekey>\n\t\t\t\t<themekey>Mineral Resources Program</themekey>' \
    + '\n\t\t\t\t<themekey>MRP</themekey>\n\t\t\t</theme>\n\t\t\t<theme>\n\t\t\t\t<themekt>USGS Thesaurus</themekt>' \
    + '\n\t\t\t\t<themekey>Magnetic field (earth)</themekey>\n\t\t\t\t<themekey>Geophysics</themekey>' \
    + '\n\t\t\t\t<themekey>GPS measurement</themekey>\n\t\t\t\t<themekey>Electromagnetic surveying</themekey>' \
    + '\n\t\t\t\t<themekey>Magnetic surveying</themekey>\n\t\t\t</theme>\n\t\t\t<place>' \
    + '\n\t\t\t\t<placekt>USGS Geographic Names Information System (GNIS), https://geonames.usgs.gov</placekt>' \
    + '\n\t\t\t\t<placekey>Colorado</placekey>\n\t\t\t\t<placekey>Silverton</placekey>' \
    + '\n\t\t\t\t<placekey>' + '{county} County</placekey>\n\t\t\t</place>\n\t\t</keywords>',
    placeholder='Type something',
    #description='String:',
    layout=Layout(width='100%', height='666px'),
    disabled=False
)
print ('Keywords list created.')

Keywords list created.


### Change the text in the textbox below to reflect what should be included as the key words for all child items

Note that changing the text below at any time creates a keywords section of the metadata seen EXACTLY as it is shown below

In [16]:
# Run this cell for key word text to edit.  
# Edit the text in place.  
# When complete move on to the next step

display(keywords)

## Lets now import and index values from the CSV file
- We need create arrays of description and unit values for the metadata template.  
- We also want to run stats on the csv columns for the entity and attributes section

In [17]:
# read file into pandas; note that can take over 1 minute depending on the size of the CSV file and computing power.

magDataFrame = pd.read_csv(csvFilePath, sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression='infer', thousands=None, decimal=b'.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=None, error_bad_lines=True, warn_bad_lines=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)
magDataFrame

Unnamed: 0,baroC,baseA,dir,fid10,hgps,julian,lat,line10,lon,m3l,...,mreslvl,mreslvld,mreslvldi,mreslvli,raltlc,x,x_NAD83,y,y_NAD83,z
0,3342.0,51498.129,70,62231.0,17.286392,2011.161644,38.515278,10,-106.226280,51466.078,...,51458.488,51457.086,-425.378,-423.975,150.23,393091.7,393091.7,4263701.0,4263701.0,3342.3
1,3341.8,51498.129,70,62231.1,17.286419,2011.161644,38.515285,10,-106.226250,51465.520,...,51457.949,51456.563,-425.910,-424.523,149.95,393094.5,393094.5,4263702.0,4263702.0,3342.1
2,3341.7,51498.129,70,62231.2,17.286444,2011.161644,38.515297,10,-106.226219,51464.973,...,51457.422,51456.051,-426.431,-425.060,149.67,393097.2,393097.2,4263703.0,4263703.0,3341.9
3,3341.5,51498.129,70,62231.3,17.286475,2011.161644,38.515305,10,-106.226189,51464.430,...,51456.898,51455.543,-426.948,-425.592,149.40,393099.9,393099.9,4263704.0,4263704.0,3341.7
4,3341.4,51498.129,70,62231.4,17.286500,2011.161644,38.515316,10,-106.226158,51463.898,...,51456.387,51455.047,-427.453,-426.113,149.14,393102.7,393102.7,4263705.0,4263705.0,3341.5
5,3341.2,51498.129,70,62231.5,17.286531,2011.161644,38.515324,10,-106.226120,51463.379,...,51455.883,51454.559,-427.950,-426.626,148.90,393105.4,393105.4,4263706.0,4263706.0,3341.3
6,3341.1,51498.129,70,62231.6,17.286558,2011.161644,38.515331,10,-106.226089,51462.871,...,51455.395,51454.086,-428.432,-427.123,148.68,393108.2,393108.2,4263707.0,4263707.0,3341.1
7,3340.9,51498.129,70,62231.7,17.286586,2011.161644,38.515343,10,-106.226059,51462.375,...,51454.918,51453.625,-428.902,-427.609,148.47,393110.9,393110.9,4263708.0,4263708.0,3340.9
8,3340.8,51498.129,70,62231.8,17.286614,2011.161644,38.515350,10,-106.226028,51461.883,...,51454.445,51453.168,-429.368,-428.091,148.27,393113.7,393113.7,4263709.0,4263709.0,3340.7
9,3340.6,51498.129,70,62231.9,17.286639,2011.161644,38.515362,10,-106.225998,51461.410,...,51453.996,51452.734,-429.810,-428.549,148.08,393116.4,393116.4,4263710.0,4263710.0,3340.5


In [18]:
# now get an array of the header names
magHeaderList = list(magDataFrame.columns.values)
magHeaderList

['baroC',
 'baseA',
 'dir',
 'fid10',
 'hgps',
 'julian',
 'lat',
 'line10',
 'lon',
 'm3l',
 'migrfz2',
 'mreslc',
 'mreslcb',
 'mreslvl',
 'mreslvld',
 'mreslvldi',
 'mreslvli',
 'raltlc',
 'x',
 'x_NAD83',
 'y',
 'y_NAD83',
 'z']

In [19]:
#Create Entity and Attribute Text:
csvAttributes = ''
csvEntityDescription = []
csvEntityUnit = []
csvEntityDescription.append("Baro altitude, edited; corrected for spikes, noise, adjusted to a constant level")#baroC
csvEntityUnit.append("meters")#baroC
csvEntityDescription.append("Base A Total Magnetic Field; mag base station, edited")#baseA
csvEntityUnit.append("nanoteslas")#baseA
csvEntityDescription.append("Direction (Degrees from North)")#dir
csvEntityUnit.append("degrees")#dir
csvEntityDescription.append("Fiducial Time, UTC seconds past midnight")#fid10
csvEntityUnit.append("seconds")#fid10
csvEntityDescription.append("Time")#hgps
csvEntityUnit.append("hours")#hgps
csvEntityDescription.append("Year and Julian Date")#julian
csvEntityUnit.append("YYYY.DDDDDD")#julian
csvEntityDescription.append("Latitude, WGS84")#lat
csvEntityUnit.append("degrees")#lat
csvEntityDescription.append("Line Number")#line10
csvEntityUnit.append("none")#line10
csvEntityDescription.append("Longitude, WGS84")#lon
csvEntityUnit.append("degrees")#lon
csvEntityDescription.append("Total Magnetic Field, lag removed")#m3l
csvEntityUnit.append("nanoteslas")#m3l
csvEntityDescription.append("Applied IGRF Field")#migrfz2
csvEntityUnit.append("nanoteslas")#migrfz2
csvEntityDescription.append("Total Magnetic Field, partial IGRF removed; partial IGRF to drape surface removed from m3l")#mreslc
csvEntityUnit.append("nanoteslas")#mreslc
csvEntityDescription.append("Total Magnetic Field, diurnals corrected;filtered diurnal signal removed from mreslc")#mreslcb
csvEntityUnit.append("nanoteslas")#mreslcb
csvEntityDescription.append("Total Magnetic Field, leveled;Intersection leveling correction applied on mreslcb")#mreslvl
csvEntityUnit.append("nanoteslas")#mreslvl
csvEntityDescription.append("Total Magnetic Field, micro-leveled; Micro-leveling correction applied on mreslvl")#mreslvld
csvEntityUnit.append("nanoteslas")#mreslvld
csvEntityDescription.append("TMF, micro-leveled, IGRF removed")#mreslvldi
csvEntityUnit.append("nanoteslas")#mreslvldi
csvEntityDescription.append("Total Magnetic Field, leveled, IGRF removed from mreslvl")#mreslvli
csvEntityUnit.append("nanoteslas")#mreslvli
csvEntityDescription.append("Radar altitude, edited; AGL, corrected for spikes, noise, adjusted through DTM leveling")#raltlc
csvEntityUnit.append("meters")#raltlc
csvEntityDescription.append("UTM Easting, WGS84, Z13N, Differential GPS")#x
csvEntityUnit.append("meters")#x
csvEntityDescription.append("UTM Easting, NAD83, Z13N, Differential GPS")#x_NAD83
csvEntityUnit.append("meters")#x_NAD83
csvEntityDescription.append("UTM Northing, WGS84, Z13N, Differential GPS")#y
csvEntityUnit.append("meters")#y
csvEntityDescription.append("UTM Northing, NAD83, Z13N, Differential GPS")#y_NAD83
csvEntityUnit.append("meters")#y_NAD83
csvEntityDescription.append("GPS altitude, MSL, from Differential GPS")#z
csvEntityUnit.append("meters")#z
#Now get a min and max value for all columns
csvMax = []
csvMin = []
entAttText = ""
csvHeaderListLength = len(magHeaderList)
for i in range(csvHeaderListLength):
 #print (magHeaderList[i])
 strCol = magHeaderList[i]
 csvMax.append(magDataFrame[strCol].max())
 csvMin.append(magDataFrame[strCol].min())
 print ('Max. Value ' + magHeaderList[i] + ' :' + str(csvMax[i]))
 print ('Min. Value ' + magHeaderList[i] + ' :' + str(csvMin[i]))
 entAtti = '\n\t\t\t\t<attr>\n\t\t\t\t\t<attrlabl>' + strCol + '</attrlabl>\n' \
        + '\t\t\t\t\t<attrdef>' + csvEntityDescription[i] + '</attrdef>' \
        + '\n\t\t\t\t\t<attrdefs>EON GEOSCIENCES INC.</attrdefs>' \
        + '\n\t\t\t\t\t<attrdomv>\n\t\t\t\t\t\t<rdom>\n\t\t\t\t\t\t\t<rdommin>'+ str(csvMin[i]) + '</rdommin>\n\t\t\t\t\t\t\t<rdommax>' \
        + str(csvMax[i]) + '</rdommax>\n\t\t\t\t\t\t\t' + '<attrunit>' + csvEntityUnit[i] + '</attrunit>\n\t\t\t\t\t\t</rdom>\n\t\t\t\t\t\t' \
        + '</attrdomv>\n' \
        + '\t\t\t\t</attr>'
 entAttText = entAttText + entAtti

csvAttributes = entAttText

print ('\nEntity and Attribute Text:\n' + csvAttributes)
 

Max. Value baroC :3680.7
Min. Value baroC :2250.4
Max. Value baseA :51524.102
Min. Value baseA :51475.102
Max. Value dir :340
Min. Value dir :70
Max. Value fid10 :88116.0
Min. Value fid10 :53023.0
Max. Value hgps :24.0016833333333
Min. Value hgps :0.0158361111111111
Max. Value julian :2011.2410958904102
Min. Value julian :2011.1616438356198
Max. Value lat :38.616119
Min. Value lat :37.803234
Max. Value line10 :8280
Min. Value line10 :10
Max. Value lon :-105.564178
Min. Value lon :-106.22628
Max. Value m3l :53003.703
Min. Value m3l :50260.0
Max. Value migrfz2 :51963.7
Min. Value migrfz2 :51550.7
Max. Value mreslc :53003.543
Min. Value mreslc :50259.539000000004
Max. Value mreslcb :52999.405999999995
Min. Value mreslcb :50267.148
Max. Value mreslvl :53001.715
Min. Value mreslvl :50262.75
Max. Value mreslvld :52998.379
Min. Value mreslvld :50267.59
Max. Value mreslvldi :1318.381
Min. Value mreslvldi :-1648.44
Max. Value mreslvli :1321.714
Min. Value mreslvli :-1653.28
Max. Value raltlc :4

In [20]:
#strip Tabs from string
stripFileListing = fileListing.replace('\t', '')
#Create array of values by splitting on line return \n
arrayFileListing = stripFileListing.splitlines()
arrayFileListing

['11003_VillaGrove_Final_Data.gdb.xml',
 '11003_VillaGrove_Final_Data_NAVD88.gdb.xml',
 '11003_VillaGrove_DTM.grd',
 '11003_VillaGrove_RadarAlt.grd',
 '11003_VillaGrove_TMF.grd',
 '11003_VillaGrove_DTM.gxf',
 '11003_VillaGrove_RadarAlt.gxf',
 '11003_VillaGrove_TMF.gxf',
 '11003_VillaGrove_Final_Data_ReadMe.rtf',
 '11003_VillaGrove_Final_Survey_Report.pdf']

In [21]:
# Create the Entity and Attributes for the other files being released besides the *.CSV data file
bFilesEntAttText = ''
bFilesEandA = ''
for i in range(len(arrayFileListing)):
    if arrayFileListing[i].find('.pdf') != -1:
        bFilesEandA = bFilesEandA + '\t\t<detailed>\n\t\t\t<enttyp>\n\t\t\t\t<enttypl>Binary File ' + arrayFileListing[i] + '</enttypl>\n' \
            + '\t\t\t\t<enttypd>Contractors report in Adobe Portable Document Format (PDF).  This report describes an aeromagnetic survey over the Village Grove region of Colorado area under contract to the EON Geosciences Inc.</enttypd>' \
            + '\n\t\t\t\t<enttypds>EON Geosciences, Montreal, Quebec, Canada: Airborne geophysical contractor for the U.S. Geological Survey.</enttypds>\n\t\t\t</enttyp>' \
            + '\n\t\t\t<attr>\n\t\t\t\t<attrlabl>report</attrlabl>' \
            + '\n\t\t\t\t<attrdef>Contractor report in Adobe PDF Format</attrdef>\n\t\t\t\t<attrdefs>EON Geosciences, Montreal, Quebec, Canada: Airborne geophysical contractor for the U.S. Geological Survey</attrdefs>\n\t\t\t\t<attrdomv>'\
            + '\n\t\t\t\t\t<udom>This report describes the survey parameters, field operations, quality control results, and data reduction procedures used to produce the aeromagnetic data.</udom>\n\t\t\t\t</attrdomv>' \
            + '\n\t\t\t</attr>\n\t\t</detailed>\n' 
    if arrayFileListing[i].find('.rtf') != -1:
        bFilesEandA = bFilesEandA + '\t\t<detailed>\n\t\t\t<enttyp>\n\t\t\t\t<enttypl>Rich Text File ' + arrayFileListing[i] + '</enttypl>\n' \
            + '\t\t\t\t<enttypd>ReadMe file in Rich Text Format containing *.CSV file entities, attributes, and processing notes</enttypd>' \
            + '\n\t\t\t\t<enttypds>EON Geosciences, Montreal, Quebec, Canada: Airborne geophysical contractor for the U.S. Geological Survey.</enttypds>\n\t\t\t</enttyp>' \
            + '\n\t\t\t<attr>\n\t\t\t\t<attrlabl>Rich Text File</attrlabl>\n\t\t\t\t<attrdef>ReadMe file in Rich Text Format (RTF) containing *.CSV file entities, attributes, and processing notes</attrdef>\n\t\t\t\t<attrdefs>EON Geosciences, Montreal, Quebec, Canada: Airborne geophysical contractor for the U.S. Geological Survey.</attrdefs>\n\t\t\t\t<attrdomv>'\
            + '\n\t\t\t\t\t<udom>ReadMe file in Rich Text Format containing *.CSV file entities, attributes, and processing notes</udom>\n\t\t\t\t</attrdomv>' \
            + '\n\t\t\t</attr>\n\t\t</detailed>\n'
    if arrayFileListing[i].find('.gdb') != -1:
        bFilesEandA = bFilesEandA + '\t\t<detailed>\n\t\t\t<enttyp>\n\t\t\t\t<enttypl>Binary File ' + arrayFileListing[i] + '</enttypl>\n' \
            + '\t\t\t\t<enttypd>Binary data base file in Geosoft (https://www.geosoft.com) Oasis montaj GRD format.  A free viewer is available for GRD files from Geosoft, visit https://www.geosoft.com/products/geosoft-viewer</enttypd>' \
            + '\n\t\t\t\t<enttypds>Geosoft Inc., Toronto, ON Canada</enttypds>\n\t\t\t</enttyp>' \
            + '\n\t\t\t<attr>\n\t\t\t\t<attrlabl>Binary data base file</attrlabl>\n\t\t\t\t<attrdef>Binary data base file described in contractor report and ReadMe file</attrdef>\n\t\t\t\t<attrdefs>EON Geosciences, Montreal, Quebec, Canada: Airborne geophysical contractor for the U.S. Geological Survey.</attrdefs>\n\t\t\t\t<attrdomv>'\
            + '\n\t\t\t\t\t<udom>Binary Data Base File described in contractor report and ReadMe file</udom>\n\t\t\t\t</attrdomv>' \
            + '\n\t\t\t</attr>\n\t\t</detailed>\n'
            
    if arrayFileListing[i].find('.xml') != -1:
        bFilesEandA = bFilesEandA + '\t\t<detailed>\n\t\t\t<enttyp>\n\t\t\t\t<enttypl>ASCII XML File ' + arrayFileListing[i] + '</enttypl>\n' \
            + '\t\t\t\t<enttypd>ASCII text XML formatted metadata file describing the data base file of the same name.  A free viewer is available for GRD files from Geosoft, visit https://www.geosoft.com/products/geosoft-viewer</enttypd>' \
            + '\n\t\t\t\t<enttypds>Geosoft Inc., Toronto, ON Canada</enttypds>\n\t\t\t</enttyp>' \
            + '\n\t\t\t<attr>\n\t\t\t\t<attrlabl>ASCII XML formatted text file</attrlabl>\n\t\t\t\t<attrdef>ASCII XML formatted text file containing metadata describing the Geosoft data base file of the same name</attrdef>\n\t\t\t\t<attrdefs>EON Geosciences, Montreal, Quebec, Canada: Airborne geophysical contractor for the U.S. Geological Survey.</attrdefs>\n\t\t\t\t<attrdomv>'\
            + '\n\t\t\t\t\t<udom>ASCII XML formatted text file containing metadata describing the Geosoft data base file of the same name</udom>\n\t\t\t\t</attrdomv>' \
            + '\n\t\t\t</attr>\n\t\t</detailed>\n'
            
    if arrayFileListing[i].find('.gxf') != -1:
        bFilesEandA = bFilesEandA + '\t\t<detailed>\n\t\t\t<enttyp>\n\t\t\t\t<enttypl>ASCII Grid File ' + arrayFileListing[i] + '</enttypl>\n' \
            + '\t\t\t\t<enttypd>ASCII grid file in Geosoft (https://www.geosoft.com) Grid Exchange Format, GXF (https://www.geosoft.com/media/uploads/resources/technical-notes/gxfr3d9_1.pdf) </enttypd>' \
            + '\n\t\t\t\t<enttypds>Geosoft Inc., Toronto, ON Canada</enttypds>\n\t\t\t</enttyp>' \
            + '\n\t\t\t<attr>\n\t\t\t\t<attrlabl>ASCII Grid File</attrlabl>\n\t\t\t\t<attrdef>ASCII grid file in GXF format described in contractor report and ReadMe file, WGS84 UTM Zone 13N</attrdef>\n\t\t\t\t<attrdefs>EON Geosciences, Montreal, Quebec, Canada: Airborne geophysical contractor for the U.S. Geological Survey.</attrdefs>\n'\
            + '\t\t\t\t<attrdomv>\n\t\t\t\t\t<udom>ASCII Grid File described in contractor report and ReadMe file, WGS84 UTM Zone 13N</udom>\n\t\t\t\t</attrdomv>' \
            + '\n\t\t\t</attr>\n\t\t</detailed>\n'
    
    if arrayFileListing[i].find('.grd') != -1:
        bFilesEandA = bFilesEandA + '\t\t<detailed>\n\t\t\t<enttyp>\n\t\t\t\t<enttypl>Binary Grid File ' + arrayFileListing[i] + '</enttypl>\n' \
            + '\t\t\t\t<enttypd>Binary grid file in Geosoft (https://www.geosoft.com) Oasis montaj GRD format.  A free viewer is available for GRD files from Geosoft, visit https://www.geosoft.com/products/geosoft-viewer</enttypd>' \
            + '\n\t\t\t\t<enttypds>Geosoft Inc., Toronto, ON Canada</enttypds>\n\t\t\t</enttyp>' \
            + '\n\t\t\t<attr>\n\t\t\t\t<attrlabl>Binary grid file</attrlabl>\n\t\t\t\t<attrdef>Binary grid file described in contractor report and ReadMe file, WGS84 UTM Zone 13N</attrdef>\n\t\t\t\t<attrdefs>EON Geosciences, Montreal, Quebec, Canada: Airborne geophysical contractor for the U.S. Geological Survey.</attrdefs>\n'\
            + '\t\t\t\t<attrdomv>\n\t\t\t\t\t<udom>Binary Grid File described in contractor report and ReadMe file, WGS84 UTM Zone 13N</udom>\n\t\t\t\t</attrdomv>' \
            + '\n\t\t\t</attr>\n\t\t</detailed>\n'
    bFilesEntAttText = bFilesEandA
print (bFilesEntAttText) 


		<detailed>
			<enttyp>
				<enttypl>Binary File 11003_VillaGrove_Final_Data.gdb.xml</enttypl>
				<enttypd>Binary data base file in Geosoft (https://www.geosoft.com) Oasis montaj GRD format.  A free viewer is available for GRD files from Geosoft, visit https://www.geosoft.com/products/geosoft-viewer</enttypd>
				<enttypds>Geosoft Inc., Toronto, ON Canada</enttypds>
			</enttyp>
			<attr>
				<attrlabl>Binary data base file</attrlabl>
				<attrdef>Binary data base file described in contractor report and ReadMe file</attrdef>
				<attrdefs>EON Geosciences, Montreal, Quebec, Canada: Airborne geophysical contractor for the U.S. Geological Survey.</attrdefs>
				<attrdomv>
					<udom>Binary Data Base File described in contractor report and ReadMe file</udom>
				</attrdomv>
			</attr>
		</detailed>
		<detailed>
			<enttyp>
				<enttypl>ASCII XML File 11003_VillaGrove_Final_Data.gdb.xml</enttypl>
				<enttypd>ASCII text XML formatted metadata file describing the data base file of the same

# Populate Metadata Template

In [22]:
#Load XML Metadata Template File and Read It
metaData = os.path.join(magMetaDataTemplatePath, magMetaDataTemplateName)
print ('Metadata path: ' + magMetaDataTemplatePath + '\n')
xmlTemplateFile = open(metaData, 'r')
metaDataContent = xmlTemplateFile.readlines()
print(metaDataContent)
xmlTemplateFile.close()


Metadata path: C:\CurrentWork\DataReleases\VillageGrove

['ï»¿<?xml version="1.0" encoding="UTF-8"?>\n', '<metadata>\n', '\t<idinfo>\n', '\t\t<citation>\n', '\t\t\t<citeinfo>\n', '\t\t\t\t<origin>Brown, P. J.</origin>\n', '\t\t\t\t<origin>Grauch, V. J.</origin>\n', '\t\t\t\t<pubdate>2019</pubdate>\n', '\t\t\t\t<title>High Resolution Aeromagnetic Survey, Villa Grove, Colorado, USA 2012</title>\n', '\t\t\t\t<geoform>Binary Files and ASCII text</geoform>\n', '\t\t\t\t<pubinfo>\n', '\t\t\t\t\t<pubplace>Denver, CO</pubplace>\n', '\t\t\t\t\t<publish>U.S. Geological Survey</publish>\n', '\t\t\t\t</pubinfo>\n', '\t\t\t\t<othercit>Additional information about Originators: Brown, P.J., http://orcid.org/0000-0002-2415-7462; Grauch, V. J., https://orcid.org/0000-0002-0761-3489</othercit>\n', '\t\t\t\t<onlink>http://dx.doi.org/10.5066/F7416V52</onlink>\n', '\t\t\t</citeinfo>\n', '\t\t</citation>\n', '\t\t<descript>\n', "\t\t\t<abstract>This data release includes the airborne magnetic survey data co

In [23]:
# Replace values of current metadata template with the appropriate values.  
# All of this input should have been defined when going through the steps outlined above.
lineString = ''
newMetaDataContent = metaDataContent
myfilename = magMetaDataTemplatePath + '\VillageGrove-Colorado-USA-AirborneMagnetics-2012.xml'
xmlFile = open(myfilename,"w+")
print(myfilename)
#print(keywords.value)
for i in range(len(metaDataContent)):
    lineString = metaDataContent[i]
    {fileListing}
    if lineString.find('{fileListing}'):
     lineString = lineString.replace('{fileListing}', fileListing)
    
    if lineString.find('{csvAttributes}'):
     lineString = lineString.replace('{csvAttributes}', csvAttributes)
    
    if lineString.find('{bFilesEntAttText}'):
     lineString = lineString.replace('{bFilesEntAttText}', bFilesEntAttText)
    
    else:
     lineString = lineString
    xmlFile.write(lineString)
    
    print (lineString)

xmlFile.close()
print ('Creation of new metadata file is complete\n\n') 


C:\CurrentWork\DataReleases\VillageGrove\VillageGrove-Colorado-USA-AirborneMagnetics-2012.xml
ï»¿<?xml version="1.0" encoding="UTF-8"?>

<metadata>

	<idinfo>

		<citation>

			<citeinfo>

				<origin>Brown, P. J.</origin>

				<origin>Grauch, V. J.</origin>

				<pubdate>2019</pubdate>

				<title>High Resolution Aeromagnetic Survey, Villa Grove, Colorado, USA 2012</title>

				<geoform>Binary Files and ASCII text</geoform>

				<pubinfo>

					<pubplace>Denver, CO</pubplace>

					<publish>U.S. Geological Survey</publish>

				</pubinfo>

				<othercit>Additional information about Originators: Brown, P.J., http://orcid.org/0000-0002-2415-7462; Grauch, V. J., https://orcid.org/0000-0002-0761-3489</othercit>

				<onlink>http://dx.doi.org/10.5066/F7416V52</onlink>

			</citeinfo>

		</citation>

		<descript>

			<abstract>This data release includes the airborne magnetic survey data collected from the Manchester region of Iowa. The Mineral Resources Program of the U.S. Geological Survey

### Check this file to see if it is valid against the FGDC metadata standard (FGDC-STD-001-1998)

## https://mrdata.usgs.gov/validation/

In [None]:
# Show the resulting child xml metadata file example 
#for i in range(len(newMetaDataContent)):
print ('Station ' + mtDataDirList[i] + ' complete')
print (newMetaDataContent)