## SquamataAssemballyAMT - Jupyter notebook for batch releasing Audio Magnetotellic (AMT) data to ScienceBase

This module performs the following operations:
- Create list of data directories.
- Identify files accompanying data release.
- Create file listing for metadata XML markup.
- Identify and load MT EDI file.
- Collect and harvest release parameters common to ALL metadata childs.
- Clean up and reformat harvested values to be XML metadata complient.
- Create User Editable Keywords Listing
- Create entity and attribute XML markup.
- Poplulate metadata template
- Validate metadata; create error log; create HTML and FGDC Text versions of the metadata.
- Create all child metadata files from first example created in previous steps. (In development)
- Perhaps upload files to ScienceBase (In development)
- Change ScienceBase parameters such as citation information, add orcid ids, add USGS CMS tags, etc. (In development)

### To execute a function/command select a cell and Hold-Shift + Press-Enter

**The 'r' signifies a string literal. Use for paths.**

Metadata wizard:  Advanced, Open In a jupyter Notebook?
Metadata Wizard 2.o from ScienceBase

In [1]:
# Phil Brown (pbrown@usgs.gov) 2018
# Working Python 3 Notebook used to facilitate the release of Audio Magnetotelluric (AMT) Data to ScienceBase.

In [2]:
# Test Cell
print ("Jupyter is working.") #To run this cell, hold down Shift and press Enter.

Jupyter is working.


In [7]:
# Load required Libraries
import sys
import os
import zipfile
import csv
import pysb
import requests
import shutil
from shutil import copyfile
import zipfile
import datetime
import glob
from lxml import etree
import json
import pickle
import shutil
import fileinput
import json
import pandas as pd
import numpy as np
from IPython.core.display import display
from IPython.core.display import HTML
from lxml import etree
##from pymdwizard.core.xml_utils import XMLRecord
##from pymdwizard.core.xml_utils import XMLNode
import re
from ipywidgets import *
from IPython.display import display
from IPython.html.widgets import widgets
import datetime
import dateutil.parser



# 1) Step One - Set Directory Paths
## Please set directory paths below
### Directory paths include
- Data Path: This is the path to the data, data structure should have a directory for each station
- Template Path: The path to the XML metadata template being used for the data.  This template should already include all information common to all child metadata files e.g. originators, larger work citation, etc.

In [8]:
#Set Data Paths - perhaps we'll get a user form to do this some day?
mtDataPath = r"C:\DataReleases\Arkansas AMT data release" #The 'r' signifies a string literal. Use for paths.
mtMataDataTemplatePath = r"C:\DataReleases\Arkansas AMT data release"
mtMataDataTemplateName = "MT-MetaData_TEMPLATE.xml"

In [9]:
#Check Paths for the fun of it
print ('The MT Data Path is: ' + '"' + mtDataPath + '"')
mtMataDataTemplatePath + "\\" + mtMataDataTemplateName

The MT Data Path is: "C:\DataReleases\Arkansas AMT data release"


'C:\\DataReleases\\Arkansas AMT data release\\MT-MetaData_TEMPLATE.xml'

# 2) Step Two - Collect Common Parameters
## The first step is collect the information common to all child metadata sets
### Values Include:
- Data Release Title
    - Currently the title is collected from the EDI file; this title generally contains the station number.
- Data Release Originator(s)
- Larger Work Title
- Larger Work Originator(s)
- Larger Work URL
- Theme Keywords
- Location Keyword
- etc. etc

**As things stand, all information is either gathered from the EDI file unless it is common to all files; these common items are to included in the metadata template file upfront...**



## Now, let's explore our data. 
- What files do we have? 
- What files do we import values from?

In [10]:
#Review content in file explorer

In [11]:
#Produce directory listing of station (SB Object Children)
#Either set up the root directory with station subdirectories only or delete non-station directories from the list array
mtDataDirList = os.listdir(mtDataPath)
mtDataDirList

['AMT050',
 'AMT070',
 'AMT090',
 'AMT115',
 'AMT140',
 'AMT170',
 'MT-MetaData_TEMPLATE.xml']

In [12]:
#Let's start with the first staion and check the result - we can then loop through the process 
#for the remaining stations in the list.
mtStationPath = mtDataPath + '\\' + mtDataDirList[0]
mtStationPath

'C:\\DataReleases\\Arkansas AMT data release\\AMT050'

In [13]:
#Look for EDI file to load
ediList = glob.glob(os.path.join(mtStationPath, '**/*MT*.edi'),  recursive=True)
ediPath = ediList[0]
ediList
print ('EDI File Path:\n' + ediPath)       

EDI File Path:
C:\DataReleases\Arkansas AMT data release\AMT050\USA-Arkansas-Buffalo_River-2017-AMT050.edi


## Enter the information unique to this data set but common to all metadata files
### These include:
- Data Release Title
- Data Release Authors
- Theme Keywords
- Location Keywords

## After this step, information will be harvested from the MT EDI file. 
### These include:
- ProductId=USA-New_Mexico-Rio_Grande_Rift-San_Luis_Basin-2009-AMT01.edi
- ExternalUrl Url=https://doi.org/10.5066/F72F7MQ7
- Attachment Filename=https://pubs.usgs.gov/of/2011/1264/report/OF11-1264.pdf
- Survey Purpose Description: 
- Data Description:
- Citation Title=Audiomagnetotelluric data, Taos Plateau Volcanic Field, New Mexico
- Citation Authors=Chad E. Ailes, Brian D. Rodriguez
- Citation Year=2011
- YearCollected=2009
- Country=USA                                  
- Ellipsoid=Clarke 1866                                                          
- Location datum=NAD27 CONUS                                                     
- SITE LATITUDE=36.752985000                                                     
- SITE LONGITUDE=-105.560966167                                                  
- Elevation units="meters"=2608.00                                                                     
- Start=2009-07-21T19:52:03 UTC/GMT
- End=2009-07-21T20:34:20 UTC/GMT
- ProcessingTimeSeriesUsed:
         wp01A1.bp1                                                                     
         wp01A2.bp1                                                                     
         wp01A1.sd6                                                                     
         wp01A2.sd6                                                                     
         wp01A1.sd7                                                                     
         wp01A2.sd8                                                                     
         wp01A2_3.sd9 
- Entities and Attributes:
    - FREQUENCIES
    - IMPEDANCE ROTATION ANGLES
    - IMPEDANCES
    - TIPPER PARAMETERS
    - COMPUTED PARAMETERS


## Lets now import and index values from the EDI Files
- We need these values for the metadata template.  
- We also want to run stats on some of these values for the entity and attributes section

In [14]:
    #Load EDI File and Read It
    ediFile = open(ediPath, 'r')
    ediContent = ediFile.read()
    ediFile.close()
    print(ediContent)


>HEAD                                                                           
                                                                                
  DATAID="Buffalo Natl River"                                                   
  ACQBY=USGS                                                                    
  ACQDATE=2017-08-27
  STATE=Arkansas                                                                
  COUNTY=Newton                                                                 
  UNITS=M                                                                       
  STDVERS=1.0                                                                   
  PROGVERS=GEOTOOLS_2.3                                                         
  PROGDATE=09/16/94                                                             
                                                                                
>INFO   MAXLINES=1000                                                           
       

In [15]:
#Now assign values to the SB MetaDataWizard Template unknowns
list_ = ediContent.splitlines()
list_length = len (list_)

# there are probally easier ways to loop through the below but I like having it all hard coded upfront
# it's easire to track an change for me
# use the example below to extract additional parameters
# Not that all variables being collected are not necessarily used in populating the template.
# Values can be hardcoded into the metadata xml template and/or harvested from the edi file

for X in list_:
  if "ProductId" in X:
    productArray = X.split('=')
    productIdArray = productArray[1].split('.')
    productId = productIdArray[0]
    # We may want to reformat this are parse out this name further for use with a root name based on the Data Release Title?
    productId = productId.replace("-", " ")
    productId = productId.replace("_", " ")
    drTitle = productId
    print ('Child Title: ' + productId)
  if "ExternalUrl Url" in X:
    externalURLArray = X.split('=')
    externalURL = externalURLArray[1]
    print ('<onlink>: ' + externalURL)
  if "STATE" in X:
    stateArray = X.split('=')
    state = stateArray[1].replace('"', "") #remove quotes around state
    print ('State: ' + state)
  if "COUNTY" in X:
    countyArray = X.split('=')
    county = countyArray[1]
    print ('County: ' + county)
  if "Start" in X:
    startArray = X.split('=')
    start = startArray[1]
    print ('Start: ' + start)
  if "End" in X:
    endArray = X.split('=')
    end = endArray[1]
    print ('End: ' + end)
  if "Attachment Filename" in X and "http" in X:
    lgwrklinkArray = X.split('=')
    lgwrklink = lgwrklinkArray[1]
    print ('Attachment Filename Link: ' + lgwrklink)
  if "Citation Title" in X:
    citTitArray = X.split('=')
    citTit = citTitArray[1]
    print ('Citation Title: ' + citTit)
  if "Citation Authors" in X:
    citNamesArray = X.split('=')
    citAuthorsArray = citNamesArray[1].split(',')
    for author in citAuthorsArray:
     author = author.strip()
     print ('Author: '+ author)
  if "Citation Year" in X:
    citYearArray = X.split('=')
    citYear = citYearArray[1]
    print ('Citation Year: ' + citYear)
  if "YearCollected" in X:
    yearColArray = X.split('=')
    yearCol = yearColArray[1]
    print ('Year Collected: ' + yearCol)
  if "Ellipsoid" in X:
    ellipsoidArray = X.split('=')
    ellipsoid = ellipsoidArray[1]
    print ('Ellipsoid: ' + ellipsoid)
  if "Location datum" in X:
    locDatumArray = X.split('=')
    locDatum = locDatumArray[1]
    print ('Local datum: ' + locDatum)
  if "SITE LATITUDE" in X:
    sitLatArray = X.split('=')
    sitLat = sitLatArray[1] # !!! probally need to reformat this to have only 6 significant digits Also need to trim extra spaces !!!
    print ('Site latitude: ' + sitLat)
  if "SITE LONGITUDE" in X:
    sitLonArray = X.split('=')
    sitLon = sitLonArray[1] # !!! probally need to reformat this to have only 6 significant digits Also need to trim extra spaces !!!
    print ('Site longitude: ' + sitLon)
  if "Elevation units" in X:
    elevationStringArray = X.split('=')
    siteElevation = elevationStringArray[2] 
    print ('Site Elevation: ' + siteElevation)
    elevationUnits = elevationStringArray[1].replace('"', "")
    print ('Elevation Units: ' + elevationUnits)
    
# Code below returns values that occupy more than one line
    
for i in range(list_length):
 value = list_[i] 
 if value.replace(" ", "") == 'SurveyPurposeDescription:':
   startIndPurpose = i + 1
   #print ('startIndPurpose: ' + str(startIndPurpose))
 if value.replace(" ", "") == 'DataDescription:':
   endIndPurpose = i - 1
   #print ('endIndPurpose: ' + str(endIndPurpose))
purpose = list_[startIndPurpose]
for j in range(startIndPurpose + 1,endIndPurpose): 
    purpose = purpose + list_[j]
    purposeClean = re.sub(' +', ' ',purpose)
print ('\nAbstract:\n\t' + purposeClean)

for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == 'DataDescription:':
   startIndDescription = k + 1
   #print ('startIndDescription: ' + str(startIndDescription))
 if value.replace(" ", "") == 'FILECREATOR:':
   endIndDescription = k - 9
   #print ('endIndDescription: ' + str(endIndDescription))
description = list_[startIndDescription]
for l in range(startIndDescription + 1,endIndDescription): 
    description = description + list_[l]
    descriptionClean = re.sub(' +', ' ',description)
print ('\nPurpose:\n\t' + descriptionClean)
    

State: Arkansas                                                                
County: Newton                                                                 
Child Title: USA Arkansas Buffalo River 2017 AMT050
<onlink>: https://doi.org/10.5066/P9CIAXC5 
Citation Title: Audiomagnetotelluric data, Buffalo River watershed, Arkansas, 2017
Author: Brian D. Rodriguez and Mark R. Hudson
Citation Year: 2018                                                             
Year Collected: 2017
Ellipsoid: Clarke 1866                                                          
Local datum: NAD27 CONUS                                                     
Site latitude: 36.07789                                                         
Site longitude: -93.31091                                                       
Site Elevation: 681.91                                                
Elevation Units: meters
Start: 2017-08-27T19:01:59 UTC/GMT
End: 2017-08-27T23:45:56 UTC/GMT  
Start: 2017-08-27T19:01:59 

In [16]:
# Now let's format the start time and end time to be what the XML file wants for <begdate> and <enddate>

begdateArr = start.split(' ')
begdate_str = begdateArr[0]
begdate_obj = dateutil.parser.parse(begdate_str)
begdate = begdate_obj.strftime('%Y%m%d')
print('<begdate> ', begdate) 

enddateArr = end.split(' ')
enddate_str = enddateArr[0]
enddate_obj = dateutil.parser.parse(enddate_str)
enddate = enddate_obj.strftime('%Y%m%d')
print('<enddate> ', enddate) 


<begdate>  20170827
<enddate>  20170827


In [17]:
#Now we reformat the lat  and longitude to 6 sig figs as well as trim of any extra spaces
#Brian seems to be stripping out the sig figs now so there is no need.  
#Also may whant to round up instead of just stripping values?

##sitLat = sitLat.strip()
##sitLat = sitLat[:-3] 
##sitLon = sitLon.strip()
##sitLon = sitLon[:-3]

print ('Site latitude: ' + sitLat)
print ('Site longitude: ' + sitLon)

Site latitude: 36.07789                                                         
Site longitude: -93.31091                                                       


In [18]:
# Now Reformat county by trimming the extra spaces
county = county.strip()
print ('County: ' + county)

County: Newton


In [21]:
## Create editable keywords example.  
## Example text is created after running this cell
## This text is displayed by running "display(keywords) below
keywords = widgets.Textarea(
    value='\t\t<keywords>\n\t\t\t<theme>\n\t\t\t\t<themekt>ISO 19115 Topic Category</themekt>' \
    + '\n\t\t\t\t<themekey>biota</themekey>\n\t\t\t</theme>\n\t\t\t<theme>\n\t\t\t\t<themekt>None</themekt>' \
    + '\n\t\t\t\t<themekey>impedance</themekey>\n\t\t\t\t<themekey>tipper</themekey>' \
    + '\n\t\t\t\t<themekey>apparent resistivity</themekey>\n\t\t\t\t<themekey>impedance phase</themekey>' \
    + '\n\t\t\t\t<themekey>impedance strike</themekey>\n\t\t\t\t<themekey>MT</themekey>' \
    + '\n\t\t\t\t<themekey>audiomagnetotelluric</themekey>\n\t\t\t\t<themekey>magnetotelluric</themekey>' \
    + '\n\t\t\t\t<themekey>AMT</themekey>\n\t\t\t\t<themekey>sounding</themekey>' \
    + '\n\t\t\t\t<themekey>Geology, Geophysics, and Geochemistry Science Center</themekey>' \
    + '\n\t\t\t\t<themekey>GGGSC</themekey>\n\t\t\t\t<themekey>Mineral Resources Program</themekey>' \
    + '\n\t\t\t\t<themekey>MRP</themekey>\n\t\t\t</theme>\n\t\t\t<theme>\n\t\t\t\t<themekt>USGS Thesaurus</themekt>' \
    + '\n\t\t\t\t<themekey>Magnetic field (earth)</themekey>\n\t\t\t\t<themekey>Geophysics</themekey>' \
    + '\n\t\t\t\t<themekey>GPS measurement</themekey>\n\t\t\t\t<themekey>Electromagnetic surveying</themekey>' \
    + '\n\t\t\t\t<themekey>Magnetic surveying</themekey>\n\t\t\t</theme>\n\t\t\t<place>' \
    + '\n\t\t\t\t<placekt>USGS Geographic Names Information System (GNIS)</placekt>' \
    + '\n\t\t\t\t<placekey>New Mexico</placekey>\n\t\t\t\t<placekey>Rio Grande del Norte National Monument</placekey>' \
    + '\n\t\t\t\t<placekey>' + county + ' County</placekey>\n\t\t\t\t<placekey>Rio Grande</placekey>\n\t\t\t</place>\n\t\t</keywords>',
    placeholder='Type something',
    #description='String:',
    layout=Layout(width='100%', height='666px'),
    disabled=False
)
print ('Keywords list created.')

Keywords list created.


### Change the text in the textbox below to relflect what should be included as the key words for all child items

Note that changing the text below at any time creates a keywords section of the metadata seen EXACTLY as it is shown below

In [103]:
# Run this cell for key word text to edit.  
# Edit the text in place.  
# When complete move on to the next step

display(keywords)

Entity and Attribute Values for the EDI file.  List !****FREQUENCIES****!,!****IMPEDANCE ROTATION ANGLES****!,!****IMPEDANCES****!,!****COMPUTED PARAMETERS****!

Here we load the frequencies
>!****FREQUENCIES****!

In [104]:
# Import entity and attributes - !****FREQUENCIES****! plan to break some of these individual chunks into objects/functions

# Get Range of Frequency Values in EDI File
for k in range(list_length):

 value = list_[k] 
 if value.replace(" ", "") == '>!****FREQUENCIES****!':
   startIndFrequencies = k + 3
   print ('startIndFrequencies: ' + str(startIndFrequencies))
 
 if value.replace(" ", "") ==  '>!****IMPEDANCEROTATIONANGLES****!':
   endIndFrequencies = k - 1
   print ('endIndFrequencies: ' + str(endIndFrequencies))

frequencyData = []
fdata = []
fdataTemp = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
frequencyDF = pd.DataFrame(fdata)
for j in range(startIndFrequencies,endIndFrequencies):
    fdataTemp = list_[j]
    fdataTemp = re.sub(' +', ' ',fdataTemp)
    fdataTemp = fdataTemp.split(" ")
    del fdataTemp[0]
    fdata = fdata + fdataTemp
    
print (fdata)  
fdata = np.array(fdata).astype(np.float) #convert String to floats
frequencyDF = pd.DataFrame(fdata,columns=['Frequencies'])
frequencyDF


startIndFrequencies: 277
endIndFrequencies: 283
['3.55000000E+03', '2.73000000E+03', '2.42000000E+03', '1.87000000E+03', '1.17000000E+03', '9.60000000E+02', '8.85000000E+02', '7.20000000E+02', '5.80000000E+02', '4.60000000E+02', '3.40000000E+02', '2.70000000E+02', '2.10000000E+02', '1.72399994E+02', '1.50000000E+02', '1.22099998E+02', '1.00000000E+02', '8.59400024E+01', '7.90000000E+01', '6.00600014E+01', '4.15000000E+01', '2.83199997E+01', '1.90400009E+01', '1.22100000E+01', '7.32399988E+00', '4.39400005E+00']


Unnamed: 0,Frequencies
0,3550.0
1,2730.0
2,2420.0
3,1870.0
4,1170.0
5,960.0
6,885.0
7,720.0
8,580.0
9,460.0


In [105]:
# Now lets get the stats of the frequency data
#Make Array of Max Vallues
frequencyMax = frequencyDF[('Frequencies')].max()
print ('Max. Frequency: ' + str(frequencyMax))
frequencyMin = frequencyDF[('Frequencies')].min()
print ('Min. Frequency: ' + str(frequencyMin))

Max. Frequency: 3550.0
Min. Frequency: 4.39400005


Here we load the Impedance Rotation Angles
>!****IMPEDANCE ROTATION ANGLES****!

In [106]:
# Import entity and attributes - !****IMPEDANCE ROTATION ANGLES****! plan to break some of these individual chunks into objects/functions

# Get Range of Frequency Values in EDI File
for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == '>!****IMPEDANCEROTATIONANGLES****!':
   startIndROT = k + 3
   print ('startIndROT: ' + str(startIndROT))
 
 if value.replace(" ", "") ==  '>!****IMPEDANCES****!':
   endIndROT = k - 1
   print ('endIndROT: ' + str(endIndROT))

rdata = []
rdataTemp = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
rotationDF = pd.DataFrame(rdata)
for j in range(startIndROT,endIndROT):
    rdataTemp = list_[j]
    rdataTemp = re.sub(' +', ' ',rdataTemp)
    rdataTemp = rdataTemp.split(" ")
    del rdataTemp[0]
    rdata = rdata + rdataTemp
    
print (rdata)  
rdata = np.array(rdata).astype(np.float) #convert String to floats
rotationDF = pd.DataFrame(rdata,columns=['ZROT'])
rotationDF

startIndROT: 287
endIndROT: 293
['0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00']


Unnamed: 0,ZROT
0,0.0
1,0.0
2,0.0
3,0.0
4,0.0
5,0.0
6,0.0
7,0.0
8,0.0
9,0.0


In [107]:
# Now lets get the stats of the rotation data
#Make Array of Max Vallues
rotationMax = rotationDF[('ZROT')].max()
print ('Max. ZROT: ' + str(frequencyMax))
rotationMin = rotationDF[('ZROT')].min()
print ('Min. ZROT: ' + str(frequencyMin))

Max. ZROT: 3550.0
Min. ZROT: 4.39400005


Here we load the impedances
>!****IMPEDANCES****!

In [108]:
# Import entity and attributes - !****IMPEDANCES****! plan to break some of these individual chunks into objects/functions

# Get Range of Impedance Values in EDI File
for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == '>!****IMPEDANCES****!':
   startIndImpedances = k + 1
   print ('startIndImpedances: ' + str(startIndImpedances))
 
 if value.replace(" ", "") ==  '>!****TIPPERPARAMETERS****!':
   endIndImpedances = k - 1
   print ('endIndImpedances: ' + str(endIndImpedances))

#Construct Array of Channel Headers   
count = 0
impedanceLabel = []
impedanceData = []
data = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
impedanceDF = pd.DataFrame(data)
for l in range(startIndImpedances,endIndImpedances): 
    if list_[l][0] == '>':
     temp = list_[l].split(" ", 1)
     #print (temp)
     impedanceLabel.append((temp[0].split(">"))[1])
     dataTemp = list_[l+1]
     for j in range(l+2,l+8):
      dataTemp = dataTemp + list_[j]
      dataTemp = re.sub(' +', ' ',dataTemp)
     data = dataTemp.split(" ")
     del data[0] # need to check for empty strings and delete these from the array of the string can't be converted to a float
     del data[len(data)-1] # need to check for empty strings and delete
     #print (data)
     data = np.array(data).astype(np.float) #convert String to floats
     se = pd.Series(data)
     print ((temp[0].split(">"))[1])   
     impedanceDF[((temp[0].split(">"))[1])] = se.values
    
    count = count + 1

#impedanceDF = pd.DataFrame(data, columns=(impedanceLabel))
impedanceDF
#data
#se 

startIndImpedances: 295
endIndImpedances: 402
ZXXR
ZXXI
ZXX.VAR
ZXYR
ZXYI
ZXY.VAR
ZYXR
ZYXI
ZYX.VAR
ZYYR
ZYYI
ZYY.VAR


Unnamed: 0,ZXXR,ZXXI,ZXX.VAR,ZXYR,ZXYI,ZXY.VAR,ZYXR,ZYXI,ZYX.VAR,ZYYR,ZYYI,ZYY.VAR
0,367.010925,331.371613,7226.53955,-1093.21973,-657.220154,18359.4414,-391.723572,-517.906982,38721.3984,-1675.78271,-1050.82922,98373.9531
1,306.581604,495.610596,844.390137,-341.106293,112.650757,8051.26611,-753.217468,-686.681396,2515.08813,-697.920837,-150.388245,23981.3848
2,323.175903,325.9505,500.675842,115.803909,119.785774,3154.01587,-882.499756,-499.462219,1389.26306,321.956177,-177.594162,8751.68652
3,141.722778,442.028656,472.750183,393.872955,-545.837524,2136.89722,-978.120911,-329.152039,622.111145,548.287231,-187.278702,2812.02979
4,-266.792267,686.77832,1314.92419,1229.47485,-1107.36743,8289.16992,-1298.83069,-305.44931,1112.01624,1123.63953,-271.982727,7010.05566
5,-319.259308,-205.590057,1086.69861,1152.98901,569.431763,3532.70654,-595.929382,-504.314697,569.315613,615.076538,547.650146,1850.76624
6,-61.971527,-35.475929,833.264771,61.100998,126.169968,2252.47388,-182.721603,-147.34877,1041.3551,32.716114,57.317795,2814.98218
7,-42.160847,-246.904785,130.138062,707.872803,823.747742,456.483582,-508.417694,-536.096191,278.142944,332.330872,600.131592,975.638428
8,15.102742,-38.368511,1279.59094,171.901123,638.413269,1370.85999,-98.669571,-56.908051,457.213959,4.779356,37.718197,489.825531
9,24.236101,0.444963,252.236435,245.412659,300.017944,312.407532,-317.878723,-425.340179,510.252533,148.002792,168.98468,631.97345


In [109]:
# Now lets get the stats of the impedance data
#Make Array of Max Vallues
impedanceMax = []
for i in range (0,len(impedanceLabel)):
    impedanceMax.append(impedanceDF[(impedanceLabel[i])].max())
    
impedanceMin = []
for i in range (0,len(impedanceLabel)):
    impedanceMin.append(impedanceDF[(impedanceLabel[i])].min())

impedanceMin

[-319.259308,
 -246.904785,
 20.3838634,
 -1093.21973,
 -1107.36743,
 4.19846487,
 -1298.83069,
 -686.681396,
 48.1407127,
 -1675.78271,
 -1050.82922,
 21.5212269]

Here we load the tipper parameters
>!****TIPPER PARAMETERS****!

In [110]:
# Import entity and attributes - !****TIPPER PARAMETERS****! plan to break some of these individual chunks into objects/functions
# Probably will need two functions for this - one for a single list and one for the long lists with more than one column
# Get Range of Impedance Values in EDI File
for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == '>!****TIPPERPARAMETERS****!':
   startIndTipper = k + 1
   print ('startIndTipper: ' + str(startIndTipper))
 
 if value.replace(" ", "") ==  '>!****COMPUTEDPARAMETERS****!':
   endIndTipper = k - 1
   print ('endIndTipper: ' + str(endIndTipper))

#Construct Array of Channel Headers   
count = 0
tipperLabel = []
tipperData = []
tdata = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
tipperDF = pd.DataFrame(tdata)
for l in range(startIndTipper,endIndTipper): 
    if list_[l][0] == '>':
     ttemp = list_[l].split(" ", 1)
     #print (ttemp)
     tipperLabel.append((ttemp[0].split(">"))[1])
     tdataTemp = list_[l+1]
     for j in range(l+2,l+8):
      tdataTemp = tdataTemp + list_[j]
      tdataTemp = re.sub(' +', ' ',tdataTemp)
      tdata = tdataTemp.split(" ")
     #print (tdata)
     del tdata[0]
     del tdata[len(tdata)-1] # need to check for empty strings and delete
     tdata = np.array(tdata).astype(np.float) #convert String to floats
     te = pd.Series(tdata)
     print ((ttemp[0].split(">"))[1])   
     tipperDF[((ttemp[0].split(">"))[1])] = te.values
    
    count = count + 1

#tipperDF = pd.DataFrame(tdata, columns=(tipperLabel))
tipperDF
#tdata
#te 

startIndTipper: 404
endIndTipper: 457
TXR.EXP
TXI.EXP
TXVAR.EXP
TYR.EXP
TYI.EXP
TYVAR.EXP


Unnamed: 0,TXR.EXP,TXI.EXP,TXVAR.EXP,TYR.EXP,TYI.EXP,TYVAR.EXP
0,0.134743,0.023227,0.006163,0.815096,0.106846,0.015658
1,-0.37488,0.127913,0.006292,0.189078,-0.049881,0.059999
2,0.283068,0.12708,0.004819,-0.991411,0.018533,0.030358
3,0.194199,-0.046293,0.000132,-0.221568,-0.281418,0.000598
4,0.092244,0.132269,0.001762,-0.032708,-0.502063,0.011108
5,0.165504,-0.107432,0.000227,0.338185,0.004927,0.000737
6,0.39028,-0.108327,0.005069,0.976413,-0.251144,0.013702
7,0.170524,-0.100838,0.000263,0.162084,0.065559,0.000921
8,0.37016,0.116904,0.011584,0.300582,-0.004274,0.012411
9,0.254276,0.012799,0.001691,0.281947,-0.039079,0.002094


In [111]:
# Now lets get the stats of the tipper data

# Make Array of Max Values
tipperMax = []
for i in range (0,len(tipperLabel)):
    tipperMax.append(tipperDF[(tipperLabel[i])].max())
print ('Tipper Max: ' + str(tipperMax))    

# Make Array of Min Values
tipperMin = []
for i in range (0,len(tipperLabel)):
    tipperMin.append(tipperDF[(tipperLabel[i])].min())
print ('Tipper Min: ' + str(tipperMin))

Tipper Max: [2.717031, 0.132268548, 1.49873149, 0.976413071, 0.106846295, 0.740545928]
Tipper Min: [-0.374880165, -2.05637932, 0.000132251458, -0.991410553, -0.637328267, 0.000375486881]


Here we load the computed parameters
>!****COMPUTED PARAMETERS****!

In [112]:
# Import entity and attributes - !****COMPUTED PARAMETERS****! plan to break some of these individual chunks into objects/functions
# Probably will need two functions for this - one for a single list and one for the long lists with more than one column
# Get Range of Impedance Values in EDI File
for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == '>!****COMPUTEDPARAMETERS****!':
   startIndPar = k + 1
   print ('startIndPar: ' + str(startIndPar))
 
 if value.replace(" ", "") ==  '>END':
   endIndPar = k - 1
   print ('endIndPar: ' + str(endIndPar))

#Construct Array of Channel Headers   
count = 0
parLabel = []
parData = []
pdata = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
parDF = pd.DataFrame(pdata)
for l in range(startIndPar,endIndPar): 
    if list_[l][0] == '>':
     ptemp = list_[l].split(" ", 1)
     #print (ptemp)
     parLabel.append((ptemp[0].split(">"))[1])
     pdataTemp = list_[l+1]
     for j in range(l+2,l+8):
      pdataTemp = pdataTemp + list_[j]
      pdataTemp = re.sub(' +', ' ',pdataTemp)
      pdata = pdataTemp.split(" ")
     #print (pdata)
     del pdata[0]
     del pdata[len(pdata)-1] # need to check for empty strings and delete
     pdata = np.array(pdata).astype(np.float) #convert String to floats
     pe = pd.Series(pdata)
     print ((ptemp[0].split(">"))[1])   
     parDF[((ptemp[0].split(">"))[1])] = te.values
    
    count = count + 1

parDF
#pdata
#pe 

startIndPar: 459
endIndPar: 818
RHOROT
RHOXX
RHOXX.ERR
RHOXY
RHOXY.ERR
RHOYX
RHOYX.ERR
RHOYY
RHOYY.ERR
PHSXX
PHSXX.ERR
PHSXY
PHSXY.ERR
PHSYX
PHSYX.ERR
PHSYY
PHSYY.ERR
TIPMAG
TIPMAG.ERR
TIPPHS
TIPPHS.ERR
ZSTRIKE
ZSKEW
TSTRIKE
COH
COH
COH
COH
EPREDCOH
EPREDCOH
SIGAMP
SIGAMP
SIGAMP
SIGAMP
SIGAMP
SIGNOISE
SIGNOISE
SIGNOISE
SIGNOISE
SIGNOISE


Unnamed: 0,RHOROT,RHOXX,RHOXX.ERR,RHOXY,RHOXY.ERR,RHOYX,RHOYX.ERR,RHOYY,RHOYY.ERR,PHSXX,...,TIPMAG.ERR,TIPPHS,TIPPHS.ERR,ZSTRIKE,ZSKEW,TSTRIKE,COH,EPREDCOH,SIGAMP,SIGNOISE
0,0.015658,0.015658,0.015658,0.015658,0.015658,0.015658,0.015658,0.015658,0.015658,0.015658,...,0.015658,0.015658,0.015658,0.015658,0.015658,0.015658,0.015658,0.015658,0.015658,0.015658
1,0.059999,0.059999,0.059999,0.059999,0.059999,0.059999,0.059999,0.059999,0.059999,0.059999,...,0.059999,0.059999,0.059999,0.059999,0.059999,0.059999,0.059999,0.059999,0.059999,0.059999
2,0.030358,0.030358,0.030358,0.030358,0.030358,0.030358,0.030358,0.030358,0.030358,0.030358,...,0.030358,0.030358,0.030358,0.030358,0.030358,0.030358,0.030358,0.030358,0.030358,0.030358
3,0.000598,0.000598,0.000598,0.000598,0.000598,0.000598,0.000598,0.000598,0.000598,0.000598,...,0.000598,0.000598,0.000598,0.000598,0.000598,0.000598,0.000598,0.000598,0.000598,0.000598
4,0.011108,0.011108,0.011108,0.011108,0.011108,0.011108,0.011108,0.011108,0.011108,0.011108,...,0.011108,0.011108,0.011108,0.011108,0.011108,0.011108,0.011108,0.011108,0.011108,0.011108
5,0.000737,0.000737,0.000737,0.000737,0.000737,0.000737,0.000737,0.000737,0.000737,0.000737,...,0.000737,0.000737,0.000737,0.000737,0.000737,0.000737,0.000737,0.000737,0.000737,0.000737
6,0.013702,0.013702,0.013702,0.013702,0.013702,0.013702,0.013702,0.013702,0.013702,0.013702,...,0.013702,0.013702,0.013702,0.013702,0.013702,0.013702,0.013702,0.013702,0.013702,0.013702
7,0.000921,0.000921,0.000921,0.000921,0.000921,0.000921,0.000921,0.000921,0.000921,0.000921,...,0.000921,0.000921,0.000921,0.000921,0.000921,0.000921,0.000921,0.000921,0.000921,0.000921
8,0.012411,0.012411,0.012411,0.012411,0.012411,0.012411,0.012411,0.012411,0.012411,0.012411,...,0.012411,0.012411,0.012411,0.012411,0.012411,0.012411,0.012411,0.012411,0.012411,0.012411
9,0.002094,0.002094,0.002094,0.002094,0.002094,0.002094,0.002094,0.002094,0.002094,0.002094,...,0.002094,0.002094,0.002094,0.002094,0.002094,0.002094,0.002094,0.002094,0.002094,0.002094


In [113]:
# Now lets get the stats of the computed parameters

# Make Array of Max Values
parMax = []
for i in range (0,len(parLabel)):
    parMax.append(parDF[(parLabel[i])].max())
print ('Computed Pararmeters Max: ' + str(parMax))    

# Make Array of Min Values
parMin = []
for i in range (0,len(parLabel)):
    parMin.append(parDF[(parLabel[i])].min())
print ('Computed Pararmeters Min: ' + str(parMin))

Computed Pararmeters Max: [0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928, 0.740545928]
Computed Pararmeters Min: [0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.000375486881, 0.00037548

## We have what we need to build the entity and attribute section for the EDI file so let's do that below:

In [114]:
#Create EDI file EandA section

        ediEandA = '\t\t<detailed>\n\t\t\t<enttypl>Text File ' + rspFileListing [i] + '</enttypl>\n' \
        + '\t\t\t<enttypd>System Calibration File</enttypd>\n\t\t\t<enttypds>Electromagnetic Instruments (EMI)</enttypds>\n\t\t</enttyp>\n' \
        + '\t\t\t<attr>\n\t\t\t<attrlabl>Freq</attrlabl>\n\t\t\t<attrdef>Frequency - Hz</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>' \
        + '\n\t\t\t<attrdomv>\n\t\t\t\t<rdom>Frequency - Hz</rdom>\n\t\t\t\t<rdommin>' \
        + strFreqMinRSP + '</rdommin>\n\t\t\t\t<rdommax>' + strFreqMaxRSP + '</rdommax>\n\t\t\t\t' \
        + '<attrunit>Hz</attrunit>\n\t\t\t</attrdomv>\n\t\t</attr>\n' \
        + '\t\t\t<attr>\n\t\t\t<attrlabl>Amp</attrlabl>\n\t\t\t<attrdef>Amplitude - Volts/Gamma</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>' \
        + '\n\t\t\t<attrdomv>\n\t\t\t\t<rdom>Amplitude - Volts/Gamma</rdom>\n\t\t\t\t<rdommin>' \
        + strAmpMinRSP + '</rdommin>\n\t\t\t\t<rdommax>' + strAmpMaxRSP + '</rdommax>\n\t\t\t\t</rdommax>\n\t\t\t\t' \
        + '<attrunit>Volts/Gamma</attrunit>\n\t\t\t</attrdomv>\n\t\t</attr>\n' \
        + '\t\t\t<attr>\n\t\t\t<attrlabl>Phz</attrlabl>\n\t\t\t<attrdef>Phase - Degrees</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>' \
        + '\n\t\t\t<attrdomv>\n\t\t\t\t<rdom>Phase - Degrees</rdom>\n\t\t\t\t<rdommin>' \
        + strGammaMinRSP + '</rdommin>\n\t\t\t\t<rdommax>' + strGammaMaxRSP + '</rdommax>\n\t\t\t\t</rdommax>\n\t\t\t\t' \
        + '<attrunit>Degrees</attrunit>\n\t\t\t</attrdomv>\n\t\t</attr>\n' \
        + '\t\t</attr>\n\t</detailed>'

IndentationError: unexpected indent (<ipython-input-114-fb0f010acd16>, line 3)

## Now lets get the range of values from the RSP values

In [143]:
#First Get the list of RSP files
rspList = glob.glob(os.path.join(mtStationPath, '*.RSP'),  recursive=True)
#rspList
fileListing = ''
rspFileListing = []
for i in range(len(rspList)):
    splitRspList = rspList[i].split('\\')
    fileListing = fileListing + '\t\t\t\t\t\t' + splitRspList[len(splitRspList) - 1] + '\n'
    rspFileListing.append(splitRspList[len(splitRspList) - 1])
print (fileListing)


						BF6-9621.RSP
						BF6-9624.RSP
						BF6-9625.RSP
						EF-9515X.RSP
						EF-9515Y.RSP



## Now the raw Binary File Listing - this can be T files or W files
We will need to figure out the best way of filtering on thise - may need to build array and then delete AVG, dmp and edi file.

These are listed in the edi file as well but they are not all there.  

    ProcessingTimeSeriesUsed:
         wp01A1.bp1                                                                     
         wp01A2.bp1                                                                     
         wp01A1.sd6                                                                     
         wp01A2.sd6                                                                     
         wp01A1.sd7                                                                     
         wp01A2.sd8                                                                     
         wp01A2_3.sd9 

- Which files need to be included in the data release?
- What is the best way to get this listing?

In [144]:
#Add the raw frequency files to the list except the AVG, dmp and edi file
binList = glob.glob(os.path.join(mtStationPath, 'WP*.*'),  recursive=True)
#binList
for i in range(len(binList)):
  splitBinList = binList[i].split('\\')
  if splitBinList[len(splitBinList) - 1].find('AVG') == -1 and splitBinList[len(splitBinList) - 1].find('dmp') == -1 and splitBinList[len(splitBinList) - 1].find('edi') == -1:
   fileListing = fileListing + '\t\t\t\t\t\t' + splitBinList[len(splitBinList) - 1] + '\n'

print ('File ListingfileListing:\n' + fileListing)

File ListingfileListing:
						BF6-9621.RSP
						BF6-9624.RSP
						BF6-9625.RSP
						EF-9515X.RSP
						EF-9515Y.RSP



In [145]:
#Now finally add the processed ASCII text files to the list
txtList = glob.glob(os.path.join(mtStationPath, '*.txt'),  recursive=True)
#txtList
for i in range(len(txtList)):
  splitTxtList = txtList[i].split('\\')
  fileListing = fileListing + '\t\t\t\t\t\t' + splitTxtList[len(splitTxtList) - 1] + '\n'

print ('File ListingfileListing:\n' + fileListing)

File ListingfileListing:
						BF6-9621.RSP
						BF6-9624.RSP
						BF6-9625.RSP
						EF-9515X.RSP
						EF-9515Y.RSP
						readme.txt
						USA-Arkansas-Buffalo_River-2017-AMT050-FC6_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT050-FC6_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT050-FC6_03.txt
						USA-Arkansas-Buffalo_River-2017-AMT050-FC7_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT050-FC7_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT050-FC7_03.txt
						USA-Arkansas-Buffalo_River-2017-AMT050-FC8_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT050-FC8_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT050-FC9_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT050-FC9_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT050-FCA_AA.txt
						USA-Arkansas-Buffalo_River-2017-AMT050-FCB_AB.txt
						USA-Arkansas-Buffalo_River-2017-AMT050-FCC_AC.txt
						USA-Arkansas-Buffalo_River-2017-AMT050-FCC_AD.txt
						USA-Arkansas-Buffalo_River-2017-AMT050-FCD_AE.txt
						USA-Arkansas-Buff

# Now get values and stats on the .RSP files that are listed to add to the Ent. and Att. information

### These files are all fixed width format (666) but it looks like the BFS*.RSP and EF*.RSP are slightly different beasts that have different formats

In [146]:
# Load the BFS*.RSP files into pandasand create entities and attributes
allRspEandP = ''
strFreqMaxRSP = ''
strAmpMaxRSP = ''
strGammaMaxRSP = ''
strFreqMinRSP = ''
strAmpMinRSP = ''
strGammaMinRSP = ''
intNumberBSF = 0
intNumberEFF = 0
for i in range(len(rspList)):
#for i in range(3):
    if rspList[i].find('BF') > 0: #Do this if the file has a BF style Fixed Width Format
        dfRSP = pd.read_fwf(rspList[i], widths=[6,6,6], skiprows=5, parse_dates=True).rename(columns={'31':'Freq', '1':'Amp', 'Unnamed: 2':'Gamma'})
#print (rspList[1])
        #dfRSP    
    
# Now lets get the stats of the RSP data

# Make Array of Max Values
        FreqMaxRSP = dfRSP['Freq'].max()
        strFreqMaxRSP = str(FreqMaxRSP)
        AmpMaxRSP = dfRSP['Amp'].max()
        strAmpMaxRSP = str(AmpMaxRSP)
        GammaMaxRSP = dfRSP['Gamma'].max()
        strGammaMaxRSP = str(GammaMaxRSP)
        FreqMinRSP = dfRSP['Freq'].min()
        strFreqMinRSP = str(FreqMinRSP)
        AmpMinRSP = dfRSP['Amp'].min()
        strAmpMinRSP = str(AmpMinRSP)
        GammaMinRSP = dfRSP['Gamma'].min()
        strGammaMinRSP = str(GammaMinRSP)
# now print RSP entity and attribute    
        rspEandA = '\t\t<detailed>\n\t\t\t<enttypl>Text File ' + rspFileListing [i] + '</enttypl>\n' \
        + '\t\t\t<enttypd>System Calibration File</enttypd>\n\t\t\t<enttypds>Electromagnetic Instruments (EMI)</enttypds>\n\t\t</enttyp>\n' \
        + '\t\t\t<attr>\n\t\t\t<attrlabl>Freq</attrlabl>\n\t\t\t<attrdef>Frequency - Hz</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>' \
        + '\n\t\t\t<attrdomv>\n\t\t\t\t<rdom>Frequency - Hz</rdom>\n\t\t\t\t<rdommin>' \
        + strFreqMinRSP + '</rdommin>\n\t\t\t\t<rdommax>' + strFreqMaxRSP + '</rdommax>\n\t\t\t\t' \
        + '<attrunit>Hz</attrunit>\n\t\t\t</attrdomv>\n\t\t</attr>\n' \
        + '\t\t\t<attr>\n\t\t\t<attrlabl>Amp</attrlabl>\n\t\t\t<attrdef>Amplitude - Volts/Gamma</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>' \
        + '\n\t\t\t<attrdomv>\n\t\t\t\t<rdom>Amplitude - Volts/Gamma</rdom>\n\t\t\t\t<rdommin>' \
        + strAmpMinRSP + '</rdommin>\n\t\t\t\t<rdommax>' + strAmpMaxRSP + '</rdommax>\n\t\t\t\t</rdommax>\n\t\t\t\t' \
        + '<attrunit>Volts/Gamma</attrunit>\n\t\t\t</attrdomv>\n\t\t</attr>\n' \
        + '\t\t\t<attr>\n\t\t\t<attrlabl>Phz</attrlabl>\n\t\t\t<attrdef>Phase - Degrees</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>' \
        + '\n\t\t\t<attrdomv>\n\t\t\t\t<rdom>Phase - Degrees</rdom>\n\t\t\t\t<rdommin>' \
        + strGammaMinRSP + '</rdommin>\n\t\t\t\t<rdommax>' + strGammaMaxRSP + '</rdommax>\n\t\t\t\t</rdommax>\n\t\t\t\t' \
        + '<attrunit>Degrees</attrunit>\n\t\t\t</attrdomv>\n\t\t</attr>\n' \
        + '\t\t</attr>\n\t</detailed>'
        print ('i Loop = ' + str(i))
        allRspEandP = allRspEandP +  rspEandA + '\n'
    if rspList[i].find('EF') > 0:
        print ('i Loop = ' + str(i) + ' EF File Found')
        dfRSP = pd.read_fwf(rspList[i], widths=[6,6,6,6,6,6,6,6,6], skiprows=5, parse_dates=True)\
        .rename(columns={'42':'Freq', '4':'Amp1', 'Low F':'Phz1', 'requen':'Amp2', 'cy':'Phz2', '10Hz':'Amp3', 'Out':'Phz3',\
        'Unnamed: 7':'Amp4', 'Unnamed: 8':'Phz4'})
#print (rspList[1])
dfRSP    
EandA = allRspEandP #[:-2] # this removes last two characters of the string.  
# Now lets get the stats of the RSP data

# Make Array of Max Values
        #FreqMaxRSP = dfRSP['Freq'].max()
        #strFreqMaxRSP = str(FreqMaxRSP)
        #AmpMaxRSP = dfRSP['Amp'].max()
        #strAmpMaxRSP = str(AmpMaxRSP)
        #GammaMaxRSP = dfRSP['Gamma'].max()
        #strGammaMaxRSP = str(GammaMaxRSP)
        #FreqMinRSP = dfRSP['Freq'].min()
        #strFreqMinRSP = str(FreqMinRSP)
        #AmpMinRSP = dfRSP['Amp'].min()
        #strAmpMinRSP = str(AmpMinRSP)
        #GammaMinRSP = dfRSP['Gamma'].min()
        #strGammaMinRSP = str(GammaMinRSP)
# now print RSP entity and attribute    
        #rspEandA = '\t\t<detailed>\n\t\t\t<enttypl>Text File ' + rspFileListing [i] + '</enttypl>\n' \
        #+ '\t\t\t<enttypd>System Calibration File</enttypd>\n\t\t\t<enttypds>Electromagnetic Instruments (EMI)</enttypds>\n\t\t</enttyp>\n' \
        #+ '\t\t\t<attr>\n\t\t\t<attrlabl>Freq</attrlabl>\n\t\t\t<attrdef>Frequency - Hz</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>' \
        #+ '\n\t\t\t<attrdomv>\n\t\t\t\t<rdom>Frequency - Hz</rdom>\n\t\t\t\t<rdommin>' \
        #+ strFreqMinRSP + '</rdommin>\n\t\t\t\t<rdommax>' + strFreqMaxRSP + '</rdommax>\n\t\t\t\t' \
        #+ '<attrunit>Hz</attrunit>\n\t\t\t</attrdomv>\n\t\t</attr>\n' \
        #+ '\t\t\t<attr>\n\t\t\t<attrlabl>Amp</attrlabl>\n\t\t\t<attrdef>Amplitude - Volts/Gamma</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>' \
        #+ '\n\t\t\t<attrdomv>\n\t\t\t\t<rdom>Amplitude - Volts/Gamma</rdom>\n\t\t\t\t<rdommin>' \
        #+ strAmpMinRSP + '</rdommin>\n\t\t\t\t<rdommax>' + strAmpMaxRSP + '</rdommax>\n\t\t\t\t</rdommax>\n\t\t\t\t' \
        #+ '<attrunit>Volts/Gamma</attrunit>\n\t\t\t</attrdomv>\n\t\t</attr>\n' \
        #+ '\t\t\t<attr>\n\t\t\t<attrlabl>Phz</attrlabl>\n\t\t\t<attrdef>Phase - Degrees</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>' \
        #+ '\n\t\t\t<attrdomv>\n\t\t\t\t<rdom>Phase - Degrees</rdom>\n\t\t\t\t<rdommin>' \
        #+ strGammaMinRSP + '</rdommin>\n\t\t\t\t<rdommax>' + strGammaMaxRSP + '</rdommax>\n\t\t\t\t</rdommax>\n\t\t\t\t' \
        #+ '<attrunit>Degrees</attrunit>\n\t\t\t</attrdomv>\n\t\t</attr>\n' \
        #+ '\t\t</attr>\n\t</detailed>\n'
        #print ('i Loop = ' + str(i))
        #allRspEandP = allRspEandP + rspEandA
#allRspEandP

#Create Test File
##EPfilename = r"C:\CurrentWork\DataManagement\SquamataMT\testJunk.xml"
##EPFile = open(EPfilename,"w+")
##print(EPfilename)
##EPFile.write(allRspEandP)
##EPFile.close()

i Loop = 0
i Loop = 1
i Loop = 2
i Loop = 3 EF File Found
i Loop = 4 EF File Found


# Populate Metadata Template

In [147]:
#Load XML Metadata Template File and Read It
metaData = os.path.join(mtMataDataTemplatePath, mtMataDataTemplateName)
xmlTemplateFile = open(metaData, 'r')
metaDataContent = xmlTemplateFile.readlines()
print(metaDataContent)
xmlTemplateFile.close()


['<?xml version="1.0" encoding="UTF-8"?>\n', '<metadata>\n', '\t<idinfo>\n', '\t\t<citation>\n', '\t\t\t<citeinfo>\n', '\t\t\t\t<origin>Rodriguez, B. D.</origin>\n', '\t\t\t\t<origin>Brown, P. J.</origin>\n', '\t\t\t\t<pubdate>2018</pubdate>\n', '\t\t\t\t<title>{title}</title>\n', '\t\t\t\t<edition>1</edition>\n', '\t\t\t\t<geoform>ASCII and Binary Digital Data</geoform>\n', '\t\t\t\t<pubinfo>\n', '\t\t\t\t\t<pubplace>Denver, CO</pubplace>\n', '\t\t\t\t\t<publish>U.S. Geological Survey</publish>\n', '\t\t\t\t</pubinfo>\n', '\t\t\t\t<othercit>Additional information about Originators:Rodriguez, B.D., http://orcid.org/0000-0002-2263-611X; Brown, P.J., http://orcid.org/0000-0002-2415-7462</othercit>\n', '\t\t\t\t<onlink>{onlink}</onlink>\n', '\t\t\t\t<lworkcit>\n', '\t\t\t\t\t<citeinfo>\n', '\t\t\t\t\t\t<origin>Ailes, C. E.</origin>\n', '\t\t\t\t\t\t<origin>Rodriguez, B. D.</origin>\n', '\t\t\t\t\t\t<pubdate>2011</pubdate>\n', '\t\t\t\t\t\t<title>Audiomagnetotelluric data, Taos Plateau Vol

In [148]:
# Replace values of current metadata template with the appropriate values.  
# All of this input should have been defined when going through the steps outlined above.
lineString = ''
newMetaDataContent = metaDataContent
splitFileName = ediList[0].split('.')
myfilename = splitFileName[0] + '.xml'
xmlFile = open(myfilename,"w+")
print(myfilename)
#print(keywords.value)
for i in range(len(metaDataContent)):
    lineString = metaDataContent[i]
    if lineString.find('{title}'):
     lineString = lineString.replace('{title}', drTitle)
    
    if lineString.find('{abstract}'):
     lineString = lineString.replace('{abstract}', purposeClean)
    
    if lineString.find('{purpose}'):
     lineString = lineString.replace('{purpose}', descriptionClean)
    
    if lineString.find('{BeginFileListingHere}'):
     lineString = lineString.replace('{BeginFileListingHere}', fileListing)
    
    if lineString.find('{keywords}'):
     lineString = lineString.replace('{keywords}', keywords.value)
    
    if lineString.find('{begdate}'):
     lineString = lineString.replace('{begdate}', begdate)
    
    if lineString.find('{enddate}'):
     lineString = lineString.replace('{enddate}', enddate)
    
    if lineString.find('{SiteLon}'):
     lineString = lineString.replace('{SiteLon}', sitLon)
    
    if lineString.find('{SiteLon}'):
     lineString = lineString.replace('{SiteLat}', sitLat)
    
    if lineString.find('{EandA}'):
     lineString = lineString.replace('{EandA}', EandA)
   
    # {county}

    else:
     lineString = lineString
    xmlFile.write(lineString)
    
    #print (lineString)
     
    
    
#for r in (metaDataContent):
    #newMetaDataContent = metaDataContent.replace('{title}', drTitle)
    #newMetaDataContent = metaDataContent.replace('{keywords}', keywords.value)
xmlFile.close()

print ('Creation of new metadata file is complete\n\n') 
#Load EDI File and Read It
##checkFile = open(open(myfilename, 'r')
##checkFileContent = checkFile.read()
##checkFile.close()
##print(checkFileContent)

C:\DataReleases\Arkansas AMT data release\AMT050\USA-Arkansas-Buffalo_River-2017-AMT050.xml
Creation of new metadata file is complete




### At this point the new metadata file should be created.  Check the result below.  If the created file is OK, run the loop for the rest of the children...

In [149]:
# Show the resulting child xml metadata file example 
#for i in range(len(newMetaDataContent)):
print (newMetaDataContent)

['<?xml version="1.0" encoding="UTF-8"?>\n', '<metadata>\n', '\t<idinfo>\n', '\t\t<citation>\n', '\t\t\t<citeinfo>\n', '\t\t\t\t<origin>Rodriguez, B. D.</origin>\n', '\t\t\t\t<origin>Brown, P. J.</origin>\n', '\t\t\t\t<pubdate>2018</pubdate>\n', '\t\t\t\t<title>{title}</title>\n', '\t\t\t\t<edition>1</edition>\n', '\t\t\t\t<geoform>ASCII and Binary Digital Data</geoform>\n', '\t\t\t\t<pubinfo>\n', '\t\t\t\t\t<pubplace>Denver, CO</pubplace>\n', '\t\t\t\t\t<publish>U.S. Geological Survey</publish>\n', '\t\t\t\t</pubinfo>\n', '\t\t\t\t<othercit>Additional information about Originators:Rodriguez, B.D., http://orcid.org/0000-0002-2263-611X; Brown, P.J., http://orcid.org/0000-0002-2415-7462</othercit>\n', '\t\t\t\t<onlink>{onlink}</onlink>\n', '\t\t\t\t<lworkcit>\n', '\t\t\t\t\t<citeinfo>\n', '\t\t\t\t\t\t<origin>Ailes, C. E.</origin>\n', '\t\t\t\t\t\t<origin>Rodriguez, B. D.</origin>\n', '\t\t\t\t\t\t<pubdate>2011</pubdate>\n', '\t\t\t\t\t\t<title>Audiomagnetotelluric data, Taos Plateau Vol