## SquamataAssemballyAMT - Jupyter notebook for batch releasing Audio Magnetotellic (AMT) data to ScienceBase

This module performs the following operations:
- Create list of data directories.
- Identify files accompanying data release.
- Create file listing for metadata XML markup.
- Identify and load MT EDI file.
- Clean up and reformat harvested values to be XML metadata complient.
- Create User Editable Keywords Listing
- Create entity and attribute XML markup.
- Poplulate metadata template
- Validate metadata; create error log; create HTML and FGDC Text versions of the metadata. (In development - use https://mrdata.usgs.gov/validation/ for validation in the interim)

Known issues needing repair:
- Fix procstep section; do we need this function to collect file information or do we plan on handeling this with boilerplate.

## Future development plans for SquamataSB

- Create all child metadata files from first example created in previous steps. 
- Batch upload files to ScienceBase.
- Batch remove files from ScienceBase. 
- Change ScienceBase parameters such as citation information, add orcid ids, add USGS CMS tags, etc. 

### Instructions
- Create a template xml format that contains boilerplate text common to all childeren in a data release.  Be sure this template contains the approriate curly bracket tags, {SquamataTagExample} used to populate the template using SquamataAMT.

### To execute a function/command select a cell and Hold-Shift + Press-Enter

**The 'r' signifies a string literal. Use for paths.**

Metadata wizard:  Advanced, Open In a jupyter Notebook?
Metadata Wizard 2.o from ScienceBase

In [1]:
# Phil Brown (pbrown@usgs.gov) 2019 Beta
# Working Python 3 Notebook used to facilitate the release of Audio Magnetotelluric (AMT) Data to ScienceBase.

In [2]:
# Test Cell
print ("Jupyter is working.") #To run this cell, hold down Shift and press Enter.

Jupyter is working.


In [19]:
# Load required Libraries
import sys
import os
import zipfile
import csv
#import pysb
import requests
import shutil
from shutil import copyfile
import zipfile
import datetime
import glob
from lxml import etree
import json
import pickle
import shutil
import fileinput
import json
import pandas as pd
import numpy as np
from IPython.core.display import display
from IPython.core.display import HTML
from lxml import etree
##from pymdwizard.core.xml_utils import XMLRecord
##from pymdwizard.core.xml_utils import XMLNode
import re
from ipywidgets import *
from IPython.display import display
from IPython.html.widgets import widgets
import datetime
import dateutil.parser

UsageError: Line magic function `%install_ext` not found.


# 1) Step One - Set Directory Paths
## Please set directory paths below
### Directory paths include
- Data Path: This is the path to the data, data structure should have a directory for each station
- Template Path: The path to the XML metadata template being used for the data.  This template should already include all information common to all child metadata files e.g. originators, larger work citation, etc.

In [6]:
#Set Data Paths - perhaps we'll get a user form to do this some day?
mtDataPath = r"C:\CurrentWork\DataReleases\Arkansas AMT data release" #The 'r' signifies a string literal. Use for paths.
mtMetaDataTemplatePath = r"C:\CurrentWork\DataReleases\Arkansas AMT data release"
mtMetaDataTemplateName = "MT-MetaData_TEMPLATE.xml"

In [7]:
#Check Paths for the fun of it
print ('The MT Data Path is: ' + '"' + mtDataPath + '"')
mtMetaDataTemplatePath + "\\" + mtMetaDataTemplateName

The MT Data Path is: "C:\CurrentWork\DataReleases\Arkansas AMT data release"


'C:\\CurrentWork\\DataReleases\\Arkansas AMT data release\\MT-MetaData_TEMPLATE.xml'

## Now, let's explore our data. 
- What files do we have? 
- What files do we import values from?

In [8]:
#Explore data files and directory structure hosted below the provided provided parent data directory

In [9]:
#Produce directory listing of station (SB Object Children)
#Either set up the root directory with station subdirectories only or delete non-station directories from the list array
mtDataDirList = os.listdir(mtDataPath)
mtDataDirList

['AMT050',
 'AMT070',
 'AMT090',
 'AMT115',
 'AMT140',
 'AMT170',
 'MT-MetaData_TEMPLATE.xml',
 'MT-MetaData_TEMPLATE.xml.bak']

In [23]:
#Let's start with the first station and check the result - we can then loop through the process 
#for the remaining stations in the list.
mtStationPath = mtDataPath + '\\' + mtDataDirList[1]
mtStationPath

'C:\\CurrentWork\\DataReleases\\Arkansas AMT data release\\AMT070'

In [24]:
#Look for EDI file to load
ediList = glob.glob(os.path.join(mtStationPath, '**/*MT*.edi'),  recursive=True)
ediPath = ediList[0]
#ediList
print ('EDI File Path:\n' + ediPath)
ediPathArray = ediPath.split('\\')
ediFile = str(ediPathArray[len(ediPathArray)-1])
print ('EDI File:\n' + ediFile)    
#ediPathArray

EDI File Path:
C:\CurrentWork\DataReleases\Arkansas AMT data release\AMT070\USA-Arkansas-Buffalo_River-2017-AMT070.edi
EDI File:
USA-Arkansas-Buffalo_River-2017-AMT070.edi


## Harvest from the MT EDI file. 
### Parameters include:
- ProductId=USA-New_Mexico-Rio_Grande_Rift-San_Luis_Basin-2009-AMT01.edi
- ExternalUrl Url=https://doi.org/10.5066/F72F7MQ7
- Attachment Filename=https://pubs.usgs.gov/of/2011/1264/report/OF11-1264.pdf
- Survey Purpose Description: 
- Data Description:
- Citation Title=Audiomagnetotelluric data, Taos Plateau Volcanic Field, New Mexico
- Citation Authors=Chad E. Ailes, Brian D. Rodriguez
- Citation Year=2011
- YearCollected=2009
- Country=USA                                  
- Ellipsoid=Clarke 1866                                                          
- Location datum=NAD27 CONUS                                                     
- SITE LATITUDE=36.752985000                                                     
- SITE LONGITUDE=-105.560966167                                                  
- Elevation units="meters"=2608.00                                                                     
- Start=2009-07-21T19:52:03 UTC/GMT
- End=2009-07-21T20:34:20 UTC/GMT
- ProcessingTimeSeriesUsed:
         wp01A1.bp1                                                                     
         wp01A2.bp1                                                                     
         wp01A1.sd6                                                                     
         wp01A2.sd6                                                                     
         wp01A1.sd7                                                                     
         wp01A2.sd8                                                                     
         wp01A2_3.sd9 
- Entities and Attributes:
    - FREQUENCIES
    - IMPEDANCE ROTATION ANGLES
    - IMPEDANCES
    - TIPPER PARAMETERS
    - COMPUTED PARAMETERS


## Lets now import and index values from the EDI Files
- We need these values for the metadata template.  
- We also want to run stats on some of these values for the entity and attributes section

In [25]:
    #Load EDI File and Read It
    ediFile = open(ediPath, 'r')
    ediContent = ediFile.read()
    ediFile.close()
    print(ediContent)


>HEAD                                                                           
                                                                                
  DATAID="Buffalo Natl River"                                                   
  ACQBY=USGS                                                                    
  ACQDATE=2017-08-24
  STATE=Arkansas                                                                
  COUNTY=Newton                                                                 
  UNITS=M                                                                       
  STDVERS=1.0                                                                   
  PROGVERS=GEOTOOLS_2.3                                                         
  PROGDATE=09/16/94                                                             
                                                                                
>INFO   MAXLINES=1000                                                           
       

In [26]:
#Now assign values to the SB MetaDataWizard Template unknowns
list_ = ediContent.splitlines()
list_length = len (list_)

# there are probally easier ways to loop through the below but I like having it all hard coded upfront
# it's easire to track an change for me
# Use the examples provided below to extract additional parameters
# Not that all variables being collected are not necessarily used in populating the template.
# Note that values can be hardcoded into the metadata xml template and/or harvested from the edi file

for X in list_:
  if "ProductId" in X:
    productArray = X.split('=')
    productIdArray = productArray[1].split('.')
    productId = productIdArray[0]
    # We may want to reformat this are parse out this name further for use with a root name based on the Data Release Title?
    productId = productId.replace("-", " ")
    productId = productId.replace("_", " ")
    drTitle = productId
    print ('Child Title: ' + productId)
  if "ExternalUrl Url" in X:
    externalURLArray = X.split('=')
    externalURL = externalURLArray[1]
    print ('<onlink>: ' + externalURL)
  if "STATE" in X:
    stateArray = X.split('=')
    state = stateArray[1].replace('"', "") #remove quotes around state
    print ('State: ' + state)
  if "COUNTY" in X:
    countyArray = X.split('=')
    county = countyArray[1]
    print ('County: ' + county)
  if "Start" in X:
    startArray = X.split('=')
    start = startArray[1]
    print ('Start: ' + start)
  if "End" in X:
    endArray = X.split('=')
    end = endArray[1]
    print ('End: ' + end)
  if "Attachment Filename" in X and "http" in X:
    lgwrklinkArray = X.split('=')
    lgwrklink = lgwrklinkArray[1]
    print ('Attachment Filename Link: ' + lgwrklink)
  if "Citation Title" in X:
    citTitArray = X.split('=')
    citTit = citTitArray[1]
    print ('Citation Title: ' + citTit)
  if "Citation Authors" in X:
    citNamesArray = X.split('=')
    citAuthorsArray = citNamesArray[1].split(',')
    for author in citAuthorsArray:
     author = author.strip()
     print ('Author: '+ author)
  if "Citation Year" in X:
    citYearArray = X.split('=')
    citYear = citYearArray[1]
    print ('Citation Year: ' + citYear)
  if "YearCollected" in X:
    yearColArray = X.split('=')
    yearCol = yearColArray[1]
    print ('Year Collected: ' + yearCol)
  if "Ellipsoid" in X:
    ellipsoidArray = X.split('=')
    ellipsoid = ellipsoidArray[1]
    print ('Ellipsoid: ' + ellipsoid)
  if "Location datum" in X:
    locDatumArray = X.split('=')
    locDatum = locDatumArray[1]
    print ('Local datum: ' + locDatum)
  if "SITE LATITUDE" in X:
    sitLatArray = X.split('=')
    sitLat = sitLatArray[1] # !!! probally need to reformat this to have only 6 significant digits Also need to trim extra spaces !!!
    print ('Site latitude: ' + sitLat)
  if "SITE LONGITUDE" in X:
    sitLonArray = X.split('=')
    sitLon = sitLonArray[1] # !!! probally need to reformat this to have only 6 significant digits Also need to trim extra spaces !!!
    print ('Site longitude: ' + sitLon)
  if "Elevation units" in X:
    elevationStringArray = X.split('=')
    siteElevation = elevationStringArray[2] 
    print ('Site Elevation: ' + siteElevation)
    elevationUnits = elevationStringArray[1].replace('"', "")
    print ('Elevation Units: ' + elevationUnits)
    
# Code below returns values that occupy more than one line
    
for i in range(list_length):
 value = list_[i] 
 if value.replace(" ", "") == 'SurveyPurposeDescription:':
   startIndPurpose = i + 1
   #print ('startIndPurpose: ' + str(startIndPurpose))
 if value.replace(" ", "") == 'DataDescription:':
   endIndPurpose = i - 1
   #print ('endIndPurpose: ' + str(endIndPurpose))
purpose = list_[startIndPurpose]
for j in range(startIndPurpose + 1,endIndPurpose): 
    purpose = purpose + list_[j]
    purposeClean = re.sub(' +', ' ',purpose)
print ('\nAbstract:\n\t' + purposeClean)

for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == 'DataDescription:':
   startIndDescription = k + 1
   #print ('startIndDescription: ' + str(startIndDescription))
 if value.replace(" ", "") == 'FILECREATOR:':
   endIndDescription = k - 9
   #print ('endIndDescription: ' + str(endIndDescription))
description = list_[startIndDescription]
for l in range(startIndDescription + 1,endIndDescription): 
    description = description + list_[l]
    descriptionClean = re.sub(' +', ' ',description)
print ('\nPurpose:\n\t' + descriptionClean)
    

State: Arkansas                                                                
County: Newton                                                                 
Child Title: USA Arkansas Buffalo River 2017 AMT070
<onlink>: https://doi.org/10.5066/P9CIAXC5 
Citation Title: Audiomagnetotelluric data, Buffalo River watershed, Arkansas, 2017
Author: Brian D. Rodriguez and Mark R. Hudson
Citation Year: 2018                                                             
Year Collected: 2017
Ellipsoid: Clarke 1866                                                          
Local datum: NAD27 CONUS                                                     
Site latitude: 36.08805                                                         
Site longitude: -93.31399                                                       
Site Elevation: 661.39                                                
Elevation Units: meters
Start: 2017-08-24T19:27:28 UTC/GMT
End: 2017-08-24T20:44:02 UTC/GMT  
Start: 2017-08-24T19:27:28 

In [27]:
# Now let's format the start time and end time to be what the XML file wants for <begdate> and <enddate>

begdateArr = start.split(' ')
begdate_str = begdateArr[0]
begdate_obj = dateutil.parser.parse(begdate_str)
begdate = begdate_obj.strftime('%Y%m%d')
print('<begdate> ', begdate) 

enddateArr = end.split(' ')
enddate_str = enddateArr[0]
enddate_obj = dateutil.parser.parse(enddate_str)
enddate = enddate_obj.strftime('%Y%m%d')
print('<enddate> ', enddate) 


<begdate>  20170824
<enddate>  20170824


In [28]:
#Now we reformat the lat  and longitude to 6 sig figs as well as trim of any extra spaces
#Brian seems to be stripping out the sig figs now so there is no need.  
#Also may whant to round up instead of just stripping values?

sitLat = sitLat.strip()
##sitLat = sitLat[:-3] 
sitLon = sitLon.strip()
##sitLon = sitLon[:-3]

print ('Site latitude: ' + sitLat)
print ('Site longitude: ' + sitLon)

Site latitude: 36.08805
Site longitude: -93.31399


In [29]:
# Now Reformat county by trimming the extra spaces
county = county.strip()
print ('County: ' + county)

County: Newton


In [30]:
## Create editable keywords example.  
## Example text is created after running this cell
## This text is displayed by running "display(keywords) below
keywords = widgets.Textarea(
    value='\t\t<keywords>\n\t\t\t<theme>\n\t\t\t\t<themekt>ISO 19115 Topic Category</themekt>' \
    + '\n\t\t\t\t<themekey>biota</themekey>\n\t\t\t</theme>\n\t\t\t<theme>\n\t\t\t\t<themekt>None</themekt>' \
    + '\n\t\t\t\t<themekey>impedance</themekey>\n\t\t\t\t<themekey>tipper</themekey>' \
    + '\n\t\t\t\t<themekey>apparent resistivity</themekey>\n\t\t\t\t<themekey>impedance phase</themekey>' \
    + '\n\t\t\t\t<themekey>impedance strike</themekey>\n\t\t\t\t<themekey>MT</themekey>' \
    + '\n\t\t\t\t<themekey>audiomagnetotelluric</themekey>\n\t\t\t\t<themekey>magnetotelluric</themekey>' \
    + '\n\t\t\t\t<themekey>AMT</themekey>\n\t\t\t\t<themekey>sounding</themekey>' \
    + '\n\t\t\t\t<themekey>Geology, Geophysics, and Geochemistry Science Center</themekey>' \
    + '\n\t\t\t\t<themekey>GGGSC</themekey>\n\t\t\t\t<themekey>Mineral Resources Program</themekey>' \
    + '\n\t\t\t\t<themekey>MRP</themekey>\n\t\t\t</theme>\n\t\t\t<theme>\n\t\t\t\t<themekt>USGS Thesaurus</themekt>' \
    + '\n\t\t\t\t<themekey>Magnetic field (earth)</themekey>\n\t\t\t\t<themekey>Geophysics</themekey>' \
    + '\n\t\t\t\t<themekey>GPS measurement</themekey>\n\t\t\t\t<themekey>Electromagnetic surveying</themekey>' \
    + '\n\t\t\t\t<themekey>Magnetic surveying</themekey>\n\t\t\t</theme>\n\t\t\t<place>' \
    + '\n\t\t\t\t<placekt>USGS Geographic Names Information System (GNIS), https://geonames.usgs.gov</placekt>' \
    + '\n\t\t\t\t<placekey>Arkansas</placekey>\n\t\t\t\t<placekey>Buffalo River</placekey>' \
    + '\n\t\t\t\t<placekey>Bear Creek</placekey>' \
    + '\n\t\t\t\t<placekey>' + county + ' County</placekey>\n\t\t\t</place>\n\t\t</keywords>',
    placeholder='Type something',
    #description='String:',
    layout=Layout(width='100%', height='666px'),
    disabled=False
)
print ('Keywords list created.')

Keywords list created.


### Change the text in the textbox below to relflect what should be included as the key words for all child items

Note that changing the text below at any time creates a keywords section of the metadata seen EXACTLY as it is shown below

In [31]:
# Run this cell for key word text to edit.  
# Edit the text in place.  
# When complete move on to the next step

display(keywords)

Entity and Attribute Values for the EDI file.  List !****FREQUENCIES****!,!****IMPEDANCE ROTATION ANGLES****!,!****IMPEDANCES****!,!****COMPUTED PARAMETERS****!

Here we load the frequencies
>!****FREQUENCIES****!

In [32]:
# Import entity and attributes - !****FREQUENCIES****! plan to break some of these individual chunks into objects/functions

# Get Range of Frequency Values in EDI File
for k in range(list_length):

 value = list_[k] 
 if value.replace(" ", "") == '>!****FREQUENCIES****!':
   startIndFrequencies = k + 3
   print ('startIndFrequencies: ' + str(startIndFrequencies))
 
 if value.replace(" ", "") ==  '>!****IMPEDANCEROTATIONANGLES****!':
   endIndFrequencies = k - 1
   print ('endIndFrequencies: ' + str(endIndFrequencies))

frequencyData = []
fdata = []
fdataTemp = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
frequencyDF = pd.DataFrame(fdata)
for j in range(startIndFrequencies,endIndFrequencies):
    fdataTemp = list_[j]
    fdataTemp = re.sub(' +', ' ',fdataTemp)
    fdataTemp = fdataTemp.split(" ")
    del fdataTemp[0]
    fdata = fdata + fdataTemp
    
print (fdata)  
fdata = np.array(fdata).astype(np.float) #convert String to floats
frequencyDF = pd.DataFrame(fdata,columns=['Frequencies'])
frequencyDF


startIndFrequencies: 269
endIndFrequencies: 275
['6.50000000E+03', '4.90000000E+03', '3.55000000E+03', '2.73000000E+03', '2.20000000E+03', '1.87000000E+03', '1.50000000E+03', '1.17000000E+03', '8.85000000E+02', '7.20000000E+02', '5.80000000E+02', '4.60000000E+02', '3.40000000E+02', '2.70000000E+02', '2.10000000E+02', '1.72399994E+02', '1.50000000E+02', '1.22099998E+02', '1.00000000E+02', '8.59400024E+01', '7.90000000E+01', '6.00600014E+01', '4.15000000E+01', '2.83199997E+01', '1.90400009E+01', '1.22100000E+01', '7.32399988E+00', '4.39400005E+00']


Unnamed: 0,Frequencies
0,6500.0
1,4900.0
2,3550.0
3,2730.0
4,2200.0
5,1870.0
6,1500.0
7,1170.0
8,885.0
9,720.0


In [33]:
# Now lets get the stats of the frequency data
#Make Array of Max Vallues
frequencyMax = frequencyDF[('Frequencies')].max()
print ('Max. Frequency: ' + str(frequencyMax))
frequencyMin = frequencyDF[('Frequencies')].min()
print ('Min. Frequency: ' + str(frequencyMin))

Max. Frequency: 6500.0
Min. Frequency: 4.39400005


Here we load the Impedance Rotation Angles
>!****IMPEDANCE ROTATION ANGLES****!

In [34]:
# Import entity and attributes - !****IMPEDANCE ROTATION ANGLES****! plan to break some of these individual chunks into objects/functions
# Get Range of IMPEDANCE ROTATION ANGLES in EDI File
for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == '>!****IMPEDANCEROTATIONANGLES****!':
   startIndROT = k + 3
   print ('startIndROT: ' + str(startIndROT))
 
 if value.replace(" ", "") ==  '>!****IMPEDANCES****!':
   endIndROT = k - 1
   print ('endIndROT: ' + str(endIndROT))

rdata = []
rdataTemp = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
rotationDF = pd.DataFrame(rdata)
for j in range(startIndROT,endIndROT):
    rdataTemp = list_[j]
    rdataTemp = re.sub(' +', ' ',rdataTemp)
    rdataTemp = rdataTemp.split(" ")
    del rdataTemp[0]
    rdata = rdata + rdataTemp
    
print (rdata)  
rdata = np.array(rdata).astype(np.float) #convert String to floats
rotationDF = pd.DataFrame(rdata,columns=['ZROT'])
rotationDF

startIndROT: 279
endIndROT: 285
['0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00']


Unnamed: 0,ZROT
0,0.0
1,0.0
2,0.0
3,0.0
4,0.0
5,0.0
6,0.0
7,0.0
8,0.0
9,0.0


In [35]:
# Now lets get the stats of the rotation data

#Make Array of Max Values
rotationMax = rotationDF[('ZROT')].max()
print ('Max. ZROT: ' + str(frequencyMax))

#Make Array of Min Values
rotationMin = rotationDF[('ZROT')].min()
print ('Min. ZROT: ' + str(frequencyMin))

Max. ZROT: 6500.0
Min. ZROT: 4.39400005


Here we load the impedances
>!****IMPEDANCES****!

In [36]:
# Import entity and attributes - !****IMPEDANCES****! plan to break some of these individual chunks into objects/functions
# Get Range of Impedance Values in EDI File
for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == '>!****IMPEDANCES****!':
   startIndImpedances = k + 1
   print ('startIndImpedances: ' + str(startIndImpedances))
 
 if value.replace(" ", "") ==  '>!****TIPPERPARAMETERS****!':
   endIndImpedances = k - 1
   print ('endIndImpedances: ' + str(endIndImpedances))

#Construct Array of Channel Headers   
count = 0
impedanceLabel = []
impedanceData = []
data = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
impedanceDF = pd.DataFrame(data)
for l in range(startIndImpedances,endIndImpedances): 
    if list_[l][0] == '>':
     temp = list_[l].split(" ", 1)
     #print (temp)
     impedanceLabel.append((temp[0].split(">"))[1])
     dataTemp = list_[l+1]
     for j in range(l+2,l+8):
      dataTemp = dataTemp + list_[j]
      dataTemp = re.sub(' +', ' ',dataTemp)
     data = dataTemp.split(" ")
     del data[0] # need to check for empty strings and delete these from the array of the string can't be converted to a float
     del data[len(data)-1] # need to check for empty strings and delete
     #print (data)
     data = np.array(data).astype(np.float) #convert String to floats
     se = pd.Series(data)
     print ((temp[0].split(">"))[1])   
     impedanceDF[((temp[0].split(">"))[1])] = se.values
    
    count = count + 1

#impedanceDF = pd.DataFrame(data, columns=(impedanceLabel))
impedanceDF
#data
#se 

startIndImpedances: 287
endIndImpedances: 394
ZXXR
ZXXI
ZXX.VAR
ZXYR
ZXYI
ZXY.VAR
ZYXR
ZYXI
ZYX.VAR
ZYYR
ZYYI
ZYY.VAR


Unnamed: 0,ZXXR,ZXXI,ZXX.VAR,ZXYR,ZXYI,ZXY.VAR,ZYXR,ZYXI,ZYX.VAR,ZYYR,ZYYI,ZYY.VAR
0,1065.57751,357.211884,59927.4844,618.584106,-486.526764,208864.141,1251.7124,549.140442,70205.7188,1059.84204,270.060699,244686.672
1,579.262085,256.633911,22954.7461,318.416077,19.273792,110121.258,479.498474,-69.786194,30218.959,60.170555,-612.868591,144970.016
2,185.068314,272.776703,4980.34082,-3.444757,618.570801,26946.3164,-83.267891,-492.342804,7647.64453,-835.666443,-1228.22681,41377.8633
3,162.22467,110.392448,1085.61536,178.804459,366.356445,4665.2041,-205.539444,-468.528442,1768.33337,-587.739258,-792.77124,7599.04102
4,-26.270411,193.291153,434.598846,334.688599,103.523888,1963.95691,-319.910431,-456.56662,446.634064,-282.331482,-530.427185,2018.34424
5,11.565822,83.428268,1005.65271,447.361023,206.430099,2954.82617,-199.54332,-324.801056,596.280579,-128.947235,-495.960846,1752.00208
6,-110.342842,90.335815,986.508789,-1.71272,373.965332,5955.87354,-275.496155,-218.513672,1238.70044,-368.099976,-151.817337,7478.43555
7,-26.440443,31.11882,3206.4353,148.810898,438.073212,18069.3848,-86.469253,-76.551643,1979.66919,59.872913,113.989967,11156.1289
8,-62.733917,-26.778345,1720.94458,32.317169,117.646538,9973.47852,-170.823303,-44.882687,1885.50195,-155.097397,53.17683,10927.1465
9,-119.109741,33.570026,2739.98608,-186.332825,327.466431,17452.7246,-100.359818,-26.406057,2196.75317,-30.766642,161.910309,13992.5293


In [37]:
# Now lets get the stats of the impedance data

# Make Array of Max Values
impedanceMax = []
for i in range (0,len(impedanceLabel)):
    impedanceMax.append(impedanceDF[(impedanceLabel[i])].max())
print ('Impedance Max: ' + str(impedanceMax))

# Make Array of Min Values
impedanceMin = []
for i in range (0,len(impedanceLabel)):
    impedanceMin.append(impedanceDF[(impedanceLabel[i])].min())
print ('Impedance Min: ' + str(impedanceMin))

Impedance Max: [1065.57751, 357.211884, 59927.4844, 618.584106, 618.570801, 208864.141, 1251.7124, 549.140442, 70205.7188, 1059.84204, 270.060699, 244686.672]
Impedance Min: [-148.541885, -26.7783451, 15.2794561, -186.332825, -486.526764, 58.1282158, -319.910431, -492.342804, 1.95360684, -835.666443, -1228.22681, 7.80162477]


Here we load the tipper parameters
>!****TIPPER PARAMETERS****!

In [38]:
# Import entity and attributes - !****TIPPER PARAMETERS****! plan to break some of these individual chunks into objects/functions
# Probably will need two functions for this - one for a single list and one for the long lists with more than one column

# Get Range of TIPPER PARAMETERS in EDI File
for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == '>!****TIPPERPARAMETERS****!':
   startIndTipper = k + 1
   print ('startIndTipper: ' + str(startIndTipper))
 
 if value.replace(" ", "") ==  '>!****COMPUTEDPARAMETERS****!':
   endIndTipper = k - 1
   print ('endIndTipper: ' + str(endIndTipper))

#Construct Array of Channel Headers   
count = 0
tipperLabel = []
tipperData = []
tdata = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
tipperDF = pd.DataFrame(tdata)
for l in range(startIndTipper,endIndTipper): 
    if list_[l][0] == '>':
     ttemp = list_[l].split(" ", 1)
     #print (ttemp)
     tipperLabel.append((ttemp[0].split(">"))[1])
     tdataTemp = list_[l+1]
     for j in range(l+2,l+8):
      tdataTemp = tdataTemp + list_[j]
      tdataTemp = re.sub(' +', ' ',tdataTemp)
      tdata = tdataTemp.split(" ")
     #print (tdata)
     del tdata[0]
     del tdata[len(tdata)-1] # need to check for empty strings and delete
     tdata = np.array(tdata).astype(np.float) #convert String to floats
     te = pd.Series(tdata)
     print ((ttemp[0].split(">"))[1])   
     tipperDF[((ttemp[0].split(">"))[1])] = te.values
    
    count = count + 1

#tipperDF = pd.DataFrame(tdata, columns=(tipperLabel))
tipperDF
#tdata
#te 

startIndTipper: 396
endIndTipper: 449
TXR.EXP
TXI.EXP
TXVAR.EXP
TYR.EXP
TYI.EXP
TYVAR.EXP


Unnamed: 0,TXR.EXP,TXI.EXP,TXVAR.EXP,TYR.EXP,TYI.EXP,TYVAR.EXP
0,-0.665486,0.084047,0.01026,0.00631,0.305008,0.035758
1,-0.501416,-0.019533,0.005649,0.262341,0.123699,0.027102
2,-0.352996,-0.024871,0.003058,0.436723,0.038778,0.016546
3,-0.281586,-0.057647,0.002072,0.562518,-0.000197,0.008902
4,-0.049265,0.020411,0.00094,0.456231,0.006581,0.004246
5,-0.341862,-0.004663,0.002074,0.520874,0.085227,0.006095
6,-0.310928,-0.016964,0.006471,0.69633,0.218187,0.039066
7,-0.361637,0.191704,0.023742,0.55887,0.741859,0.133794
8,-0.628506,0.22793,0.032993,0.13186,0.869534,0.191205
9,-0.48865,0.103331,0.019423,0.415461,0.640408,0.123721


In [39]:
# Now lets get the stats of the tipper data

# Make Array of Max Values
tipperMax = []
for i in range (0,len(tipperLabel)):
    tipperMax.append(tipperDF[(tipperLabel[i])].max())
print ('Tipper Max: ' + str(tipperMax))    

# Make Array of Min Values
tipperMin = []
for i in range (0,len(tipperLabel)):
    tipperMin.append(tipperDF[(tipperLabel[i])].min())
print ('Tipper Min: ' + str(tipperMin))

Tipper Max: [0.216024354, 1.18306506, 0.492599487, 0.696329534, 0.869533837, 1.58599246]
Tipper Min: [-0.772653282, -0.151319191, 0.000366475666, -0.474566609, -1.23376417, 0.00277365581]


Here we load the computed parameters
>!****COMPUTED PARAMETERS****!

In [40]:
# Import entity and attributes - !****COMPUTED PARAMETERS****! plan to break some of these individual chunks into objects/functions
# Probably will need two functions for this - one for a single list and one for the long lists with more than one column
# Get Range of COMPUTED PARAMETERS in EDI File
for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == '>!****COMPUTEDPARAMETERS****!':
   startIndPar = k + 1
   print ('startIndPar: ' + str(startIndPar))
 
 if value.replace(" ", "") ==  '>END':
   endIndPar = k - 1
   print ('endIndPar: ' + str(endIndPar))

#Construct Array of Channel Headers   
count = 0
parLabel = []
parData = []
pdata = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
parDF = pd.DataFrame(pdata)
for l in range(startIndPar,endIndPar): 
    if list_[l][0] == '>':
     ptemp = list_[l].split(" ", 1)
     #print (ptemp)
     parLabel.append((ptemp[0].split(">"))[1])
     pdataTemp = list_[l+1]
     for j in range(l+2,l+8):
      pdataTemp = pdataTemp + list_[j]
      pdataTemp = re.sub(' +', ' ',pdataTemp)
      pdata = pdataTemp.split(" ")
     #print (pdata)
     del pdata[0]
     del pdata[len(pdata)-1] # need to check for empty strings and delete
     pdata = np.array(pdata).astype(np.float) #convert String to floats
     pe = pd.Series(pdata)
     print ((ptemp[0].split(">"))[1])   
     parDF[((ptemp[0].split(">"))[1])] = te.values
    
    count = count + 1

parDF
#pdata
#pe 

startIndPar: 451
endIndPar: 810
RHOROT
RHOXX
RHOXX.ERR
RHOXY
RHOXY.ERR
RHOYX
RHOYX.ERR
RHOYY
RHOYY.ERR
PHSXX
PHSXX.ERR
PHSXY
PHSXY.ERR
PHSYX
PHSYX.ERR
PHSYY
PHSYY.ERR
TIPMAG
TIPMAG.ERR
TIPPHS
TIPPHS.ERR
ZSTRIKE
ZSKEW
TSTRIKE
COH
COH
COH
COH
EPREDCOH
EPREDCOH
SIGAMP
SIGAMP
SIGAMP
SIGAMP
SIGAMP
SIGNOISE
SIGNOISE
SIGNOISE
SIGNOISE
SIGNOISE


Unnamed: 0,RHOROT,RHOXX,RHOXX.ERR,RHOXY,RHOXY.ERR,RHOYX,RHOYX.ERR,RHOYY,RHOYY.ERR,PHSXX,...,TIPMAG.ERR,TIPPHS,TIPPHS.ERR,ZSTRIKE,ZSKEW,TSTRIKE,COH,EPREDCOH,SIGAMP,SIGNOISE
0,0.035758,0.035758,0.035758,0.035758,0.035758,0.035758,0.035758,0.035758,0.035758,0.035758,...,0.035758,0.035758,0.035758,0.035758,0.035758,0.035758,0.035758,0.035758,0.035758,0.035758
1,0.027102,0.027102,0.027102,0.027102,0.027102,0.027102,0.027102,0.027102,0.027102,0.027102,...,0.027102,0.027102,0.027102,0.027102,0.027102,0.027102,0.027102,0.027102,0.027102,0.027102
2,0.016546,0.016546,0.016546,0.016546,0.016546,0.016546,0.016546,0.016546,0.016546,0.016546,...,0.016546,0.016546,0.016546,0.016546,0.016546,0.016546,0.016546,0.016546,0.016546,0.016546
3,0.008902,0.008902,0.008902,0.008902,0.008902,0.008902,0.008902,0.008902,0.008902,0.008902,...,0.008902,0.008902,0.008902,0.008902,0.008902,0.008902,0.008902,0.008902,0.008902,0.008902
4,0.004246,0.004246,0.004246,0.004246,0.004246,0.004246,0.004246,0.004246,0.004246,0.004246,...,0.004246,0.004246,0.004246,0.004246,0.004246,0.004246,0.004246,0.004246,0.004246,0.004246
5,0.006095,0.006095,0.006095,0.006095,0.006095,0.006095,0.006095,0.006095,0.006095,0.006095,...,0.006095,0.006095,0.006095,0.006095,0.006095,0.006095,0.006095,0.006095,0.006095,0.006095
6,0.039066,0.039066,0.039066,0.039066,0.039066,0.039066,0.039066,0.039066,0.039066,0.039066,...,0.039066,0.039066,0.039066,0.039066,0.039066,0.039066,0.039066,0.039066,0.039066,0.039066
7,0.133794,0.133794,0.133794,0.133794,0.133794,0.133794,0.133794,0.133794,0.133794,0.133794,...,0.133794,0.133794,0.133794,0.133794,0.133794,0.133794,0.133794,0.133794,0.133794,0.133794
8,0.191205,0.191205,0.191205,0.191205,0.191205,0.191205,0.191205,0.191205,0.191205,0.191205,...,0.191205,0.191205,0.191205,0.191205,0.191205,0.191205,0.191205,0.191205,0.191205,0.191205
9,0.123721,0.123721,0.123721,0.123721,0.123721,0.123721,0.123721,0.123721,0.123721,0.123721,...,0.123721,0.123721,0.123721,0.123721,0.123721,0.123721,0.123721,0.123721,0.123721,0.123721


In [41]:
# Now lets get the stats of the computed parameters

# Make Array of Max Values
parMax = []
for i in range (0,len(parLabel)):
    parMax.append(parDF[(parLabel[i])].max())
print ('Computed Pararmeters Max: ' + str(parMax))    

# Make Array of Min Values
parMin = []
for i in range (0,len(parLabel)):
    parMin.append(parDF[(parLabel[i])].min())
print ('Computed Pararmeters Min: ' + str(parMin))

Computed Pararmeters Max: [1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246, 1.58599246]
Computed Pararmeters Min: [0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0.00277365581, 0

## Now lets get the range of values from the RSP values

## Now the raw Binary File Listing - this can be T files or W files
We will need to figure out the best way of filtering on thise - may need to build array and then delete AVG, dmp and edi file.

These are listed in the edi file as well but they are not all there.  

    ProcessingTimeSeriesUsed:
         wp01A1.bp1                                                                     
         wp01A2.bp1                                                                     
         wp01A1.sd6                                                                     
         wp01A2.sd6                                                                     
         wp01A1.sd7                                                                     
         wp01A2.sd8                                                                     
         wp01A2_3.sd9 

- Which files need to be included in the data release?
- What is the best way to get this listing?

In [42]:
#First Get the list of RSP files
rspList = glob.glob(os.path.join(mtStationPath, '*.RSP'),  recursive=True)
#rspList
ediFile = str(ediPathArray[len(ediPathArray)-1])
fileListing = ''
fileListing = fileListing + '\t\t\t\t\t\t' + ediFile + '\n' #start the file listing with the main EDI file
rspFileListing = []
for i in range(len(rspList)):
    splitRspList = rspList[i].split('\\')
    fileListing = fileListing + '\t\t\t\t\t\t' + splitRspList[len(splitRspList) - 1] + '\n'
    rspFileListing.append(splitRspList[len(splitRspList) - 1])
print (fileListing)


						USA-Arkansas-Buffalo_River-2017-AMT070.edi
						BF6-9621.RSP
						BF6-9624.RSP
						BF6-9625.RSP
						EF-9515X.RSP
						EF-9515Y.RSP



In [43]:
#No Raw frequency files so skip this step for AMT
#Add the raw frequency files to the list except the AVG, dmp and edi file
'''
binList = glob.glob(os.path.join(mtStationPath, 'WP*.*'),  recursive=True)

for i in range(len(binList)):
  splitBinList = binList[i].split('\\')
  if splitBinList[len(splitBinList) - 1].find('AVG') == -1 and splitBinList[len(splitBinList) - 1].find('dmp') == -1 and splitBinList[len(splitBinList) - 1].find('edi') == -1:
   fileListing = fileListing + '\t\t\t\t\t\t' + splitBinList[len(splitBinList) - 1] + '\n'

print ('File ListingfileListing:\n' + fileListing
'''

"\nbinList = glob.glob(os.path.join(mtStationPath, 'WP*.*'),  recursive=True)\n\nfor i in range(len(binList)):\n  splitBinList = binList[i].split('\\')\n  if splitBinList[len(splitBinList) - 1].find('AVG') == -1 and splitBinList[len(splitBinList) - 1].find('dmp') == -1 and splitBinList[len(splitBinList) - 1].find('edi') == -1:\n   fileListing = fileListing + '\t\t\t\t\t\t' + splitBinList[len(splitBinList) - 1] + '\n'\n\nprint ('File ListingfileListing:\n' + fileListing\n"

In [44]:
#Now finally add the processed ASCII text files to the list
txtList = glob.glob(os.path.join(mtStationPath, '*.txt'),  recursive=True)
txtListFormatted = []
#txtList
print ('Text File Only Listing:\n')
for i in range(len(txtList)):
  splitTxtList = txtList[i].split('\\')
  txtListFormatted.append(splitTxtList[len(splitTxtList) - 1])
  #print (txtListFormatted[i])  
  fileListing = fileListing + '\t\t\t\t\t\t' + splitTxtList[len(splitTxtList) - 1] + '\n'

#print ('File Listing:\n' + fileListing)
#Remove readme file from list.  May need to search an remove if it does not come up first in alphabetical order?'
del txtListFormatted[0]
print (txtListFormatted)

Text File Only Listing:

['USA-Arkansas-Buffalo_River-2017-AMT070-FC6_01.txt', 'USA-Arkansas-Buffalo_River-2017-AMT070-FC6_02.txt', 'USA-Arkansas-Buffalo_River-2017-AMT070-FC6_03.txt', 'USA-Arkansas-Buffalo_River-2017-AMT070-FC7_01.txt', 'USA-Arkansas-Buffalo_River-2017-AMT070-FC7_02.txt', 'USA-Arkansas-Buffalo_River-2017-AMT070-FC8_01.txt', 'USA-Arkansas-Buffalo_River-2017-AMT070-FC8_02.txt', 'USA-Arkansas-Buffalo_River-2017-AMT070-FC9_01.txt', 'USA-Arkansas-Buffalo_River-2017-AMT070-FC9_02.txt', 'USA-Arkansas-Buffalo_River-2017-AMT070-FCA_AA.txt', 'USA-Arkansas-Buffalo_River-2017-AMT070-FCA_AB.txt', 'USA-Arkansas-Buffalo_River-2017-AMT070-FCB_AC.txt', 'USA-Arkansas-Buffalo_River-2017-AMT070-FCB_AD.txt', 'USA-Arkansas-Buffalo_River-2017-AMT070-FCC_AE.txt', 'USA-Arkansas-Buffalo_River-2017-AMT070-FCC_AF.txt', 'USA-Arkansas-Buffalo_River-2017-AMT070-FCD_AG.txt', 'USA-Arkansas-Buffalo_River-2017-AMT070-FCD_AH.txt', 'USA-Arkansas-Buffalo_River-2017-AMT070-FCD_AI.txt', 'USA-Arkansas-Buffal

In [45]:
# Create the Entity and Attributes for the .txt? listing
txtEandA = ''
#As before, we read in the list of files

for i in range(len(txtListFormatted)):
    txtEandA = txtEandA + '\t\t<detailed>\n\t\t\t<enttyp>\n\t\t\t\t<enttypl>Text File ' + txtListFormatted[i] + '</enttypl>\n' \
        + '\t\t\t\t<enttypd>Header file in ASCII text format for raw binary time series files</enttypd>' \
        + '\n\t\t\t\t<enttypds>U.S. Geological Survey</enttypds>\n\t\t\t</enttyp>' \
        + '\n\t\t\t<attr>\n\t\t\t\t<attrlabl>Header Information</attrlabl>\n\t\t\t\t<attrdef>Header description and settings for time series binary content</attrdef>\n\t\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>\n\t\t\t\t<attrdomv>'\
        + '\n\t\t\t\t\t<udom>Header description and settings for time series binary content</udom>\n\t\t\t\t</attrdomv>' \
        + '\n\t\t\t</attr>\n\t\t</detailed>\n'   
    
#Add the final .txt listing for the readme file.  
#Be sure to comment this out if there is no readme.txt file included with the data release

readmeEandA = '\t\t<detailed>\n\t\t<enttyp>\n\t\t\t<enttypl>Text File readme.txt</enttypl>'\
        + '\n\t\t\t<enttypd>Read Me file describing the naming format of the EDI files and that they may need to be renamed to be imported into certain software packages.</enttypd>'\
        + '\n\t\t\t<enttypds>U.S. Geological Survey</enttypds>'\
        + '\n\t\t</enttyp>\n\t\t</detailed>\n'
    
txtEandA = txtEandA + readmeEandA
print (txtEandA)
ALL_EandA = '<eainfo>\n'   
ALL_EandA = ALL_EandA + txtEandA

		<detailed>
			<enttyp>
				<enttypl>Text File USA-Arkansas-Buffalo_River-2017-AMT070-FC6_01.txt</enttypl>
				<enttypd>Header file in ASCII text format for raw binary time series files</enttypd>
				<enttypds>U.S. Geological Survey</enttypds>
			</enttyp>
			<attr>
				<attrlabl>Header Information</attrlabl>
				<attrdef>Header description and settings for time series binary content</attrdef>
				<attrdefs>U.S. Geological Survey</attrdefs>
				<attrdomv>
					<udom>Header description and settings for time series binary content</udom>
				</attrdomv>
			</attr>
		</detailed>
		<detailed>
			<enttyp>
				<enttypl>Text File USA-Arkansas-Buffalo_River-2017-AMT070-FC6_02.txt</enttypl>
				<enttypd>Header file in ASCII text format for raw binary time series files</enttypd>
				<enttypds>U.S. Geological Survey</enttypds>
			</enttyp>
			<attr>
				<attrlabl>Header Information</attrlabl>
				<attrdef>Header description and settings for time series binary content</attrdef>
				<attrdefs>U.S. 

# Now get values and stats on the .RSP files that are listed to add to the Ent. and Att. information

### These files are all fixed width format (666) but it looks like the BFS*.RSP and EF*.RSP are slightly different beasts that have different formats

In [46]:
# Load the BFS*.RSP files into pandasand create entities and attributes
allRspEandP = ''
strFreqMaxBSP = ''
strAmpMaxBSP = ''
strGammaMaxBSP = ''
strFreqMinBSP = ''
strAmpMinBSP = ''
strGammaMinBSP = ''

strFreqMaxESP = ''
strFreqMinESP = ''
strAmp1MaxESP = ''
strAmp1MinESP = ''
strAmp2MaxESP = ''
strAmp2MinESP = ''
strAmp3MaxESP = ''
strAmp3MinESP = ''
strAmp4MaxESP = ''
strAmp4MinESP = ''
strPhz1MaxESP = ''
strPhz1MinESP = ''
strPhz2MaxESP = ''
strPhz2MinESP = ''
strPhz3MaxESP = ''
strPhz3MinESP = ''
strPhz4MaxESP = ''
strPhz4MinESP = ''

intNumberBSF = 0
intNumberEFF = 0
for i in range(len(rspList)):
#for i in range(3):
    if rspList[i].find('BF') > 0: #Do this if the file has a BF style Fixed Width Format
        bfRSP = pd.read_fwf(rspList[i], widths=[6,6,6], skiprows=5, parse_dates=True).rename(columns={'31':'Freq', '1':'Amp', 'Unnamed: 2':'Gamma'})
#print (rspList[1])
        print (bfRSP)    
    
# Now lets get the stats of the bfRSP data

# Make Array of Min and Max Values
        FreqMaxRSP = bfRSP['Freq'].max()
        strFreqMaxBSP = str(FreqMaxRSP)
        AmpMaxRSP = bfRSP['Amp'].max()
        strAmpMaxBSP = str(AmpMaxRSP)
        GammaMaxRSP = bfRSP['Gamma'].max()
        strGammaMaxBSP = str(GammaMaxRSP)
        FreqMinRSP = bfRSP['Freq'].min()
        strFreqMinBSP = str(FreqMinRSP)
        AmpMinRSP = bfRSP['Amp'].min()
        strAmpMinBSP = str(AmpMinRSP)
        GammaMinRSP = bfRSP['Gamma'].min()
        strGammaMinBSP = str(GammaMinRSP)
# now print RSP entity and attribute    
        bspEandA = '\t\t<detailed>\n\t\t<enttyp>\n\t\t\t<enttypl>Text File ' + rspFileListing [i] + '</enttypl>\n' \
        + '\t\t\t<enttypd>System Calibration File</enttypd>\n\t\t\t<enttypds>Electromagnetic Instruments (EMI)</enttypds>\n\t\t</enttyp>\n' \
        + '\t\t\t<attr>\n\t\t\t<attrlabl>Freq</attrlabl>\n\t\t\t<attrdef>Frequency - Hz</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>' \
        + '\n\t\t\t<attrdomv>\n\t\t\t\t<rdom>\n\t\t\t\t<rdommin>' \
        + strFreqMinBSP + '</rdommin>\n\t\t\t\t<rdommax>' + strFreqMaxBSP + '</rdommax>\n\t\t\t\t' \
        + '<attrunit>Hz</attrunit>\n\t\t\t</rdom>\n\t\t\t</attrdomv>\n\t\t</attr>\n' \
        + '\t\t\t<attr>\n\t\t\t<attrlabl>Amp</attrlabl>\n\t\t\t<attrdef>Amplitude - Volts/Gamma</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>' \
        + '\n\t\t\t<attrdomv>\n\t\t\t\t<rdom>\n\t\t\t\t<rdommin>' \
        + strAmpMinBSP + '</rdommin>\n\t\t\t\t<rdommax>' + strAmpMaxBSP + '</rdommax>\n\t\t\t\t' \
        + '<attrunit>Volts/Gamma</attrunit>\n\t\t\t</rdom>\n\t\t\t</attrdomv>\n\t\t</attr>\n' \
        + '\t\t\t<attr>\n\t\t\t<attrlabl>Phz</attrlabl>\n\t\t\t<attrdef>Phase - Degrees</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>' \
        + '\n\t\t\t<attrdomv>\n\t\t\t\t<rdom>\n\t\t\t\t<rdommin>' \
        + strGammaMinBSP + '</rdommin>\n\t\t\t\t<rdommax>' + strGammaMaxBSP + '</rdommax>\n\t\t\t\t' \
        + '<attrunit>Degrees</attrunit>\n\t\t\t</rdom>\n\t\t\t</attrdomv>\n\t\t</attr>\n' \
        + '\t</detailed>'
        print ('i Loop = ' + str(i))
        allRspEandP = allRspEandP +  bspEandA + '\n'
    if rspList[i].find('EF') > 0:
        print ('i Loop = ' + str(i) + ' EF File Found')
        efRSP = pd.read_fwf(rspList[i], widths=[6,6,6,6,6,6,6,6,6], skiprows=5, parse_dates=True)\
        .rename(columns={'42':'Freq', '4':'Amp1', 'Low F':'Phz1', 'requen':'Amp2', 'cy':'Phz2', '10Hz':'Amp3', 'Out':'Phz3',\
        'Unnamed: 7':'Amp4', 'Unnamed: 8':'Phz4'})
#Delete headers from data to remove all strings from Pandas Data Frame

        efRSP = efRSP[efRSP.Amp2 != 'Freque']

# Make Array of Min and Max Values
        strFreqMaxESP = str(efRSP['Freq'].max())
        strFreqMinESP = str(efRSP['Freq'].min())
        strAmp1MaxESP = str(efRSP['Amp1'].max())
        strAmp1MinESP = str(efRSP['Amp1'].min())
        strAmp2MaxESP = str(efRSP['Amp2'].max())
        strAmp2MinESP = str(efRSP['Amp2'].min())
        strAmp3MaxESP = str(efRSP['Amp3'].max())
        strAmp3MinESP = str(efRSP['Amp3'].min())
        strAmp4MaxESP = str(efRSP['Amp4'].max())
        strAmp4MinESP = str(efRSP['Amp4'].min())
        strPhz1MaxESP = str(efRSP['Phz1'].max())
        strPhz1MinESP = str(efRSP['Phz1'].min())
        strPhz2MaxESP = str(efRSP['Phz2'].max())
        strPhz2MinESP = str(efRSP['Phz2'].min())
        strPhz3MaxESP = str(efRSP['Phz3'].max())
        strPhz3MinESP = str(efRSP['Phz3'].min())
        strPhz4MaxESP = str(efRSP['Phz4'].max())
        strPhz4MinESP = str(efRSP['Phz4'].min())
        espEandA = '\t<detailed>\n\t\t\t<enttyp>\n\t\t\t<enttypl>Text File ' + rspFileListing [i] + '</enttypl>\n' \
        + '\t\t\t<enttypd>System Calibration File</enttypd>\n\t\t\t<enttypds>Electromagnetic Instruments (EMI)</enttypds>\n\t\t</enttyp>\n' \
        + '\t\t\t<attr>\n\t\t\t<attrlabl>Freq</attrlabl>\n\t\t\t<attrdef>Frequency - Hz</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>' \
        + '\n\t\t\t<attrdomv>\n\t\t\t\t<rdom>\n\t\t\t\t<rdommin>' \
        + strFreqMinESP + '</rdommin>\n\t\t\t\t<rdommax>' + strFreqMaxESP + '</rdommax>\n\t\t\t\t' \
        + '<attrunit>Hz</attrunit>\n\t\t\t</rdom>\n\t\t\t</attrdomv>\n\t\t</attr>\n' \
        + '\t\t<attr>\n\t\t\t<attrlabl>Amp</attrlabl>\n\t\t\t<attrdef>Amplitude - Volts/Gamma</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>' \
        + '\n\t\t\t<attrdomv>\n\t\t\t\t<rdom>\n\t\t\t\t<rdommin>' \
        + strAmp1MinESP + '</rdommin>\n\t\t\t\t<rdommax>' + strAmp1MaxESP + '</rdommax>\n\t\t\t\t' \
        + '<attrunit>Volts/Gamma</attrunit>\n\t\t\t</rdom>\n\t\t\t</attrdomv>\n\t\t</attr>\n' \
        + '\t\t\t<attr>\n\t\t\t<attrlabl>Phz</attrlabl>\n\t\t\t<attrdef>Phase - Degrees</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>' \
        + '\n\t\t\t<attrdomv>\n\t\t\t\t<rdom>\n\t\t\t\t<rdommin>' \
        + strPhz1MinESP + '</rdommin>\n\t\t\t\t<rdommax>' + strPhz1MaxESP + '</rdommax>\n\t\t\t\t' \
        + '<attrunit>Degrees</attrunit>\n\t\t\t</rdom>\n\t\t\t</attrdomv>\n\t\t</attr>\n' \
        + '\t</detailed>'
        print ('i Loop = ' + str(i))
        allRspEandP = allRspEandP +  espEandA + '\n' 
        print (str(allRspEandP)) 

        
EandA = allRspEandP # [-:2] # this removes the last characters of the string to get rid of the last line return.  
ALL_EandA = ALL_EandA +  allRspEandP


        Freq      Amp  Gamma
0       0.10  0.00191   89.3
1       0.15  0.00287   88.9
2       0.20  0.00383   88.5
3       0.30  0.00574   87.8
4       0.40  0.00764   87.1
5       0.60  0.01140   85.7
6       0.80  0.01520   84.2
7       1.00  0.01890   82.8
8       1.50  0.02810   79.2
9       2.00  0.03700   75.8
10      3.00  0.05360   69.2
11      4.00  0.06820   63.1
12      6.00  0.09130   52.7
13      8.00  0.10700   44.6
14     10.00  0.11800   38.3
15     15.00  0.13600   27.4
16     20.00  0.13790   23.1
17     30.00  0.14800   14.1
18     40.00  0.14800   10.1
19     80.00  0.15000    5.6
20    100.00  0.15000    4.5
21    200.00  0.15190    3.2
22    400.00  0.15190    0.9
23   1000.00  0.15190    4.3
24   2000.00  0.15190   -0.5
25   4000.00  0.15190   -1.7
26   8000.00  0.15500   -4.4
27  10000.00  0.15700   -6.2
28  15000.00  0.15800  -10.2
29  20000.00  0.16300  -14.0
30  25000.00  0.16941  -19.6
i Loop = 0
        Freq      Amp  Gamma
0       0.10  0.00183   89.3
1  

## Now let's work on the SD mode files

### Overview of SD files,

The prefix file naming convention for the SD mode AMT non-transmitter files are:
AANNNRN where AA is the survey area, NNN is the site number, R is the run "number" (run A, run B, run C, etc.), and N is a subset run number (first run is A1, second run is A2, third run is A3, etc.)

The suffix file naming convention for the SD mode AMT non-transmitter files are:
SD6 sample frequencies are always 79, 90, 100, 150, 210, 270, 340, 460, 580 Hertz
SD7 sample frequencies are always 340, 460, 580, 720, 885, 1170 Hertz
SD8 sample frequencies are always 1170, 1500, 1870, 2200, 2730, 3550, 4900, 6500, 9000 Hertz
SD9 sample frequencies are always 6500, 9000, 11590, 15290, 19500, 23370 Hertz

The prefix file naming convention for the SD mode AMT transmitter files are:
AANNNRN where AA is the survey area, NNN is the site number, R is the run "number" (run A, run B, run C, etc.), and N is a subset run "number" (first run is AA, second run is AB, third run is AC, etc.)

The suffix file naming convention for the SD mode AMT transmitter files are:
SDA sample frequency is always 960 Hertz
SDB sample frequency is always 1200 Hertz
SDC sample frequency is always 1870 Hertz
SDD sample frequency is always 2420 Hertz
SDE sample frequency is always 2730 Hertz
SDF sample frequency is always 3550 Hertz
SDG sample frequency is always 5210 Hertz
SDH sample frequency is always 6850 Hertz
SDI sample frequency is always 11590 Hertz
SDJ sample frequency is always 15920 Hertz
SDK sample frequency is always 23370 Hertz

The file format of the SD mode files are described in each of the text files (last block of information within each file). It is the same data format regardless of whether or not a transmitter was used to collect the data. We must have been looking at the text file for the FC (Fourier Coefficient) files in your office, so I was confused that I hadn't described the data format (it is there in the text files for all SD files).



In [47]:
#First get a listing of the SD mode files
#Now finally add the processed ASCII text files to the list
sdList = glob.glob(os.path.join(mtStationPath, '*.sd?'),  recursive=True)
#sdaList
sdFileList = ''
for i in range(len(sdList)):
  splitSdList = sdList[i].split('\\')
  fileListing = fileListing + '\t\t\t\t\t\t' + splitSdList[len(splitSdList) - 1] + '\n'
  sdFileList = sdFileList + splitSdList[len(splitSdList) - 1] + '\n'

#print ('File ListingfileListing:\n' + fileListing)
print ('SD file listing:\n' + sdFileList)


SD file listing:
AR107A1.SD6
AR107A1.SD7
AR107A1.SD8
AR107A1.SD9
AR107A2.SD6
AR107A2.SD7
AR107A2.SD8
AR107A2.SD9
AR107A3.SD6
AR107AA.SDA
AR107AA.SDK
AR107AB.SDA
AR107AB.SDK
AR107AC.SDB
AR107AC.SDK
AR107AD.SDB
AR107AD.SDK
AR107AE.SDC
AR107AE.SDK
AR107AF.SDC
AR107AF.SDJ
AR107AG.SDD
AR107AG.SDJ
AR107AH.SDD
AR107AI.SDD
AR107AJ.SDE
AR107AK.SDE
AR107AL.SDF
AR107AM.SDE
AR107AN.SDE
AR107AO.SDF
AR107AP.SDF
AR107AQ.SDG
AR107AR.SDG
AR107AS.SDH
AR107AT.SDH
AR107AU.SDH
AR107AV.SDI
AR107AW.SDI
AR107AX.SDJ
AR107AY.SDJ
AR107AZ.SDJ



In [48]:
#As before, we read in the list of files

for i in range(len(sdList)):
#for i in range(3):
    #if rspList[i].find('BF') > 0: #Do this if the file has a BF style Fixed Width Format
        dfSD = pd.read_fwf(sdList[i], widths=[15,15,15,15,15], skiprows=27, parse_dates=True).rename(columns={'79.0000  32.':'Amp1', '00000   20   1':'Amp2', 'Unnamed: 2':'Amp3', 'Unnamed: 3':'Amp4', 'Unnamed: 4':'Amp5'})
#Now strip out the header rows in the data as to not screw up calculating the range
        #dfSD = dfSD[dfSD.Amp4 != '00000   20   1']
        print (dfSD)        
    

               Amp1             Amp2      Amp3      Amp4       Amp5
0   78.43617901e-02              0.0 -2.466982 -0.695490   8.685007
1               0.0  -1.73370693e-02 -0.006457  0.062490   0.005230
2   45.53783757e-05              0.0 -0.012838  0.000924   0.039871
3   -1.40428358e-02  27.87193084e-05 -0.000125  0.000214   0.000000
4   17.66267466e-03  -1.98228124e-02 -0.038671  0.080741  -0.000227
5   60.37991003e-05  -3.09672770e-04  0.000310  0.000938   0.000000
6      90.0000  32.   00000   20   1       NaN       NaN        NaN
7   11.95985511e-01              0.0 -3.570451  0.770737  15.447788
8               0.0  -2.14143609e-02  0.004307  0.093651  -0.001623
9   57.96725187e-05              0.0 -0.018119  0.007824   0.065776
10  -1.16298988e-02  39.09823515e-05 -0.000074  0.000349   0.000000
11  98.75625162e-04  -3.09439948e-02 -0.065299  0.120000  -0.000393
12  72.61861151e-05  -3.80715238e-04  0.000468  0.001250   0.000000
13    100.0000  32.   00000   20   1       NaN  

     1200.0000  58.   00000   15   1          Amp3      Amp4      Amp5
0   48.89275870e-01              0.0 -3.113427e+00 -0.217449  2.844879
1               0.0  17.69282872e-04  5.920487e-03 -0.002372 -0.007259
2   24.60078692e-06              0.0 -8.739576e-04  0.006886  0.000010
3   -2.85395767e-03  14.67239506e-07  5.585084e-07  0.000013  0.000000
4   -4.52205102e-04  10.65348418e-04  3.543066e-04 -0.001471  0.000004
5   24.53515352e-07  11.05234373e-09  4.181819e-07  0.000001  0.000000
6    1200.0000  58.   00000   15   1           NaN       NaN       NaN
7   11.85288228e+00              0.0 -7.764261e+00 -0.200828  5.944846
8               0.0  46.64392453e-04  1.548346e-02 -0.004410 -0.013447
9   37.51199251e-06              0.0 -1.971287e-03  0.014987  0.000780
10  -8.23818915e-03  12.44273631e-06  5.359461e-06  0.000023  0.000000
11  -1.92876280e-03  32.38575632e-04  1.441273e-03 -0.003031  0.000007
12  59.03381925e-07  26.64731657e-07  1.716211e-06  0.000002  0.000000
13   1

    23370.0000 620.   00000   15   1          Amp3          Amp4          Amp5
0   45.55152076e-02              0.0 -7.087189e-01  3.081424e-02  1.884883e+00
1               0.0  13.24746148e-05  1.857883e-05 -4.642349e-04  1.066177e-04
2   31.80665978e-08              0.0  1.378059e-04 -2.339437e-05 -5.117787e-04
3   26.49527817e-05  20.27254202e-08 -2.530460e-08  2.888766e-07  0.000000e+00
4   58.15816763e-07  -3.23891623e-06  3.905688e-04 -1.062246e-04 -1.677364e-07
5   29.59507649e-09  -2.09441822e-07 -2.200433e-08  6.163622e-07  0.000000e+00
6   23370.0000 620.   00000   15   1           NaN           NaN           NaN
7   32.25369011e-02              0.0 -7.312486e-01 -1.733352e-02  2.352832e+00
8               0.0  23.25250554e-05 -1.406697e-06 -6.875106e-04  6.824439e-05
9   34.37777416e-08              0.0  2.230159e-04 -4.933377e-05 -6.550995e-04
10  21.90933803e-05  26.75079203e-08 -2.284638e-08  2.685391e-07  0.000000e+00
11  -1.09483260e-04  11.38073216e-06  2.882552e-04 -

     2420.0000 188.   00000   15   1          Amp3          Amp4          Amp5
0   20.12019898e-02              0.0 -1.835968e-01  2.809926e-02  2.012528e-01
1               0.0  94.06638600e-06  1.341175e-04 -8.198176e-05 -1.768216e-04
2   39.48629913e-08              0.0 -1.080047e-05  1.300843e-04  1.779661e-05
3   -9.92232638e-05  18.43185758e-08  3.216951e-09  2.508157e-07  0.000000e+00
4   -8.02639658e-05  -6.96227220e-06  7.159191e-05  5.893907e-06 -1.381604e-07
5   65.50657901e-09  -1.74758059e-07  6.495418e-08  4.189978e-07  0.000000e+00
6    2420.0000 188.   00000   15   1           NaN           NaN           NaN
7   17.13955884e-02              0.0 -1.511578e-01  2.838166e-02  1.803248e-01
8               0.0  56.04307948e-06  1.443237e-04 -5.621494e-06 -2.035609e-04
9   42.50769351e-08              0.0 -2.156194e-05  1.048794e-04  2.801268e-05
10  -6.48949534e-05  14.06904924e-08  2.165514e-08  1.812756e-07  0.000000e+00
11  -3.24114169e-05  -1.24681561e-05  1.083827e-05 -

    11590.0000 620.   00000   15   1          Amp3          Amp4          Amp5
0   48.86751535e-02              0.0 -1.693268e-01  3.333456e-02  1.393859e-01
1               0.0  -6.29959189e-04 -4.681675e-04  3.375843e-04  2.367237e-04
2   23.35385300e-07              0.0 -1.077556e-04 -1.102810e-04  7.457896e-06
3   52.17773656e-06  48.18937631e-08  1.421939e-10  2.252866e-07  0.000000e+00
4   16.02031696e-05  29.49088831e-05 -7.353142e-05 -6.263112e-05 -8.012454e-07
5   -1.61400751e-08  -2.01822074e-07 -1.331396e-08  5.362226e-07  0.000000e+00
6   11590.0000 620.   00000   15   1           NaN           NaN           NaN
7   25.13768889e-01              0.0 -8.007939e-01  2.591884e-01  4.765780e-01
8               0.0  -3.96731824e-04  2.540043e-04  3.150405e-04  8.585859e-06
9   62.63533052e-08              0.0  1.468009e-04  1.067885e-03  1.455471e-04
10  -2.96219890e-04  45.39901208e-08 -8.011941e-08  8.836651e-07  0.000000e+00
11  28.68137686e-05  25.83701563e-05 -1.087912e-04 -

In [49]:
#listing of the Fourier Coefficient files (FC) files.
#These are Binary so we don't need to come up with the range of the values.
fcList = glob.glob(os.path.join(mtStationPath, '*.fc?'),  recursive=True)
#fcList
for i in range(len(fcList)):
  splitFcList = fcList[i].split('\\')
  fileListing = fileListing + '\t\t\t\t\t\t' + splitFcList[len(splitFcList) - 1] + '\n'

print ('File ListingfileListing:\n' + fileListing)

File ListingfileListing:
						USA-Arkansas-Buffalo_River-2017-AMT070.edi
						BF6-9621.RSP
						BF6-9624.RSP
						BF6-9625.RSP
						EF-9515X.RSP
						EF-9515Y.RSP
						readme.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC6_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC6_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC6_03.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC7_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC7_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC8_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC8_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC9_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC9_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FCA_AA.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FCA_AB.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FCB_AC.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FCB_AD.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FCC_AE.txt
						USA-Arkansas-Buffalo_Riv

In [50]:
# Create the Entity and Attributes for the FC listing
for i in range(len(fcList)):
    splitFcList = fcList[i].split('\\')
    fcEandA = '\t\t<detailed>\n\t\t<enttyp>\n\t\t\t<enttypl>Binary File ' + splitFcList[len(splitFcList) - 1] + '</enttypl>\n' \
        + '\t\t\t<enttypd>Raw Time Series Binary Data File. The header information for this file contains field notes including survey identifier, personnel, date/time, weather, indication of a remote MT site, additional comments and sensor configuration for the electric and magnetic field channels.</enttypd>' \
        + '\n\t\t\t<enttypds>U.S. Geological Survey</enttypds>\n\t\t</enttyp>' \
        + '\n<attr>\n\t\t\t<attrlabl>Data Value</attrlabl>\n\t\t\t<attrdef>Binary Data Value</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>\n\t\t\t<attrdomv>'\
        + '\n\t\t\t\t<udom>Binary Data Value</udom>\n\t\t\t\t</attrdomv>' \
        + '\t\t</attr>\n\t</detailed>\n'
    print (fcEandA)
ALL_EandA = ALL_EandA + fcEandA    
        
        
        
        
        

		<detailed>
		<enttyp>
			<enttypl>Binary File AR107A1.FC6</enttypl>
			<enttypd>Raw Time Series Binary Data File. The header information for this file contains field notes including survey identifier, personnel, date/time, weather, indication of a remote MT site, additional comments and sensor configuration for the electric and magnetic field channels.</enttypd>
			<enttypds>U.S. Geological Survey</enttypds>
		</enttyp>
<attr>
			<attrlabl>Data Value</attrlabl>
			<attrdef>Binary Data Value</attrdef>
			<attrdefs>U.S. Geological Survey</attrdefs>
			<attrdomv>
				<udom>Binary Data Value</udom>
				</attrdomv>		</attr>
	</detailed>

		<detailed>
		<enttyp>
			<enttypl>Binary File AR107A1.FC7</enttypl>
			<enttypd>Raw Time Series Binary Data File. The header information for this file contains field notes including survey identifier, personnel, date/time, weather, indication of a remote MT site, additional comments and sensor configuration for the electric and magnetic field channels.<

In [51]:
#listing of the Time Series TS? files
#Now finally add the processed ASCII text files to the list
tsList = glob.glob(os.path.join(mtStationPath, '*.ts?'),  recursive=True)
#tsList
for i in range(len(tsList)):
  splitTsList = tsList[i].split('\\')
  fileListing = fileListing + '\t\t\t\t\t\t' + splitTsList[len(splitTsList) - 1] + '\n'

print ('File ListingfileListing:\n' + fileListing)

File ListingfileListing:
						USA-Arkansas-Buffalo_River-2017-AMT070.edi
						BF6-9621.RSP
						BF6-9624.RSP
						BF6-9625.RSP
						EF-9515X.RSP
						EF-9515Y.RSP
						readme.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC6_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC6_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC6_03.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC7_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC7_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC8_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC8_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC9_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC9_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FCA_AA.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FCA_AB.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FCB_AC.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FCB_AD.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FCC_AE.txt
						USA-Arkansas-Buffalo_Riv

In [52]:
# Create the Entity and Attributes for the Time series .TS? listing
for i in range(len(tsList)):
    splitTsList = tsList[i].split('\\')
    tsEandA = '\t\t<detailed>\n\t\t<enttyp>\n\t\t\t<enttypl>Binary File ' + splitTsList[len(splitTsList) - 1] + '</enttypl>\n' \
        + '\t\t\t<enttypd>Raw Time Series Binary Data File. The header information for this file contains field notes including survey identifier, personnel, date/time, weather, indication of a remote MT site, additional comments and sensor configuration for the electric and magnetic field channels.</enttypd>' \
        + '\n\t\t\t<enttypds>U.S. Geological Survey</enttypds>' \
        + '\n\t\t</enttyp>\n\t\t<attr>\n\t\t\t<attrlabl>Data Value</attrlabl>\n\t\t\t<attrdef>Binary Data Value</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>\n\t\t\t<attrdomv>'\
        + '\n\t\t\t\t<udom>Binary Data Value</udom>\n\t\t\t\t</attrdomv>' \
        + '\t\t</attr>\n\t</detailed>\n'
    print (tsEandA)
ALL_EandA = ALL_EandA + tsEandA

		<detailed>
		<enttyp>
			<enttypl>Binary File AR107A1.TS1</enttypl>
			<enttypd>Raw Time Series Binary Data File. The header information for this file contains field notes including survey identifier, personnel, date/time, weather, indication of a remote MT site, additional comments and sensor configuration for the electric and magnetic field channels.</enttypd>
			<enttypds>U.S. Geological Survey</enttypds>
		</enttyp>
		<attr>
			<attrlabl>Data Value</attrlabl>
			<attrdef>Binary Data Value</attrdef>
			<attrdefs>U.S. Geological Survey</attrdefs>
			<attrdomv>
				<udom>Binary Data Value</udom>
				</attrdomv>		</attr>
	</detailed>

		<detailed>
		<enttyp>
			<enttypl>Binary File AR107A2.TS1</enttypl>
			<enttypd>Raw Time Series Binary Data File. The header information for this file contains field notes including survey identifier, personnel, date/time, weather, indication of a remote MT site, additional comments and sensor configuration for the electric and magnetic field channels

In [53]:
#listing of the BP? files
#Now finally add the processed ASCII text files to the list
bpList = glob.glob(os.path.join(mtStationPath, '*.bp?'),  recursive=True)
#bpList
for i in range(len(bpList)):
  splitBpList = bpList[i].split('\\')
  fileListing = fileListing + '\t\t\t\t\t\t' + splitBpList[len(splitBpList) - 1] + '\n'

print ('File ListingfileListing:\n' + fileListing)

File ListingfileListing:
						USA-Arkansas-Buffalo_River-2017-AMT070.edi
						BF6-9621.RSP
						BF6-9624.RSP
						BF6-9625.RSP
						EF-9515X.RSP
						EF-9515Y.RSP
						readme.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC6_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC6_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC6_03.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC7_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC7_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC8_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC8_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC9_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FC9_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FCA_AA.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FCA_AB.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FCB_AC.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FCB_AD.txt
						USA-Arkansas-Buffalo_River-2017-AMT070-FCC_AE.txt
						USA-Arkansas-Buffalo_Riv

In [54]:
# Create the Entity and Attributes for the .BP? listing

#As before, we read in the list of files

for i in range(len(bpList)):
#for i in range(3):
    #if rspList[i].find('BF') > 0: #Do this if the file has a BF style Fixed Width Format
        dfBP = pd.read_fwf(bpList[i], widths=[15,15,15,15,15], skiprows=27, parse_dates=True).rename(columns={'4.3945   1.':'Amp1', '85697    1   2':'Amp2', 'Unnamed: 2':'Amp3', 'Unnamed: 3':'Amp4', 'Unnamed: 4':'Amp5'})
#Now strip out the header rows in the data as to not screw up calculating the range
        #dfSD = dfSD[dfSD.Amp4 != '00000   20   1']
        print (dfBP)        


for i in range(len(bpList)):
    splitBpList = tsList[i].split('\\')
    bpEandA = '\t\t<detailed>\n\t\t<enttyp>\n\t\t\t<enttypl>Binary File ' + splitBpList[len(splitBpList) - 1] + '</enttypl>\n' \
        + '\t\t\t<enttypd>Raw Time Series Binary Data File. The header information for this file contains field notes including survey identifier, personnel, date/time, weather, indication of a remote MT site, additional comments and sensor configuration for the electric and magnetic field channels.</enttypd>' \
        + '\n\t\t\t<enttypds>U.S. Geological Survey</enttypds>\n\t\t</enttyp>' \
        + '\n<attr>\n\t\t\t<attrlabl>Data Value</attrlabl>\n\t\t\t<attrdef>Binary Data Value</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>\n\t\t\t<attrdomv>'\
        + '\n\t\t\t\t<udom>Binary Data Value</udom>\n\t\t\t\t</attrdomv>' \
        + '\t\t</attr>\n\t</detailed>\n'
    print (bpEandA)
    
ALL_EandA = ALL_EandA +  bpEandA

               Amp1             Amp2          Amp3      Amp4          Amp5
0   26.46152775e-03              0.0 -1.621413e-02 -0.000732  3.660247e-02
1               0.0  -2.12743708e-05  1.611640e-05  0.000126 -4.652991e-06
2   11.66422483e-06              0.0 -2.843110e-04  0.000043 -1.644689e-04
3   62.20732719e-06  -1.43688475e-06  2.604866e-06  0.000015  0.000000e+00
4   36.64344858e-06  19.39807701e-05 -4.614082e-04  0.000045  1.418937e-06
5   -7.21114448e-06  78.83054831e-08 -5.324942e-06  0.000051  0.000000e+00
6       7.3242   3.   83474    1   4           NaN       NaN           NaN
7   53.16958381e-03              0.0 -2.509836e-02  0.006166  6.656381e-02
8               0.0  13.82783626e-06  2.169803e-04  0.000446 -2.535260e-04
9   10.58956726e-06              0.0 -1.027067e-03  0.000381  2.884323e-04
10  -1.61539904e-05  -2.68364885e-07  5.318863e-06  0.000027  0.000000e+00
11  23.62957838e-05  -1.60336919e-04 -1.093120e-03  0.000408 -1.993054e-06
12  -2.85674004e-06  -1.6

# Populate Metadata Template

In [55]:
#Load XML Metadata Template File and Read It
metaData = os.path.join(mtMetaDataTemplatePath, mtMetaDataTemplateName)
print ('Metadata path: ' + mtMetaDataTemplatePath + '\n')
xmlTemplateFile = open(metaData, 'r')
metaDataContent = xmlTemplateFile.readlines()
print(metaDataContent)
xmlTemplateFile.close()


Metadata path: C:\CurrentWork\DataReleases\Arkansas AMT data release

['<?xml version="1.0" encoding="UTF-8"?>\n', '<metadata>\n', '\t<idinfo>\n', '\t\t<citation>\n', '\t\t\t<citeinfo>\n', '\t\t\t\t<origin>Rodriguez, B. D.</origin>\n', '\t\t\t\t<origin>Brown, P. J.</origin>\n', '\t\t\t\t<origin>Hudson, M. R.</origin>\n', '\t\t\t\t<pubdate>2018</pubdate>\n', '\t\t\t\t<title>{title}</title>\n', '\t\t\t\t<edition>1</edition>\n', '\t\t\t\t<geoform>ASCII and Binary Digital Data</geoform>\n', '\t\t\t\t<pubinfo>\n', '\t\t\t\t\t<pubplace>Denver, CO</pubplace>\n', '\t\t\t\t\t<publish>U.S. Geological Survey</publish>\n', '\t\t\t\t</pubinfo>\n', '\t\t\t\t<othercit>Additional information about Originators:Rodriguez, B.D., http://orcid.org/0000-0002-2263-611X; Brown, P.J., http://orcid.org/0000-0002-2415-7462</othercit>\n', '\t\t\t\t<onlink>https://doi.org/10.5066/P9CIAXC5</onlink>\n', '\t\t\t</citeinfo>\n', '\t\t</citation>\n', '\t\t<descript>\n', '\t\t\t<abstract>This dataset includes audio-magne

In [56]:
# Replace values of current metadata template with the appropriate values.  
# All of this input should have been defined when going through the steps outlined above.
lineString = ''
newMetaDataContent = metaDataContent
splitFileName = ediList[0].split('.')
myfilename = splitFileName[0] + '.xml'
xmlFile = open(myfilename,"w+")
print(myfilename)
#print(keywords.value)
for i in range(len(metaDataContent)):
    lineString = metaDataContent[i]
    if lineString.find('{title}'):
     lineString = lineString.replace('{title}', citTit + '; Station ' + drTitle)
    
    if lineString.find('{abstract}'):
     lineString = lineString.replace('{abstract}', purposeClean)
    
    if lineString.find('{purpose}'):
     lineString = lineString.replace('{purpose}', descriptionClean)
    
    if lineString.find('{BeginFileListingHere}'):
     lineString = lineString.replace('{BeginFileListingHere}', fileListing)
    
    if lineString.find('{keywords}'):
     lineString = lineString.replace('{keywords}', keywords.value)
    
    if lineString.find('{begdate}'):
     lineString = lineString.replace('{begdate}', begdate)
    
    if lineString.find('{enddate}'):
     lineString = lineString.replace('{enddate}', enddate)
    
    if lineString.find('{SiteLon}'):
     lineString = lineString.replace('{SiteLon}', sitLon)
    
    if lineString.find('{SiteLon}'):
     lineString = lineString.replace('{SiteLat}', sitLat)
    
    if lineString.find('{EandA}'):
     lineString = lineString.replace('{EandA}', ALL_EandA)
   
    # {county}

    else:
     lineString = lineString
    xmlFile.write(lineString)
    
    #print (lineString)
     
    
    
#for r in (metaDataContent):
    #newMetaDataContent = metaDataContent.replace('{title}', drTitle)
    #newMetaDataContent = metaDataContent.replace('{keywords}', keywords.value)
xmlFile.close()

print ('Creation of new metadata file is complete\n\n') 
#Load EDI File and Read It
##checkFile = open(open(myfilename, 'r')
##checkFileContent = checkFile.read()
##checkFile.close()
##print(checkFileContent)

C:\CurrentWork\DataReleases\Arkansas AMT data release\AMT070\USA-Arkansas-Buffalo_River-2017-AMT070.xml
Creation of new metadata file is complete




### Check this file to see if it is valid against the FGDC metadata standard (FGDC-STD-001-1998)

## https://mrdata.usgs.gov/validation/

In [57]:
# Show the resulting child xml metadata file example 
#for i in range(len(newMetaDataContent)):
print (newMetaDataContent)

['<?xml version="1.0" encoding="UTF-8"?>\n', '<metadata>\n', '\t<idinfo>\n', '\t\t<citation>\n', '\t\t\t<citeinfo>\n', '\t\t\t\t<origin>Rodriguez, B. D.</origin>\n', '\t\t\t\t<origin>Brown, P. J.</origin>\n', '\t\t\t\t<origin>Hudson, M. R.</origin>\n', '\t\t\t\t<pubdate>2018</pubdate>\n', '\t\t\t\t<title>{title}</title>\n', '\t\t\t\t<edition>1</edition>\n', '\t\t\t\t<geoform>ASCII and Binary Digital Data</geoform>\n', '\t\t\t\t<pubinfo>\n', '\t\t\t\t\t<pubplace>Denver, CO</pubplace>\n', '\t\t\t\t\t<publish>U.S. Geological Survey</publish>\n', '\t\t\t\t</pubinfo>\n', '\t\t\t\t<othercit>Additional information about Originators:Rodriguez, B.D., http://orcid.org/0000-0002-2263-611X; Brown, P.J., http://orcid.org/0000-0002-2415-7462</othercit>\n', '\t\t\t\t<onlink>https://doi.org/10.5066/P9CIAXC5</onlink>\n', '\t\t\t</citeinfo>\n', '\t\t</citation>\n', '\t\t<descript>\n', '\t\t\t<abstract>This dataset includes audio-magnetotelluric (AMT) sounding data collected in August 2017 in the Buffalo