## SquamataAssemballyAMT - Jupyter notebook for batch releasing Audio Magnetotellic (AMT) data to ScienceBase

This module performs the following operations:
- Create list of data directories.
- Identify files accompanying data release.
- Create file listing for metadata XML markup.
- Identify and load MT EDI file.
- Clean up and reformat harvested values to be XML metadata complient.
- Create User Editable Keywords Listing
- Create entity and attribute XML markup.
- Poplulate metadata template
- Validate metadata; create error log; create HTML and FGDC Text versions of the metadata. (In development - use https://mrdata.usgs.gov/validation/ for validation in the interim)

Known issues needing repair:
- Fix procstep section; do we need this function to collect file information or do we plan on handeling this with boilerplate.

## Future development plans for SquamataSB

- Create all child metadata files from first example created in previous steps. 
- Batch upload files to ScienceBase.
- Batch remove files from ScienceBase. 
- Change ScienceBase parameters such as citation information, add orcid ids, add USGS CMS tags, etc. 

### Instructions
- Create a template xml format that contains boilerplate text common to all childeren in a data release.  Be sure this template contains the approriate curly bracket tags, {SquamataTagExample} used to populate the template using SquamataAMT.

### To execute a function/command select a cell and Hold-Shift + Press-Enter

**The 'r' signifies a string literal. Use for paths.**

Metadata wizard:  Advanced, Open In a jupyter Notebook?
Metadata Wizard 2.o from ScienceBase

In [224]:
# Phil Brown (pbrown@usgs.gov) 2019 Beta
# Working Python 3 Notebook used to facilitate the release of Audio Magnetotelluric (AMT) Data to ScienceBase.

In [225]:
# Test Cell
print ("Jupyter is working.") #To run this cell, hold down Shift and press Enter.

Jupyter is working.


In [226]:
# Load required Libraries
import sys
import os
import zipfile
import csv
#import pysb
import requests
import shutil
from shutil import copyfile
import zipfile
import datetime
import glob
from lxml import etree
import json
import pickle
import shutil
import fileinput
import json
import pandas as pd
import numpy as np
from IPython.core.display import display
from IPython.core.display import HTML
from lxml import etree
##from pymdwizard.core.xml_utils import XMLRecord
##from pymdwizard.core.xml_utils import XMLNode
import re
from ipywidgets import *
from IPython.display import display
from IPython.html.widgets import widgets
import datetime
import dateutil.parser
import time
from IPython.display import Javascript

# 1) Step One - Set Directory Paths
## Please set directory paths below
### Directory paths include
- Data Path: This is the path to the data, data structure should have a directory for each station
- Template Path: The path to the XML metadata template being used for the data.  This template should already include all information common to all child metadata files e.g. originators, larger work citation, etc.

In [227]:
#Set Data Paths - perhaps we'll get a user form to do this some day?
mtDataPath = r"C:\CurrentWork\DataReleases\Arkansas AMT data release" #The 'r' signifies a string literal. Use for paths.
mtMetaDataTemplatePath = r"C:\CurrentWork\DataReleases\Arkansas AMT data release"
mtMetaDataTemplateName = "MT-MetaData_TEMPLATE.xml"

In [223]:
#Check Paths for the fun of it
print ('The MT Data Path is: ' + '"' + mtDataPath + '"')
mtMetaDataTemplatePath + "\\" + mtMetaDataTemplateName

The MT Data Path is: "C:\CurrentWork\DataReleases\Arkansas AMT data release"


'C:\\CurrentWork\\DataReleases\\Arkansas AMT data release\\MT-MetaData_TEMPLATE.xml'

## Now, let's explore our data. 
- What files do we have? 
- What files do we import values from?

In [213]:
#Explore data files and directory structure hosted below the provided provided parent data directory

In [207]:
#Produce directory listing of station (SB Object Children)
#Either set up the root directory with station subdirectories only or delete non-station directories from the list array
#mtDataDirList = os.walk(mtDataPath)
#mtDataDirList = [entry.path for entry in os.scandir(mtDataPath) if entry.is_dir()]
mtDataDirList = next(os.walk(mtDataPath))[1]
mtDataDirList

['AMT050', 'AMT070', 'AMT090', 'AMT115', 'AMT140', 'AMT170']

# We can select an indivdual station to test or run code for all stations in batch mode.  

In [200]:
#!!!!!!!!!Run this code and select A single station !!!!!!!!!
#!!!!!!!!!Skip the loop cell below and run cells sequentually !!!!!!!!!!
#!!!!!!!!!Use Shift-Enter to then execute each cell one by one!!!!!!!!!! 
mtStationPath = mtDataPath + '\\' + mtDataDirList[0]
mtStationPath

'C:\\CurrentWork\\DataReleases\\Arkansas AMT data release\\AMT050'

# -----------------------------------------------------------------------------------------------------------

In [264]:
#!!!!!!!!!!!!!!!!!!  Run this code to loop through ALL Data Sets !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
#!!!!!!!This code will produce metadata files for ALL stations in batch mode!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
#!!!!!!!!!!!!!!!!!!  Skip running this code if processing a single station !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
for i in range (0,len(mtDataDirList)):
    mtStationPath = mtDataPath + '\\' + mtDataDirList[i]
    from IPython.display import Javascript
    display(Javascript('IPython.notebook.execute_cells_below()'))

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# -----------------------------------------------------------------------------------------------------------

In [265]:

#Look for EDI file to load
ediList = glob.glob(os.path.join(mtStationPath, '**/*MT*.edi'),  recursive=True)
ediPath = ediList[0]
#ediList
print ('EDI File Path:\n' + ediPath)
ediPathArray = ediPath.split('\\')
ediFile = str(ediPathArray[len(ediPathArray)-1])
print ('EDI File:\n' + ediFile)    
#ediPathArray

EDI File Path:
C:\CurrentWork\DataReleases\Arkansas AMT data release\AMT170\USA-Arkansas-Buffalo_River-2017-AMT170.edi
EDI File:
USA-Arkansas-Buffalo_River-2017-AMT170.edi


## Harvest from the MT EDI file. 
### Parameters include:
- ProductId=USA-New_Mexico-Rio_Grande_Rift-San_Luis_Basin-2009-AMT01.edi
- ExternalUrl Url=https://doi.org/10.5066/F72F7MQ7
- Attachment Filename=https://pubs.usgs.gov/of/2011/1264/report/OF11-1264.pdf
- Survey Purpose Description: 
- Data Description:
- Citation Title=Audiomagnetotelluric data, Taos Plateau Volcanic Field, New Mexico
- Citation Authors=Chad E. Ailes, Brian D. Rodriguez
- Citation Year=2011
- YearCollected=2009
- Country=USA                                  
- Ellipsoid=Clarke 1866                                                          
- Location datum=NAD27 CONUS                                                     
- SITE LATITUDE=36.752985000                                                     
- SITE LONGITUDE=-105.560966167                                                  
- Elevation units="meters"=2608.00                                                                     
- Start=2009-07-21T19:52:03 UTC/GMT
- End=2009-07-21T20:34:20 UTC/GMT
- ProcessingTimeSeriesUsed:
         wp01A1.bp1                                                                     
         wp01A2.bp1                                                                     
         wp01A1.sd6                                                                     
         wp01A2.sd6                                                                     
         wp01A1.sd7                                                                     
         wp01A2.sd8                                                                     
         wp01A2_3.sd9 
- Entities and Attributes:
    - FREQUENCIES
    - IMPEDANCE ROTATION ANGLES
    - IMPEDANCES
    - TIPPER PARAMETERS
    - COMPUTED PARAMETERS


## Lets now import and index values from the EDI Files
- We need these values for the metadata template.  
- We also want to run stats on some of these values for the entity and attributes section

In [266]:
    #Load EDI File and Read It
    ediFile = open(ediPath, 'r')
    ediContent = ediFile.read()
    ediFile.close()
    print(ediContent)


>HEAD                                                                           
                                                                                
  DATAID="Buffalo Natl River"                                                   
  ACQBY=USGS                                                                    
  ACQDATE=2017-08-25
  STATE=Arkansas                                                                
  COUNTY=Boone                                                                  
  UNITS=M                                                                       
  STDVERS=1.0                                                                   
  PROGVERS=GEOTOOLS_2.3                                                         
  PROGDATE=09/16/94                                                             
                                                                                
>INFO   MAXLINES=1000                                                           
       

In [267]:
#Now assign values to the SB MetaDataWizard Template unknowns
list_ = ediContent.splitlines()
list_length = len (list_)

# there are probally easier ways to loop through the below but I like having it all hard coded upfront
# it's easire to track an change for me
# Use the examples provided below to extract additional parameters
# Not that all variables being collected are not necessarily used in populating the template.
# Note that values can be hardcoded into the metadata xml template and/or harvested from the edi file

for X in list_:
  if "ProductId" in X:
    productArray = X.split('=')
    productIdArray = productArray[1].split('.')
    productId = productIdArray[0]
    # We may want to reformat this are parse out this name further for use with a root name based on the Data Release Title?
    productId = productId.replace("-", " ")
    productId = productId.replace("_", " ")
    drTitle = productId
    print ('Child Title: ' + productId)
  if "ExternalUrl Url" in X:
    externalURLArray = X.split('=')
    externalURL = externalURLArray[1]
    print ('<onlink>: ' + externalURL)
  if "STATE" in X:
    stateArray = X.split('=')
    state = stateArray[1].replace('"', "") #remove quotes around state
    print ('State: ' + state)
  if "COUNTY" in X:
    countyArray = X.split('=')
    county = countyArray[1]
    print ('County: ' + county)
  if "Start" in X:
    startArray = X.split('=')
    start = startArray[1]
    print ('Start: ' + start)
  if "End" in X:
    endArray = X.split('=')
    end = endArray[1]
    print ('End: ' + end)
  if "Attachment Filename" in X and "http" in X:
    lgwrklinkArray = X.split('=')
    lgwrklink = lgwrklinkArray[1]
    print ('Attachment Filename Link: ' + lgwrklink)
  if "Citation Title" in X:
    citTitArray = X.split('=')
    citTit = citTitArray[1]
    print ('Citation Title: ' + citTit)
  if "Citation Authors" in X:
    citNamesArray = X.split('=')
    citAuthorsArray = citNamesArray[1].split(',')
    for author in citAuthorsArray:
     author = author.strip()
     print ('Author: '+ author)
  if "Citation Year" in X:
    citYearArray = X.split('=')
    citYear = citYearArray[1]
    print ('Citation Year: ' + citYear)
  if "YearCollected" in X:
    yearColArray = X.split('=')
    yearCol = yearColArray[1]
    print ('Year Collected: ' + yearCol)
  if "Ellipsoid" in X:
    ellipsoidArray = X.split('=')
    ellipsoid = ellipsoidArray[1]
    print ('Ellipsoid: ' + ellipsoid)
  if "Location datum" in X:
    locDatumArray = X.split('=')
    locDatum = locDatumArray[1]
    print ('Local datum: ' + locDatum)
  if "SITE LATITUDE" in X:
    sitLatArray = X.split('=')
    sitLat = sitLatArray[1] # !!! probally need to reformat this to have only 6 significant digits Also need to trim extra spaces !!!
    print ('Site latitude: ' + sitLat)
  if "SITE LONGITUDE" in X:
    sitLonArray = X.split('=')
    sitLon = sitLonArray[1] # !!! probally need to reformat this to have only 6 significant digits Also need to trim extra spaces !!!
    print ('Site longitude: ' + sitLon)
  if "Elevation units" in X:
    elevationStringArray = X.split('=')
    siteElevation = elevationStringArray[2] 
    print ('Site Elevation: ' + siteElevation)
    elevationUnits = elevationStringArray[1].replace('"', "")
    print ('Elevation Units: ' + elevationUnits)
    
# Code below returns values that occupy more than one line
    
for i in range(list_length):
 value = list_[i] 
 if value.replace(" ", "") == 'SurveyPurposeDescription:':
   startIndPurpose = i + 1
   #print ('startIndPurpose: ' + str(startIndPurpose))
 if value.replace(" ", "") == 'DataDescription:':
   endIndPurpose = i - 1
   #print ('endIndPurpose: ' + str(endIndPurpose))
purpose = list_[startIndPurpose]
for j in range(startIndPurpose + 1,endIndPurpose): 
    purpose = purpose + list_[j]
    purposeClean = re.sub(' +', ' ',purpose)
print ('\nAbstract:\n\t' + purposeClean)

for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == 'DataDescription:':
   startIndDescription = k + 1
   #print ('startIndDescription: ' + str(startIndDescription))
 if value.replace(" ", "") == 'FILECREATOR:':
   endIndDescription = k - 9
   #print ('endIndDescription: ' + str(endIndDescription))
description = list_[startIndDescription]
for l in range(startIndDescription + 1,endIndDescription): 
    description = description + list_[l]
    descriptionClean = re.sub(' +', ' ',description)
print ('\nPurpose:\n\t' + descriptionClean)
    

State: Arkansas                                                                
County: Boone                                                                  
Child Title: USA Arkansas Buffalo River 2017 AMT170
<onlink>: https://doi.org/10.5066/P9CIAXC5 
Citation Title: Audiomagnetotelluric data, Buffalo River watershed, Arkansas, 2017
Author: Brian D. Rodriguez and Mark R. Hudson
Citation Year: 2018                                                             
Year Collected: 2017
Ellipsoid: Clarke 1866                                                          
Local datum: NAD27 CONUS                                                     
Site latitude: 36.12309                                                         
Site longitude: -93.27205                                                       
Site Elevation: 634.66                                                
Elevation Units: meters
Start: 2017-08-25T18:22:52 UTC/GMT
End: 2017-08-25T22:38:48 UTC/GMT  
Start: 2017-08-25T18:22:52 

In [268]:
#Reformat product ID to more correctly reflect child (station) name
arrChildID = productId.split(' ')
strChildID = arrChildID[-1]
strChildID


'AMT170'

In [269]:
# Now let's format the start time and end time to be what the XML file wants for <begdate> and <enddate>

begdateArr = start.split(' ')
begdate_str = begdateArr[0]
begdate_obj = dateutil.parser.parse(begdate_str)
begdate = begdate_obj.strftime('%Y%m%d')
print('<begdate> ', begdate) 

enddateArr = end.split(' ')
enddate_str = enddateArr[0]
enddate_obj = dateutil.parser.parse(enddate_str)
enddate = enddate_obj.strftime('%Y%m%d')
print('<enddate> ', enddate) 


<begdate>  20170825
<enddate>  20170825


In [270]:
#Now we reformat the lat  and longitude to 6 sig figs as well as trim of any extra spaces
#Brian seems to be stripping out the sig figs now so there is no need.  
#Also may whant to round up instead of just stripping values?

sitLat = sitLat.strip()
##sitLat = sitLat[:-3] 
sitLon = sitLon.strip()
##sitLon = sitLon[:-3]

print ('Site latitude: ' + sitLat)
print ('Site longitude: ' + sitLon)

Site latitude: 36.12309
Site longitude: -93.27205


In [271]:
# Now Reformat county by trimming the extra spaces
county = county.strip()
print ('County: ' + county)

County: Boone


In [272]:
## Create editable keywords example.  
## Example text is created after running this cell
## This text is displayed by running "display(keywords) below
keywords = widgets.Textarea(
    value='\t\t<keywords>\n\t\t\t<theme>\n\t\t\t\t<themekt>ISO 19115 Topic Category</themekt>' \
    + '\n\t\t\t\t<themekey>biota</themekey>\n\t\t\t</theme>\n\t\t\t<theme>\n\t\t\t\t<themekt>None</themekt>' \
    + '\n\t\t\t\t<themekey>impedance</themekey>\n\t\t\t\t<themekey>tipper</themekey>' \
    + '\n\t\t\t\t<themekey>apparent resistivity</themekey>\n\t\t\t\t<themekey>impedance phase</themekey>' \
    + '\n\t\t\t\t<themekey>impedance strike</themekey>\n\t\t\t\t<themekey>MT</themekey>' \
    + '\n\t\t\t\t<themekey>audiomagnetotelluric</themekey>\n\t\t\t\t<themekey>magnetotelluric</themekey>' \
    + '\n\t\t\t\t<themekey>AMT</themekey>\n\t\t\t\t<themekey>sounding</themekey>' \
    + '\n\t\t\t\t<themekey>Geology, Geophysics, and Geochemistry Science Center</themekey>' \
    + '\n\t\t\t\t<themekey>GGGSC</themekey>\n\t\t\t\t<themekey>Mineral Resources Program</themekey>' \
    + '\n\t\t\t\t<themekey>MRP</themekey>\n\t\t\t</theme>\n\t\t\t<theme>\n\t\t\t\t<themekt>USGS Thesaurus</themekt>' \
    + '\n\t\t\t\t<themekey>Magnetic field (earth)</themekey>\n\t\t\t\t<themekey>Geophysics</themekey>' \
    + '\n\t\t\t\t<themekey>GPS measurement</themekey>\n\t\t\t\t<themekey>Electromagnetic surveying</themekey>' \
    + '\n\t\t\t\t<themekey>Magnetic surveying</themekey>\n\t\t\t</theme>\n\t\t\t<place>' \
    + '\n\t\t\t\t<placekt>USGS Geographic Names Information System (GNIS), https://geonames.usgs.gov</placekt>' \
    + '\n\t\t\t\t<placekey>Colorado</placekey>\n\t\t\t\t<placekey>Silverton</placekey>' \
    + '\n\t\t\t\t<placekey>' + county + ' County</placekey>\n\t\t\t</place>\n\t\t</keywords>',
    placeholder='Type something',
    #description='String:',
    layout=Layout(width='100%', height='666px'),
    disabled=False
)
print ('Keywords list created.')

Keywords list created.


### Change the text in the textbox below to relflect what should be included as the key words for all child items

Note that changing the text below at any time creates a keywords section of the metadata seen EXACTLY as it is shown below

In [273]:
# Run this cell for key word text to edit.  
# Edit the text in place.  
# When complete move on to the next step

display(keywords)

### The module below collects the file date property for use in creating the process description dates

In [274]:
# module to get file process dates for process descriptions
# Visit https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior for time format standards

# Set {CollectionDate} to the station end date
strCollectionDate = enddate
print ("{CollectionDate} " + strCombinationDate)

# {CombinationDate} Get the Time stamp for the AVG file
# First we need to get the file path
avgList = glob.glob(os.path.join(mtStationPath, '*.AVG'),  recursive=True)
#print (avgList)
avgTime = time.ctime(os.path.getctime(avgList[0]))
print ("avgTime: " + avgTime)
avgTime_obj = datetime.datetime.strptime(avgTime,'%a %b %d %H:%M:%S %Y')
strCombinationDate = datetime.datetime.strftime(avgTime_obj,'%Y%m%d')
print ("{CombinationDate} " + strCombinationDate)


# {ConversionDate} is the date of the older edi file, this is the second file in the edi list
# No that isn't it, I filtered to get the newer edi file, oh I'll just find the file 
# in an array of edi files that isn't the other one

ediAllList = glob.glob(os.path.join(mtStationPath, '*.edi'),  recursive=True)
#print (ediAllList)
for i in range (0,len(ediAllList)):
  if ediAllList[i] != ediPath:
    OriginalEDIPath = ediAllList[i]
os.path.getctime(OriginalEDIPath)
oediTime = time.ctime(os.path.getctime(OriginalEDIPath))
print ("oediTime: " + oediTime)
oediTime_obj = datetime.datetime.strptime(oediTime,'%a %b %d %H:%M:%S %Y')
strConversionDate = datetime.datetime.strftime(oediTime_obj,'%Y%m%d')
print ("{ConversionDate} " + strConversionDate)

# {RotationDate} Get the Time stamp for the AVG file
# First we need to get the file path
dmpList = glob.glob(os.path.join(mtStationPath, '*.dmp'),  recursive=True)
#print (dmpList)
dmpTime = time.ctime(os.path.getctime(dmpList[0]))
print ("dmpTime: " + dmpTime)
dmpTime_obj = datetime.datetime.strptime(dmpTime,'%a %b %d %H:%M:%S %Y')
strRotationDate = datetime.datetime.strftime(dmpTime_obj,'%Y%m%d')
print ("{RotationDate} " + strRotationDate)

#{HarvestDate} = final edi file 
os.path.getctime(ediPath)
filetime = time.ctime(os.path.getctime(ediPath))
print ("Harvest Time: " + filetime)
filetime_obj = datetime.datetime.strptime(filetime,'%a %b %d %H:%M:%S %Y')
strHarvestDate = datetime.datetime.strftime(filetime_obj,'%Y%m%d')
print ("{HarvestDate} " + strHarvestDate)

{CollectionDate} 20181221
avgTime: Fri Dec 21 10:41:16 2018
{CombinationDate} 20181221
oediTime: Fri Dec 21 10:41:16 2018
{ConversionDate} 20181221
dmpTime: Fri Dec 21 10:41:16 2018
{RotationDate} 20181221
Harvest Time: Fri Dec 21 10:41:14 2018
{HarvestDate} 20181221


Entity and Attribute Values for the EDI file.  List !****FREQUENCIES****!,!****IMPEDANCE ROTATION ANGLES****!,!****IMPEDANCES****!,!****COMPUTED PARAMETERS****!

Here we load the frequencies
>!****FREQUENCIES****!

In [275]:
# Import entity and attributes - !****FREQUENCIES****! plan to break some of these individual chunks into objects/functions

# Get Range of Frequency Values in EDI File
for k in range(list_length):

 value = list_[k] 
 if value.replace(" ", "") == '>!****FREQUENCIES****!':
   startIndFrequencies = k + 2
   print ('startIndFrequencies: ' + str(startIndFrequencies))
 
 if value.replace(" ", "") ==  '>!****IMPEDANCEROTATIONANGLES****!':
   endIndFrequencies = k - 1
   print ('endIndFrequencies: ' + str(endIndFrequencies))

frequencyData = []
fdata = []
fdataTemp = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
frequencyDF = pd.DataFrame(fdata)
for j in range(startIndFrequencies,endIndFrequencies):
    fdataTemp = list_[j]
    fdataTemp = re.sub(' +', ' ',fdataTemp)
    fdataTemp = fdataTemp.split(" ")
    del fdataTemp[0]
    fdata = fdata + fdataTemp
    
print (fdata)  
fdata = np.array(fdata).astype(np.float) #convert String to floats
frequencyDF = pd.DataFrame(fdata,columns=['Frequencies'])
frequencyDF


startIndFrequencies: 276
endIndFrequencies: 283
['2.33700000E+04', '1.52900000E+04', '1.15900000E+04', '6.85000000E+03', '5.21000000E+03', '3.55000000E+03', '2.73000000E+03', '2.42000000E+03', '1.87000000E+03', '1.17000000E+03', '9.60000000E+02', '8.85000000E+02', '7.20000000E+02', '5.80000000E+02', '4.60000000E+02', '3.40000000E+02', '2.70000000E+02', '2.10000000E+02', '1.72399994E+02', '1.50000000E+02', '1.22099998E+02', '1.00000000E+02', '8.59400024E+01', '7.90000000E+01', '6.00600014E+01', '4.15000000E+01', '2.83199997E+01', '1.90400009E+01', '1.22100000E+01', '7.32399988E+00', '4.39400005E+00']


Unnamed: 0,Frequencies
0,23370.0
1,15290.0
2,11590.0
3,6850.0
4,5210.0
5,3550.0
6,2730.0
7,2420.0
8,1870.0
9,1170.0


In [276]:
# Now lets get the stats of the frequency data
#Make Array of Max Vallues
frequencyMax = frequencyDF[('Frequencies')].max()
print ('Max. Frequency: ' + str(frequencyMax))
frequencyMin = frequencyDF[('Frequencies')].min()
print ('Min. Frequency: ' + str(frequencyMin))

Max. Frequency: 23370.0
Min. Frequency: 4.39400005


Here we load the Impedance Rotation Angles
>!****IMPEDANCE ROTATION ANGLES****!

In [277]:
# Import entity and attributes - !****IMPEDANCE ROTATION ANGLES****! plan to break some of these individual chunks into objects/functions
# Get Range of IMPEDANCE ROTATION ANGLES in EDI File
for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == '>!****IMPEDANCEROTATIONANGLES****!':
   startIndROT = k + 3
   print ('startIndROT: ' + str(startIndROT))
 
 if value.replace(" ", "") ==  '>!****IMPEDANCES****!':
   endIndROT = k - 1
   print ('endIndROT: ' + str(endIndROT))

rdata = []
rdataTemp = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
rotationDF = pd.DataFrame(rdata)
for j in range(startIndROT,endIndROT):
    rdataTemp = list_[j]
    rdataTemp = re.sub(' +', ' ',rdataTemp)
    rdataTemp = rdataTemp.split(" ")
    del rdataTemp[0]
    rdata = rdata + rdataTemp
    
print (rdata)  
rdata = np.array(rdata).astype(np.float) #convert String to floats
rotationDF = pd.DataFrame(rdata,columns=['ZROT'])
rotationDF

startIndROT: 287
endIndROT: 293
['0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00']


Unnamed: 0,ZROT
0,0.0
1,0.0
2,0.0
3,0.0
4,0.0
5,0.0
6,0.0
7,0.0
8,0.0
9,0.0


In [278]:
# Now lets get the stats of the rotation data

#Make Array of Max Values
rotationMax = rotationDF[('ZROT')].max()
print ('Max. ZROT: ' + str(frequencyMax))

#Make Array of Min Values
rotationMin = rotationDF[('ZROT')].min()
print ('Min. ZROT: ' + str(frequencyMin))

Max. ZROT: 23370.0
Min. ZROT: 4.39400005


Here we load the impedances
>!****IMPEDANCES****!

In [279]:
# Import entity and attributes - !****IMPEDANCES****! plan to break some of these individual chunks into objects/functions
# Get Range of Impedance Values in EDI File
for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == '>!****IMPEDANCES****!':
   startIndImpedances = k + 1
   print ('startIndImpedances: ' + str(startIndImpedances))
 
 if value.replace(" ", "") ==  '>!****TIPPERPARAMETERS****!':
   endIndImpedances = k - 1
   print ('endIndImpedances: ' + str(endIndImpedances))

#Construct Array of Channel Headers   
count = 0
impedanceLabel = []
impedanceData = []
data = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
impedanceDF = pd.DataFrame(data)
for l in range(startIndImpedances,endIndImpedances): 
    if list_[l][0] == '>':
     temp = list_[l].split(" ", 1)
     #print (temp)
     impedanceLabel.append((temp[0].split(">"))[1])
     dataTemp = list_[l+1]
     for j in range(l+2,l+8):
      dataTemp = dataTemp + list_[j]
      dataTemp = re.sub(' +', ' ',dataTemp)
     data = dataTemp.split(" ")
     del data[0] # need to check for empty strings and delete these from the array of the string can't be converted to a float
     del data[len(data)-1] # need to check for empty strings and delete
     #print (data)
     data = np.array(data).astype(np.float) #convert String to floats
     se = pd.Series(data)
     print ((temp[0].split(">"))[1])   
     impedanceDF[((temp[0].split(">"))[1])] = se.values
    
    count = count + 1

#impedanceDF = pd.DataFrame(data, columns=(impedanceLabel))
impedanceDF
#data
#se 

startIndImpedances: 295
endIndImpedances: 402
ZXXR
ZXXI
ZXX.VAR
ZXYR
ZXYI
ZXY.VAR
ZYXR
ZYXI
ZYX.VAR
ZYYR
ZYYI
ZYY.VAR


Unnamed: 0,ZXXR,ZXXI,ZXX.VAR,ZXYR,ZXYI,ZXY.VAR,ZYXR,ZYXI,ZYX.VAR,ZYYR,ZYYI,ZYY.VAR
0,306.417297,153.6651,20182.2754,42.056553,23.067776,2207.79932,458.28772,263.788025,125620.938,124.106056,63.9813,13742.0488
1,-121.648033,1214.50964,277.478912,456.200684,-380.605255,617.505005,-910.950073,-2847.14136,776.005859,153.21582,816.279358,1726.93298
2,-194.413559,1003.82898,220.478546,492.204712,-324.573273,410.787109,-920.670837,-2231.74927,183.322174,24.663624,681.263,341.558777
3,-605.233093,1005.24805,355.032196,716.199707,-2.126833,191.005203,-750.646606,-2669.88989,379.128204,-369.631531,547.602661,203.968719
4,-865.367371,181.611008,1661.30859,895.711609,546.919495,1687.07324,-833.327698,-1716.62915,457.183594,-242.709686,381.736298,464.273926
5,-209.054016,-578.315247,1080.15234,378.869293,869.398865,1047.16187,-388.380219,-1503.98962,296.772919,-272.456329,491.295166,287.70871
6,73.385651,-284.886017,117.486847,111.841835,563.580872,158.889893,-199.908508,-1106.48889,37.973393,-274.891296,218.525635,51.355434
7,81.734467,-245.261032,63.382687,202.891602,578.881714,89.1838,-317.923737,-1074.58765,39.3088,-192.160568,377.716095,55.310184
8,116.643867,-105.666679,12.097741,119.268715,411.3078,13.46182,-207.066391,-857.970947,9.948493,-167.331741,199.614105,11.070232
9,108.668556,42.936996,261.296631,138.533844,264.574036,220.732117,-211.893753,-540.16095,214.133072,-160.275284,-237.92215,180.890366


In [280]:
# Now lets get the stats of the impedance data

# Make Array of Max Values
impedanceMax = []
for i in range (0,len(impedanceLabel)):
    impedanceMax.append(impedanceDF[(impedanceLabel[i])].max())
print ('Impedance Max: ' + str(impedanceMax))

# Make Array of Min Values
impedanceMin = []
for i in range (0,len(impedanceLabel)):
    impedanceMin.append(impedanceDF[(impedanceLabel[i])].min())
print ('Impedance Min: ' + str(impedanceMin))

Impedance Max: [306.417297, 1214.50964, 20182.2754, 895.711609, 869.398865, 4891.87061, 458.28772, 263.788025, 125620.938, 153.21582, 816.279358, 13742.0488]
Impedance Min: [-865.367371, -578.315247, 12.0977411, 40.4731903, -380.605255, 9.24014187, -920.670837, -2847.14136, 9.948493, -369.631531, -237.92215, 3.48604012]


Here we load the tipper parameters
>!****TIPPER PARAMETERS****!

In [281]:
# Import entity and attributes - !****TIPPER PARAMETERS****! plan to break some of these individual chunks into objects/functions
# Probably will need two functions for this - one for a single list and one for the long lists with more than one column

# Get Range of TIPPER PARAMETERS in EDI File
for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == '>!****TIPPERPARAMETERS****!':
   startIndTipper = k + 1
   print ('startIndTipper: ' + str(startIndTipper))
 
 if value.replace(" ", "") ==  '>!****COMPUTEDPARAMETERS****!':
   endIndTipper = k - 1
   print ('endIndTipper: ' + str(endIndTipper))

#Construct Array of Channel Headers   
count = 0
tipperLabel = []
tipperData = []
tdata = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
tipperDF = pd.DataFrame(tdata)
for l in range(startIndTipper,endIndTipper): 
    if list_[l][0] == '>':
     ttemp = list_[l].split(" ", 1)
     #print (ttemp)
     tipperLabel.append((ttemp[0].split(">"))[1])
     tdataTemp = list_[l+1]
     for j in range(l+2,l+8):
      tdataTemp = tdataTemp + list_[j]
      tdataTemp = re.sub(' +', ' ',tdataTemp)
      tdata = tdataTemp.split(" ")
     #print (tdata)
     del tdata[0]
     del tdata[len(tdata)-1] # need to check for empty strings and delete
     tdata = np.array(tdata).astype(np.float) #convert String to floats
     te = pd.Series(tdata)
     print ((ttemp[0].split(">"))[1])   
     tipperDF[((ttemp[0].split(">"))[1])] = te.values
    
    count = count + 1

#tipperDF = pd.DataFrame(tdata, columns=(tipperLabel))
tipperDF
#tdata
#te 

startIndTipper: 404
endIndTipper: 457
TXR.EXP
TXI.EXP
TXVAR.EXP
TYR.EXP
TYI.EXP
TYVAR.EXP


Unnamed: 0,TXR.EXP,TXI.EXP,TXVAR.EXP,TYR.EXP,TYI.EXP,TYVAR.EXP
0,-0.069348,-0.002339,0.000728,0.204662,0.010997,8e-05
1,0.264808,-0.20174,0.00018,-0.205061,0.02329,0.0004
2,0.421451,-0.144033,7e-06,-0.252255,-0.074843,1.3e-05
3,0.448583,0.398693,0.001469,-0.014735,-0.291666,0.00079
4,0.217084,0.272175,0.000761,0.160449,-0.208331,0.000773
5,0.251809,0.099355,4.4e-05,-0.070953,0.003048,4.3e-05
6,0.239689,0.124724,4.8e-05,-0.103358,-0.031085,6.5e-05
7,0.269647,0.159076,7.6e-05,-0.192334,-0.025541,0.000107
8,0.225318,0.172532,0.000175,-0.171354,-0.04041,0.000194
9,-0.033002,0.195693,0.002129,0.239712,-0.047049,0.001798


In [282]:
# Now lets get the stats of the tipper data

# Make Array of Max Values
tipperMax = []
for i in range (0,len(tipperLabel)):
    tipperMax.append(tipperDF[(tipperLabel[i])].max())
print ('Tipper Max: ' + str(tipperMax))    

# Make Array of Min Values
tipperMin = []
for i in range (0,len(tipperLabel)):
    tipperMin.append(tipperDF[(tipperLabel[i])].min())
print ('Tipper Min: ' + str(tipperMin))

Tipper Max: [0.448583275, 0.91566056, 15.8525887, 0.912206829, 0.248657346, 6.89921093]
Tipper Min: [-0.604699671, -0.201740324, 7.20155958e-06, -0.252254784, -0.774622023, 1.34176671e-05]


Here we load the computed parameters
>!****COMPUTED PARAMETERS****!

In [283]:
# Import entity and attributes - !****COMPUTED PARAMETERS****! plan to break some of these individual chunks into objects/functions
# Probably will need two functions for this - one for a single list and one for the long lists with more than one column
# Get Range of COMPUTED PARAMETERS in EDI File
for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == '>!****COMPUTEDPARAMETERS****!':
   startIndPar = k + 1
   print ('startIndPar: ' + str(startIndPar))
 
 if value.replace(" ", "") ==  '>END':
   endIndPar = k - 1
   print ('endIndPar: ' + str(endIndPar))

#Construct Array of Channel Headers   
count = 0
parLabel = []
parData = []
pdata = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
parDF = pd.DataFrame(pdata)
for l in range(startIndPar,endIndPar): 
    if list_[l][0] == '>':
     ptemp = list_[l].split(" ", 1)
     #print (ptemp)
     parLabel.append((ptemp[0].split(">"))[1])
     pdataTemp = list_[l+1]
     for j in range(l+2,l+8):
      pdataTemp = pdataTemp + list_[j]
      pdataTemp = re.sub(' +', ' ',pdataTemp)
      pdata = pdataTemp.split(" ")
     #print (pdata)
     del pdata[0]
     del pdata[len(pdata)-1] # need to check for empty strings and delete
     pdata = np.array(pdata).astype(np.float) #convert String to floats
     pe = pd.Series(pdata)
     print ((ptemp[0].split(">"))[1])   
     parDF[((ptemp[0].split(">"))[1])] = te.values
    
    count = count + 1

parDF
#pdata
#pe 

startIndPar: 459
endIndPar: 818
RHOROT
RHOXX
RHOXX.ERR
RHOXY
RHOXY.ERR
RHOYX
RHOYX.ERR
RHOYY
RHOYY.ERR
PHSXX
PHSXX.ERR
PHSXY
PHSXY.ERR
PHSYX
PHSYX.ERR
PHSYY
PHSYY.ERR
TIPMAG
TIPMAG.ERR
TIPPHS
TIPPHS.ERR
ZSTRIKE
ZSKEW
TSTRIKE
COH
COH
COH
COH
EPREDCOH
EPREDCOH
SIGAMP
SIGAMP
SIGAMP
SIGAMP
SIGAMP
SIGNOISE
SIGNOISE
SIGNOISE
SIGNOISE
SIGNOISE


Unnamed: 0,RHOROT,RHOXX,RHOXX.ERR,RHOXY,RHOXY.ERR,RHOYX,RHOYX.ERR,RHOYY,RHOYY.ERR,PHSXX,...,TIPMAG.ERR,TIPPHS,TIPPHS.ERR,ZSTRIKE,ZSKEW,TSTRIKE,COH,EPREDCOH,SIGAMP,SIGNOISE
0,8e-05,8e-05,8e-05,8e-05,8e-05,8e-05,8e-05,8e-05,8e-05,8e-05,...,8e-05,8e-05,8e-05,8e-05,8e-05,8e-05,8e-05,8e-05,8e-05,8e-05
1,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,...,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004,0.0004
2,1.3e-05,1.3e-05,1.3e-05,1.3e-05,1.3e-05,1.3e-05,1.3e-05,1.3e-05,1.3e-05,1.3e-05,...,1.3e-05,1.3e-05,1.3e-05,1.3e-05,1.3e-05,1.3e-05,1.3e-05,1.3e-05,1.3e-05,1.3e-05
3,0.00079,0.00079,0.00079,0.00079,0.00079,0.00079,0.00079,0.00079,0.00079,0.00079,...,0.00079,0.00079,0.00079,0.00079,0.00079,0.00079,0.00079,0.00079,0.00079,0.00079
4,0.000773,0.000773,0.000773,0.000773,0.000773,0.000773,0.000773,0.000773,0.000773,0.000773,...,0.000773,0.000773,0.000773,0.000773,0.000773,0.000773,0.000773,0.000773,0.000773,0.000773
5,4.3e-05,4.3e-05,4.3e-05,4.3e-05,4.3e-05,4.3e-05,4.3e-05,4.3e-05,4.3e-05,4.3e-05,...,4.3e-05,4.3e-05,4.3e-05,4.3e-05,4.3e-05,4.3e-05,4.3e-05,4.3e-05,4.3e-05,4.3e-05
6,6.5e-05,6.5e-05,6.5e-05,6.5e-05,6.5e-05,6.5e-05,6.5e-05,6.5e-05,6.5e-05,6.5e-05,...,6.5e-05,6.5e-05,6.5e-05,6.5e-05,6.5e-05,6.5e-05,6.5e-05,6.5e-05,6.5e-05,6.5e-05
7,0.000107,0.000107,0.000107,0.000107,0.000107,0.000107,0.000107,0.000107,0.000107,0.000107,...,0.000107,0.000107,0.000107,0.000107,0.000107,0.000107,0.000107,0.000107,0.000107,0.000107
8,0.000194,0.000194,0.000194,0.000194,0.000194,0.000194,0.000194,0.000194,0.000194,0.000194,...,0.000194,0.000194,0.000194,0.000194,0.000194,0.000194,0.000194,0.000194,0.000194,0.000194
9,0.001798,0.001798,0.001798,0.001798,0.001798,0.001798,0.001798,0.001798,0.001798,0.001798,...,0.001798,0.001798,0.001798,0.001798,0.001798,0.001798,0.001798,0.001798,0.001798,0.001798


In [284]:
# Now lets get the stats of the computed parameters

# Make Array of Max Values
parMax = []
for i in range (0,len(parLabel)):
    parMax.append(parDF[(parLabel[i])].max())
print ('Computed Pararmeters Max: ' + str(parMax))    

# Make Array of Min Values
parMin = []
for i in range (0,len(parLabel)):
    parMin.append(parDF[(parLabel[i])].min())
print ('Computed Pararmeters Min: ' + str(parMin))

Computed Pararmeters Max: [6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093, 6.89921093]
Computed Pararmeters Min: [1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.34176671e-05, 1.

## Now lets get the range of values from the RSP values

## Now the raw Binary File Listing - this can be T files or W files
We will need to figure out the best way of filtering on thise - may need to build array and then delete AVG, dmp and edi file.

These are listed in the edi file as well but they are not all there.  

    ProcessingTimeSeriesUsed:
         wp01A1.bp1                                                                     
         wp01A2.bp1                                                                     
         wp01A1.sd6                                                                     
         wp01A2.sd6                                                                     
         wp01A1.sd7                                                                     
         wp01A2.sd8                                                                     
         wp01A2_3.sd9 

- Which files need to be included in the data release?
- What is the best way to get this listing?

In [285]:
#First Get the list of RSP files
rspList = glob.glob(os.path.join(mtStationPath, '*.RSP'),  recursive=True)
#rspList
ediFile = str(ediPathArray[len(ediPathArray)-1])
fileListing = ''
fileListing = fileListing + '\t\t\t\t\t\t' + ediFile + '\n' #start the file listing with the main EDI file
rspFileListing = []
for i in range(len(rspList)):
    splitRspList = rspList[i].split('\\')
    fileListing = fileListing + '\t\t\t\t\t\t' + splitRspList[len(splitRspList) - 1] + '\n'
    rspFileListing.append(splitRspList[len(splitRspList) - 1])
print (fileListing)


						USA-Arkansas-Buffalo_River-2017-AMT170.edi
						BF6-9621.RSP
						BF6-9624.RSP
						BF6-9625.RSP
						EF-9515X.RSP
						EF-9515Y.RSP



In [286]:
#Now finally add the processed ASCII text files to the list
txtList = glob.glob(os.path.join(mtStationPath, '*.txt'),  recursive=True)
txtListFormatted = []
#txtList
print ('Text File Only Listing:\n')
for i in range(len(txtList)):
  splitTxtList = txtList[i].split('\\')
  txtListFormatted.append(splitTxtList[len(splitTxtList) - 1])
  #print (txtListFormatted[i])  
  fileListing = fileListing + '\t\t\t\t\t\t' + splitTxtList[len(splitTxtList) - 1] + '\n'

#print ('File Listing:\n' + fileListing)
#Remove readme file from list.  May need to search an remove if it does not come up first in alphabetical order?'
del txtListFormatted[0]
print (txtListFormatted)

Text File Only Listing:

['USA-Arkansas-Buffalo_River-2017-AMT170-FC6_01.txt', 'USA-Arkansas-Buffalo_River-2017-AMT170-FC6_02.txt', 'USA-Arkansas-Buffalo_River-2017-AMT170-FC6_03.txt', 'USA-Arkansas-Buffalo_River-2017-AMT170-FC7_01.txt', 'USA-Arkansas-Buffalo_River-2017-AMT170-FC7_02.txt', 'USA-Arkansas-Buffalo_River-2017-AMT170-FC8_01.txt', 'USA-Arkansas-Buffalo_River-2017-AMT170-FC8_02.txt', 'USA-Arkansas-Buffalo_River-2017-AMT170-FC9_01.txt', 'USA-Arkansas-Buffalo_River-2017-AMT170-FC9_02.txt', 'USA-Arkansas-Buffalo_River-2017-AMT170-FC9_03.txt', 'USA-Arkansas-Buffalo_River-2017-AMT170-FCA_AA.txt', 'USA-Arkansas-Buffalo_River-2017-AMT170-FCB_AB.txt', 'USA-Arkansas-Buffalo_River-2017-AMT170-FCC_BA.txt', 'USA-Arkansas-Buffalo_River-2017-AMT170-FCC_BB.txt', 'USA-Arkansas-Buffalo_River-2017-AMT170-FCC_CA.txt', 'USA-Arkansas-Buffalo_River-2017-AMT170-FCD_CB.txt', 'USA-Arkansas-Buffalo_River-2017-AMT170-FCE_CC.txt', 'USA-Arkansas-Buffalo_River-2017-AMT170-FCE_DA.txt', 'USA-Arkansas-Buffal

In [287]:
# Create the Entity and Attributes for the .txt? listing
txtEandA = ''
#As before, we read in the list of files

for i in range(len(txtListFormatted)):
    if txtListFormatted[i].find('-BP') != -1:
        txtEandA = txtEandA + '\t\t<detailed>\n\t\t\t<enttyp>\n\t\t\t\t<enttypl>Text File ' + txtListFormatted[i] + '</enttypl>\n' \
            + '\t\t\t\t<enttypd>Header file in ASCII text format for raw cross power files</enttypd>' \
            + '\n\t\t\t\t<enttypds>U.S. Geological Survey</enttypds>\n\t\t\t</enttyp>' \
            + '\n\t\t\t<attr>\n\t\t\t\t<attrlabl>Header Information</attrlabl>\n\t\t\t\t<attrdef>Header description and settings for cross power binary content</attrdef>\n\t\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>\n\t\t\t\t<attrdomv>'\
            + '\n\t\t\t\t\t<udom>Header description and settings for cross power binary content</udom>\n\t\t\t\t</attrdomv>' \
            + '\n\t\t\t</attr>\n\t\t</detailed>\n' 
    if txtListFormatted[i].find('-FC') != -1:
        txtEandA = txtEandA + '\t\t<detailed>\n\t\t\t<enttyp>\n\t\t\t\t<enttypl>Text File ' + txtListFormatted[i] + '</enttypl>\n' \
            + '\t\t\t\t<enttypd>Header file in ASCII text format for raw fourier coefficient files</enttypd>' \
            + '\n\t\t\t\t<enttypds>U.S. Geological Survey</enttypds>\n\t\t\t</enttyp>' \
            + '\n\t\t\t<attr>\n\t\t\t\t<attrlabl>Header Information</attrlabl>\n\t\t\t\t<attrdef>Header description and settings for fourier coefficient binary content</attrdef>\n\t\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>\n\t\t\t\t<attrdomv>'\
            + '\n\t\t\t\t\t<udom>Header description and settings for fourier coefficient binary content</udom>\n\t\t\t\t</attrdomv>' \
            + '\n\t\t\t</attr>\n\t\t</detailed>\n'
    if txtListFormatted[i].find('-SD') != -1:
        txtEandA = txtEandA + '\t\t<detailed>\n\t\t\t<enttyp>\n\t\t\t\t<enttypl>Text File ' + txtListFormatted[i] + '</enttypl>\n' \
            + '\t\t\t\t<enttypd>Header file in ASCII text format for sychronous detection cross power files</enttypd>' \
            + '\n\t\t\t\t<enttypds>U.S. Geological Survey</enttypds>\n\t\t\t</enttyp>' \
            + '\n\t\t\t<attr>\n\t\t\t\t<attrlabl>Header Information</attrlabl>\n\t\t\t\t<attrdef>Header description and settings for cross power binary content</attrdef>\n\t\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>\n\t\t\t\t<attrdomv>'\
            + '\n\t\t\t\t\t<udom>Header description and settings for cross power binary content</udom>\n\t\t\t\t</attrdomv>' \
            + '\n\t\t\t</attr>\n\t\t</detailed>\n'
            
    if txtListFormatted[i].find('-TS') != -1:
        txtEandA = txtEandA + '\t\t<detailed>\n\t\t\t<enttyp>\n\t\t\t\t<enttypl>Text File ' + txtListFormatted[i] + '</enttypl>\n' \
            + '\t\t\t\t<enttypd>Header file in ASCII text format for binary time series files</enttypd>' \
            + '\n\t\t\t\t<enttypds>U.S. Geological Survey</enttypds>\n\t\t\t</enttyp>' \
            + '\n\t\t\t<attr>\n\t\t\t\t<attrlabl>Header Information</attrlabl>\n\t\t\t\t<attrdef>Header description and settings for time series binary content</attrdef>\n\t\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>\n\t\t\t\t<attrdomv>'\
            + '\n\t\t\t\t\t<udom>Header description and settings for time series binary content</udom>\n\t\t\t\t</attrdomv>' \
            + '\n\t\t\t</attr>\n\t\t</detailed>\n'
    
#Add the final .txt listing for the readme file.  
#Be sure to comment this out if there is no readme.txt file included with the data release

readmeEandA = '\t\t<detailed>\n\t\t<enttyp>\n\t\t\t<enttypl>Text File readme.txt</enttypl>'\
        + '\n\t\t\t<enttypd>Read Me file describing the naming format of the EDI files and that they may need to be renamed to be imported into certain software packages.</enttypd>'\
        + '\n\t\t\t<enttypds>U.S. Geological Survey</enttypds>'\
        + '\n\t\t</enttyp>\n\t\t</detailed>\n'
    
txtEandA = txtEandA + readmeEandA
print (txtEandA)
ALL_EandA = '<eainfo>\n'   
ALL_EandA = ALL_EandA + txtEandA

		<detailed>
			<enttyp>
				<enttypl>Text File USA-Arkansas-Buffalo_River-2017-AMT170-FC6_01.txt</enttypl>
				<enttypd>Header file in ASCII text format for raw fourier coefficient files</enttypd>
				<enttypds>U.S. Geological Survey</enttypds>
			</enttyp>
			<attr>
				<attrlabl>Header Information</attrlabl>
				<attrdef>Header description and settings for fourier coefficient binary content</attrdef>
				<attrdefs>U.S. Geological Survey</attrdefs>
				<attrdomv>
					<udom>Header description and settings for fourier coefficient binary content</udom>
				</attrdomv>
			</attr>
		</detailed>
		<detailed>
			<enttyp>
				<enttypl>Text File USA-Arkansas-Buffalo_River-2017-AMT170-FC6_02.txt</enttypl>
				<enttypd>Header file in ASCII text format for raw fourier coefficient files</enttypd>
				<enttypds>U.S. Geological Survey</enttypds>
			</enttyp>
			<attr>
				<attrlabl>Header Information</attrlabl>
				<attrdef>Header description and settings for fourier coefficient binary content</at

# Now get values and stats on the .RSP files that are listed to add to the Ent. and Att. information

### These files are all fixed width format (666) but it looks like the BFS*.RSP and EF*.RSP are slightly different beasts that have different formats

In [288]:
# Load the BFS*.RSP files into pandasand create entities and attributes
allRspEandP = ''
strFreqMaxBSP = ''
strAmpMaxBSP = ''
strGammaMaxBSP = ''
strFreqMinBSP = ''
strAmpMinBSP = ''
strGammaMinBSP = ''

strFreqMaxESP = ''
strFreqMinESP = ''
strAmp1MaxESP = ''
strAmp1MinESP = ''
strAmp2MaxESP = ''
strAmp2MinESP = ''
strAmp3MaxESP = ''
strAmp3MinESP = ''
strAmp4MaxESP = ''
strAmp4MinESP = ''
strPhz1MaxESP = ''
strPhz1MinESP = ''
strPhz2MaxESP = ''
strPhz2MinESP = ''
strPhz3MaxESP = ''
strPhz3MinESP = ''
strPhz4MaxESP = ''
strPhz4MinESP = ''

intNumberBSF = 0
intNumberEFF = 0
for i in range(len(rspList)):
#for i in range(3):
    if rspList[i].find('BF') > 0: #Do this if the file has a BF style Fixed Width Format
        bfRSP = pd.read_fwf(rspList[i], widths=[6,6,6], skiprows=5, parse_dates=True).rename(columns={'31':'Freq', '1':'Amp', 'Unnamed: 2':'Gamma'})
#print (rspList[1])
        print (bfRSP)    
    
# Now lets get the stats of the bfRSP data

# Make Array of Min and Max Values
        FreqMaxRSP = bfRSP['Freq'].max()
        strFreqMaxBSP = str(FreqMaxRSP)
        AmpMaxRSP = bfRSP['Amp'].max()
        strAmpMaxBSP = str(AmpMaxRSP)
        GammaMaxRSP = bfRSP['Gamma'].max()
        strGammaMaxBSP = str(GammaMaxRSP)
        FreqMinRSP = bfRSP['Freq'].min()
        strFreqMinBSP = str(FreqMinRSP)
        AmpMinRSP = bfRSP['Amp'].min()
        strAmpMinBSP = str(AmpMinRSP)
        GammaMinRSP = bfRSP['Gamma'].min()
        strGammaMinBSP = str(GammaMinRSP)
# now print RSP entity and attribute    
        bspEandA = '\t\t<detailed>\n\t\t<enttyp>\n\t\t\t<enttypl>Text File ' + rspFileListing [i] + '</enttypl>\n' \
        + '\t\t\t<enttypd>System Calibration File</enttypd>\n\t\t\t<enttypds>Electromagnetic Instruments (EMI)</enttypds>\n\t\t</enttyp>\n' \
        + '\t\t\t<attr>\n\t\t\t<attrlabl>Freq</attrlabl>\n\t\t\t<attrdef>Frequency - Hz</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>' \
        + '\n\t\t\t<attrdomv>\n\t\t\t\t<rdom>\n\t\t\t\t<rdommin>' \
        + strFreqMinBSP + '</rdommin>\n\t\t\t\t<rdommax>' + strFreqMaxBSP + '</rdommax>\n\t\t\t\t' \
        + '<attrunit>Hz</attrunit>\n\t\t\t</rdom>\n\t\t\t</attrdomv>\n\t\t</attr>\n' \
        + '\t\t\t<attr>\n\t\t\t<attrlabl>Amp</attrlabl>\n\t\t\t<attrdef>Amplitude - Volts/Gamma</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>' \
        + '\n\t\t\t<attrdomv>\n\t\t\t\t<rdom>\n\t\t\t\t<rdommin>' \
        + strAmpMinBSP + '</rdommin>\n\t\t\t\t<rdommax>' + strAmpMaxBSP + '</rdommax>\n\t\t\t\t' \
        + '<attrunit>Volts/Gamma</attrunit>\n\t\t\t</rdom>\n\t\t\t</attrdomv>\n\t\t</attr>\n' \
        + '\t\t\t<attr>\n\t\t\t<attrlabl>Phz</attrlabl>\n\t\t\t<attrdef>Phase - Degrees</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>' \
        + '\n\t\t\t<attrdomv>\n\t\t\t\t<rdom>\n\t\t\t\t<rdommin>' \
        + strGammaMinBSP + '</rdommin>\n\t\t\t\t<rdommax>' + strGammaMaxBSP + '</rdommax>\n\t\t\t\t' \
        + '<attrunit>Degrees</attrunit>\n\t\t\t</rdom>\n\t\t\t</attrdomv>\n\t\t</attr>\n' \
        + '\t</detailed>'
        print ('i Loop = ' + str(i))
        allRspEandP = allRspEandP +  bspEandA + '\n'
    if rspList[i].find('EF') > 0:
        print ('i Loop = ' + str(i) + ' EF File Found')
        efRSP = pd.read_fwf(rspList[i], widths=[6,6,6,6,6,6,6,6,6], skiprows=5, parse_dates=True)\
        .rename(columns={'42':'Freq', '4':'Amp1', 'Low F':'Phz1', 'requen':'Amp2', 'cy':'Phz2', '10Hz':'Amp3', 'Out':'Phz3',\
        'Unnamed: 7':'Amp4', 'Unnamed: 8':'Phz4'})
#Delete headers from data to remove all strings from Pandas Data Frame

        efRSP = efRSP[efRSP.Amp2 != 'Freque']

# Make Array of Min and Max Values
        strFreqMaxESP = str(efRSP['Freq'].max())
        strFreqMinESP = str(efRSP['Freq'].min())
        strAmp1MaxESP = str(efRSP['Amp1'].max())
        strAmp1MinESP = str(efRSP['Amp1'].min())
        strAmp2MaxESP = str(efRSP['Amp2'].max())
        strAmp2MinESP = str(efRSP['Amp2'].min())
        strAmp3MaxESP = str(efRSP['Amp3'].max())
        strAmp3MinESP = str(efRSP['Amp3'].min())
        strAmp4MaxESP = str(efRSP['Amp4'].max())
        strAmp4MinESP = str(efRSP['Amp4'].min())
        strPhz1MaxESP = str(efRSP['Phz1'].max())
        strPhz1MinESP = str(efRSP['Phz1'].min())
        strPhz2MaxESP = str(efRSP['Phz2'].max())
        strPhz2MinESP = str(efRSP['Phz2'].min())
        strPhz3MaxESP = str(efRSP['Phz3'].max())
        strPhz3MinESP = str(efRSP['Phz3'].min())
        strPhz4MaxESP = str(efRSP['Phz4'].max())
        strPhz4MinESP = str(efRSP['Phz4'].min())
        espEandA = '\t<detailed>\n\t\t\t<enttyp>\n\t\t\t<enttypl>Text File ' + rspFileListing [i] + '</enttypl>\n' \
        + '\t\t\t<enttypd>System Calibration File</enttypd>\n\t\t\t<enttypds>Electromagnetic Instruments (EMI)</enttypds>\n\t\t</enttyp>\n' \
        + '\t\t\t<attr>\n\t\t\t<attrlabl>Freq</attrlabl>\n\t\t\t<attrdef>Frequency - Hz</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>' \
        + '\n\t\t\t<attrdomv>\n\t\t\t\t<rdom>\n\t\t\t\t<rdommin>' \
        + strFreqMinESP + '</rdommin>\n\t\t\t\t<rdommax>' + strFreqMaxESP + '</rdommax>\n\t\t\t\t' \
        + '<attrunit>Hz</attrunit>\n\t\t\t</rdom>\n\t\t\t</attrdomv>\n\t\t</attr>\n' \
        + '\t\t<attr>\n\t\t\t<attrlabl>Amp</attrlabl>\n\t\t\t<attrdef>Amplitude - Volts/Gamma</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>' \
        + '\n\t\t\t<attrdomv>\n\t\t\t\t<rdom>\n\t\t\t\t<rdommin>' \
        + strAmp1MinESP + '</rdommin>\n\t\t\t\t<rdommax>' + strAmp1MaxESP + '</rdommax>\n\t\t\t\t' \
        + '<attrunit>Volts/Gamma</attrunit>\n\t\t\t</rdom>\n\t\t\t</attrdomv>\n\t\t</attr>\n' \
        + '\t\t\t<attr>\n\t\t\t<attrlabl>Phz</attrlabl>\n\t\t\t<attrdef>Phase - Degrees</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>' \
        + '\n\t\t\t<attrdomv>\n\t\t\t\t<rdom>\n\t\t\t\t<rdommin>' \
        + strPhz1MinESP + '</rdommin>\n\t\t\t\t<rdommax>' + strPhz1MaxESP + '</rdommax>\n\t\t\t\t' \
        + '<attrunit>Degrees</attrunit>\n\t\t\t</rdom>\n\t\t\t</attrdomv>\n\t\t</attr>\n' \
        + '\t</detailed>'
        print ('i Loop = ' + str(i))
        allRspEandP = allRspEandP +  espEandA + '\n' 
        print (str(allRspEandP)) 

        
EandA = allRspEandP # [-:2] # this removes the last characters of the string to get rid of the last line return.  
ALL_EandA = ALL_EandA +  allRspEandP


        Freq      Amp  Gamma
0       0.10  0.00191   89.3
1       0.15  0.00287   88.9
2       0.20  0.00383   88.5
3       0.30  0.00574   87.8
4       0.40  0.00764   87.1
5       0.60  0.01140   85.7
6       0.80  0.01520   84.2
7       1.00  0.01890   82.8
8       1.50  0.02810   79.2
9       2.00  0.03700   75.8
10      3.00  0.05360   69.2
11      4.00  0.06820   63.1
12      6.00  0.09130   52.7
13      8.00  0.10700   44.6
14     10.00  0.11800   38.3
15     15.00  0.13600   27.4
16     20.00  0.13790   23.1
17     30.00  0.14800   14.1
18     40.00  0.14800   10.1
19     80.00  0.15000    5.6
20    100.00  0.15000    4.5
21    200.00  0.15190    3.2
22    400.00  0.15190    0.9
23   1000.00  0.15190    4.3
24   2000.00  0.15190   -0.5
25   4000.00  0.15190   -1.7
26   8000.00  0.15500   -4.4
27  10000.00  0.15700   -6.2
28  15000.00  0.15800  -10.2
29  20000.00  0.16300  -14.0
30  25000.00  0.16941  -19.6
i Loop = 0
        Freq      Amp  Gamma
0       0.10  0.00183   89.3
1  

## Now let's work on the SD mode files

### Overview of SD files,

The prefix file naming convention for the SD mode AMT non-transmitter files are:
AANNNRN where AA is the survey area, NNN is the site number, R is the run "number" (run A, run B, run C, etc.), and N is a subset run number (first run is A1, second run is A2, third run is A3, etc.)

The suffix file naming convention for the SD mode AMT non-transmitter files are:
SD6 sample frequencies are always 79, 90, 100, 150, 210, 270, 340, 460, 580 Hertz
SD7 sample frequencies are always 340, 460, 580, 720, 885, 1170 Hertz
SD8 sample frequencies are always 1170, 1500, 1870, 2200, 2730, 3550, 4900, 6500, 9000 Hertz
SD9 sample frequencies are always 6500, 9000, 11590, 15290, 19500, 23370 Hertz

The prefix file naming convention for the SD mode AMT transmitter files are:
AANNNRN where AA is the survey area, NNN is the site number, R is the run "number" (run A, run B, run C, etc.), and N is a subset run "number" (first run is AA, second run is AB, third run is AC, etc.)

The suffix file naming convention for the SD mode AMT transmitter files are:
SDA sample frequency is always 960 Hertz
SDB sample frequency is always 1200 Hertz
SDC sample frequency is always 1870 Hertz
SDD sample frequency is always 2420 Hertz
SDE sample frequency is always 2730 Hertz
SDF sample frequency is always 3550 Hertz
SDG sample frequency is always 5210 Hertz
SDH sample frequency is always 6850 Hertz
SDI sample frequency is always 11590 Hertz
SDJ sample frequency is always 15920 Hertz
SDK sample frequency is always 23370 Hertz

The file format of the SD mode files are described in each of the text files (last block of information within each file). It is the same data format regardless of whether or not a transmitter was used to collect the data. We must have been looking at the text file for the FC (Fourier Coefficient) files in your office, so I was confused that I hadn't described the data format (it is there in the text files for all SD files).



In [289]:
#First get a listing of the SD mode files
#Now finally add the processed ASCII text files to the list
sdList = glob.glob(os.path.join(mtStationPath, '*.sd?'),  recursive=True)
#sdaList
sdFileList = ''
for i in range(len(sdList)):
  splitSdList = sdList[i].split('\\')
  fileListing = fileListing + '\t\t\t\t\t\t' + splitSdList[len(splitSdList) - 1] + '\n'
  sdFileList = sdFileList + splitSdList[len(splitSdList) - 1] + '\n'

#print ('File ListingfileListing:\n' + fileListing)
print ('SD file listing:\n' + sdFileList)


SD file listing:
AR170A1.SD6
AR170A1.SD7
AR170A1.SD8
AR170A1.SD9
AR170A2.SD6
AR170A2.SD7
AR170A2.SD8
AR170A2.SD9
AR170A3.SD6
AR170A3.SD9
AR170AA.SDA
AR170AB.SDB
AR170BA.SDC
AR170BB.SDC
AR170CA.SDC
AR170CB.SDD
AR170CC.SDE
AR170DA.SDE
AR170EA.SDE
AR170EB.SDF
AR170EC.SDG
AR170FA.SDG
AR170FB.SDH
AR170FC.SDH
AR170GA.SDH
AR170GB.SDI
AR170HA.SDI
AR170HB.SDJ
AR170HC.SDK
AR170HD.SDK
AR170IA.SDI
AR170IB.SDJ
AR170JA.SDJ
AR170KA.SDJ
AR170LA.SDJ
AR170LB.SDI



In [290]:
#As before, we read in the list of files

for i in range(len(sdList)):
#for i in range(3):
    #if rspList[i].find('BF') > 0: #Do this if the file has a BF style Fixed Width Format
        dfSD = pd.read_fwf(sdList[i], widths=[15,15,15,15,15], skiprows=27, parse_dates=True).rename(columns={'79.0000  32.':'Amp1', '00000   20   1':'Amp2', 'Unnamed: 2':'Amp3', 'Unnamed: 3':'Amp4', 'Unnamed: 4':'Amp5'})
#Now strip out the header rows in the data as to not screw up calculating the range
        #dfSD = dfSD[dfSD.Amp4 != '00000   20   1']
        print (dfSD)        
    

               Amp1             Amp2      Amp3      Amp4       Amp5
0   86.25229496e-03              0.0 -0.455947 -0.153087   4.670031
1               0.0  -6.86214441e-03 -0.002106  0.067742  -0.003314
2   99.36826705e-05              0.0 -0.000956 -0.000308   0.008660
3   34.93329495e-04  12.31010608e-05  0.000055  0.000023   0.000000
4   69.24365362e-04  -4.46505012e-03 -0.048565  0.059606  -0.000744
5   83.88387153e-05  -4.78466039e-05  0.000148  0.001282   0.000000
6      90.0000  32.   00000   20   1       NaN       NaN        NaN
7   45.76751123e-03              0.0 -0.022931 -0.144560   1.416386
8               0.0  -7.48451380e-04 -0.002103  0.019907  -0.002410
9   28.79635125e-05              0.0 -0.000091  0.000160   0.000960
10  13.13329871e-04  11.76887512e-06  0.000018  0.000006   0.000000
11  21.30180853e-04  84.15316058e-05 -0.013605  0.017493  -0.000221
12  22.70130383e-05  40.53935116e-07  0.000025  0.000354   0.000000
13    100.0000  32.   00000   20   1       NaN  

      6500.0000 620.             Amp2          Amp3          Amp4  \
0    10.60040167e-03              0.0  1.922143e-03  4.128546e-03   
1                0.0  57.34393228e-07  9.387248e-06  2.144321e-05   
2    59.65137117e-08              0.0 -6.083508e-06  6.503849e-06   
3    -6.59141246e-06  87.14687989e-09  1.025267e-08  2.174961e-08   
4    -8.50799508e-06  -1.03463872e-05 -5.937094e-06  3.220916e-06   
5    -4.98351058e-09  -3.29805190e-08  9.918152e-09  3.298961e-07   
6     9000.0000 620.   00000   20   1           NaN           NaN   
7    73.23751254e-03              0.0  1.949879e-02 -9.391240e-03   
8                0.0  -4.52227176e-05 -5.156796e-05 -5.330879e-06   
9    93.38571288e-08              0.0 -5.228476e-05  2.763849e-05   
10   -9.16477540e-06  12.16815399e-08 -6.149521e-08  6.336638e-08   
11   23.93772672e-06  28.51059423e-06 -3.827954e-06  1.005718e-05   
12   -4.56987339e-09  -6.67802255e-08 -3.819624e-08  4.984702e-07   
13   11590.0000 620.   00000   20 

     1870.0000 106.   00000   15   1          Amp3          Amp4          Amp5
0   89.16024997e-05              0.0  3.427359e-05  2.289519e-04  2.253983e-03
1               0.0  30.16934049e-07  9.660327e-07  6.796548e-06 -1.106647e-05
2   11.47769608e-08              0.0 -1.812619e-07  7.467364e-07  8.289636e-07
3   -1.17812600e-06  92.83452932e-10  2.951194e-09  2.414978e-09  0.000000e+00
4   -2.64793753e-06  13.96740122e-07 -3.962024e-06  3.883538e-06 -7.008167e-08
5   -1.26638111e-08  -1.05932932e-10  3.130312e-09  1.151469e-07  0.000000e+00
6    1870.0000 106.   00000   15   1           NaN           NaN           NaN
7   30.77009091e-04              0.0 -3.215968e-04 -9.907139e-04  2.496577e-03
8               0.0  10.89251253e-07  1.181415e-06  5.445590e-06 -4.034373e-06
9   92.80670265e-09              0.0 -1.145631e-06  2.463786e-06 -1.519195e-07
10  -1.62451852e-06  91.51341936e-10  6.661063e-10  4.260059e-09  0.000000e+00
11  -7.03972418e-06  32.12611197e-07 -3.518077e-06 -

    23370.0000 620.   00000   15   1          Amp3      Amp4       Amp5
0   17.54306965e+01              0.0 -7.952567e+01 -1.642427  61.518064
1               0.0  23.31388453e-03 -1.475663e-02 -0.037088   0.020950
2   11.95990181e-05              0.0  2.349379e-02 -0.013287  -0.004558
3   28.28809619e-04  57.12199646e-07 -1.598653e-07  0.000010   0.000000
4   -2.29878137e-03  11.53631884e-04  7.678207e-03 -0.003808  -0.000026
5   -1.23690849e-06  -9.41265538e-08 -1.536339e-07  0.000006   0.000000
6   23370.0000 620.   00000   15   1           NaN       NaN        NaN
7   42.34392906e-01              0.0  5.590549e+00  0.043591   9.468943
8               0.0  17.38731938e-03 -1.044895e-02  0.023390  -0.012783
9   27.40672287e-05              0.0 -7.983513e-03  0.004671  -0.011071
10  59.59144260e-04  -1.08824894e-04  2.055038e-07  0.000044   0.000000
11  -3.74664409e-03  20.62009411e-04 -4.782623e-03  0.002219  -0.000059
12  -3.49102937e-06  23.25534721e-06  1.415913e-06  0.000013   0

    15290.0000 620.   00000   15   1         Amp3         Amp4        Amp5
0   16.71672191e+03              0.0  -884.701697   954.701973  155.515182
1               0.0  13.38164864e-01     1.978030     0.111554   -0.072663
2   67.80859035e-05              0.0    -1.136428     1.108818    0.081281
3   -1.34591415e-01  -3.28852671e-04     0.000118     0.000590    0.000000
4   49.16247420e-02  71.64765444e-02     0.010304    -0.057516    0.000135
5   21.61167183e-06  -5.28133192e-06    -0.000102     0.000047    0.000000
6   15290.0000 620.   00000   15   1          NaN          NaN         NaN
7   19.51344410e+03              0.0 -1249.773990  1343.552272  214.991891
8               0.0  10.34708828e-01     2.224587     0.137984   -0.125921
9   61.03861868e-05              0.0    -0.826296     1.559707    0.132708
10  -1.55436204e-01  -1.97538436e-04     0.000082     0.000555    0.000000
11  54.98966665e-02  78.94896667e-02     0.015713    -0.081752    0.000128
12  -1.72564025e-06  23.9

In [291]:
#listing of the Fourier Coefficient files (FC) files.
#These are Binary so we don't need to come up with the range of the values.
fcList = glob.glob(os.path.join(mtStationPath, '*.fc?'),  recursive=True)
#fcList
for i in range(len(fcList)):
  splitFcList = fcList[i].split('\\')
  fileListing = fileListing + '\t\t\t\t\t\t' + splitFcList[len(splitFcList) - 1] + '\n'

print ('File ListingfileListing:\n' + fileListing)

File ListingfileListing:
						USA-Arkansas-Buffalo_River-2017-AMT170.edi
						BF6-9621.RSP
						BF6-9624.RSP
						BF6-9625.RSP
						EF-9515X.RSP
						EF-9515Y.RSP
						readme.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC6_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC6_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC6_03.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC7_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC7_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC8_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC8_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC9_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC9_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC9_03.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FCA_AA.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FCB_AB.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FCC_BA.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FCC_BB.txt
						USA-Arkansas-Buffalo_Riv

In [292]:
# Create the Entity and Attributes for the FC listing
for i in range(len(fcList)):
    splitFcList = fcList[i].split('\\')
    fcEandA = '\t\t<detailed>\n\t\t<enttyp>\n\t\t\t<enttypl>Binary File ' + splitFcList[len(splitFcList) - 1] + '</enttypl>\n' \
        + '\t\t\t<enttypd>Raw Time Series Binary Data File. The header information for this file contains field notes including survey identifier, personnel, date/time, weather, indication of a remote MT site, additional comments and sensor configuration for the electric and magnetic field channels.</enttypd>' \
        + '\n\t\t\t<enttypds>U.S. Geological Survey</enttypds>\n\t\t</enttyp>' \
        + '\n<attr>\n\t\t\t<attrlabl>Data Value</attrlabl>\n\t\t\t<attrdef>Binary Data Value</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>\n\t\t\t<attrdomv>'\
        + '\n\t\t\t\t<udom>Binary Data Value</udom>\n\t\t\t\t</attrdomv>' \
        + '\t\t</attr>\n\t</detailed>\n'
    print (fcEandA)
ALL_EandA = ALL_EandA + fcEandA    
        
        
        
        
        

		<detailed>
		<enttyp>
			<enttypl>Binary File AR170A1.FC6</enttypl>
			<enttypd>Raw Time Series Binary Data File. The header information for this file contains field notes including survey identifier, personnel, date/time, weather, indication of a remote MT site, additional comments and sensor configuration for the electric and magnetic field channels.</enttypd>
			<enttypds>U.S. Geological Survey</enttypds>
		</enttyp>
<attr>
			<attrlabl>Data Value</attrlabl>
			<attrdef>Binary Data Value</attrdef>
			<attrdefs>U.S. Geological Survey</attrdefs>
			<attrdomv>
				<udom>Binary Data Value</udom>
				</attrdomv>		</attr>
	</detailed>

		<detailed>
		<enttyp>
			<enttypl>Binary File AR170A1.FC7</enttypl>
			<enttypd>Raw Time Series Binary Data File. The header information for this file contains field notes including survey identifier, personnel, date/time, weather, indication of a remote MT site, additional comments and sensor configuration for the electric and magnetic field channels.<

In [293]:
#listing of the Time Series TS? files
#Now finally add the processed ASCII text files to the list
tsList = glob.glob(os.path.join(mtStationPath, '*.ts?'),  recursive=True)
#tsList
for i in range(len(tsList)):
  splitTsList = tsList[i].split('\\')
  fileListing = fileListing + '\t\t\t\t\t\t' + splitTsList[len(splitTsList) - 1] + '\n'

print ('File ListingfileListing:\n' + fileListing)

File ListingfileListing:
						USA-Arkansas-Buffalo_River-2017-AMT170.edi
						BF6-9621.RSP
						BF6-9624.RSP
						BF6-9625.RSP
						EF-9515X.RSP
						EF-9515Y.RSP
						readme.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC6_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC6_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC6_03.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC7_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC7_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC8_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC8_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC9_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC9_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC9_03.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FCA_AA.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FCB_AB.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FCC_BA.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FCC_BB.txt
						USA-Arkansas-Buffalo_Riv

In [294]:
# Create the Entity and Attributes for the Time series .TS? listing
for i in range(len(tsList)):
    splitTsList = tsList[i].split('\\')
    tsEandA = '\t\t<detailed>\n\t\t<enttyp>\n\t\t\t<enttypl>Binary File ' + splitTsList[len(splitTsList) - 1] + '</enttypl>\n' \
        + '\t\t\t<enttypd>Raw Time Series Binary Data File. The header information for this file contains field notes including survey identifier, personnel, date/time, weather, indication of a remote MT site, additional comments and sensor configuration for the electric and magnetic field channels.</enttypd>' \
        + '\n\t\t\t<enttypds>U.S. Geological Survey</enttypds>' \
        + '\n\t\t</enttyp>\n\t\t<attr>\n\t\t\t<attrlabl>Data Value</attrlabl>\n\t\t\t<attrdef>Binary Data Value</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>\n\t\t\t<attrdomv>'\
        + '\n\t\t\t\t<udom>Binary Data Value</udom>\n\t\t\t\t</attrdomv>' \
        + '\t\t</attr>\n\t</detailed>\n'
    print (tsEandA)
ALL_EandA = ALL_EandA + tsEandA

		<detailed>
		<enttyp>
			<enttypl>Binary File AR170A1.TS1</enttypl>
			<enttypd>Raw Time Series Binary Data File. The header information for this file contains field notes including survey identifier, personnel, date/time, weather, indication of a remote MT site, additional comments and sensor configuration for the electric and magnetic field channels.</enttypd>
			<enttypds>U.S. Geological Survey</enttypds>
		</enttyp>
		<attr>
			<attrlabl>Data Value</attrlabl>
			<attrdef>Binary Data Value</attrdef>
			<attrdefs>U.S. Geological Survey</attrdefs>
			<attrdomv>
				<udom>Binary Data Value</udom>
				</attrdomv>		</attr>
	</detailed>

		<detailed>
		<enttyp>
			<enttypl>Binary File AR170A2.TS1</enttypl>
			<enttypd>Raw Time Series Binary Data File. The header information for this file contains field notes including survey identifier, personnel, date/time, weather, indication of a remote MT site, additional comments and sensor configuration for the electric and magnetic field channels

In [295]:
#listing of the BP? files
#Now finally add the processed ASCII text files to the list
bpList = glob.glob(os.path.join(mtStationPath, '*.bp?'),  recursive=True)
#bpList
for i in range(len(bpList)):
  splitBpList = bpList[i].split('\\')
  fileListing = fileListing + '\t\t\t\t\t\t' + splitBpList[len(splitBpList) - 1] + '\n'

print ('File ListingfileListing:\n' + fileListing)

File ListingfileListing:
						USA-Arkansas-Buffalo_River-2017-AMT170.edi
						BF6-9621.RSP
						BF6-9624.RSP
						BF6-9625.RSP
						EF-9515X.RSP
						EF-9515Y.RSP
						readme.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC6_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC6_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC6_03.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC7_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC7_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC8_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC8_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC9_01.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC9_02.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FC9_03.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FCA_AA.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FCB_AB.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FCC_BA.txt
						USA-Arkansas-Buffalo_River-2017-AMT170-FCC_BB.txt
						USA-Arkansas-Buffalo_Riv

In [296]:
# Create the Entity and Attributes for the .BP? listing

#As before, we read in the list of files

for i in range(len(bpList)):
#for i in range(3):
    #if rspList[i].find('BF') > 0: #Do this if the file has a BF style Fixed Width Format
        dfBP = pd.read_fwf(bpList[i], widths=[15,15,15,15,15], skiprows=27, parse_dates=True).rename(columns={'4.3945   1.':'Amp1', '85697    1   2':'Amp2', 'Unnamed: 2':'Amp3', 'Unnamed: 3':'Amp4', 'Unnamed: 4':'Amp5'})
#Now strip out the header rows in the data as to not screw up calculating the range
        #dfSD = dfSD[dfSD.Amp4 != '00000   20   1']
        #print (dfBP)        


for i in range(len(bpList)):
    splitBpList = tsList[i].split('\\')
    bpEandA = '\t\t<detailed>\n\t\t<enttyp>\n\t\t\t<enttypl>Binary File ' + splitBpList[len(splitBpList) - 1] + '</enttypl>\n' \
        + '\t\t\t<enttypd>Raw Time Series Binary Data File. The header information for this file contains field notes including survey identifier, personnel, date/time, weather, indication of a remote MT site, additional comments and sensor configuration for the electric and magnetic field channels.</enttypd>' \
        + '\n\t\t\t<enttypds>U.S. Geological Survey</enttypds>\n\t\t</enttyp>' \
        + '\n<attr>\n\t\t\t<attrlabl>Data Value</attrlabl>\n\t\t\t<attrdef>Binary Data Value</attrdef>\n\t\t\t<attrdefs>U.S. Geological Survey</attrdefs>\n\t\t\t<attrdomv>'\
        + '\n\t\t\t\t<udom>Binary Data Value</udom>\n\t\t\t\t</attrdomv>' \
        + '\t\t</attr>\n\t</detailed>\n'
    print (bpEandA)
    
ALL_EandA = ALL_EandA +  bpEandA

		<detailed>
		<enttyp>
			<enttypl>Binary File AR170A1.TS1</enttypl>
			<enttypd>Raw Time Series Binary Data File. The header information for this file contains field notes including survey identifier, personnel, date/time, weather, indication of a remote MT site, additional comments and sensor configuration for the electric and magnetic field channels.</enttypd>
			<enttypds>U.S. Geological Survey</enttypds>
		</enttyp>
<attr>
			<attrlabl>Data Value</attrlabl>
			<attrdef>Binary Data Value</attrdef>
			<attrdefs>U.S. Geological Survey</attrdefs>
			<attrdomv>
				<udom>Binary Data Value</udom>
				</attrdomv>		</attr>
	</detailed>

		<detailed>
		<enttyp>
			<enttypl>Binary File AR170A2.TS1</enttypl>
			<enttypd>Raw Time Series Binary Data File. The header information for this file contains field notes including survey identifier, personnel, date/time, weather, indication of a remote MT site, additional comments and sensor configuration for the electric and magnetic field channels.<

# Populate Metadata Template

In [297]:
#Load XML Metadata Template File and Read It
metaData = os.path.join(mtMetaDataTemplatePath, mtMetaDataTemplateName)
print ('Metadata path: ' + mtMetaDataTemplatePath + '\n')
xmlTemplateFile = open(metaData, 'r')
metaDataContent = xmlTemplateFile.readlines()
print(metaDataContent)
xmlTemplateFile.close()


Metadata path: C:\CurrentWork\DataReleases\Arkansas AMT data release

['<?xml version="1.0" encoding="UTF-8"?>\n', '<metadata>\n', '\t<idinfo>\n', '\t\t<citation>\n', '\t\t\t<citeinfo>\n', '\t\t\t\t<origin>Rodriguez, B. D.</origin>\n', '\t\t\t\t<origin>Brown, P. J.</origin>\n', '\t\t\t\t<origin>Hudson, M. R.</origin>\n', '\t\t\t\t<pubdate>2018</pubdate>\n', '\t\t\t\t<title>{title}</title>\n', '\t\t\t\t<edition>1</edition>\n', '\t\t\t\t<geoform>ASCII and Binary Digital Data</geoform>\n', '\t\t\t\t<pubinfo>\n', '\t\t\t\t\t<pubplace>Denver, CO</pubplace>\n', '\t\t\t\t\t<publish>U.S. Geological Survey</publish>\n', '\t\t\t\t</pubinfo>\n', '\t\t\t\t<othercit>Additional information about Originators:Rodriguez, B.D., http://orcid.org/0000-0002-2263-611X; Brown, P.J., http://orcid.org/0000-0002-2415-7462</othercit>\n', '\t\t\t\t<onlink>https://doi.org/10.5066/P9CIAXC5</onlink>\n', '\t\t\t</citeinfo>\n', '\t\t</citation>\n', '\t\t<descript>\n', '\t\t\t<abstract>This dataset includes audio-magne

In [298]:
# Replace values of current metadata template with the appropriate values.  
# All of this input should have been defined when going through the steps outlined above.
lineString = ''
newMetaDataContent = metaDataContent
splitFileName = ediList[0].split('.')
myfilename = splitFileName[0] + '.xml'
xmlFile = open(myfilename,"w+")
print(myfilename)
#print(keywords.value)
for i in range(len(metaDataContent)):
    lineString = metaDataContent[i]
    if lineString.find('{title}'):
     lineString = lineString.replace('{title}', citTit + '; Station ' + strChildID)
    
    if lineString.find('{abstract}'):
     lineString = lineString.replace('{abstract}', purposeClean)
    
    if lineString.find('{purpose}'):
     lineString = lineString.replace('{purpose}', descriptionClean)
    
    if lineString.find('{BeginFileListingHere}'):
     lineString = lineString.replace('{BeginFileListingHere}', fileListing)
    
    if lineString.find('{keywords}'):
     lineString = lineString.replace('{keywords}', keywords.value)
    
    if lineString.find('{CollectionDate}'):
     lineString = lineString.replace('{CollectionDate}', strCombinationDate)
    
    if lineString.find('{CombinationDate}'):
     lineString = lineString.replace('{CombinationDate}', strCombinationDate)

    if lineString.find('{ConversionDate}'):
     lineString = lineString.replace('{ConversionDate}', strConversionDate)

    if lineString.find('{RotationDate}'):
     lineString = lineString.replace('{RotationDate}', strRotationDate)
    
    if lineString.find('{HarvestDate}'):
     lineString = lineString.replace('{HarvestDate}', strHarvestDate)
   
    if lineString.find('{begdate}'):
     lineString = lineString.replace('{begdate}', begdate)
    
    if lineString.find('{enddate}'):
     lineString = lineString.replace('{enddate}', enddate)
    
    if lineString.find('{SiteLon}'):
     lineString = lineString.replace('{SiteLon}', sitLon)
    
    if lineString.find('{SiteLon}'):
     lineString = lineString.replace('{SiteLat}', sitLat)
    
    if lineString.find('{EandA}'):
     lineString = lineString.replace('{EandA}', ALL_EandA)
   
    # {county}

    else:
     lineString = lineString
    xmlFile.write(lineString)
    
    #print (lineString)
     
    
    
#for r in (metaDataContent):
    #newMetaDataContent = metaDataContent.replace('{title}', drTitle)
    #newMetaDataContent = metaDataContent.replace('{keywords}', keywords.value)
xmlFile.close()

print ('Creation of new metadata file is complete\n\n') 
#Load EDI File and Read It
##checkFile = open(open(myfilename, 'r')
##checkFileContent = checkFile.read()
##checkFile.close()
##print(checkFileContent)

C:\CurrentWork\DataReleases\Arkansas AMT data release\AMT170\USA-Arkansas-Buffalo_River-2017-AMT170.xml
Creation of new metadata file is complete




### Check this file to see if it is valid against the FGDC metadata standard (FGDC-STD-001-1998)

## https://mrdata.usgs.gov/validation/

In [299]:
# Show the resulting child xml metadata file example 
#for i in range(len(newMetaDataContent)):
print (newMetaDataContent)

['<?xml version="1.0" encoding="UTF-8"?>\n', '<metadata>\n', '\t<idinfo>\n', '\t\t<citation>\n', '\t\t\t<citeinfo>\n', '\t\t\t\t<origin>Rodriguez, B. D.</origin>\n', '\t\t\t\t<origin>Brown, P. J.</origin>\n', '\t\t\t\t<origin>Hudson, M. R.</origin>\n', '\t\t\t\t<pubdate>2018</pubdate>\n', '\t\t\t\t<title>{title}</title>\n', '\t\t\t\t<edition>1</edition>\n', '\t\t\t\t<geoform>ASCII and Binary Digital Data</geoform>\n', '\t\t\t\t<pubinfo>\n', '\t\t\t\t\t<pubplace>Denver, CO</pubplace>\n', '\t\t\t\t\t<publish>U.S. Geological Survey</publish>\n', '\t\t\t\t</pubinfo>\n', '\t\t\t\t<othercit>Additional information about Originators:Rodriguez, B.D., http://orcid.org/0000-0002-2263-611X; Brown, P.J., http://orcid.org/0000-0002-2415-7462</othercit>\n', '\t\t\t\t<onlink>https://doi.org/10.5066/P9CIAXC5</onlink>\n', '\t\t\t</citeinfo>\n', '\t\t</citation>\n', '\t\t<descript>\n', '\t\t\t<abstract>This dataset includes audio-magnetotelluric (AMT) sounding data collected in August 2017 in the Buffalo