
<h1>AmeriFlux and FLUXNET BIF File Parser</h1>
<p>
This notebook walks the user through reading in an AmeriFlux or FLUXNET BIF (BADM Interchange Format file) and manipulating the contents for the purposes of answering questions and/or writing out useful subsets of the data in a simple usable format. 
<p>
To start using this notebook, you will need an Excel BIF file that you downloaded from either AmeriFlux or FLUXNET. In this example, the file is in my home directory and I started with the AmeriFlux all sites BIF. This code should be able to read any BIF. 
<p>
To execute the whole notebook, choose "Cell->Run All" from the menu above. If you have updated a single segment of code in a notebook, select the cell you want to execute and choose "Cell->Run Cells". The first code cell in this notebook does all of the setup and must be run to completion before any of the other cells can be run. When it is done running, it will print 'DONE' below the code box.
<p>
In several of the code segments there are mechanisms that you can use to customize the code to do something different than it currently does. For instance, you can have it look for a different VARIABLE_GROUP or limit down the list of sites by specifying a sites filter. You can also specify whether you want to print to a file in the printtofile variable in the cells that can print to a file. 
<p>
If no path is defined in the file specification, the file is assumed to be in the same directory as this notebook.



<h2>Initialize the notebook by reading in the BIF and parsing it into useful data structures</h2>
The first three code blocks below must always be executed before trying to do any of the other activities in this notebook. Before executing the first block below, you will need to specify the name and location of the BIF file. See the first line of code in the next box to specify.
The code below completes three steps described below.
<ul>
<li><h3>Read the BIF file into a data structure</h3>
The code below first imports some useful code libraries so that they will be handy for use later. It then opens the Excel BIF file and reads it into a data structure. </li>
<li><h3>Create a class for holding the variable group</h3>
This class is an object oriented programming concept. It allows us to create a custom data type and methods for holding and operating on the variable groups in the BIF. I do not recommend modifying this class unless you are comfortable with programming in python.<\li>
<li><h3>Use the new class to turn the BIF file contents into a list of Variable Groups</h3>
This next section of the code will convert the original format of the BIF data from a Panda data frame that looked just like the BIF into a list of variable group instances with each variable group's entries included.</li>

In [None]:
# First to import useful libraries.
import numpy as np
import pandas as pd
# import to enable ordered dictionaries
# import collections as cl
import matplotlib.pyplot as plt
from operator import itemgetter
# imports to connect to and interpret web services
import requests
import json
import pprint
# import to allow us to work with files
import sys
# import to enable date and time management
import datetime
# import to manage directories and files
import os
# import to enable copying of structures
from copy import deepcopy

# open the BIF file
# customize the file name in the next  line with your actual BIF file.
bif_file_name = '20181022-AMF_AA-Net_BIF_LATEST.xlsx'
# bif_file_name = 'FLX_AA-Flx_BIF_LATEST.xlsx'
# bif_file_name = 'US-ARM_BIF.xlsx'

(bif_file_prefix,bif_file_suffix) = bif_file_name.split(".")
#print("DEBUG prefix ", bif_file_prefix)
bif_file = pd.ExcelFile(bif_file_name)

# Now read the BIF file into a data structure
bif_contents = pd.read_excel(bif_file)

# get and print the header row from the Excel file just below this code segment.
bif_header = bif_contents.columns
print("The BIF File Column Headers are:",'\n', bif_header)

# Now to check that the columns are in the order expected
correctindex = ['SITE_ID', 'GROUP_ID', 'VARIABLE_GROUP', 'VARIABLE', 'DATAVALUE']
if np.all(bif_header == correctindex): print("BIF Column order and labels in the file were as expected")
else: print("ERROR: Bad BIF file column header")
    
    
# Now to define the class structure for storing variable groups and its methods  
class VariableGrp(object):
    """" this class is to define an object that store a BADM variable group and the methods that operate on the group
         Attributes:
              param_array: Is a dictionary containing the parameters associated with this group. The first row
                            contains the label of the item and in the second row is the data value at the same index.
                            The first three columns contain the site id, the group id, and the variable group. The 
                            remaining columns contain the parameters that are in that variable group.
          Methods:
              __init__(self, header, row): This is the object constructor. It is called when a new group is first 
                          encountered to store the elements of the first row of the group. Header is the index and 
                          row is the row from the BIF file that we are interpretting.
              addparam(self, ptype, pvalue): Add the rest of the parameters to the group. Where ptype is the parameter
                          type and pvalue is parameter's value.
              getgrouptype(self): Return the type of this group (VARIABLE_GROUP)
              isgroup(self,gtype): Determine whether this is a gtype group
              hasparam(self, param, param_val): Check if this group has this parameter with the specified param_val.
              hasparamtype()
              printcontents(self): print the group in a vertical format with each parameter as a row
              printlateral(self, printlabels, labels, printtofile, fout): print out the group in column format. If Printlabels is true then it 
                              prints the column headers using the labels provided as an argument. The printtofile is a
                              boolean (True indicates that you want the output to go to the file pointed to by fout) instead of the screen.
                              fout should be a pointer to an already open file.
              """
    def __init__(self, header, row):
            
            #create a 2D array of lists
            self.parray = dict()
            #First to store the SITE_ID, GROUP_ID, VARIABLE_GROUP
            for i in range(3):
                self.parray[header[i]] = row[i]
            #Now to add the first parameter
            # CHECK HERE IF IS EVEN THAT MANY ELEMENTS
            if len(row.index) > 4 and pd.notnull(row[4]):
                self.parray[row[3]] = row[4]          


    def addparam(self, row):
            # CHECK HERE IF IS EVEN THAT MANY ELEMENTS
            if len(row.index) > 4 and pd.notnull(row[4]):
                self.parray[row[3]] = row[4]
            else:
                return 0
            return 1
    
    def getgrouptype(self):
             return self.parray["VARIABLE_GROUP"]
    
    def getsiteid(self):
            return self.parray["SITE_ID"]
        
    def getparamkeys(self):
            return self.parray.keys()
    
    def isgroup(self, gtype):
            if self.parray["VARIABLE_GROUP"] == gtype:
                    return True
            else:
                    return False
            
    def hasparam(self, param, param_val):
        try:
            if self.parray[param] == param_val: 
                return True
            else:
                return False
        except ValueError:
            return False
    
    def hasparamtype(self, param):
        try:
            if param in self.parray: 
                return True
            else:
                return False
        except ValueError:
            return False
    
    def printcontents(self):
            for param, val in self.parray.items():
                print("\t", param, val)
                
    def printlateral(self, printlabels, labels, printtofile, fout):
        columnheader = ""
        row = ""
        for param, val in labels.items():
            if(printlabels):
                columnheader = columnheader + '"' + param + '"' + ", "
            if param in self.parray:
                row = row + '"' + str(self.parray[param]) + '"' + ", "
            else:
                row = row + ", "

        # print to either the screen or the file depending on the printtofile argument
        if printtofile:
            # check if this is the first line of the file and we should print column labels
            if printlabels:
                fout.write(columnheader + '\n')
            # print the variable to the file
            fout.write(row + '\n')
        else:
            # we are writing to the screen check if this is the first line and print column headers before the variable
            if printlabels:
                print(columnheader)
            # print the variable to the screen
            print(row)
        
# end of definition of the class        
            
 
# Now to parse the file into instances of the Vargroup class
# first to initialize some variables
grpid = 0
grpcnt = -1
siteslist = dict()
bif_groups = dict()
# loop through the rows of the BIF file we just read in
for ndex,bif_row in bif_contents.iterrows():
    # check if this is a variable group we already started (known group_id)
    grpid = bif_row['GROUP_ID']
    if grpid in bif_groups:  
        # We have already seen this groupid
        #first make sure that this is not a duplicate gid for a different group
        if( bif_row['SITE_ID'] == bif_groups[grpid].getsiteid() and bif_row['VARIABLE_GROUP'] == bif_groups[grpid].getgrouptype()):
            # Add this row to the existing variable group - it is ok
            bif_groups[grpid].addparam(bif_row)
        else:
            # this is a second different group with the same grpid so reject
            print("ERROR IN BIF: Duplicated GroupID found and ignored - ", grpid)
                
    else:
        # This is a new group so we need to create a new VariableGrp and store the header and row in it
        grpcnt = grpcnt + 1
        bif_groups[grpid] = VariableGrp(bif_header, bif_row)

# Now to fill in the list of sites 
for gid, group in bif_groups.items():
    if group.getsiteid() in siteslist:
        siteslist[group.getsiteid()] += 1
    else:
        siteslist[group.getsiteid()] = 1

# print the list of sites found in the file and the number of groups for each
# print(siteslist)
# print the last group read in just to make sure things look ok
# bif_groups[grpid].printcontents()
# print(grpcnt, " groups found ", len(siteslist.keys()), " sites found ", len(grouptypeslist.keys()), " group types found") # just to let the user know this part has executed

print('DONE  ')

<h2> Find and Print all the Group Types in this File</h2>
This next code segment finds all the unique group types and prints out a list of the group types found in the file. It will also create the grouptypeslist used in later blocks and print the number of instances of each group type that were found in the file. The code block below must be run before using the rest of the notebook.

In [None]:
# Now to fill in the list of sites and the list of grouptypes and their parameters
grouptypeslist = dict()
for gid, group in bif_groups.items():
    params = group.getparamkeys()
    grouptype = group.getgrouptype()

    if grouptype in grouptypeslist:
        for param in params:
                if param in grouptypeslist[grouptype].keys():
                    grouptypeslist[grouptype][param] +=1
                else:                     
                    grouptypeslist[grouptype][param] = 1
    else:
        grouptypeslist[grouptype] = dict()
        for param in group.getparamkeys():
                grouptypeslist[grouptype][param] = 1

# print the group types and all parameters found for each group type
# after each parameter print the number of groups containing this parameter
for gid, grouptype in grouptypeslist.items():
    paramslist = "\n" + gid + ": " + str(grouptype["SITE_ID"]) + "(# is number of occurances)\n Parameters:"
    for param, val in grouptype.items():
        if (param != 'SITE_ID' and param != 'GROUP_ID' and param != "VARIABLE_GROUP"):
            paramslist = paramslist + param + ": " + str(val) + ", "
    print(paramslist)



<h2>Define the List of Sites to Use in Filtering the Queries in the Notebook Below Here</h2>
The code segment below defines an array with a list of sites, SitesofInterest. This list of sites will be used in many of the later code segments in the notebook to filter results. Depending on whether you want the code segments in the notebook below here to operate on all or sub-set of the sites in this BIF, uncomment (remove the # sign) on the appropriate line in the code segment right below here. Only leave one of the two lines that start with SitesofInterest uncommented.

If you want to define the sites of interest by some property like IGBP, example code for doing that is commented out at the end of this block. Remove the """ before and after that section to use it to define the siteofinterest instead of the simple lists used at the top of the code.

In [None]:
SitesofInterest = []
# Customize: Uncomment only one of the three sections below (use the instructions 
# at the start of each section to learn how to comment or uncomment).

# 1. For all sites, remove the # from the front of the next line or add 
# the # to comment it out.
SitesofInterest = siteslist.keys()

# 2. To limit results to a specific list of sites, list them between 
# the brackets below and remove the # from the front of the next line
#SitesofInterest = ['US-MMS', 'US-Seg', 'US-NGB']

# 3. To limit results to a set of sites with a specific IGBP type, insert # 
# in the front of the two lines that start with """ below. To deactivate 
# these lines again, remove the # signs and the """
"""
SitesofInterest = []
grouptype = 'GRP_IGBP'
param = 'IGBP'
paramvalue = 'DBF'
for gid, group in bif_groups.items():
    # First to find all the GRP_IGBP groups
    if group.isgroup(grouptype):
        # Now to just look for all the IGBPs that are paramvalue
        if group.hasparam( param, paramvalue ):
            # if this is a IGBP that matches then add the site to SitesofInterest
            if group.getsiteid() not in SitesofInterest:
                SitesofInterest.append(group.getsiteid())
"""               
# Print the resulting SitesofInterest to check that it is what you want.
print('SitesofInterest contains:', SitesofInterest)            


 <h1>Print a csv file with all entries of particular group</h1>
 Specify the group type and whether you want to print to a file.

In [None]:
# Create a csv file that contains the parameters for a user-defined group type and site list.
# The csv file is named as follows: YYYY-MM-DD-HH-MM-SS-<GROUP_NAME>
# The SitesofInterest variable contains the list of sites of interest.

# printlabel is used to cause the column labels to be printed out. 
printlabel = True
# customize this by changing the next 2 lines
grouptype = 'GRP_LAI'  # VARIABLE_GROUP we are looking for
printtofile = True   # Indicates whether to print to a file

#filename = 'YYYY-MM-DD-HH-MM-SS-<grouptype>.csv'  # the file where the results will be written
time = str(datetime.datetime.now().isoformat(sep='-', timespec='seconds'))
time = time.replace( ':', '-')

# if we will write to a file - first open it
if printtofile:
    filename = time + "-" + grouptype + ".csv"
    print ("DEBUG - content filename",filename)
    fout = open(filename, 'w')
else:
    fout = sys.stdout

count = 0

#loop through all of the groups that were in the BIF
for gid, group in bif_groups.items():
    # if this is a group of the group type we are looking for
    if group.isgroup(grouptype) and group.getsiteid() in SitesofInterest:
            # print this group out in column format
            group.printlateral(printlabel, grouptypeslist[grouptype], printtofile, fout)
            # turn the column labels off after the first time
            printlabel = False
            count += 1
            
# now to close the file if we opened it            
if printtofile:
    fout.close()
    if( count > 0):
        print( "File " + filename +" written and closed")
    else:
        os.remove(filename)
        print( "File " + filename + " not written - no instances of " + grouptype + " in BIF for chosen sites")

<h1>Create a directory and then write a csv file per group type in the BADM</h1>
This code will create a directory named for the date and BADM filename (YYYY-MM-DD-MM-SS-<BIF file name>. In the directory it will create a csv file per group type that contains all the parameters and their values for a user-defined site list.
The csv file for a given group type is named as follows: YYYY-MM-DD-HH-MM-SS-<Group Type>
The SitesofInterest variable contains the list of sites to filter by and was defined earlier.

In [None]:
# Create a cvs file per group type that contains all the parameters and their values for a user-defined site list.
# The csv file for a given group type is named as follows: YYYY-MM-DD-<GROUP_NAME>
# The SitesofInterest variable contains the list of sites of interest.

# create a directory to put all the output files. The directory will be where the Jupiter notebook
# is and the name of the directory is YYYY-MM-DD-MM-SS-<BIF file name that was read in last>
# get the current date and time and replace the colons with -
time = str(datetime.datetime.now().isoformat(sep='-', timespec='seconds'))
time = time.replace( ':', '-')

# create the new directory name
newdirectorypath = "./" + time + "-" + bif_file_prefix
print("DEBUG directory name", newdirectorypath)
# create the directory if it does not already exist
if not os.path.exists(newdirectorypath):
    os.makedirs(newdirectorypath)

# For each group type create a file in the directory
for grouptype in grouptypeslist:
    # printlabel is used to cause the column labels to be printed out. 
    printlabel = True
    count = 0
    # if we are printing to a file, build the file name and open the file
    # name the file YYYY-MM-DD-MM-SS-<the group type that we are printing out>.csv
    filename = newdirectorypath + "/"+ time + "-" + grouptype +  ".csv"
    
    # we will write to a file - first open it
    fout = open(filename, 'w')
    #loop through all of the groups that were in the BIF
    for gid, group in bif_groups.items():
        # if this is a group of the group type we are looking for
        if group.isgroup(grouptype) and group.getsiteid() in SitesofInterest:
            # print this group out in column format
            group.printlateral(printlabel, grouptypeslist[grouptype], True, fout)
            # turn the column labels off after the first time
            printlabel = False
            count += 1
            
    # now to close the file we opened            
    fout.close()
    if( count == 0):
        os.remove(filename)
        print( "File " + filename + " not written - no instances of " + grouptype + " in BIF for chosen sites")
    # print( "Filename "+ filename + " written and closed")
print("All group files written")

 <h2>Print all the Groups that Have a Particular Parameter</h2>
 This next section of code enables you to search through all of the groups that are in the BIF file. It will filter by the sites in SitesofInterest with a particular group type and specific parameter. In this case, the example is printing out all the GRP_TEAM_MEMBERs where the TEAM_MEMBER_ROLE is equal to PI. 

In [None]:
printlabel = True
# customize this by changing the next five lines
grouptype = 'GRP_TEAM_MEMBER' # the VARIABLE_TYPE we are looking for
param = 'TEAM_MEMBER_ROLE' # the parameter to search for
paramvalue = 'PI' # the parameter value to search for
printtofile = False # set to True if you want a file written, False if you do not want a file written

# if we will write to a file - first name it and then open it
if printtofile:
    # create the new file name where we want to write the results and name if BIF file name + group type + param
    filename = "./" + bif_file_prefix  + "-" + grouptype + "-" + paramvalue + ".csv"
    fout = open(filename, 'w')
else:
    fout = sys.stdout
# loop through all of the BIF groups in the file
for gid, group in bif_groups.items():
    # First to find all the grouptype groups (Tower team members)
    if group.isgroup(grouptype) and group.getsiteid() in SitesofInterest:
        # Now to just look for all the grouptype groups that are also for sites in our filter
        if group.hasparam( param, paramvalue ):
            # if this group is the param and paramvalue we are looking 
            # for then print out the group in a column oriented format below
            # the printlabel variable is True the first time through to print column headers
            group.printlateral(printlabel, grouptypeslist[grouptype], printtofile, fout)
            printlabel = False
# now to close the file if we opened it
if printtofile:
    fout.close()
    print( "Filename "+ filename + " written and closed")

<h2>Print all the groups of the group type specified</h2>
This next section of code enables you to search through all of the groups that are in the BIF file. It will filter by the sites in SitesofInterest. It is looking for a particular group type and specific parameter. In this case, the example is printing out all the GRP_TEAM_MEMBERs where the TEAM_MEMBER_ROLE is equal to PI. This function prints all the instances of the group specified in 'grouptype'. The output will be filtered by the SitesofInterest list defined above.

In [None]:
# printlabel is used to cause the column labels to be printed out. 
printlabel = True
# customize this by changing the next 3 lines
grouptype = 'GRP_DM_WATER'  # VARIABLE_GROUP we are looking for
printtofile = True   # Indicates whether to print to a file True will write to a file and False will write to the screen

if printtofile:
    # create the new file name where we want to write the results and name if BIF file name + group type 
    filename = "./" + bif_file_prefix  + "-" + grouptype + ".csv"

    # now open the file
    fout = open(filename, 'w')
else:
    fout = sys.stdout
    
count = 0
#loop through all of the groups that were in the BIF
for gid, group in bif_groups.items():
    # if this is a group of the group type we are looking for
    if group.isgroup(grouptype) and group.getsiteid() in SitesofInterest:
            # print this group out in column format
            group.printlateral(printlabel, grouptypeslist[grouptype], printtofile, fout)
            # turn the column labels off after the first time
            printlabel = False
            count += 1
            
# now to close the file if we opened it            
if printtofile:
    fout.close()
    if (count >0 ):
        print( "Filename "+ filename + " written and closed")
    else:
        os.remove(filename)
        print( "File " + filename + " not written - no instances of " + grouptype + " in BIF for chosen sites")


<h2>For a Particular Group Type Print the List of Sites With That Group Type</h2>
This segment of code below will loop through all of the groups in the BIF looking for the 'grouptype' matching the specified VARIABLE_GROUP and will create a list of sites with that group type and the count of how many instances of the group type each site has. The output will be filtered by the SitesofInterest list defined above.

In [None]:
# customize the line below to define the VARIABLE_GROUP desired
grouptype = "GRP_LAI" # VARIABLE_GROUP to search for
# create an empty list for holding the list of sites and the number of instances per site
sitelist = dict()
# loop through all of the groups that were in the BIF
for gid, group in bif_groups.items():
    # if this is the group type and passes the site filter
    if group.getgrouptype() == grouptype and group.getsiteid() in SitesofInterest:
        # remember the SITE_ID of this group and increment the count
        site = group.getsiteid()
        if site in sitelist:
            sitelist[site] += 1
        else:
            sitelist[site] = 1

# print below the sites reporting the group and the # of instances            
print( "List of sites reporting ", grouptype)
print("SITE_ID", "   ", "# of Instances of ", grouptype)
for site,instances in sorted(sitelist.items(), key=itemgetter(1), reverse = True):
    print( site, "\t", instances)


<h2>Plot the Values of a Parameter for a Group Type and Site</h2>
This code segment plots out the values of a parameter in a group type for a specific site. It will only work for numeric parameters.

In [None]:
# customize the next four lines to define the site, group type, and parameter to plot
site = "US-Ha1"   # the site to plot values for
grouptype = "GRP_LAI"   # the VARIABLE_GROUP to plot values for
paramy = "LAI_O_DEC"  # the parameter to plot on the y axis
paramx = "LAI_DATE"   # the parameter to plot on the x axis

# First to get all the x and y values as lists for the plot
xvals = []
yvals = []
# loop through all of the groups that were in the BIF
for gid, group in bif_groups.items():
    # if this is the grouptype and and a site we are looking for
    if group.getgrouptype() == grouptype and group.getsiteid() == site:
        # if it has the parameters we are looking for paramx, paramy
        if( group.hasparamtype(paramy) and group.hasparamtype(paramx)):
            xvals.append(int(group.parray[paramx]))
            yvals.append(float(group.parray[paramy]))

# now to plot the collected values
plt.plot(xvals, yvals, 'ro')
plt.show()