# Function sandbox

This notebook has test for code and functions that have been moved to a separate module.

### Batch upsert
<div class="alert alert-danger">
Use functions from module instead of this version
</div>
Define a function to batch process insert or update queries:

Just a test with random data, use `execute=False` to print the query:

### Functions to read records in a spreadsheet

<div class="alert alert-danger">
Use functions from module instead of this version
</div>

We need a wrapping function to apply a lower level function (`create_record_function`) to all rows in a `worksheet` of the selected `workbook` using a dictionary `col_dictionary`, we add a `**kwargs` to pass additional arguments to the lower level function:


#### Insert into field_site table

This function will transform create an insert records from one row of the spreadsheet (`item`) using a column dictionary (`sw`). 

We need to consider:
- geom might be single or multiple points
- projection (SRID) is UTM GDA zone 55 or 56, latlong WGS84, or a different format
- elevation in m, or NULL 
- GPS uncertainty in meters, or NULL
- text description of GPS location, or NULL

Test this function with one workbook:

In [14]:
worksheet='Site'
filename='UNSWFireVegResponse_UplandBasalt_AlexThomsen+DK.xlsx'
col_definitions={'site_label':0, 'location_description':10,'utm_zone':11, 'xs':(12,), 'ys':(13,),
                 'gps_uncertainty_m':14,
                 'gps_geom_description':17,
                 'elevation':38, 'visit_date':(2,4,5)}
survey="UplandBasalt"

records = import_records_from_workbook(inputdir,filename,worksheet,col_definitions,create_field_site_record) 

len(records)

28

Check details from one record:

In [15]:
records[12]

{'site_label': 'MWL03',
 'location_description': 'Wynnes Rock Lookout Road',
 'gps_geom_description': '30 m transect for woody plants >2m tall, with two 5x5m subplots at either end (subplots 1 & 2) with 20x5m subplot in middle (subplot 3); non-woody spp and woodplants <2m tall counted in the two 5x5m subplots',
 'geom': "ST_GeomFromText('POINT(256134 6288811)', 28356)"}

#### Insert into field_visits table

This function will create an insert record from one row of the spreadsheet (`item`) using a column dictionary (`sw`). 

We need to consider:
- iterate over multiple visit dates in different columns
- add survey name to the record
- text description of visit, or NULL
- observerlist to be split into multiple names (list or array)

In [16]:
def create_field_visit_record(item,sw):
    site_label = item[sw['site_label']].value
    records = list()
    for k in sw['visit_date']:
        visit_date = item[k].value
        if site_label is not None and site_label != "Site":
            if isinstance(visit_date, datetime):
                record = {'visit_id': site_label, 'visit_date': visit_date}
                if 'survey' in sw.keys():
                    record['survey_name'] = sw['survey']
                for column in ('visit_description', 'mainobserver', 'observerlist','replicate_nr'):
                    if column in sw.keys():
                        val=item[sw[column]].value
                        if val is not None and val not in ('na','NA'):
                            if column=='observerlist':
                                val=val.split(',')
                            record[column] =  val
                records.append(record)
    return records

Test of the function with one workbook/worksheet:

In [17]:
worksheet='Site'
filename='UNSWFireVegResponse_UplandBasalt_AlexThomsen+DK.xlsx'
col_definitions={'site_label':0, 'location_description':10,'utm_zone':11, 'xs':(12,), 'ys':(13,),
                 'gps_uncertainty_m':14,
                 'gps_geom_description':17,
                 'observerlist':3,
                 'elevation':38, 'visit_date':(2,4,5,6,7,8,9),
                 'survey':"UplandBasalt"}

records = import_records_from_workbook(inputdir,filename,worksheet,col_definitions,create_field_visit_record) 

len(records)

42

In [18]:
records[21]

{'visit_id': 'MWL04',
 'visit_date': datetime.datetime(2000, 12, 14, 0, 0),
 'survey_name': 'UplandBasalt',
 'observerlist': ['Alexandria Thomsen', ' Stephan Wilson']}

### Import records to database
I create another function that will call the above functions to process data from a workbook into records that are then imported into the database.

This functions was renamed to `import_site_and_visit_records`.

This function passes the keyword arguments `**kwargs` to the next functions. This works, because the structure of both the `create_record_function`s is similar and we can define the column correspondence in the same dictionary as we will see in the examples below:


#### Create field sample records

This is a lower level function that will create a field sample record from an `item` (a row in the spreadsheet), using the dictionary or "switch" in `sw`:

In [12]:
def create_field_sample_record(item,sw):
    visit_id=item[sw['visit_id']].value
    if visit_id is not None and visit_id not in ('Site Number'):
        if 'replicate_nr' in sw.keys():
            replicatenr = item[sw['replicate_nr']].value
        elif 'fixed_replicate_nr' in sw.keys():
            replicatenr = sw['fixed_replicate_nr']
        else:
            replicatenr = None
        samplenr = item[sw['sample_nr']].value
        record={'visit_id': visit_id, 'replicate_nr': replicatenr, 'sample_nr': samplenr}
        if 'date' in sw.keys():
            visit_date = item[sw['date']].value 
            if isinstance(visit_date,datetime):
                record['visit_date'] = visit_date.date()

        
        return(record)

#### Create quadrat sample records

This is a lower level function that will create a quadrat sample record from an `item` (a row in the spreadsheet), using the dictionary or "switch" in `sw`. It uses a lookup table to fill information about the visit, and the vocabularies for seedbank and regenerative organ to translate raw values of these variables:

In [1]:
def create_quadrat_sample_record(item,sw,lookup,valid_seedbank,valid_organ):
    species = item[sw['species']].value
    spcode = item[sw['spcode']].value
    visit_id =  item[sw['visit_id']].value
    if species is not None:
        record={'visit_id': visit_id, 'sample_nr': item[sw['sample_nr']].value,
                'species': species}
        comms=list()
        if 'workbook' in sw.keys():
            comms.append("Imported from workbook %s using python script" % sw['workbook'])
        if 'worksheet' in sw.keys():
            comms.append("Imported from spreadsheet %s" % sw['worksheet'])
    
        if 'date' in sw.keys():
            visit_date = item[sw['date']].value
        else:
            visit_date = None
            
        if 'replicate_nr' in sw.keys():
            replicate_nr = item[sw['replicate_nr']].value
        elif 'fixed_replicate_nr' in sw.keys():
            replicate_nr = sw['fixed_replicate_nr']
        
        if isinstance(visit_date,datetime):
            record['visit_date'] = visit_date.date()
        else:    
            p=filter(lambda n: n['visit_id'] == visit_id and  n['replicate_nr'] == replicate_nr, lookup)
            found=list(p)
            if len(found)==1 and 'visit_date' in found[0].keys():
                visit_date=found[0]['visit_date']
                if isinstance(visit_date,datetime):
                    record['visit_date'] = visit_date.date()
                    comms.append("visit date not provided, matched by replicate nr %s" % replicate_nr)
                else:
                    record['visit_date'] = visit_date
                    comms.append("matched by replicate nr %s, assuming date object" % replicate_nr)
            else:
                comms.append("neither visit date nor replicate nr was matched ( replicate nr %s ), no date" % replicate_nr)

        if (isinstance(spcode, str) and spcode.isnumeric()) or isinstance(spcode,int):
            record['species_code']=spcode
         
        for k in ('species_notes', 'resprout_organ', 'seedbank', 'adults_unburnt', 'resprouts_live', 'resprouts_died', 'resprouts_kill', 'resprouts_reproductive',
                  'recruits_live', 'recruits_reproductive', 'recruits_died','notes'):
            if k in sw.keys():
                vals=item[sw[k]].value
                if vals is not None and vals not in ('na','NA'):
                    if k == 'resprout_organ':
                        if vals in valid_organ:
                            record[k]=vals
                        elif vals.capitalize() in valid_organ:
                            record[k]=vals.capitalize()
                        else:
                            comms.append("resprout organ written as %s" % vals)
                    elif k == 'seedbank':
                        if vals in valid_seedbank:
                            record[k]=vals
                        elif vals.capitalize() in valid_seedbank:
                            record[k]=vals.capitalize()
                        else:
                            comms.append("seedbank written as %s" % vals)
                    elif k == 'notes':
                        if isinstance(vals,(int, float, complex)):
                            comms.append("Comment column included a numeric value of %s" % vals)
                        else:
                            comms.append(vals)
                    elif k in ('adults_unburnt', 'resprouts_live', 'resprouts_died', 'resprouts_kill', 'resprouts_reproductive',
                  'recruits_live', 'recruits_reproductive', 'recruits_died'):
                        if isinstance(vals,int):
                            record[k]=vals   
                        else:
                            comms.append("%s written as %s" % (k,vals))
                    else:
                        record[k]=vals        
        
        if len(comms)>0:
            record["comments"]=comms
        
        return(record)

#### Validate and update site/quadrat records to database

This function filters a list of `records` to find unique records and then validate them against the information in table `field_visit` (visit_id, visit_date and replicate_nr). Any valid but missing records are inserted in table `field_visit` and the samples are inserted in table `field_sample`.


In [2]:
def validate_and_update_site_records(records, useconn=None):
    if useconn is None:
        # connect to the PostgreSQL server
        print('Connecting to the PostgreSQL database...')
        conn = psycopg2.connect(**params)
    else:
        conn = useconn
    conn = psycopg2.connect(**params)

    cur = conn.cursor(cursor_factory=DictCursor)
    unique_records = list()
    sites = list()
    for record in records:
        if record not in unique_records:
            unique_records.append(record)
            if record['visit_id'] not in sites:
                sites.append(record['visit_id'])
    #alternative
    #from psycopg2 import sql
    #qry= sql.SQL('SELECT DISTINCT visit_id,visit_date,replicate_nr FROM form.field_visit WHERE visit_id IN ({}) ORDER by visit_id, visit_date;').format(
    #    sql.SQL(',').join(map(sql.Literal, sites))
    #)
    qryvisits= cur.mogrify('SELECT DISTINCT visit_id,visit_date,replicate_nr FROM form.field_visit WHERE visit_id IN %s ORDER by visit_id, visit_date;',(tuple(sites),))
    cur.execute(qryvisits)
    ##print(qry)
    visits = cur.fetchall()
    updated_rows=0
    for record in unique_records:
        if any(d['visit_id'] == record['visit_id'] for d in visits):
            if 'visit_date' in record.keys():
                p=filter(lambda n: n['visit_id'] == record['visit_id'] and  n['visit_date'] == record['visit_date'], visits)
                found=list(p)
                record['found']=len(found)
            elif 'replicate_nr' in record.keys():
                p=filter(lambda n: n['visit_id'] == record['visit_id'] and  n['replicate_nr'] == record['replicate_nr'], visits)
                found=list(p)
                #print(found)
                record['found']=len(found)
                if (len(found)>0):
                    record['visit_date']=found[0][1]
   
            if 'visit_date' in record.keys():
                cur.execute('INSERT INTO form.field_visit(visit_id,visit_date) values %s ON CONFLICT DO NOTHING',
                            (tuple([record['visit_id'],record['visit_date']]),))
                if cur.rowcount > 0:
                    updated_rows = updated_rows + cur.rowcount
                cur.execute('INSERT INTO form.field_samples(visit_id,visit_date,sample_nr) values %s ON CONFLICT DO NOTHING',
                        (tuple([record['visit_id'],record['visit_date'],record['sample_nr']]),))
                if cur.rowcount > 0:
                    updated_rows = updated_rows + cur.rowcount        
            else:
                print("record for %s is incomplete" % record['visit_id'])
        else:
            print("%s not found" % record['visit_id'])
            record['found']=0

    print("%s rows updated" % updated_rows)
    conn.commit()
    
    cur.execute(qryvisits)
    ##print(qry)
    updated_visits = cur.fetchall()

    cur.close()

    if useconn is None and conn is not None:
        conn.close()
        print('Database connection closed.')
    return(updated_visits)

