# Biomass Field Data Clean v3

#### env = zonal

This notebook reads in the field data csv (all site data listed by rows) of basal wedge counts within a site.
The notebook, cleans the file by dropping null easting values, filling in the uid column based on the index and fills any missing datum value with WGS84 (data with missing values was collected in 2013).
This function filters the dataframe based on datum and zone information returning four df.
This function Converts each dataframe to geographics GDA94 and returns a list of active df.
This function creates a latitude and longitude column and insert the relevant gda94 coordinates and removes other existing coordinate columns.

This function returns a list of column names that includes the substring (var_) 

This function transfers iterrow data to output df.

This function calculated site proportions and returns df with alive proportions ‘alv_prop’, dead proportions ‘ded_prop’ and total proportions ‘total_prop’.
Exports:
-	Initial_biomass_cleaned.csv


In [1]:
import pandas as pd
import geopandas as gpd
import os

In [2]:
dir_ = r"F:\cdu\data"
output_dir = r"F:\cdu\data\output"

In [3]:
os.listdir(dir_)

['obs_sheets',
 'output',
 '2009_2013_sites.xlsx',
 'tree_biomass_field_data_20220917.csv',
 'zonal_stats',
 'tern_data']

In [4]:
# original filed data test (13 sites)
#csv_ = os.path.join(dir_, "tree_biomass_field_data_copy.csv")
# merged csv abova and obs sheet 2013
csv_ = os.path.join(dir_, "tree_biomass_field_data_20220917.csv")

In [5]:
#csv_ = "\\pgb-bas01\DENR_Satellite_Imagery$\Scratch\Rob\tern\tree_biomass_field_data\biomass_carbon\tree_biomass_field_data_copy.csv"

In [9]:
#df = pd.read_excel(csv_, sheet_name="tree_biomass_field_data", encoding='windows-1252')
df = pd.read_csv(csv_, encoding='windows-1252',delimiter='\s+')

ParserError: Error tokenizing data. C error: Expected 244 fields in line 4, saw 316


In [7]:
df.shape

(46, 1)

In [8]:
print(len(df.site.unique()))

AttributeError: 'DataFrame' object has no attribute 'site'

In [12]:
def init_clean(df):
    
    """ This function cleans the file by dropping null easting values, 
    filling in the uid column based on the index and fills any missing datum value with 
    WGS84 (data with missing values was collected in 2013)."""
    
    # fill in uid column
    df['uid'] = df.index + 1
    # drop any value where an easting is missing (i.e. no coordinates)
    df.dropna(axis=0, subset=['easting'], inplace= True)
    
    # fill in datum to WGS84 - due to data collection in 2012 2013
    df['datum'] = df['datum'].fillna('WGS84')
    
    return df
     

In [13]:
clean_df = init_clean(df)

KeyError: ['easting']

In [11]:
clean_df.shape

(34, 244)

In [12]:
def filter_dataframe(df):
    
    """ This function filters the dataframe based on datum and zone information returning four df. """
    
    wgs = df[df['datum']== 'WGS84']
    gda = df[df['datum']== 'GDA94']

    wgs52 = wgs[wgs['zone'] == 52.0]
    wgs53 = wgs[wgs['zone'] == 53.0]

    gda52 = gda[gda['zone'] == 52.0]
    gda53 = gda[gda['zone'] == 53.0]
    
    return [gda52, gda53, wgs52, wgs53]

In [13]:
df_list = filter_dataframe(df)

In [14]:
def convert_gdf(df_list, epsg_list):
    
    """ This function Converts each dataframe to geographics GDA94 and returns a list of active df. """
    
    gdf_list = []
    for df, epsg in zip(df_list, epsg_list):
        gdf = gpd.GeoDataFrame(
        df, geometry=gpd.points_from_xy(df.easting, df.northing))

        gdf = gdf.set_crs(epsg= epsg) #'epsg:{0}'.format(str(epsg)))

        gdf = gdf.to_crs(4283)

        print(gdf.crs)
        gdf_list.append(gdf)
          
    return gdf_list

In [15]:
gdf_list = convert_gdf(df_list, [28352, 28353, 32752, 32753])

epsg:4283
epsg:4283
epsg:4283
epsg:4283


In [16]:
def concat_and_clean(gdf_list):
    """ This function creates a latitude and longitude column and 
    insert the relevant gda94 coordinates and removes other existing coordinate columns. """
    
    final = pd.concat(gdf_list)
    
    final['lon_gda94'] = final.geometry.x
    final['lat_gda94'] = final.geometry.y

    del final['zone']
    del final['easting']
    del final['northing']
    del final['lattitude']
    del final['longitude']
    
    final['datum'] = 'GDA94'
    
    return final
    

In [17]:
clean_gdf = concat_and_clean(gdf_list)
print(clean_gdf.shape)
print(len(clean_gdf.site.unique()))

(34, 242)
34


##### Export to csv

In [18]:
clean_df_ = pd.DataFrame(clean_gdf)
clean_df = clean_df_.reset_index(drop=True)
# fill in uid column
clean_df['uid'] = clean_df.index + 1
clean_df

Unnamed: 0,ï»¿notes,uid,site,date,am_date,datum,factor,count,cent_sp01,cent_l_sp01,...,nw_d_sp09,nw_sp10,nw_l_sp10,nw_d_sp10,nw_sp11,nw_l_sp11,nw_d_sp11,geometry,lon_gda94,lat_gda94
0,,1,girra02,5/6/2012,20120605.0,GDA94,0.1,7.0,Melaleuca nervoa,23.0,...,,,,,,,,POINT (131.13200 -12.52341),131.132,-12.523407
1,,2,jdr01,22/05/2012,20120522.0,GDA94,0.75,7.0,E. miniata,26.0,...,,,,,,,,POINT (131.59970 -13.88379),131.5997,-13.883786
2,,3,jdr02,23/05/2012,20120523.0,GDA94,0.75,7.0,E. tetrodonta,29.0,...,,,,,,,,POINT (131.60025 -13.85727),131.600246,-13.857266
3,,4,jdr03,23/05/2012,20120523.0,GDA94,0.25,7.0,Syzygium sp.,37.0,...,,,,,,,,POINT (131.56791 -13.84688),131.567909,-13.846884
4,Juvinale eucalytpus,5,jdr04,23/05/2012,20120523.0,GDA94,0.25,7.0,Species 1,15.0,...,,,,,,,,POINT (131.59773 -13.97467),131.597729,-13.974669
5,grassland,6,jdr05,24/05/2012,20120524.0,GDA94,0.0,7.0,,,...,,,,,,,,POINT (147.66450 -82.04603),147.664501,-82.046029
6,,7,auver03,10/10/2012,20121010.0,GDA94,0.1,7.0,Bauhinia cunninghamii,8.0,...,,,,,,,,POINT (129.97898 -15.71043),129.978985,-15.710431
7,basal factor not recorded,8,auver04,10/10/2012,20121010.0,GDA94,0.1,7.0,Bauhinia cunninghamii,4.0,...,,,,,,,,POINT (129.81524 -15.82118),129.815235,-15.821182
8,,9,auver05,11/10/2012,20121011.0,GDA94,0.1,7.0,,,...,,,,,,,,POINT (129.82772 -15.86185),129.827717,-15.861849
9,no trees,10,auver06,11/10/2012,20121011.0,GDA94,0.1,7.0,,,...,,,,,,,,POINT (129.68501 -16.00775),129.68501,-16.007755


In [22]:
clean_df = pd.DataFrame(clean_gdf)
clean_df.to_csv(os.path.join(output_dir, "initial_biomass_cleaned_v4.csv"), index=False)

In [23]:
def search_for_names(gdf, var_):
    """ This function returns a list of column names that includes the substring (var_) """
    var_cols = [col for col in gdf.columns if var_ in col]

    return var_cols

In [24]:
var_cols =search_for_names(clean_gdf, "sp")

In [25]:
# define a list of column header basal positions
pos_list = ["cent", "north", "south", "ne", "se", "nw", "sw"]

In [33]:
def add_columns(df2, row):
    
    """ This function transfers iterrow data to output df. """
    
    df2['site'] = row['site']
    df2['uid'] =  row['uid']
    df2['date'] = str(row['am_date'])
    df2['factor'] = row['factor']
    df2['loc_count'] = row['count']
    
#     # replace 9999.0 from factor
#     fact_ = row['factor']
    
    
#     if fact_ == 9999.0:
#         print("fact_: ", fact_)
#         factor_ = 0.0
#         print("factor_: ", factor_)  
#     else:
#         factor_ == fact_
      
#     df2['factor'] = factor_
#     df2['factor'] = row['factor']
    df2['geometry'] = row['geometry']
    df2['lon_gda94'] = row['lon_gda94']
    df2['lat_gda94'] = row['lat_gda94']
    
    return df2

In [34]:
clean_gdf.columns
clean_gdf.shape

(34, 242)

In [35]:
import numpy as np
clean_gdf.fillna(9999, inplace=True)

In [36]:
def prop(df):
    
    """ This function calculated site proportions and returns df with alive proportions 
    ‘alv_prop’, dead proportions ‘ded_prop’ and total proportions ‘total_prop’. """
    
    print("--------------------------------------PROPORTIONS-----------------------------------")
    print(df)
    al_prop = []
    ded_prop = []
    total_prop = []
    tot_alive = df.alive.sum()
    print("tot_alive: ", tot_alive)

    tot_dead = df.dead.sum()
    print("tot_dead: ", tot_dead)
    tot_all = tot_alive + tot_dead
    print("tot_all: ", tot_all)
    for index, row in df.iterrows():
        print("-"*50)
        print("alive proportion: ", (row["alive"]/tot_alive)*100)
        al_prop.append((row["alive"]/tot_alive)*100)
        print("dead proportion: ", (row["dead"]/tot_dead)*100)
        ded_prop.append((row["dead"]/tot_dead)*100)
        print("total proportion: ", ((row["alive"] + row["dead"])/(tot_alive + tot_dead)*100))
        
        total_prop.append((row["alive"] + row["dead"])/(tot_alive + tot_dead)*100)
        
    df["alv_prop"] = al_prop
    df["ded_prop"] = ded_prop
    df["total_prop"] = total_prop
    
    return df
        

In [37]:
df_list = []
index_list = []

site_name_list = []
site_count_list = []
site_average_alive_list = []
site_average_dead_list = []
site_average_total_list = []

for index, row in clean_gdf.iterrows():
#     print(index)
#     print(row)
    row.dropna(inplace=True)
    site = row["site"]
    count = row["count"]
    uid = row["uid"]
    print("="*50)
    print(site)
    print("count: ", count)
    site_list = []
    index_list.append(site_list)
    species = []
    alive_count = []
    dead_count = []
    
    site_list = []
    count_list = []
    site_pos_al = []
    site_pos_ded = []
    site_pos_tot = []
    
    for pos in pos_list:
        pos_species = []
        pos_sp_live = []
        pos_sp_dead = []
        position_list = []
        
        pos_cols = [col for col in clean_df.columns if pos in col]
#         print("pos_cols: ", pos_cols)
        for cols in pos_cols:
#             print("="*50)
#             print(cols)
            if "{0}_sp".format(pos) in cols:
#                 print("species")

                if(row[cols] == "9999"):
                    pass
                    #print('true')
                elif(row[cols] == 9999):
                    pass
                    #print('true')
                elif(row[cols] == 9999.0):
                    pass
                    #print('true')
                else:
                    print("Valid species: ", row[cols])
                    species.append(row[cols])
                    pos_species.append(row[cols])
                    
                    
                                        
            elif "{0}_l".format(pos) in cols:
                #print("live")
                live = row[cols]
                
                if(row[cols] == "9999"):
                    pass
                    #print('true')
                elif(row[cols] == 9999):
                    pass
                    #print('true')
                elif(row[cols] == 9999.0):
                    pass
                    #print('true')
                else:
                    print("Valid alive: ", row[cols])
                    alive_count.append(row[cols])
                    pos_sp_live.append(row[cols])
                    
                
                    
            elif "{0}_d".format(pos) in cols:
                #print("dead")
                dead = row[cols]
                
                
                if(row[cols] == "9999"):
                    pass
                    #print('true')
                elif(row[cols] == 9999):
                    pass
                    #print('true')
                elif(row[cols] == 9999.0):
                    pass
                    #print('true')
                else:
                    print("Valid dead: ", row[cols])
                    dead_count.append(row[cols])
                    pos_sp_dead.append(row[cols])
                    
            else:
                print("ERROR"*100)

            
        sum_al = sum(pos_sp_live)
        sum_dead = sum(pos_sp_dead)
        sum_all = sum_al + sum_dead
        #print("sum_all: ", sum_all)
        #print(site)
        #print(count)
        site_list.append(site)
        count_list.append(count)
        site_pos_al.append(sum_al)
        site_pos_ded.append(sum_dead)
        site_pos_tot.append(sum_all)

        print("position totals: ", sum_al, sum_dead, sum_all)
        print("-"*50)
    print("Total basal count locations within site: ", count)
    print("List of ALIVE position totals stems: ", site_pos_al)
    print("Totals at: ", sum(site_pos_al))
    print("Equals: ", sum(site_pos_al)/count)


    print("List of DEAD position totals stems: ", site_pos_ded)
    print("Totals at: ", sum(site_pos_ded))
    print("Equals: ", sum(site_pos_ded)/count)
    

    print("List of TOTAL position totals stems: ", site_pos_tot)
    print("Totals at: ", sum(site_pos_tot))
    print("Equals: ", sum(site_pos_tot)/count)
    
    site_name_list.append(site)
    site_count_list.append(count)
    site_average_alive_list.append(sum(site_pos_al)/count)
    site_average_dead_list.append(sum(site_pos_ded)/count)
    site_average_total_list.append(sum(site_pos_tot)/count)

    
    print('='*20)
    
    df = pd.DataFrame(list(zip(species, alive_count, dead_count)),
           columns =['species', 'alive', 'dead'])
    


    print("Groupby Species: ")
    # groupby species list
    df2 = df.groupby('species').sum()
    df3 = df2.reset_index()
    print("*"*50)
    print("groupby dataframe: ", df3)
    #print("df columns: ", df3.columns)
    
    # calculate strand species proprtions
    df3 = prop(df3)
    print("-"*50, df3)
    # add columns and fill
    df3 = add_columns(df3, row)
    
    
    print("-"*50, df3)
    if df3.empty:
        print("+"*50)
        print("Data frame is Empty")
        
        empty_header_list = ["species", "alive", "dead", "alv_prop", "ded_prop", "total_prop"]
        #,  'site', 'uid', 'date', 'factor', 'loc_count', 'geometry', 'lon_gda94', 'lat_gda94']
        empty_data_list = ["None", 0.0, 0.0, 0.0, 0.0, 0.0] #, site, uid, date, factor, loc_count, geometry, lon_gda94, lat_gda_94]
        
        # convert the list into dataframe row
        data = pd.DataFrame(empty_data_list).T
 
        # add columns
        data.columns = empty_header_list

        df3 = data
        df3 = add_columns(df3, row)
        
        print("+"*50)
        print("newdf3: ", df3)
    else:
        print("*"*50)
        print("Data frame is NOT Empty")
        df3 = add_columns(df3, row)
        df3_columns = df3.columns.tolist()
        print(df3_columns)
        print("*"*50) 
    
    df_list.append(df3)
   

# site_name_list = []
# site_count_list = []
# site_average_alive_list = []
# site_average_dead_list = []
# site_average_total_list = []

print("+"*50)
print(site_name_list)
print(site_average_alive_list)

site_average_alive_list
site_average_dead_list
site_average_total_list
print("-"*50)

# create lists of usefull data
factor_list = clean_df['factor'].tolist()
print("Factor list: ", factor_list)

factor = [0.0 if item == 9999.0 else item for item in factor_list]
print("Factor: ", factor)


## Convert averages by timing the site average strand counts by the basl wedge factor
cor_site_average_alive_list = [i*fac for i, fac in zip(site_average_alive_list, factor)]
cor_site_average_dead_list = [i*fac for i, fac in zip(site_average_dead_list, factor)]
cor_site_average_total_list = [i*fac for i, fac in zip(site_average_total_list, factor)]


df_av = pd.DataFrame(list(zip(site_name_list, factor, site_count_list, site_average_alive_list, site_average_dead_list, site_average_total_list,
                             cor_site_average_alive_list, cor_site_average_dead_list, cor_site_average_total_list)),
       columns =['site', 'factor', 'count', 'avg_alive', 'avg_dead', 'avg_total', 'cor_av_liv', 'cor_av_ded', 'cor_av_tot'])
print("DF_AV", df_av)

girra02
count:  7.0
Valid species:  Melaleuca nervoa
Valid alive:  23.0
Valid dead:  0.0
Valid species:  Lophostemon lactifluus
Valid alive:  1.0
Valid dead:  0.0
Valid species:  P. spiralis
Valid alive:  2.0
Valid dead:  0.0
Valid species:  Eucalyptus alba
Valid alive:  1.0
Valid dead:  0.0
position totals:  27.0 0.0 27.0
--------------------------------------------------
Valid species:  Melaleuca nervoa
Valid alive:  20.0
Valid dead:  5.0
Valid species:  P. spiralis
Valid alive:  1.0
Valid dead:  1.0
Valid species:  Eucalyptus alba
Valid alive:  2.0
Valid dead:  0.0
position totals:  23.0 6.0 29.0
--------------------------------------------------
Valid species:  Melaleuca nervoa
Valid alive:  3.0
Valid dead:  8.0
Valid species:  Lophostemon lactifluus
Valid alive:  4.0
Valid dead:  2.0
Valid species:  P. spiralis
Valid alive:  2.0
Valid dead:  0.0
Valid species:  Eucalyptus alba
Valid alive:  4.0
Valid dead:  0.0
position totals:  13.0 10.0 23.0
-------------------------------------




                   species  alive  dead
0                Buchorvet   11.0   0.0
1           C. armstrongii    3.0   0.0
2               C. fraseri    2.0   0.0
3             Corymbia sp.    1.0   0.0
4         E. chlorostachys   38.0   2.0
5               E. miniata   18.0   0.0
6            E. tetrodonta   98.0   4.0
7              F. achleata    3.0   0.0
8               L. humilis    5.0   0.0
9                P. careya    6.0   0.0
10            P. pubescens    1.0   0.0
11             P. spiralis   48.0   4.0
12  Syzygium suborbiculare    8.0   0.0
tot_alive:  242.0
tot_dead:  10.0
tot_all:  252.0
--------------------------------------------------
alive proportion:  4.545454545454546
dead proportion:  0.0
total proportion:  4.365079365079365
--------------------------------------------------
alive proportion:  1.2396694214876034
dead proportion:  0.0
total proportion:  1.1904761904761905
--------------------------------------------------
alive proportion:  0.8264462809917356
dead

--------------------------------------------------                      species  alive  dead   alv_prop   ded_prop  total_prop
0         Buchanania obovata    1.0   0.0   0.452489   0.000000    0.420168
1         Corymbia polycarpa   23.0   0.0  10.407240   0.000000    9.663866
2        Corymbia polysciada   32.0   0.0  14.479638   0.000000   13.445378
3           E. chlorostachys   67.0   0.0  30.316742   0.000000   28.151261
4                 E. miniata   17.0   2.0   7.692308  11.764706    7.983193
5              E. tetrodonta   19.0   0.0   8.597285   0.000000    7.983193
6     Grevillea pteridifolia    3.0   0.0   1.357466   0.000000    1.260504
7                 L. humilis   43.0   0.0  19.457014   0.000000   18.067227
8     Lophostemon lactifluus    9.0   0.0   4.072398   0.000000    3.781513
9                P. spiralis    3.0   0.0   1.357466   0.000000    1.260504
10  Terminalia ferdinandiana    2.0   0.0   0.904977   0.000000    0.840336
11                      dead    2.0  

8  Terminalia carpentariae   11.0   0.0   5.418719   0.000000    5.092593
--------------------------------------------------                    species  alive  dead   alv_prop   ded_prop  total_prop  \
0       Acacia aulacocarpa   17.0   5.0   8.374384  38.461538   10.185185   
1       Buchanania obovata   11.0   0.0   5.418719   0.000000    5.092593   
2               E. miniata   92.0   8.0  45.320197  61.538462   46.296296   
3            E. tetrodonta   26.0   0.0  12.807882   0.000000   12.037037   
4      Grevillea agrifolia    1.0   0.0   0.492611   0.000000    0.462963   
5                P. careya   17.0   0.0   8.374384   0.000000    7.870370   
6        Persoonia falcata   11.0   0.0   5.418719   0.000000    5.092593   
7   Petalostigma pubescens   17.0   0.0   8.374384   0.000000    7.870370   
8  Terminalia carpentariae   11.0   0.0   5.418719   0.000000    5.092593   

     site  uid        date  factor  loc_count  \
0  legu07   36  20121008.0    0.25        7.0   
1  leg

In [38]:
df_av.to_csv(os.path.join(output_dir, "av_test_v4.csv"), index=False)

# Original

In [39]:
# df_list = []
# index_list = []

# site_name_list = []
# site_count_list = []
# site_average_alive_list = []
# site_average_dead_list = []
# site_average_total_list = []

# for index, row in clean_gdf.iterrows():
# #     print(index)
# #     print(row)
#     row.dropna(inplace=True)
#     site = row["site"]
#     count = row["count"]
#     uid = row["uid"]
#     print("="*50)
#     print(site)
#     print("count: ", count)
#     site_list = []
#     index_list.append(site_list)
#     species = []
#     alive_count = []
#     dead_count = []
    
#     site_list = []
#     count_list = []
#     site_pos_al = []
#     site_pos_ded = []
#     site_pos_tot = []
    
#     for pos in pos_list:
#         pos_species = []
#         pos_sp_live = []
#         pos_sp_dead = []
#         position_list = []
        
#         pos_cols = [col for col in clean_df.columns if pos in col]
# #         print("pos_cols: ", pos_cols)
#         for cols in pos_cols:
# #             print("="*50)
# #             print(cols)
#             if "{0}_sp".format(pos) in cols:
# #                 print("species")

#                 if(row[cols] == "9999"):
#                     pass
#                     #print('true')
#                 elif(row[cols] == 9999):
#                     pass
#                     #print('true')
#                 elif(row[cols] == 9999.0):
#                     pass
#                     #print('true')
#                 else:
#                     print("Valid species: ", row[cols])
#                     species.append(row[cols])
#                     pos_species.append(row[cols])
                    
                    
                                        
#             elif "{0}_l".format(pos) in cols:
#                 #print("live")
#                 live = row[cols]
                
#                 if(row[cols] == "9999"):
#                     pass
#                     #print('true')
#                 elif(row[cols] == 9999):
#                     pass
#                     #print('true')
#                 elif(row[cols] == 9999.0):
#                     pass
#                     #print('true')
#                 else:
#                     print("Valid alive: ", row[cols])
#                     alive_count.append(row[cols])
#                     pos_sp_live.append(row[cols])
                    
                
                    
#             elif "{0}_d".format(pos) in cols:
#                 #print("dead")
#                 dead = row[cols]
                
                
#                 if(row[cols] == "9999"):
#                     pass
#                     #print('true')
#                 elif(row[cols] == 9999):
#                     pass
#                     #print('true')
#                 elif(row[cols] == 9999.0):
#                     pass
#                     #print('true')
#                 else:
#                     print("Valid dead: ", row[cols])
#                     dead_count.append(row[cols])
#                     pos_sp_dead.append(row[cols])
                    
#             else:
#                 print("ERROR"*100)

            
#         sum_al = sum(pos_sp_live)
#         sum_dead = sum(pos_sp_dead)
#         sum_all = sum_al + sum_dead
#         #print("sum_all: ", sum_all)
#         #print(site)
#         #print(count)
#         site_list.append(site)
#         count_list.append(count)
#         site_pos_al.append(sum_al)
#         site_pos_ded.append(sum_dead)
#         site_pos_tot.append(sum_all)

#         print("position totals: ", sum_al, sum_dead, sum_all)
#         print("-"*50)
#     print("Total basal count locations within site: ", count)
#     print("List of ALIVE position totals stems: ", site_pos_al)
#     print("Totals at: ", sum(site_pos_al))
#     print("Equals: ", sum(site_pos_al)/count)


#     print("List of DEAD position totals stems: ", site_pos_ded)
#     print("Totals at: ", sum(site_pos_ded))
#     print("Equals: ", sum(site_pos_ded)/count)
    

#     print("List of TOTAL position totals stems: ", site_pos_tot)
#     print("Totals at: ", sum(site_pos_tot))
#     print("Equals: ", sum(site_pos_tot)/count)
    
#     site_name_list.append(site)
#     site_count_list.append(count)
#     site_average_alive_list.append(sum(site_pos_al)/count)
#     site_average_dead_list.append(sum(site_pos_ded)/count)
#     site_average_total_list.append(sum(site_pos_tot)/count)

    
#     print('='*20)
    
#     df = pd.DataFrame(list(zip(species, alive_count, dead_count)),
#            columns =['species', 'alive', 'dead'])

#     print("Groupby Species: ")
#     # groupby species list
#     df2 = df.groupby('species').sum()
#     df3 = df2.reset_index()
#     print("*"*50)
#     print("groupby dataframe: ", df3)
#     #print("df columns: ", df3.columns)
    
#     # calculate strand species proprtions
#     df3 = prop(df3)
#     print("-"*50, df3)
#     # add columns and fill
#     df3 = add_columns(df3, row)
    
    
#     print("-"*50, df3)
#     df_list.append(df3)
   

# # site_name_list = []
# # site_count_list = []
# # site_average_alive_list = []
# # site_average_dead_list = []
# # site_average_total_list = []

# print("+"*50)
# print(site_name_list)
# print(site_average_alive_list)

# site_average_alive_list
# site_average_dead_list
# site_average_total_list
# print("-"*50)

# # create lists of usefull data
# factor_list = clean_df['factor'].tolist()
# print("Factor list: ", factor_list)

# factor = [0.0 if item == 9999.0 else item for item in factor_list]
# print("Factor: ", factor)


# ## Convert averages by timing the site average strand counts by the basl wedge factor
# cor_site_average_alive_list = [i*fac for i, fac in zip(site_average_alive_list, factor)]
# cor_site_average_dead_list = [i*fac for i, fac in zip(site_average_dead_list, factor)]
# cor_site_average_total_list = [i*fac for i, fac in zip(site_average_total_list, factor)]


# df_av = pd.DataFrame(list(zip(site_name_list, factor, site_count_list, site_average_alive_list, site_average_dead_list, site_average_total_list,
#                              cor_site_average_alive_list, cor_site_average_dead_list, cor_site_average_total_list)),
#        columns =['site', 'factor', 'count', 'avg_alive', 'avg_dead', 'avg_total', 'cor_av_liv', 'cor_av_ded', 'cor_av_tot'])
# print("DF_AV", df_av)

In [40]:
print(df_av.shape)
print(len(df_av.site.unique()))

(34, 9)
34


In [41]:
df_av

Unnamed: 0,site,factor,count,avg_alive,avg_dead,avg_total,cor_av_liv,cor_av_ded,cor_av_tot
0,girra02,0.1,7.0,20.714286,3.428571,24.142857,2.071429,0.342857,2.414286
1,jdr01,0.75,7.0,23.285714,0.142857,23.428571,17.464286,0.107143,17.571429
2,jdr02,0.75,7.0,28.857143,1.0,29.857143,21.642857,0.75,22.392857
3,jdr03,0.25,7.0,33.714286,0.857143,34.571429,8.428571,0.214286,8.642857
4,jdr04,0.25,7.0,11.285714,0.285714,11.571429,2.821429,0.071429,2.892857
5,jdr05,0.0,7.0,0.0,0.0,0.0,0.0,0.0,0.0
6,auver03,0.1,7.0,10.428571,0.0,10.428571,1.042857,0.0,1.042857
7,auver04,0.1,7.0,5.142857,0.285714,5.428571,0.514286,0.028571,0.542857
8,auver05,0.1,7.0,0.571429,0.0,0.571429,0.057143,0.0,0.057143
9,auver06,0.1,7.0,0.0,0.0,0.0,0.0,0.0,0.0


In [42]:
df_av.to_csv(os.path.join(output_dir, "df_av_v4.csv"), index=False)

In [43]:
for i in df_list:
    print("-"*50)
    print(i)

--------------------------------------------------
                  species  alive  dead   alv_prop   ded_prop  total_prop  \
0         Eucalyptus alba   12.0   0.0   8.333333   0.000000    7.142857   
1  Lophostemon lactifluus   20.0   4.0  13.888889  16.666667   14.285714   
2        Melaleuca nervoa   93.0  19.0  64.583333  79.166667   66.666667   
3             P. spiralis   19.0   1.0  13.194444   4.166667   11.904762   

      site  uid        date  factor  loc_count  \
0  girra02    6  20120605.0     0.1        7.0   
1  girra02    6  20120605.0     0.1        7.0   
2  girra02    6  20120605.0     0.1        7.0   
3  girra02    6  20120605.0     0.1        7.0   

                                       geometry  lon_gda94  lat_gda94  
0  POINT (131.1320003975128 -12.52340665621054)    131.132 -12.523407  
1  POINT (131.1320003975128 -12.52340665621054)    131.132 -12.523407  
2  POINT (131.1320003975128 -12.52340665621054)    131.132 -12.523407  
3  POINT (131.1320003975128 -

In [44]:
df1 = pd.concat(df_list)
df2 = df1.reset_index(drop=True)

In [45]:
df2

Unnamed: 0,species,alive,dead,alv_prop,ded_prop,total_prop,site,uid,date,factor,loc_count,geometry,lon_gda94,lat_gda94
0,Eucalyptus alba,12,0,8.33333,0,7.14286,girra02,6,20120605.0,0.10,7.0,POINT (131.1320003975128 -12.52340665621054),131.132000,-12.523407
1,Lophostemon lactifluus,20,4,13.8889,16.6667,14.2857,girra02,6,20120605.0,0.10,7.0,POINT (131.1320003975128 -12.52340665621054),131.132000,-12.523407
2,Melaleuca nervoa,93,19,64.5833,79.1667,66.6667,girra02,6,20120605.0,0.10,7.0,POINT (131.1320003975128 -12.52340665621054),131.132000,-12.523407
3,P. spiralis,19,1,13.1944,4.16667,11.9048,girra02,6,20120605.0,0.10,7.0,POINT (131.1320003975128 -12.52340665621054),131.132000,-12.523407
4,Corymbia sp.,21,0,12.8834,0,12.8049,jdr01,25,20120522.0,0.75,7.0,POINT (131.5996997759004 -13.88378624750307),131.599700,-13.883786
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,E. chlorostachys,16,5,8.60215,31.25,10.396,gulf11,15,20130716.0,0.25,7.0,POINT (134.2071533638711 -14.96792758927459),134.207153,-14.967928
196,E. miniata,36,3,19.3548,18.75,19.3069,gulf11,15,20130716.0,0.25,7.0,POINT (134.2071533638711 -14.96792758927459),134.207153,-14.967928
197,E. tetrodonta,116,6,62.3656,37.5,60.396,gulf11,15,20130716.0,0.25,7.0,POINT (134.2071533638711 -14.96792758927459),134.207153,-14.967928
198,P. pubescens,6,1,3.22581,6.25,3.46535,gulf11,15,20130716.0,0.25,7.0,POINT (134.2071533638711 -14.96792758927459),134.207153,-14.967928


In [46]:

df2['factor'] = df2['factor'].replace(9999.0,0.0)
print(df2)


                    species alive dead  alv_prop ded_prop total_prop     site  \
0           Eucalyptus alba    12    0   8.33333        0    7.14286  girra02   
1    Lophostemon lactifluus    20    4   13.8889  16.6667    14.2857  girra02   
2          Melaleuca nervoa    93   19   64.5833  79.1667    66.6667  girra02   
3               P. spiralis    19    1   13.1944  4.16667    11.9048  girra02   
4              Corymbia sp.    21    0   12.8834        0    12.8049    jdr01   
..                      ...   ...  ...       ...      ...        ...      ...   
195        E. chlorostachys    16    5   8.60215    31.25     10.396   gulf11   
196              E. miniata    36    3   19.3548    18.75    19.3069   gulf11   
197           E. tetrodonta   116    6   62.3656     37.5     60.396   gulf11   
198            P. pubescens     6    1   3.22581     6.25    3.46535   gulf11   
199          Planchonia sp.     1    1  0.537634     6.25   0.990099   gulf11   

     uid        date  facto

In [47]:
df_list[0]
print(len(df_list))

34


In [48]:
df2.head(10)

Unnamed: 0,species,alive,dead,alv_prop,ded_prop,total_prop,site,uid,date,factor,loc_count,geometry,lon_gda94,lat_gda94
0,Eucalyptus alba,12,0,8.33333,0.0,7.14286,girra02,6,20120605.0,0.1,7.0,POINT (131.1320003975128 -12.52340665621054),131.132,-12.523407
1,Lophostemon lactifluus,20,4,13.8889,16.6667,14.2857,girra02,6,20120605.0,0.1,7.0,POINT (131.1320003975128 -12.52340665621054),131.132,-12.523407
2,Melaleuca nervoa,93,19,64.5833,79.1667,66.6667,girra02,6,20120605.0,0.1,7.0,POINT (131.1320003975128 -12.52340665621054),131.132,-12.523407
3,P. spiralis,19,1,13.1944,4.16667,11.9048,girra02,6,20120605.0,0.1,7.0,POINT (131.1320003975128 -12.52340665621054),131.132,-12.523407
4,Corymbia sp.,21,0,12.8834,0.0,12.8049,jdr01,25,20120522.0,0.75,7.0,POINT (131.5996997759004 -13.88378624750307),131.5997,-13.883786
5,E. miniata,142,1,87.1166,100.0,87.1951,jdr01,25,20120522.0,0.75,7.0,POINT (131.5996997759004 -13.88378624750307),131.5997,-13.883786
6,Corymbia sp.,38,5,19.1919,71.4286,20.9756,jdr02,26,20120523.0,0.75,7.0,POINT (131.600245706593 -13.85726623878167),131.600246,-13.857266
7,E. tetrodonta,160,2,80.8081,28.5714,79.0244,jdr02,26,20120523.0,0.75,7.0,POINT (131.600245706593 -13.85726623878167),131.600246,-13.857266
8,Syzygium sp.,236,6,100.0,100.0,100.0,jdr03,27,20120523.0,0.25,7.0,POINT (131.5679093748641 -13.84688438272733),131.567909,-13.846884
9,Species 1,73,2,92.4051,100.0,92.5926,jdr04,28,20120523.0,0.25,7.0,POINT (131.5977290340626 -13.97466897171563),131.597729,-13.974669


In [49]:
print(df2.shape)
print(len(df2.site.unique()))

(200, 14)
34


In [50]:
df_av

Unnamed: 0,site,factor,count,avg_alive,avg_dead,avg_total,cor_av_liv,cor_av_ded,cor_av_tot
0,girra02,0.1,7.0,20.714286,3.428571,24.142857,2.071429,0.342857,2.414286
1,jdr01,0.75,7.0,23.285714,0.142857,23.428571,17.464286,0.107143,17.571429
2,jdr02,0.75,7.0,28.857143,1.0,29.857143,21.642857,0.75,22.392857
3,jdr03,0.25,7.0,33.714286,0.857143,34.571429,8.428571,0.214286,8.642857
4,jdr04,0.25,7.0,11.285714,0.285714,11.571429,2.821429,0.071429,2.892857
5,jdr05,0.0,7.0,0.0,0.0,0.0,0.0,0.0,0.0
6,auver03,0.1,7.0,10.428571,0.0,10.428571,1.042857,0.0,1.042857
7,auver04,0.1,7.0,5.142857,0.285714,5.428571,0.514286,0.028571,0.542857
8,auver05,0.1,7.0,0.571429,0.0,0.571429,0.057143,0.0,0.057143
9,auver06,0.1,7.0,0.0,0.0,0.0,0.0,0.0,0.0


In [51]:
print(df_av.site.unique())

['girra02' 'jdr01' 'jdr02' 'jdr03' 'jdr04' 'jdr05' 'auver03' 'auver04'
 'auver05' 'auver06' 'auver07' 'lit01' 'buff01' 'ep01' 'girra01' 'hshr01'
 'hsf01' 'hsf02' 'wedo01' 'wed03' 'umb07' 'legu01' 'legu02' 'legu03'
 'legu04' 'legu05' 'legu06' 'legu07' 'leg08' 'auver01' 'auver02' 'centa13'
 'centa14' 'gulf11']


In [52]:
print(df2.site.unique())

['girra02' 'jdr01' 'jdr02' 'jdr03' 'jdr04' 'jdr05' 'auver03' 'auver04'
 'auver05' 'auver06' 'auver07' 'lit01' 'buff01' 'ep01' 'girra01' 'hshr01'
 'hsf01' 'hsf02' 'wedo01' 'wed03' 'umb07' 'legu01' 'legu02' 'legu03'
 'legu04' 'legu05' 'legu06' 'legu07' 'leg08' 'auver01' 'auver02' 'centa13'
 'centa14' 'gulf11']


In [53]:
df2.to_csv(os.path.join(output_dir, "df2_v4.csv"), index=False)

In [54]:
result = pd.merge(df2, df_av, on=["site", "site"])

### Dataframe has site averages and coorrected averages calculated

column headers:
 - avg: average stem counts per site (i.e. sum(cent, north etc.)/num(basal counts within site)
 - cor_av: corrected stem count average (i.e. avg * factor)
    

In [55]:
result

Unnamed: 0,species,alive,dead,alv_prop,ded_prop,total_prop,site,uid,date,factor_x,...,lon_gda94,lat_gda94,factor_y,count,avg_alive,avg_dead,avg_total,cor_av_liv,cor_av_ded,cor_av_tot
0,Eucalyptus alba,12,0,8.33333,0,7.14286,girra02,6,20120605.0,0.10,...,131.132000,-12.523407,0.10,7.0,20.714286,3.428571,24.142857,2.071429,0.342857,2.414286
1,Lophostemon lactifluus,20,4,13.8889,16.6667,14.2857,girra02,6,20120605.0,0.10,...,131.132000,-12.523407,0.10,7.0,20.714286,3.428571,24.142857,2.071429,0.342857,2.414286
2,Melaleuca nervoa,93,19,64.5833,79.1667,66.6667,girra02,6,20120605.0,0.10,...,131.132000,-12.523407,0.10,7.0,20.714286,3.428571,24.142857,2.071429,0.342857,2.414286
3,P. spiralis,19,1,13.1944,4.16667,11.9048,girra02,6,20120605.0,0.10,...,131.132000,-12.523407,0.10,7.0,20.714286,3.428571,24.142857,2.071429,0.342857,2.414286
4,Corymbia sp.,21,0,12.8834,0,12.8049,jdr01,25,20120522.0,0.75,...,131.599700,-13.883786,0.75,7.0,23.285714,0.142857,23.428571,17.464286,0.107143,17.571429
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,E. chlorostachys,16,5,8.60215,31.25,10.396,gulf11,15,20130716.0,0.25,...,134.207153,-14.967928,0.25,7.0,26.571429,2.285714,28.857143,6.642857,0.571429,7.214286
196,E. miniata,36,3,19.3548,18.75,19.3069,gulf11,15,20130716.0,0.25,...,134.207153,-14.967928,0.25,7.0,26.571429,2.285714,28.857143,6.642857,0.571429,7.214286
197,E. tetrodonta,116,6,62.3656,37.5,60.396,gulf11,15,20130716.0,0.25,...,134.207153,-14.967928,0.25,7.0,26.571429,2.285714,28.857143,6.642857,0.571429,7.214286
198,P. pubescens,6,1,3.22581,6.25,3.46535,gulf11,15,20130716.0,0.25,...,134.207153,-14.967928,0.25,7.0,26.571429,2.285714,28.857143,6.642857,0.571429,7.214286


In [56]:
print(len(result.site.unique()))

34


In [57]:
#df2_ = basal_area_m2(result)
df2_ = result

In [58]:
df2_.sample(3)

Unnamed: 0,species,alive,dead,alv_prop,ded_prop,total_prop,site,uid,date,factor_x,...,lon_gda94,lat_gda94,factor_y,count,avg_alive,avg_dead,avg_total,cor_av_liv,cor_av_ded,cor_av_tot
134,E. chlorostachys,27,0,15.9763,0,15.6069,legu04,33,20121007.0,0.1,...,129.257523,-15.3967,0.1,7.0,24.142857,0.571429,24.714286,2.414286,0.057143,2.471429
39,Antidesma sp.,7,0,4.21687,0,3.91061,buff01,3,20120713.0,0.25,...,130.894779,-11.793886,0.25,7.0,23.714286,1.857143,25.571429,5.928571,0.464286,6.392857
141,E. miniata,41,4,17.2996,25,17.7866,legu05,34,20121007.0,0.1,...,129.240838,-15.385716,0.1,7.0,33.857143,2.285714,36.142857,3.385714,0.228571,3.614286


In [59]:
# def proportions(df_):
#     """ Calculate the site perportions % """
#     df_list = []
#     for uid in df_.uid.unique():
#         print(uid)
#         df = df_[df_['uid']== uid]
#         al_prop_list = []
#         dead_prop_list = []
#         tot_prop_list = []
#         total_alive = df['alive'].sum()
#         print("total alive count: ", total_alive)
#         total_dead = df["dead"].sum()
#         for species in df["species"].unique():
#             alive = df.loc[df["species"]==species, "alive"].iloc[0]
#             dead = df.loc[df["species"]==species, "dead"].iloc[0]

#             alive_port = alive / total_alive *100
#             dead_port = dead / total_dead *100
#             tot_port = (alive + dead) /(total_alive + total_dead) *100

#             al_prop_list.append(alive_port)
#             dead_prop_list.append(dead_port)
#             tot_prop_list.append(tot_port)

#             print("total_alive: ", total_alive)

#         df["al_prop"] = al_prop_list
#         df["d_prop"] = dead_prop_list
#         df["tot_prop"] = tot_prop_list
        
#         df_list.append(df)
#     df1 = pd.concat(df_list)
    
#     return df1

In [60]:
#df3 = proportions(df2_)

In [61]:
#df3

In [62]:
def define_class(df):
    
    class_ = []
    leaves = []
    leaves_oth = []
    twigs = []
    bark = []
    bark_oth = []
    wood = []
    wood_oth = []
    branches = []
    branches_oth = []
    stems = []
    stems_oth = []
    agb = []
    agb_oth = []
    root = []
    root_oth = []
    tot_bm = []
    
    
    for index, row in df.iterrows():
        
        species = row["species"]
        #print(species)
        
    
        if species == "E. tetrodonta" or species == " Eucalyptus tetrodonta":
            
            #print("Eute")
            class_.append("Eute")
            leaves.append(122)
            leaves_oth.append(0.84)
            twigs.append(127)
            bark.append(341)
            bark_oth.append(0.99)
            wood.append(2161)
            wood_oth.append(0.93)
            branches.append(799)
            branches_oth.append(0.85)
            stems.append(2502)
            stems_oth.append(0.95)
            agb.append(3403)
            agb_oth.append(0.97)
            root.append(542)
            root_oth.append(0.57)
            tot_bm.append(3945)
            
        elif species == "E. miniata" or species == "Eucalyptus miniata":
   
            #print("Eumi")
            class_.append("Eumi")
            leaves.append(50)
            leaves_oth.append(0.96)
            twigs.append(52)
            bark.append(218)
            bark_oth.append(0.92)
            wood.append(1829)
            wood_oth.append(0.95)
            branches.append(375)
            branches_oth.append(0.79)
            stems.append(2047)
            stems_oth.append(0.96)
            agb.append(2472)
            agb_oth.append(0.96)
            root.append(542)
            root_oth.append(0.57)
            tot_bm.append(3014)

        elif species == "E. porrecta" or species == "Eucalyptus porrecta" or species =="C. porrecta" or species == "Corymbia porrecta":
               
            #print("Eupo")    
            class_.append("Eupo")
            leaves.append(73)
            leaves_oth.append(0.85)
            twigs.append(76)
            bark.append(326)
            bark_oth.append(0.98)
            wood.append(1289)
            wood_oth.append(0.98)
            branches.append(619)
            branches_oth.append(0.96)
            stems.append(1616)
            stems_oth.append(0.90)
            agb.append(2308)
            agb_oth.append(0.98)
            root.append(542)
            root_oth.append(0.57)
            tot_bm.append(2850)
            
        elif species == "E. bleeseri" or species == "Eucalyptus bleeseri" or species =="C. bleeseri" or species == "Corymbia bleeseri":
            
            #print("Eubl")
            class_.append("Eubl")
            leaves.append(49)
            leaves_oth.append(0.80)
            twigs.append(51)
            bark.append(347)
            bark_oth.append(0.97)
            wood.append(2225)
            wood_oth.append(0.97)
            branches.append(1163)
            branches_oth.append(0.90)
            stems.append(2573)
            stems_oth.append(0.97)
            agb.append(3785)
            agb_oth.append(0.96)
            root.append(542)
            root_oth.append(0.57)
            tot_bm.append(4327)
            
            
        elif species == "E. chlorostachys" or species == "Erythrophleum chlorostachys":
            
            #print("Erch")
            class_.append("Erch")
            leaves.append(154)
            leaves_oth.append(0.75)
            twigs.append(160)
            bark.append(401)
            bark_oth.append(0.95)
            wood.append(1044)
            wood_oth.append(0.92)
            branches.append(814)
            branches_oth.append(0.62)
            stems.append(1445)
            stems_oth.append(0.93)
            agb.append(2413)
            agb_oth.append(0.82)
            root.append(542)
            root_oth.append(0.57)
            tot_bm.append(2955)
            
        else:
            
            #print("Other")
            class_.append("Tefe")
            leaves.append(93)
            leaves_oth.append(0.89)
            twigs.append(97)
            bark.append(379)
            bark_oth.append(0.92)
            wood.append(1233)
            wood_oth.append(0.92)
            branches.append(935)
            branches_oth.append(0.83)
            stems.append(1612)
            stems_oth.append(0.92)
            agb.append(2640)
            agb_oth.append(0.91)
            root.append(542)
            root_oth.append(0.57)
            tot_bm.append(3182)
            
    df["co_ef_nme"] = class_
    df["leaves"] = leaves
    df["leaves_r2"] = leaves_oth
    df["twigs"] = twigs
    df["bark"] = bark
    df["bark_r2"] = bark_oth
    df["wood"] = wood
    df["wood_r2"] = wood_oth
    df["branches"]= branches
    df["branches_r2"] = branches_oth
    df["stems"] = stems
    df["stems_r2"] = stems_oth
    df["agb"] = agb
    df["agb_r2"] = agb_oth
    df["root"] = root
    df["root_r2"] = root_oth
    df["tot_bm"] = tot_bm
    
    
    return df
            
    

In [63]:
df3 = df2_

In [64]:
df4 = define_class(df3)
df4.to_csv(os.path.join(output_dir, "df4.csv"), index=False)
df4

Unnamed: 0,species,alive,dead,alv_prop,ded_prop,total_prop,site,uid,date,factor_x,...,wood_r2,branches,branches_r2,stems,stems_r2,agb,agb_r2,root,root_r2,tot_bm
0,Eucalyptus alba,12,0,8.33333,0,7.14286,girra02,6,20120605.0,0.10,...,0.92,935,0.83,1612,0.92,2640,0.91,542,0.57,3182
1,Lophostemon lactifluus,20,4,13.8889,16.6667,14.2857,girra02,6,20120605.0,0.10,...,0.92,935,0.83,1612,0.92,2640,0.91,542,0.57,3182
2,Melaleuca nervoa,93,19,64.5833,79.1667,66.6667,girra02,6,20120605.0,0.10,...,0.92,935,0.83,1612,0.92,2640,0.91,542,0.57,3182
3,P. spiralis,19,1,13.1944,4.16667,11.9048,girra02,6,20120605.0,0.10,...,0.92,935,0.83,1612,0.92,2640,0.91,542,0.57,3182
4,Corymbia sp.,21,0,12.8834,0,12.8049,jdr01,25,20120522.0,0.75,...,0.92,935,0.83,1612,0.92,2640,0.91,542,0.57,3182
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,E. chlorostachys,16,5,8.60215,31.25,10.396,gulf11,15,20130716.0,0.25,...,0.92,814,0.62,1445,0.93,2413,0.82,542,0.57,2955
196,E. miniata,36,3,19.3548,18.75,19.3069,gulf11,15,20130716.0,0.25,...,0.95,375,0.79,2047,0.96,2472,0.96,542,0.57,3014
197,E. tetrodonta,116,6,62.3656,37.5,60.396,gulf11,15,20130716.0,0.25,...,0.93,799,0.85,2502,0.95,3403,0.97,542,0.57,3945
198,P. pubescens,6,1,3.22581,6.25,3.46535,gulf11,15,20130716.0,0.25,...,0.92,935,0.83,1612,0.92,2640,0.91,542,0.57,3182


In [65]:
for i in df4.columns:
    print(i)

species
alive
dead
alv_prop
ded_prop
total_prop
site
uid
date
factor_x
loc_count
geometry
lon_gda94
lat_gda94
factor_y
count
avg_alive
avg_dead
avg_total
cor_av_liv
cor_av_ded
cor_av_tot
co_ef_nme
leaves
leaves_r2
twigs
bark
bark_r2
wood
wood_r2
branches
branches_r2
stems
stems_r2
agb
agb_r2
root
root_r2
tot_bm


In [66]:
def total_ba_m2(df):
    
    # Total basal area
    tot_ba = []
    al_ba = []
    ded_ba = []
    
    # Carbom stock
    c_leaves = []
    c_twigs = []
    c_bark = []
    c_wood = []
    c_branches = []
    c_stems = []
    c_agb = []
    c_roots = []
    

        
    for index, row in df.iterrows():
        #print(index)
        
        # Calculate total site basal area
        total_basal = (row["total_prop"] * row["cor_av_tot"])/100
        tot_ba.append(total_basal)
        #tot_ba.append((row["total_prop"] * row["cor_av_tot"])/100)
        
        alive_basal = (row["alv_prop"] * row["cor_av_liv"])/100
        al_ba.append(alive_basal)
        #al_ba.append((row["alv_prop"] * row["cor_av_liv"])/100)
        
        dead_basal = (row["ded_prop"] * row["cor_av_ded"])/100
        ded_ba.append(dead_basal)
        #ded_ba.append((row["ded_prop"] * row["cor_av_ded"])/100)
        # Calculate site carbon
        
        c_leaves.append(total_basal * row["leaves"])
        c_twigs.append(total_basal * row["twigs"])
        c_bark.append(total_basal * row["bark"])
        c_wood.append(total_basal * row["wood"])
        c_branches.append(total_basal * row["branches"])
        c_stems.append(total_basal * row["stems"])
        c_agb.append(total_basal * row["agb"])
        c_roots.append(total_basal * row["root"])
        
#         c_leaves.append(row["total_prop"] * row["leaves"])
#         c_twigs.append(row["total_prop"] * row["twigs"])
#         c_bark.append(row["total_prop"] * row["bark"])
#         c_wood.append(row["total_prop"] * row["wood"])
#         c_branches.append(row["total_prop"] * row["branches"])
#         c_stems.append(row["total_prop"] * row["stems"])
#         c_agb.append(row["total_prop"] * row["agb"])
#         c_roots.append(row["total_prop"] * row["root"])

    # Total basal area    
    df["total_alv_ba"] = al_ba
    df["total_ded_ba"] = ded_ba
    df["total_ba"] = tot_ba
    # Carbon stock
    df["c_leaves"] = c_leaves
    df["c_twigs"] = c_twigs
    df["c_bark"] = c_bark
    df["c_wood"] = c_wood
    df["c_branches"] = c_branches
    df["c_stems"] = c_stems
    df["c_agb"] = c_agb
    df["c_roots"] = c_roots
            
    return df
        

In [67]:
df4.columns

Index(['species', 'alive', 'dead', 'alv_prop', 'ded_prop', 'total_prop',
       'site', 'uid', 'date', 'factor_x', 'loc_count', 'geometry', 'lon_gda94',
       'lat_gda94', 'factor_y', 'count', 'avg_alive', 'avg_dead', 'avg_total',
       'cor_av_liv', 'cor_av_ded', 'cor_av_tot', 'co_ef_nme', 'leaves',
       'leaves_r2', 'twigs', 'bark', 'bark_r2', 'wood', 'wood_r2', 'branches',
       'branches_r2', 'stems', 'stems_r2', 'agb', 'agb_r2', 'root', 'root_r2',
       'tot_bm'],
      dtype='object')

In [68]:
df5 =total_ba_m2(df4)

In [69]:

df5

Unnamed: 0,species,alive,dead,alv_prop,ded_prop,total_prop,site,uid,date,factor_x,...,total_ded_ba,total_ba,c_leaves,c_twigs,c_bark,c_wood,c_branches,c_stems,c_agb,c_roots
0,Eucalyptus alba,12,0,8.33333,0,7.14286,girra02,6,20120605.0,0.10,...,0.000000,0.172449,16.037755,16.727551,65.358163,212.629592,161.239796,277.987755,455.265306,93.467347
1,Lophostemon lactifluus,20,4,13.8889,16.6667,14.2857,girra02,6,20120605.0,0.10,...,0.057143,0.344898,32.075510,33.455102,130.716327,425.259184,322.479592,555.975510,910.530612,186.934694
2,Melaleuca nervoa,93,19,64.5833,79.1667,66.6667,girra02,6,20120605.0,0.10,...,0.271429,1.609524,149.685714,156.123810,610.009524,1984.542857,1504.904762,2594.552381,4249.142857,872.361905
3,P. spiralis,19,1,13.1944,4.16667,11.9048,girra02,6,20120605.0,0.10,...,0.014286,0.287415,26.729592,27.879252,108.930272,354.382653,268.732993,463.312925,758.775510,155.778912
4,Corymbia sp.,21,0,12.8834,0,12.8049,jdr01,25,20120522.0,0.75,...,0.000000,2.250000,209.250000,218.250000,852.750000,2774.250000,2103.750000,3627.000000,5940.000000,1219.500000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,E. chlorostachys,16,5,8.60215,31.25,10.396,gulf11,15,20130716.0,0.25,...,0.178571,0.750000,115.500000,120.000000,300.750000,783.000000,610.500000,1083.750000,1809.750000,406.500000
196,E. miniata,36,3,19.3548,18.75,19.3069,gulf11,15,20130716.0,0.25,...,0.107143,1.392857,69.642857,72.428571,303.642857,2547.535714,522.321429,2851.178571,3443.142857,754.928571
197,E. tetrodonta,116,6,62.3656,37.5,60.396,gulf11,15,20130716.0,0.25,...,0.214286,4.357143,531.571429,553.357143,1485.785714,9415.785714,3481.357143,10901.571429,14827.357143,2361.571429
198,P. pubescens,6,1,3.22581,6.25,3.46535,gulf11,15,20130716.0,0.25,...,0.035714,0.250000,23.250000,24.250000,94.750000,308.250000,233.750000,403.000000,660.000000,135.500000


In [70]:
df5.columns

Index(['species', 'alive', 'dead', 'alv_prop', 'ded_prop', 'total_prop',
       'site', 'uid', 'date', 'factor_x', 'loc_count', 'geometry', 'lon_gda94',
       'lat_gda94', 'factor_y', 'count', 'avg_alive', 'avg_dead', 'avg_total',
       'cor_av_liv', 'cor_av_ded', 'cor_av_tot', 'co_ef_nme', 'leaves',
       'leaves_r2', 'twigs', 'bark', 'bark_r2', 'wood', 'wood_r2', 'branches',
       'branches_r2', 'stems', 'stems_r2', 'agb', 'agb_r2', 'root', 'root_r2',
       'tot_bm', 'total_alv_ba', 'total_ded_ba', 'total_ba', 'c_leaves',
       'c_twigs', 'c_bark', 'c_wood', 'c_branches', 'c_stems', 'c_agb',
       'c_roots'],
      dtype='object')

In [71]:
df5.to_csv(os.path.join(output_dir, "df5_v4.csv"), index=False)

In [72]:
df5.sample(2)

Unnamed: 0,species,alive,dead,alv_prop,ded_prop,total_prop,site,uid,date,factor_x,...,total_ded_ba,total_ba,c_leaves,c_twigs,c_bark,c_wood,c_branches,c_stems,c_agb,c_roots
73,Denhamia obscura,1,0,0.588235,0.0,0.520833,hshr01,7,20120606.0,1.0,...,0.0,0.142857,13.285714,13.857143,54.142857,176.142857,133.571429,230.285714,377.142857,77.428571
113,P. careya,6,3,2.98507,18.75,4.14747,wed03,11,20120710.0,0.25,...,0.107143,0.321429,29.892857,31.178571,121.821429,396.321429,300.535714,518.142857,848.571429,174.214286


In [79]:
def total_kg_per_site(df):
    df_list = []

    for uid in df.uid.unique():
        df1 = df[df["uid"]==uid]
        #print(df1)
        # -------------------------------- Biomass -----------------------------------
        df1["bio_l_kg1ha"] = (df1.loc[:, "c_leaves"].sum())*0.47
        
        df1["bio_t_kg1ha"] = (df1.loc[:, "c_twigs"].sum())*0.49
        
        df1["bio_b_kg1ha"] = (df1.loc[:, "c_bark"].sum())*0.49
        
        df1["bio_w_kg1ha"] = (df1.loc[:, "c_wood"].sum())*0.49
        
        df1["bio_br_kg1ha"] = (df1.loc[:, "c_branches"].sum())*0.49
        
        df1["bio_s_kg1ha"] = (df1.loc[:, "c_stems"].sum())*0.49
        
        df1["bio_r_kg1ha"] = (df1.loc[:, "c_roots"].sum())*0.49
        
        df1["bio_agb_kg1ha"] = (df1.loc[:, "c_agb"].sum())*0.49
        
        # ------------------------------ Carbon ------------------------------------
        
        df1["c_l_kg1ha"] = (df1.loc[:, "c_leaves"].sum())
        
        df1["c_t_kg1ha"] = (df1.loc[:, "c_twigs"].sum())
        
        df1["c_b_kg1ha"] = (df1.loc[:, "c_bark"].sum())
        
        df1["c_w_kg1ha"] = (df1.loc[:, "c_wood"].sum())
        
        df1["c_br_kg1ha"] = (df1.loc[:, "c_branches"].sum())
        
        df1["c_s_kg1ha"] = (df1.loc[:, "c_stems"].sum())

        df1["c_r_kg1ha"] = (df1.loc[:, "c_roots"].sum())
        
        df1["c_agb_kg1ha"] = (df1.loc[:, "c_agb"].sum())
        
        print(df1)
        df_list.append(df1)

        
        #dfmi.loc[:, ('one', 'second')]
        
#     for index, row in df.iterrows():
#         tot_ba_ha.append((row["tot_prop"] * row["ba_all_m2"])/100)
#         al_ba_ha.append((row["al_prop"] * row["ba_alive_m2"])/100)
        
#     df["alv_ba_ha"] = al_ba_ha
#     df["tot_ba_ha"] = tot_ba_ha
    final_df = pd.concat(df_list)        
    return final_df

In [80]:
df5 = total_kg_per_site(df5)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # Remove the CWD from sys.path while we load stuff.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if sys.path[0] == '':
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See th

                  species alive dead alv_prop ded_prop total_prop     site  \
0         Eucalyptus alba    12    0  8.33333        0    7.14286  girra02   
1  Lophostemon lactifluus    20    4  13.8889  16.6667    14.2857  girra02   
2        Melaleuca nervoa    93   19  64.5833  79.1667    66.6667  girra02   
3             P. spiralis    19    1  13.1944  4.16667    11.9048  girra02   

   uid        date  factor_x  ...  bio_r_kg1ha bio_agb_kg1ha   c_l_kg1ha  \
0    6  20120605.0       0.1  ...      641.186       3123.12  224.528571   
1    6  20120605.0       0.1  ...      641.186       3123.12  224.528571   
2    6  20120605.0       0.1  ...      641.186       3123.12  224.528571   
3    6  20120605.0       0.1  ...      641.186       3123.12  224.528571   

    c_t_kg1ha   c_b_kg1ha    c_w_kg1ha   c_br_kg1ha    c_s_kg1ha    c_r_kg1ha  \
0  234.185714  915.014286  2976.814286  2257.357143  3891.828571  1308.542857   
1  234.185714  915.014286  2976.814286  2257.357143  3891.828571  

    species alive dead alv_prop ded_prop total_prop    site  uid        date  \
131    None     0    0        0        0          0  legu02   31  20121005.0   

     factor_x  ...  bio_r_kg1ha bio_agb_kg1ha  c_l_kg1ha  c_t_kg1ha  \
131       0.0  ...          0.0           0.0        0.0        0.0   

     c_b_kg1ha  c_w_kg1ha  c_br_kg1ha  c_s_kg1ha  c_r_kg1ha  c_agb_kg1ha  
131        0.0        0.0         0.0        0.0        0.0          0.0  

[1 rows x 74 columns]
    species alive dead alv_prop ded_prop total_prop    site  uid        date  \
132    None     0    0        0        0          0  legu03   32  20121006.0   

     factor_x  ...  bio_r_kg1ha bio_agb_kg1ha  c_l_kg1ha  c_t_kg1ha  \
132       0.0  ...          0.0           0.0        0.0        0.0   

     c_b_kg1ha  c_w_kg1ha  c_br_kg1ha  c_s_kg1ha  c_r_kg1ha  c_agb_kg1ha  
132        0.0        0.0         0.0        0.0        0.0          0.0  

[1 rows x 74 columns]
                     species alive dead alv_pr

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # Remove the CWD from sys.path while we load stuff.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if sys.path[0] == '':
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See th

In [81]:
df5.to_csv(os.path.join(output_dir, "kg1ha bio_c_v4.csv"), index=False)

In [82]:
df5.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 200 entries, 0 to 199
Data columns (total 74 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   species        200 non-null    object 
 1   alive          200 non-null    object 
 2   dead           200 non-null    object 
 3   alv_prop       200 non-null    object 
 4   ded_prop       191 non-null    object 
 5   total_prop     200 non-null    object 
 6   site           200 non-null    object 
 7   uid            200 non-null    int64  
 8   date           200 non-null    object 
 9   factor_x       200 non-null    float64
 10  loc_count      200 non-null    float64
 11  geometry       200 non-null    object 
 12  lon_gda94      200 non-null    float64
 13  lat_gda94      200 non-null    float64
 14  factor_y       200 non-null    float64
 15  count          200 non-null    float64
 16  avg_alive      200 non-null    float64
 17  avg_dead       200 non-null    float64
 18  avg_total 

In [87]:
df_totals = df5[["uid", "site", "date", "lon_gda94", "lat_gda94", "bio_l_kg1ha", "bio_t_kg1ha", "bio_b_kg1ha", "bio_w_kg1ha",
                 "bio_br_kg1ha", "bio_s_kg1ha", "bio_r_kg1ha", "bio_agb_kg1ha", "c_l_kg1ha", "c_t_kg1ha",
                 "c_b_kg1ha", "c_w_kg1ha", "c_br_kg1ha", "c_s_kg1ha", "c_r_kg1ha", "c_agb_kg1ha"]]

In [88]:
# df_final[df_final["uid"] == 1]

In [89]:
df_totals.drop_duplicates(subset=["site"], inplace=True)
df_totals.to_csv(os.path.join(output_dir, "c_bio_site_totals_v4.csv"), index=False)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
