## Read all csv files in a directory
This notebook will go through all the csv files in 00_Library and get their gauge ID number. Then it will see if they are in the shapefile where the reservoir polygons are matched to the gauge IDs

In [1]:
import pandas as pd
import geopandas as gpd
import glob    #This one lets you read all the files in a directory

We will read the shapefile we're working with, which is a file I made in ArcGIS using a Spatial Join to connect gauge locations (which I scraped from the bom website) to reservoir polygons (DEA National HydroPolys layer where type = 'Reservoir'). Then we'll read all of the csv files in the library of good data that I hand picked from bom (00_Library, 00 stands for the year 2000 because that's when they all start at) and we'll make a shapefile that only has reservoir polygons representing the gauges in this library. This will be the shapefile used in the first pilot of converting depth to surface area.

In [3]:
# open the shapefile with the polygons attached to the gauge IDs
gdf = gpd.read_file("Spatial_Join_gauges_to_polys/pointsID_polys_join_1km.shp")
gdf = gdf.set_index(['gauge_ID'])
gdf = gdf.reset_index().dropna().set_index('gauge_ID') #get rid of NaN index values

#read all csv files in the 00_Library directory using glob. It makes a list of pandas dataframes.
lib_list = [pd.read_csv(filename, nrows=1, escapechar='#') #Only need to read the first row, that's where the ID is
      for filename in glob.glob("00_Library/*.csv")]

#Make a list of the gauge IDs from 00_Library
ID_list = []

for i in lib_list:
    column = i.iloc[:,[1]] #This is the column with the ID in it
    ID = list(column)
    ID = ID[0]
    ID = i.at[0, ID]
    
    ID_list.append(ID)

Let's see how many gauges from the 00_Library are attached to polygons in the pointsID_polys_join_1km shapefile;

In [4]:
true_list = [] #If the gauge ID from 00_Library is attached to a reservoir polygon in pointsID_polys_join_1km, the ID will go in this list
false_list = [] #If the gauge ID cannot be found in the shapefile it will go in this list

for i in ID_list:
    exists = str(i) in gdf.index  # check if the gauge ID (make the ID is a string, not a number) exists in the shapefile
    if exists == True:
        true_list.append(str(i))
    if exists == False:
        false_list.append(str(i))
        
print('There are this many gauges in 00_Library that can also be found in the shapefile:', len(true_list))
print('This many gagues in 00_Library were not found in the shapefile:', len(false_list))

There are this many gauges in 00_Library that can also be found in the shapefile: 143
This many gagues in 00_Library were not found in the shapefile: 30


Use the list to make a new dataframe with only the gauges/polygons in that list

In [5]:
gdf2 = gdf[gdf.index.isin(true_list)] #make a geodataframe with rows where the ID is in the true list
gdf2

Unnamed: 0_level_0,Join_Count,TARGET_FID,JOIN_FID,NAME,SHAPE_Area,Area_calc,lat,lon,staion_nam,lat_1,lon_1,geometry
gauge_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
TAYLORS,1,3,536,LAKE TAYLOR,0.000490,4.853574e+06,-36.783437,142.383614,Taylors Lake,-36.773120,142.386560,"POLYGON ((142.39295 -36.77087, 142.39255 -36.7..."
RE604,1,4,554,UPPER STONY CREEK RESERVOIR,0.000051,5.002874e+05,-37.818353,144.203635,Upper Stony,-37.823300,144.206200,"POLYGON ((144.21140 -37.81367, 144.21152 -37.8..."
sp-o10334,1,5,137,LAKE EILDON,0.013548,1.335458e+08,-37.162231,145.965819,EILDON,-37.225123,145.932049,"POLYGON ((145.90362 -37.00929, 145.90379 -37.0..."
425022,1,6,253,LAKE MENINDEE,0.015008,1.566768e+08,-32.342074,142.328469,LAKE MENINDEE,-32.295600,142.368200,"POLYGON ((142.35087 -32.28318, 142.35105 -32.2..."
sp-o11534,1,8,563,WARANGA BASIN,0.005671,5.633755e+07,-36.555218,145.096791,WARANGA BASIN,-36.556052,145.096115,"POLYGON ((145.15262 -36.53798, 145.15294 -36.5..."
...,...,...,...,...,...,...,...,...,...,...,...,...
136023A,1,663,373,NED CHURCHWARD WEIR,0.000514,5.740195e+06,-25.083942,152.026595,Ned Churchward HW,-25.051500,152.099450,"POLYGON ((151.99580 -25.18014, 151.99548 -25.1..."
136020A,1,664,50,BEN ANDERSON BARRAGE,0.000538,6.021928e+06,-24.927564,152.231507,Ben Anderson Barrage,-24.930000,152.255250,"POLYGON ((152.16701 -24.99459, 152.16765 -24.9..."
136003C,1,666,105,CLAUDE WHARTON WEIR,0.000133,1.481334e+06,-25.617167,151.559284,Claude Wharton HW,-25.614740,151.595000,"POLYGON ((151.52507 -25.62287, 151.52643 -25.6..."
125008A,1,669,342,MARIAN WEIR,0.000134,1.544632e+06,-21.146641,148.885758,Mirani Weir HW,-21.177880,148.830000,"POLYGON ((148.82987 -21.17723, 148.82926 -21.1..."


Great, now we can save this as a shapefile. I'll just take a few of these columns, I don't need the FID or Join columns anymore. lat_1 and lon_1 are the gauge locations and lat and lon are the centroids of the polygons

In [8]:
gdf3 = gdf2[['NAME', 'staion_nam', 'lat', 'lon', 'geometry']]
gdf3

Unnamed: 0_level_0,NAME,staion_nam,lat,lon,geometry
gauge_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
TAYLORS,LAKE TAYLOR,Taylors Lake,-36.783437,142.383614,"POLYGON ((142.39295 -36.77087, 142.39255 -36.7..."
RE604,UPPER STONY CREEK RESERVOIR,Upper Stony,-37.818353,144.203635,"POLYGON ((144.21140 -37.81367, 144.21152 -37.8..."
sp-o10334,LAKE EILDON,EILDON,-37.162231,145.965819,"POLYGON ((145.90362 -37.00929, 145.90379 -37.0..."
425022,LAKE MENINDEE,LAKE MENINDEE,-32.342074,142.328469,"POLYGON ((142.35087 -32.28318, 142.35105 -32.2..."
sp-o11534,WARANGA BASIN,WARANGA BASIN,-36.555218,145.096791,"POLYGON ((145.15262 -36.53798, 145.15294 -36.5..."
...,...,...,...,...,...
136023A,NED CHURCHWARD WEIR,Ned Churchward HW,-25.083942,152.026595,"POLYGON ((151.99580 -25.18014, 151.99548 -25.1..."
136020A,BEN ANDERSON BARRAGE,Ben Anderson Barrage,-24.927564,152.231507,"POLYGON ((152.16701 -24.99459, 152.16765 -24.9..."
136003C,CLAUDE WHARTON WEIR,Claude Wharton HW,-25.617167,151.559284,"POLYGON ((151.52507 -25.62287, 151.52643 -25.6..."
125008A,MARIAN WEIR,Mirani Weir HW,-21.146641,148.885758,"POLYGON ((148.82987 -21.17723, 148.82926 -21.1..."


In [9]:
output = '00_Library_reservois'
gdf3.to_file(output)

Great, now you have a shapefile that only has the reservoirs with good depth data. You can use this to test notebooks later on that ingest more than one reservoir at a time.