Plan for steps in this processing code
1. Define a base grid
2. Load all the tif files - know how
4. Load the shp files - know how
5. Project the shp files to tif - at the resolution / bounds, etc with the base grid - know how
6. Generate the train deposit / occurence tif files - know how
6. Unify all the tif data - know how

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from tqdm import tqdm
import rasterio
import pandas as pd
import geopandas as gpd

import utilities as utils

RAW_DATA_DIR = "data/LAWLEY22-RAW/geophysics/"
DERIV_DATA_DIR = "data/LAWLEY22-DERIV/geophysics/"

In [None]:
tifs, shps = utils.get_input_var_files("uscan")

Loads the raster data

In [None]:
rasters = utils.load_rasters(tifs, rasters_path=RAW_DATA_DIR, verbosity=1)

We'll need to upsample rasters that have too low a resolution

In [None]:
rasters = utils.resample_rasters(rasters[-2], rasters, tifs)

Loads rasters of the vector data if available; otherwise generates them

In [None]:
try:
    rasters += utils.load_rasters(shps, rasters_path=RAW_DATA_DIR, verbosity=1)
except rasterio.RasterioIOError:
    base_raster = rasters[0] # defaults to intermediate resolution raster
    vectors = utils.load_vectors(shps, vectors_path=RAW_DATA_DIR, verbosity=0)
    pbar = tqdm(zip(shps, vectors))
    for shp, vector in pbar:
        pbar.set_description(f"Processing {shp}")
        utils.proximity_raster_of_vector_points(base_raster, shp, vector)
    rasters += utils.load_rasters(shps, rasters_path=RAW_DATA_DIR, verbosity=1)

Loads the base grid for all data if available; otherwise generates it

In [None]:
# grid_cell_ids = utils.generate_s2_grid(rasters, DERIV_DATA_DIR, "s2_grid_uscan")

Initialize the datacube

In [None]:
# datacube = utils.init_datacube({"s2_cell_id": grid_cell_ids}, ["s2_cell_center", "s2_cell_poly"] + tifs + shps, verbosity=1)

Load file with Deposits and Occurrences

In [None]:
# df_dep, df_occ = utils.process_raw_deposit_file('GeologyMineralOccurrences_USCanada_Australia.csv', csv_path='data/LAWLEY22-RAW/labels/', region='USCanada', dep_grp='MVT')

Adding MVT_Deposit, MVT_Occurrence columns to datacube

In [None]:
# datacube, notrecogdep = utils.mvt_dep_occur_to_s2cells(datacube, df_dep, colname='MVT_Deposit')
# datacube, notrecogocc = utils.mvt_dep_occur_to_s2cells(datacube, df_occ, colname='MVT_Occurrence')

Add neighbors column to datacube

In [None]:
# datacube = utils.neighbor_deposits(datacube, deptype='MVT')

Populate the datacube using as many process as available CPUs

In [None]:
# datacube = utils.populate_datacube(datacube, tifs+shps)

Final filtering of s2 cells that contain no data

In [None]:
# print(f"Removing {np.count_nonzero(np.isnan(datacube.loc[:, tifs+shps].values).all(axis=1))} s2 cells that have no geophysical data")
# datacube = datacube[~np.isnan(datacube.loc[:, tifs+shps].values).all(axis=1)]

Store the datacube for future use

In [None]:
# datacube.to_csv(f"{DERIV_DATA_DIR}datacube_uscan.csv")
datacube = pd.read_csv(f"{DERIV_DATA_DIR}datacube_uscan.csv")
from shapely import wkt
datacube["s2_cell_poly"] = datacube["s2_cell_poly"].apply(wkt.loads)

In [None]:
datacube.info()

We'll rasterize the datacube to visualize it and confirm the output

In [None]:
base_raster = rasters[0]
utils.rasterize_datacube(datacube, base_raster, DERIV_DATA_DIR)