# First run of crspy

Once you have installed crspy to your python environment it's time to prepare the working directory. 

Firstly, copy over the example `name_list.py` file (copy also found [here](https://github.com/danpower101/crspy/blob/master/name_list.py)) into the working directory and make these two changes:

- Change `defaultdir` to the working directory address. e.g. `defaultdir = "C:/users/dan/crns_wd/"`

The directory is set up in a particular way, this is to organise the different data sources and outputs. This structure is important to how crspy runs. To make set up easier a function is provided to easily set this up which you can run below.

In [None]:
from name_list import nld
import crspy

wd = nld['defaultdir']
crspy.initial(wd)

## Adding data

Looking in the working directory, the folder structure should now be ready. You will also notice some files, such as the `metadata.csv` file. More information on this file can be found [here](https://github.com/danpower101/crspy/wiki/Metadata). 

To demonstrate crspy's functions first, we have uploaded some example files on the github page [here](https://github.com/danpower101/crspy/blob/master/data/). 

Replace the skeleton `metadata.csv` (created when running the above function) with the one found on the github page. Also copy over the `nmdb_stations.csv` file.

You now need to also:

- Place the calibration data (Calib_USA_Site_011.csv) into the `data/calibration_data/` folder. 
- Place the raw data (USA_SITE_011.txt) into the `data/crns_data/raw/` folder.
- Place the era5land data (example_era5land.nc) into the `data/era5land/` folder. (Generated using Copernicus Climate Change Service Information [2020])

The example era5land data is pre-extracted using the era5land functions (check [here](https://github.com/danpower101/crspy/wiki/ERA5-Land-Data) for more information on this 
process). When building your own era5land file you can change the name - but this then needs to be also changed in the `name_list.py` file to match.

Two more external sources are required which can be collected by running the below functions. These download land cover and above ground biomass data from the European Space Agency Climate Change Iniative (ESA CCI). This involves first setting up your computer to interact with the [CDS database](https://cds.climate.copernicus.eu/cdsapp#!/home) - instructions found [here](https://github.com/danpower101/crspy/wiki/ERA5-Land-Data)

These are fairly large files and so can be skipped for now if desired. This will remove the agb correction from the calculations.

In [None]:
# UNCOMMENT BELOW IF YOU WISH TO RUN THIS STEP
#crspy.dl_land_cover() # about 2GB
#crspy.dl_agb() # about 25GB

## Fill the metadata

The below code will fill the metadata table (more information on what is collected [here](https://github.com/danpower101/crspy/wiki/Metadata)). It will also create a unique beta coefficient (required for pressure corrections to the Neutron signal) and reference pressure for each site based on the equations of Desilets (2021).

In [None]:
import pandas as pd # pandas is needed to read in the csv

# Read in the metadata.csv file using pandas
meta = pd.read_csv(nld['defaultdir']+"/data/metadata.csv")

# Run the function
meta = crspy.fill_metadata(meta)

## Process crns data

With the metadata filled correctly we can now process the data. This is done with a simple function call shown below. 

The fileslist function will provide a list of all the raw datasets in the raw folder, this helps when processing many sites at once. For now we only have a single site.

In [None]:
# Use the getlistoffiles function to get a list of file paths for time series data
fileslist = crspy.getlistoffiles(nld['defaultdir']+"/data/crns_data/raw/")

# Run the process with the following - outputs are the dataframe and metadata file - args are first file in list, and whether calibration is required. If we don't have a N0 in meta_data.csv but do have calibration data this can be set to True
df, meta = crspy.process_raw_data(fileslist[0], calibrate=True)

## Check the outputs

Once the above process has been complete, you can check the folders to see the saved outputs. Important folders are:

- `/data/crns_data/final/` will contain the final output file including soil moisture estimates as well as variables that have been calculated along the way.
- `/data/qa/` will contain a folder with qa graphs, these are time series that are useful in identifying possible issues with the data
- `/data/n0_calibration/` will contains numerous files related to the n0 calibration process. The report should be checked to check reasonable values have been used (these values are collected automatically so are written here to make sure there is full line of site). Much more detail on the n0 calibration process can be found at [Schron et al., 2017](https://doi.org/10.5194/hess-21-5009-2017)