(processing-raw-data)=
# Processing raw data

## 0) Activate the pyopia environment

If you installed PyOPIA within as per the guide [here](https://pyopia.readthedocs.io/en/latest/intro.html#installing), then you should activate this environment first, e.g.:

```
uv sync
```
and
```
source .venv/bin/activate
```

## 1) Create a new project folder with a config file and metadata template
To start a new image processing project with PyOPIA, you can use the 'init-project' command (here called 'myproject'):

```
pyopia init-project myproject
```

If you want help and additional options for this command, do: `pyopia init-project --help`

You should now have a new project folder ('myproject') contaning a config file ('config.toml') and a README file with suggestions for steps to perform before starting processing. Several other input files and subfolders are also generated:

```
myproject/
├── auxillarydata
│   └── auxillary_data.csv
├── config.toml
├── images
├── metadata.json
├── processed
├── pyopia-default-classifier-20250409.keras
└── README
```

## 2)  Make sure you are happy with your config file

Refer to the comments in the examples given here {ref}`toml-config`

If you need detailed help on arguments specific to a pipeline class, then you may wish to refer to the API documentation for that specific class.

Particle classification is provided by [steps.classifier], which points to a pre-trained Keras CNN model. A default classifier for PyOPIA was provided by default using the init-project command. 

## 3) Add project-relevant metadata

PyOPIA generates a self-describing netCDF file during processing, which in addition to particle statistics contain some basic metadata. These are in part taken from the 'metadata.json' file generated in the previous step.

The generated template file 'metadata.json' contains several items that should be filled out, such as 'title' and 'creator_name'. Also check that you are happy with the default license proposed (CC BY-SA). 


You can add your own metadata items in this file as well.

## 4) Add auxillary data

A typical image dataset will be associated with some auxillary data variables, e.g. temperature, salinity and depth for a profiling setup deployed at sea. This information can optionally be incorporated into the particle statistics netCDF that PyOPIA generates, to ease post-processing of the data. Such information should is added as time series in the auxillary data file ('auxillary_data.csv'). Each row in this file should consist of a time stamp and one or more auxillary data elements. The time stamps are interpolated to match each image being processed, so they need not match exactly, but should cover the same time period. See the generated template file for more information ('auxillarydata/auxillary_data.csv').


## 5) Process!

Run the command line processing which simply needs to know which config file you want it to work on, e.g.:

```
pyopia process config.toml
```

## 4) Output

* You should expect an output folder defined by the `output_datafile` argument within the `[steps.output]` step.   
  * This will either contain a new .nc file or several .nc files, depending on if you used the `append = false` option (intended for {ref}`big-data`) or not.
* If you defined the `export_outputpath` argument in `[steps.statextract]`, then you will also have a folder containing a series of .h5 files, that contains all the particle ROIs