This document is meant to describe the code archive I have built and use it to do some analysis of some Cosmic data.

#### Pre/post processing and data merging

The first part of the code are the two bash `uc2roppall` and `ppall` scripts that will process all of the data in a given file. These require the ropp functions applicable to function properly. 

`uc2roppall` will run a ucar to ropp script on all files with given properties and puts them in another folder. 

`ppall` postprocesses all files in a given folder and moves them into a new one.

Both of these require some rewriting to work with any folder, but this is easy because you just have to change the file directory that is given in the first couple lines of the script.

After all of the data is postprocessed then python needs to be activated and the function `mergeNetCDF4Directory` in `netCDF4utils` needs to be executed. There is documentation for this in the code file but I will supply it here:

```mergeNetCDF4Directory (string: directoryName, string: fileBeginning, string: outputFileName,
                              character: separator = '|', array of strings: variables = DEFAULT_VARIABLES):
        This function takes netCDF4 files in a directory with a certain fileBeginning and will merge them into a pandas dataframe this dataframe is then printed into a csv and will return the dataframe
        The reason for this function: 
        When using ropp to post process occultations it outputs a file that is unwieldy and terrible, additionally it will only be one occultation at a time.
        Pandas dataframes are much more user friendly and work with way more packages, so it is nicer to just have all the data in that form.
        Also, it is far more useful to have all of the occultations in one bin so they can be analyzed together
        Things to note:
        Data should be indexed by occultation id by default 
        use something like this to select the row:
            df.loc[df['occ_id'] == id]
        ```

This function will save the merged csv to the file `outputFileName`. It will take all of the ropp files that are in a given directory, `directoryName`, and will merge them into one pandas dataframe, then it saves the file. `variables` is an array of strings that index which variables from the ropp files will be put into the merged file. `variables` has a default value that is defined in the code, however this can be overwritten to have any valid variables that one would want in the merged CSV.

##### Python Processing

Once the merged file exists, it can easily be copied from any server to anywere else using the `cp` command. Once it is on a computer that can be used for analysis (one that can run `matplotlib`, ect) the next method can be used: `ReadMergedCSV`.

To use this method the python package: `netCDF4utils` has to be imported into the jupyter notebook to be used.

It requires that a filepath and a dataframe be supplied so that it can know where the merged CSV is located and where it needs to put the resulting dataframe. This function is quite slow, since the way that the csv is stored requires that every array of values must be parsed for the correct values to be put into the dataframe.

After this has been done the pandas dataframe can be manipulated easily, as with any other pandas dataframe. 

Some important things to note:

1. Many of the column data for many of the rows are themselves arrays, so plotting and analysis should be either done row by row or a relevant constant must be extracted to have all of the occultations analyzed together.
2. The data is not particularly clean, meaning that there are some significant artefacting problems that I have seen within the data that must be addressed before we can truly say that we have correct analysis of the data. This ranges from refractivity being set to -infinity to bending angles being close enough to zero at high impact parameters that the behavior of the curves is not particularly meaningful.

There are some additional functions that be can used to analyze the data. These include averaging, exponential coefficient determination, data plotting, binned plotting, moving average plotting, and minimum and maximum finding. These all are designed to have functionality that seems to fit with finding trends in this particular data.

To illustrate some of these functions and their utility, I will import a merged Dataset and manipulate it with the functions defined in the `netCDF4utils` package.

In [1]:
import netCDF4utils as utils

