Skip to content

Latest commit

 

History

History
140 lines (105 loc) · 4.97 KB

creating_hgc_dataframes.rst

File metadata and controls

140 lines (105 loc) · 4.97 KB

Creating HGC compatible DataFrames

WaDI is intended to be a generic processor of hydrochemical data but also offers special support for HGC, which is a package for correction, validation and analysis of groundwater quality samples. An example workflow for reading data from a spreadsheet file in wide format is provided below. It demonstrates two key features

  • how to use WaDI's default dictionary to map the feature names to HGC feature names
  • automatic unit conversion to HGC compatible units

As always, the first step is to define the WaDI DataObject that will perform the required steps. As can be seen from the function arguments, the name of the log file will be 'hgc_compatible.log' and no messages will be printed to the screen because the silent keyword argument is set to True.

.. ipython:: python

    import wadi as wd

    wdo = wd.DataObject(log_fname='hgc_compatible.log',
        silent=True,
    )

Second, the file_reader of the DataObject must understand the structure of the spreadsheet file. As explained in :doc:`example 2 <../user_guide/example_02>`, this is done by creating dictionaries with keyword arguments which enable importing multiple different 'blocks' of data from the file. Here, the general sample data (sample code, location, sampling date) are found in columns A through E and the feature data in columns F through N. A separate dictionary needs to be created for both.

.. ipython:: python

    df0_kwargs = {
        "skiprows": [1],
        "usecols": "A:E",
        "units_row": 1,
        "datatype": "sampleinfo",
    }

    df1_kwargs = {
        "skiprows": [1],
        "usecols": "F:N",
        "units_row": 1,
        "datatype": "feature",
    }

As can be seen from the items in both dictionaries defined above, "skiprows": [1] is provided so that the second row of the spreadsheet is not read when reading the concentration (or sample info) data. This is because the second row contains the units. This row must be read separately, which is accomplished by also passing "units_row": 1.

With these dictionaries defined, the file_reader can be initialized and the result of the import operation can be visualized by printing the contents of the imported DataFrame

For the purpose of the demonstration in this documentation, find the folder with the data files

.. ipython:: python

    from wadi.documentation_helpers import get_data_dir
    DATA_DIRECTORY = get_data_dir()

.. ipython:: python

    wdo.file_reader(
        file_path=DATA_DIRECTORY / "hgc_example.xlsx",
        format="wide",
        blocks=[df0_kwargs, df1_kwargs],
    )

    print(wdo.get_imported_dataframe().head())

The next step is to define a mapping dictionary. WaDI's default dictionary already contains the HGC names for features. In this case, the feature names provided in the spreadsheet are according to the Dutch SIKB standard, which are also known to WaDI's default database (see :doc:`example 2 <../user_guide/mapping_dictionaries>`).

The mapping dictionary is created by calling the default_dict function of the MapperDict object, with the arguments 'SIKB' and 'HGC'. The resulting dictionary is stored under the variable name names_dict, which is then used to initialize the name_map of the WaDI DataObject. The selected match methods are 'exact' and 'fuzzy', with the latter being included because it is case insensitive.

.. ipython:: python

    names_dict = wd.mapper.MapperDict.default_dict('SIKB', 'HGC')

    wdo.name_map(
        m_dict=names_dict,
        match_method=["exact", "fuzzy"],
    )

The final step before the data can be converted is to define the harmonizer. The convert_units argument needs to be set to True and instead of specifying chemical concentration units, the target_units are set to 'hgc'. WaDI then understands that it must convert the feature units to values that are prescribed in HGC, which are different for different species (for example, mg/l for chloride, but ug/l for bromide).

.. ipython:: python

    df = wdo.harmonizer(
        convert_units=True,
        target_units="hgc",
    )

The data can now be converted and displayed on the screen.

.. ipython:: python
    :okexcept:

    df = wdo.get_converted_dataframe()

    print(df.head())

The user should always check the contents of the DataFrame created by WaDI to ensure that the mapping and harmonzing operations yielded the desired results. This is why it is critically important to inspect the conversion results, especially the column names, the units and the concentrations before proceeding with doing any calculations in HGC!

HGC requires a DataFrame without the units. This can be created by setting the include_units argument of the get_converted_dataframe function to False.

.. ipython:: python
    :okexcept:

    df_hgc = wdo.get_converted_dataframe(include_units=False)

    print(df_hgc.head())