# Example workflow with this project's packages


This is a walkthrough of how to use the `daphnia` data mutation and transformation functions to perform various operations on `.NPZ` files from `TRex` animal tracking software:


### Workflow

* convert `.npz` output files into pandas dataframes with `NPZer`

* clean TRex data by removing invalid data, such as sudden jumps or missing data points with `TRexDataCleaner`

* Impute missing data with `TRexDataImputer` and relevant imputation methods

* visualize daphnia tracking with `visualizer`

## Importing relevant packages


copy the following code into a jupyter notebook `code` cell at the top of your notebook to import the relevant packages



In [25]:
# Import necessary tools
import pandas as pd
import numpy as np
import sys
import os

# Declare root path for accessing package
sys.path.append(os.path.abspath(os.path.join('..', '..')))

# import module
from src.data_manipulation.NPZer import NPZer
from src.data_manipulation.TRexDataCleaner import TRexDataCleaner
from src.data_manipulation.TRexImputer import TRexImputer




Below are code examples for each operation. Copy the code into a jupyter notebook inside the `workspace/notebooks` folder and place new data inside `workspace/data` to process new data

## NPZ file to pandas DataFrame



In [32]:


# # Set desired parameters ("../../" to access root directory since we are 2 folders deep and data is in root)
SOURCE_DIR = '../../data/npz_file/single_7_9_fish1.MP4_fish0.npz'
INVERT_Y = True
PARAMS = ['time', 'X', 'Y']

In [35]:
# Unzip and turn data into a pandas table
unzippedData = NPZer.pandafy(source_dir=SOURCE_DIR,
                              invertY=INVERT_Y,
                              params=PARAMS)

# Print data in form of pandas table
print('TRex Data:\n', unzippedData)

TRex Data:
              time          X         Y
0        0.000000  23.536650 -1.792803
1        0.016949  23.517750 -1.792841
2        0.033898  23.517750 -1.792841
3        0.050847  23.517750 -1.792841
4        0.067796  23.517750 -1.792841
...           ...        ...       ...
10817  183.338989  19.579285 -6.965172
10818  183.355927  19.569004 -6.989434
10819  183.372879  19.588287 -7.017863
10820  183.389832  19.577187 -7.083682
10821  183.406784  19.577187 -7.083682

[10822 rows x 3 columns]


Pandas DataFrame file to .NPZ file

## Data frame file to .NPZ file



In [37]:
# Zip unzipped data back into a .npz file
SAVE_DIR = '../workspace/data'
NPZer.npzip(data=unzippedData, save_dir=SAVE_DIR)

FileNotFoundError: [Errno 2] No such file or directory: '../workspace/data.npz'

## Clean pandas.DataFrame of Invalid Data


In [40]:
# Clean Data

dataCleaner = TRexDataCleaner()
VMAX = 15
cleanedData, removedData = dataCleaner.renderDiscontinuities(data=unzippedData, vmax=VMAX)



# Print cleaned data
print(cleanedData)

  removedData = pd.concat([removedData, pd.DataFrame([data.iloc[f]])], ignore_index=True)


             time          X         Y
0        0.000000  23.536650 -1.792803
1        0.016949  23.517750 -1.792841
2        0.033898  23.517750 -1.792841
3        0.050847  23.517750 -1.792841
4        0.067796  23.517750 -1.792841
...           ...        ...       ...
10817  183.338989  19.579285 -6.965172
10818  183.355927  19.569004 -6.989434
10819  183.372879  19.588287 -7.017863
10820  183.389832  19.577187 -7.083682
10821  183.406784  19.577187 -7.083682

[10822 rows x 3 columns]


## Impute missing data with an imputation method

In [43]:
# Import necessary tools
from src.data_manipulation.TRexImputer import TRexImputer

imputer = TRexImputer()

# Set desired parameters
DATA = cleanedData
FUNCTION = 'avgValue'

FileNotFoundError: Directory c:\Users\jwright\Documents\GitHub\daphnia\workspace\notebooks\src\data_manipulation\imputation_strategies does not exist.

## Data visualization

In [11]:
# code
