Skip to content

ClustProject/KUDataPreprocessing

Repository files navigation

KU Data Preprocessing Package

1. Repository Structure

.
├── dataset
│   ├── ecg_mitbih_test.csv
│   ├── imputed_data
│   │   └── ecg_mitbih_test_imputed.csv
│   ├── decomposed_data
│   │   ├── ecg_mitbih_test_imputed.csv
│   │   └── trend_decomposed.csv
│   └── synchronized_data
│       └── synchronized_dtw.csv
├── imputation.py
├── seasonal_trend_decomposition.py
├── synchronization.py
└── README.md

2. Preprocessing module

2.1 Missing Value (NA) Imputation

2.1.1 Supported Options & Sample Usage

Impute the missing values in a dataset and save the result.

  • Simple Imputation with mean, median, most_frequent, constant value [description]
# Sample Usage
python imputation.py --data_path='./dataset/ecg_mitbih_test.csv' \
                     --option='simple' \
                     --strategy='mean' \
                     --output_path='./dataset/imputed_data/ecg_mitbih_test_imputed.csv'
# Sample Usage
python seasonal_trend_decomposition.py --data_path='./dataset/ecg_mitbih_test.csv' \
                                       --option='knn' \
                                       --n_neighbors=5 \
                                       --output_path='./dataset/imputed_data/ecg_mitbih_test_imputed.csv'
# Sample Usage
python imputation.py --data_path='./dataset/ecg_mitbih_test.csv' \
                     --option='mice' \
                     --strategy='mean' \
                     --output_path='./dataset/imputed_data/ecg_mitbih_test_imputed.csv'

2.1.2 Testing imputation module by adding random NAs to temporary dataset.

Just add --test_module argument to the command-line for testing the module. If ``--test_moduleargument is given,imputation.py` automatically adds random NAs to the dataset and then continues to impute the missing values.

* Sample Usage
python imputation.py --data_path='./dataset/ecg_mitbih_test.csv' \
                     --option='simple' \
                     --strategy='mean' \
                     --output_path='./dataset/imputed_data/ecg_mitbih_test_imputed.csv'
                     --test_module

2.2 Seasonal Trend Decomposition and Prediction (STL)

2.2.1 Seasonal Trend Detection using Seasonal-Trend LOESS (STL)

2.2.1 Diagnosis of Patterns in Time-Series data

# Sample Usage
python imputation.py --data_path='./dataset/machine_temperature_system_failure.csv' \
                     --seasonal_output_path='./dataset/decomposed_data/seasonal_decomposed.csv'
                     --trend_output_path='./dataset/decomposed_data/trend_decomposed.csv'

2.3 Synchronization using DTW and soft-DTW

# Sample Usage
python synchronization.py --data_path='./dataset/power_voltage.csv' \
                          --dtw_output_path='./dataset/synchronized_data/synchronized_dtw.csv'\
                          --plot_output_path='./dataset/synchronized_data'\
                          --option='dtw'\
                          --distance=2

About

고려대학교 제공 데이터 및 전처리

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages