Skip to content

dlt3/Odor-data-analysis

Repository files navigation

Odor data analysis

This study focus on develop a odor predict model and interpret the model's classification result by using explainable AI method.

Reference

Research purpose

  • Prevention of odor in pig barns by managing chemical substances (odor substances) that affect odor generation
  • Creation of an optimal prediction model for complex odors using 15 odorous substances
  • Identification of the influence of odorous substances on complex odors and the interaction effect between odorous substances
  • Creation of a complex odor classification prediction model using 15 odorous substances and measurement-related variables
  • Prevention of bad smell in pig houses by managing chemical substances (odor substances) that affect odor generation

Data information

  • explanatory variable : Complex odor
  • response variable : 15 odorous substances
    • Ammonia
    • Sulfur compounds: Hydrogen Sulfide, Methyl mercaptan, Dimethyl sulfide, Dimethyl disulfide
    • Volatile Organic compounds: Acetic acid , Propionic acid, Butyric acid, Iso-Butyric acid, Valeric acid, Iso-Valeric aic, Phenol, para-Cresol, Indole, Skatole

image

Analysis process

Research 1

  • Compare different analysis processes to find the optimal predictive model
  • Data problems and solutions
    • High missing rate: Considering the fact that the missing rate may be high considering data collection through sensors in the future, consider the replacement method rather than the missing value removal method
    • Small amount of data: Model validation through the Leave-One-Out Cross Validation (LOOCV) method that can be used when there is little data
  • Data pre-processing
    • Missing imputation: Simple imputation (mean, median), Multivariate imputation (bayesian), Multiple imputation (bayesian ridge, gaussian process regression, KNN)
    • Feature preprocessing: standardization, Partial Least Square (PLS), Principal Component Analysis (PCA)
  • Prediction models: Regression, SVM, RandomForest, ExtraTree, XGBoost, DNN
  • Model Verification: Using R-square, MAPE through LOOCV
  • Additional Analysis: Correlation Analysis, Principal Component Analysis(PCA), Identification of predictor feature importance

image

Research 2

  • Features related to measurement: measurement time (year, month, day), measurement location (inside the pig barn, outside the pig barn, site boundary)
  • summary
    • Perform data preprocessing based on primary research and compare multiple machine learning models
    • Minimize overfitting by analyzing 30 times and select the optimal model through 8 evaluation indicators
    • Identification of the influence and interaction effect of odor spray through the XAI method
  • Data pre-processing
    • Complex odor: Conversion of continuous data into binary classification data in the form of emission possible / non emission in accordance with the domestic odor prevention law
    • Measurement-related variables: Measurement time variables are converted into seasonal variables, followed by One-Hot Encoding, and measurement location variables One-Hot Encoding
    • Variable preprocessing: Multivariate imputation (bayesian ridge) & Standardization
  • Prediction models: k-Nearest Neighbor, SVC, RandomForest, LightGBM, ExtraTree, XGBoost
  • Model validation: F1-score, Accuracy, Sensitivity, Specitiv
  • Identification of influence: XAI - Partial Dependence Plot, variable importance
  • Additional analysis: correlation analysis and VIF (continuous variable), ANOVA (categorical variable)

image

Releases

No releases published

Packages

No packages published