[BUG]: Pycaret : target column-Misisng values Handling Issue #3938

neelgandhi77 · 2024-03-08T09:55:46Z

pycaret version checks

I have checked that this issue has not already been reported here.
I have confirmed this bug exists on the latest version of pycaret.
I have confirmed this bug exists on the master branch of pycaret (pip install -U git+https://github.com/pycaret/pycaret.git@master).

Issue Description

what if target column contains NaN and when pass it as a target col in Pycaret, it's showing missing value error ; all available Imputation Methods working for rest of cols not for chosen target col.

Reproducible Example

'''
Dataset at Glance
Country	Year	Status	Life expectancy	Adult Mortality	infant deaths	Alcohol	percentage expenditure	Hepatitis B	Measles	...	Polio	Total expenditure	Diphtheria	HIV/AIDS	GDP	Population	thinness 1-19 years	thinness 5-9 years	Income composition of resources	Schooling
0	Afghanistan	2015	Developing	65.0	263.0	62	0.01	71.279624	65.0	1154	...	6.0	8.16	65.0	0.1	584.259210	33736494.0	17.2	17.3	0.479	10.1
1	Afghanistan	2014	Developing	59.9	271.0	64	0.01	73.523582	62.0	492	...	58.0	8.18	62.0	0.1	612.696514	327582.0	17.5	17.5	0.476	10.0

'''

import pandas as pd
from pycaret.classification import *
from pycaret.regression import *

data = pd.read_csv('Life Expectancy.csv')
data.head()
data.columns = data.columns.str.strip()

# Successfully Executed Till Now

s = setup(data, target = 'Life expectancy', numeric_imputation="mean", categorical_imputation="mode")
#error

Expected Behavior

It should have to handle missing values using any type imputation for target column & Setting Up Successfully.

Actual Results

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[7], line 1
----> 1 s = setup(data, target = 'Life expectancy',numeric_imputation="mean",categorical_imputation="mode")

File c:\Users\AppData\Local\Programs\Python\Python311\Lib\site-packages\pycaret\regression\functional.py:593, in setup(data, data_func, target, index, train_size, test_data, ordinal_features, numeric_features, categorical_features, date_features, text_features, ignore_features, keep_features, preprocess, create_date_columns, imputation_type, numeric_imputation, categorical_imputation, iterative_imputation_iters, numeric_iterative_imputer, categorical_iterative_imputer, text_features_method, max_encoding_ohe, encoding_method, rare_to_value, rare_value, polynomial_features, polynomial_degree, low_variance_threshold, group_features, drop_groups, remove_multicollinearity, multicollinearity_threshold, bin_numeric_features, remove_outliers, outliers_method, outliers_threshold, transformation, transformation_method, normalize, normalize_method, pca, pca_method, pca_components, feature_selection, feature_selection_method, feature_selection_estimator, n_features_to_select, transform_target, transform_target_method, custom_pipeline, custom_pipeline_position, data_split_shuffle, data_split_stratify, fold_strategy, fold, fold_shuffle, fold_groups, n_jobs, use_gpu, html, session_id, system_log, log_experiment, experiment_name, experiment_custom_tags, log_plots, log_profile, log_data, verbose, memory, profile, profile_kwargs)
    591 exp = _EXPERIMENT_CLASS()
    592 set_current_experiment(exp)
--> 593 return exp.setup(
    594     data=data,
    595     data_func=data_func,
    596     target=target,
    597     index=index,
    598     train_size=train_size,
    599     test_data=test_data,
    600     ordinal_features=ordinal_features,
    601     numeric_features=numeric_features,
    602     categorical_features=categorical_features,
    603     date_features=date_features,
    604     text_features=text_features,
    605     ignore_features=ignore_features,
    606     keep_features=keep_features,
    607     preprocess=preprocess,
    608     create_date_columns=create_date_columns,
    609     imputation_type=imputation_type,
    610     numeric_imputation=numeric_imputation,
    611     categorical_imputation=categorical_imputation,
    612     iterative_imputation_iters=iterative_imputation_iters,
    613     numeric_iterative_imputer=numeric_iterative_imputer,
    614     categorical_iterative_imputer=categorical_iterative_imputer,
    615     text_features_method=text_features_method,
    616     max_encoding_ohe=max_encoding_ohe,
    617     encoding_method=encoding_method,
    618     rare_to_value=rare_to_value,
    619     rare_value=rare_value,
    620     polynomial_features=polynomial_features,
    621     polynomial_degree=polynomial_degree,
    622     low_variance_threshold=low_variance_threshold,
    623     group_features=group_features,
    624     drop_groups=drop_groups,
    625     remove_multicollinearity=remove_multicollinearity,
    626     multicollinearity_threshold=multicollinearity_threshold,
    627     bin_numeric_features=bin_numeric_features,
    628     remove_outliers=remove_outliers,
    629     outliers_method=outliers_method,
    630     outliers_threshold=outliers_threshold,
    631     transformation=transformation,
    632     transformation_method=transformation_method,
    633     normalize=normalize,
    634     normalize_method=normalize_method,
    635     pca=pca,
    636     pca_method=pca_method,
    637     pca_components=pca_components,
    638     feature_selection=feature_selection,
    639     feature_selection_method=feature_selection_method,
    640     feature_selection_estimator=feature_selection_estimator,
    641     n_features_to_select=n_features_to_select,
    642     transform_target=transform_target,
    643     transform_target_method=transform_target_method,
    644     custom_pipeline=custom_pipeline,
    645     custom_pipeline_position=custom_pipeline_position,
    646     data_split_shuffle=data_split_shuffle,
    647     data_split_stratify=data_split_stratify,
    648     fold_strategy=fold_strategy,
    649     fold=fold,
    650     fold_shuffle=fold_shuffle,
    651     fold_groups=fold_groups,
    652     n_jobs=n_jobs,
    653     use_gpu=use_gpu,
    654     html=html,
    655     session_id=session_id,
    656     system_log=system_log,
    657     log_experiment=log_experiment,
    658     experiment_name=experiment_name,
    659     experiment_custom_tags=experiment_custom_tags,
    660     log_plots=log_plots,
    661     log_profile=log_profile,
    662     log_data=log_data,
    663     verbose=verbose,
    664     memory=memory,
    665     profile=profile,
    666     profile_kwargs=profile_kwargs,
    667 )

File c:\Users\AppData\Local\Programs\Python\Python311\Lib\site-packages\pycaret\regression\oop.py:711, in RegressionExperiment.setup(self, data, data_func, target, index, train_size, test_data, ordinal_features, numeric_features, categorical_features, date_features, text_features, ignore_features, keep_features, preprocess, create_date_columns, imputation_type, numeric_imputation, categorical_imputation, iterative_imputation_iters, numeric_iterative_imputer, categorical_iterative_imputer, text_features_method, max_encoding_ohe, encoding_method, rare_to_value, rare_value, polynomial_features, polynomial_degree, low_variance_threshold, group_features, drop_groups, remove_multicollinearity, multicollinearity_threshold, bin_numeric_features, remove_outliers, outliers_method, outliers_threshold, transformation, transformation_method, normalize, normalize_method, pca, pca_method, pca_components, feature_selection, feature_selection_method, feature_selection_estimator, n_features_to_select, transform_target, transform_target_method, custom_pipeline, custom_pipeline_position, data_split_shuffle, data_split_stratify, fold_strategy, fold, fold_shuffle, fold_groups, n_jobs, use_gpu, html, session_id, system_log, log_experiment, experiment_name, experiment_custom_tags, log_plots, log_profile, log_data, engine, verbose, memory, profile, profile_kwargs)
    708 if data_func is not None:
    709     data = data_func()
--> 711 self.data = self._prepare_dataset(data, target)
    712 self.target_param = self.data.columns[-1]
    713 self.index = index

File c:\Users\AppData\Local\Programs\Python\Python311\Lib\site-packages\pycaret\internal\preprocess\preprocessor.py:165, in Preprocessor._prepare_dataset(self, X, y)
    163 # Check that y has no missing values
    164 if y.isna().any():
--> 165     raise ValueError(
    166         f"{y.isna().sum()} missing values found in the target column: "
    167         f"{y.name}. To proceed, remove the respective rows from the data. "
    168     )
    170 return df_shrink_dtypes(
    171     X.merge(y.to_frame(), left_index=True, right_index=True)
    172 )

ValueError: 10 missing values found in the target column: Life expectancy. To proceed, remove the respective rows from the data.

Pycaret 3.2.0

Replace this line with the output of the version code above.

neelgandhi77 added the bug Something isn't working label Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Pycaret : target column-Misisng values Handling Issue #3938

[BUG]: Pycaret : target column-Misisng values Handling Issue #3938

neelgandhi77 commented Mar 8, 2024 •

edited

[BUG]: Pycaret : target column-Misisng values Handling Issue #3938

[BUG]: Pycaret : target column-Misisng values Handling Issue #3938

Comments

neelgandhi77 commented Mar 8, 2024 • edited

pycaret version checks

Issue Description

Reproducible Example

Expected Behavior

Actual Results

neelgandhi77 commented Mar 8, 2024 •

edited