-
Notifications
You must be signed in to change notification settings - Fork 104
getting ValueError when running notebook with XGBoost on Titanic dataset. #3
Description
Hi,
Thanks for sharing your work!
I just tested the titanic dataset downloaded from https://www.kaggle.com/c/titanic/data with XGBoost as below-
m, feats, trainm, testm = Auto_ViML(train, target, test, sample_submission, scoring_parameter=scoring_parameter, hyper_param='GS',feature_reduction=True, Boosting_Flag=True,Binning_Flag=False, Add_Poly=0, Stacking_Flag=False, Imbalanced_Flag=False, verbose=1)
Once I ran the above code then found below error-
ValueError: DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in fields Name
It seems same error occurs in case of Boosting_Flag=None. Logs of the console just prior to error is as below-
Train (Size: 891,12) has Single_Label with target: ['Survived']
"
################### Binary-Class ##################### "
Shuffling the data set before training
Class -> Counts -> Percent
1: 342 -> 38.4%
0: 549 -> 61.6%
Selecting 2-Class Classifier...
Using GridSearchCV for Hyper Parameter tuning...
Target Survived is already numeric. No transformation done.
Top columns in Train with missing values: ['Cabin', 'Age', 'Embarked']
and their missing value totals: [687, 177, 2]
Classifying variables in data set...
Number of Numeric Columns = 2
Number of Integer-Categorical Columns = 3
Number of String-Categorical Columns = 1
Number of Factor-Categorical Columns = 0
Number of String-Boolean Columns = 1
Number of Numeric-Boolean Columns = 0
Number of Discrete String Columns = 2
Number of NLP String Columns = 0
Number of Date Time Columns = 0
Number of ID Columns = 2
Number of Columns to Delete = 0
11 Predictors classified...
This does not include the Target column(s)
2 variables removed since they were some ID or low-information variables
Completed Label Encoding, Missing Value Imputing and Scaling of data without errors.
No Missing values in Train
Test data has no missing values
Number of numeric variables = 5
No variables were removed since no highly correlated variables found in data
Data Ready for Modeling with Target variable = Survived
Starting Selection among 11 predictors...
Number of numeric variables = 5
No variables were removed since no highly correlated variables found in data
Adding 6 categorical variables to reduced numeric variables of 5
Selected No. of variables = 11
Finding Important Features...
in 11 variables