Skip to content

getting ValueError when running notebook with XGBoost on Titanic dataset. #3

@dsbyprateekg

Description

@dsbyprateekg

Hi,

Thanks for sharing your work!
I just tested the titanic dataset downloaded from https://www.kaggle.com/c/titanic/data with XGBoost as below-
m, feats, trainm, testm = Auto_ViML(train, target, test, sample_submission, scoring_parameter=scoring_parameter, hyper_param='GS',feature_reduction=True, Boosting_Flag=True,Binning_Flag=False, Add_Poly=0, Stacking_Flag=False, Imbalanced_Flag=False, verbose=1)

Once I ran the above code then found below error-
ValueError: DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in fields Name

It seems same error occurs in case of Boosting_Flag=None. Logs of the console just prior to error is as below-

Train (Size: 891,12) has Single_Label with target: ['Survived']
"
################### Binary-Class ##################### "
Shuffling the data set before training
Class -> Counts -> Percent
1: 342 -> 38.4%
0: 549 -> 61.6%
Selecting 2-Class Classifier...
Using GridSearchCV for Hyper Parameter tuning...
Target Survived is already numeric. No transformation done.
Top columns in Train with missing values: ['Cabin', 'Age', 'Embarked']
and their missing value totals: [687, 177, 2]
Classifying variables in data set...
Number of Numeric Columns = 2
Number of Integer-Categorical Columns = 3
Number of String-Categorical Columns = 1
Number of Factor-Categorical Columns = 0
Number of String-Boolean Columns = 1
Number of Numeric-Boolean Columns = 0
Number of Discrete String Columns = 2
Number of NLP String Columns = 0
Number of Date Time Columns = 0
Number of ID Columns = 2
Number of Columns to Delete = 0
11 Predictors classified...
This does not include the Target column(s)
2 variables removed since they were some ID or low-information variables
Completed Label Encoding, Missing Value Imputing and Scaling of data without errors.
No Missing values in Train
Test data has no missing values
Number of numeric variables = 5
No variables were removed since no highly correlated variables found in data

Data Ready for Modeling with Target variable = Survived
Starting Selection among 11 predictors...
Number of numeric variables = 5
No variables were removed since no highly correlated variables found in data
Adding 6 categorical variables to reduced numeric variables of 5
Selected No. of variables = 11
Finding Important Features...
in 11 variables

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions