diff --git a/predict-credit-churn/CreditChurn.ipynb b/predict-credit-churn/CreditChurn.ipynb index 121b21c..eeb399e 100644 --- a/predict-credit-churn/CreditChurn.ipynb +++ b/predict-credit-churn/CreditChurn.ipynb @@ -94,18 +94,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "First let's use some built in functions from EvalML to convert the data to a woodwork data structure and then cast its dtypes to something we'd rather work with. Then we're going to take a look at some of the unqiue, non-numeric values in the features. Sure enough, `Education_Level`, `Marital_Status`, and `Income_Category` have `Unknown` as a value. This is something we'll have to remember before we get to the model training, since `Unknown` isn't an acceptable value for any of the features." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from evalml.utils.gen_utils import _convert_to_woodwork_structure, _convert_woodwork_types_wrapper\n", - "data = _convert_to_woodwork_structure(data)\n", - "data = _convert_woodwork_types_wrapper(data.to_dataframe())" + "We're going to take a look at some of the unqiue, non-numeric values in the features. Sure enough, `Education_Level`, `Marital_Status`, and `Income_Category` have `Unknown` as a value. This is something we'll have to remember before we get to the model training, since `Unknown` isn't an acceptable value for any of the features." ] }, { @@ -183,7 +172,7 @@ "outputs": [], "source": [ "X = data.copy()\n", - "data = data.drop(['Credit_Limit'], axis=1)\n", + "X = X.drop(['Credit_Limit'], axis=1)\n", "y = X.pop('Attrition_Flag')\n", "\n", "X['Income_Category'] = X['Income_Category'].replace({'Less than $40K':0,\n", @@ -230,6 +219,25 @@ "X = preprocessing(X, y)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Using `infer_feature_types`, we can convert our dataset into a [Woodwork](https://github.com/alteryx/woodwork) data structure, and even [specify what types](https://evalml.alteryx.com/en/stable/user_guide/automl.html) certain features should be. For example, we want to cast `Income_Category` as a categorical type, rather than natural language which is what it was inferred as." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from evalml.utils.gen_utils import infer_feature_types\n", + "X = infer_feature_types(X, feature_types={'Income_Category': 'categorical',\n", + " 'Education_Level': 'categorical'})\n", + "X" + ] + }, { "cell_type": "markdown", "metadata": {},