### Encoding categorical features
The ```categories``` and ```dummies``` arguments in the ```df_to_pandas``` method can be used to encode the categorical features. In this example, we create an encoding template for the ```ACSIncome``` task, namely ```ACSIncome_categories```, based on Appendix B.1 of the original [folktables paper](https://arxiv.org/abs/2108.04884). Note that the encoding of categorical features is subject to change and that any updates or additional information can be found in the [ACS PUMS documentation](https://www.census.gov/programs-surveys/acs/microdata/documentation.html).

In [None]:
from folktables import ACSDataSource, ACSIncome, generate_categories

The ```categories``` argument in the ```df_to_pandas``` method expects a nested dict with columns of categorical features and their corresponding encodings.

In [None]:
ACSIncome_categories = {
    "COW": {
        1.0: (
            "Employee of a private for-profit company or"
            "business, or of an individual, for wages,"
            "salary, or commissions"
        ),
        2.0: (
            "Employee of a private not-for-profit, tax-exempt,"
            "or charitable organization"
        ),
        3.0: "Local government employee (city, county, etc.)",
        4.0: "State government employee",
        5.0: "Federal government employee",
        6.0: (
            "Self-employed in own not incorporated business,"
            "professional practice, or farm"
        ),
        7.0: (
            "Self-employed in own incorporated business,"
            "professional practice or farm"
        ),
        8.0: "Working without pay in family business or farm",
        9.0: "Unemployed and last worked 5 years ago or earlier or never worked",
    },
    "SCHL": {
        1.0: "No schooling completed",
        2.0: "Nursery school, preschool",
        3.0: "Kindergarten",
        4.0: "Grade 1",
        5.0: "Grade 2",
        6.0: "Grade 3",
        7.0: "Grade 4",
        8.0: "Grade 5",
        9.0: "Grade 6",
        10.0: "Grade 7",
        11.0: "Grade 8",
        12.0: "Grade 9",
        13.0: "Grade 10",
        14.0: "Grade 11",
        15.0: "12th grade - no diploma",
        16.0: "Regular high school diploma",
        17.0: "GED or alternative credential",
        18.0: "Some college, but less than 1 year",
        19.0: "1 or more years of college credit, no degree",
        20.0: "Associate's degree",
        21.0: "Bachelor's degree",
        22.0: "Master's degree",
        23.0: "Professional degree beyond a bachelor's degree",
        24.0: "Doctorate degree",
    },
    "MAR": {
        1.0: "Married",
        2.0: "Widowed",
        3.0: "Divorced",
        4.0: "Separated",
        5.0: "Never married or under 15 years old",
    },
    "SEX": {1.0: "Male", 2.0: "Female"},
    "RAC1P": {
        1.0: "White alone",
        2.0: "Black or African American alone",
        3.0: "American Indian alone",
        4.0: "Alaska Native alone",
        5.0: (
            "American Indian and Alaska Native tribes specified;"
            "or American Indian or Alaska Native,"
            "not specified and no other"
        ),
        6.0: "Asian alone",
        7.0: "Native Hawaiian and Other Pacific Islander alone",
        8.0: "Some Other Race alone",
        9.0: "Two or More Races",
    },
}


We use the ```dummies``` argument to indicate that dummy variables for all the categorical features (which we have encoded via the ```categories``` argument) should be created. Note that the default value is ```False``` which just returns a pandas dataframe with the encoded categorical features.

In [None]:
data_source = ACSDataSource(survey_year='2018', horizon='1-Year', survey='person')
ca_data = data_source.get_data(states=["CA"], download=True)

ca_features, ca_labels, _ = ACSIncome.df_to_pandas(ca_data, categories=ACSIncome_categories, dummies=True)

ca_features.head()

In [None]:
ca_labels.head()

## Automatic generated categorical feature encoding
The ```categories``` argument can automatically be generated using the ```get_definitions``` and ```generate_categories``` function to respectively download the definitions available [here](https://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/) and create the categories dictionary.
Note that this feature only works for ```survey_year```>='2017' and categories not needing the externally deffined ```PUMA``` codes. [puma codes](https://www2.census.gov/geo/docs/reference/puma/2010_PUMA_Names.txt)

In [None]:
definition_df = data_source.get_definitions(download=True)
categories = generate_categories(features=ACSIncome.features, definition_df=definition_df)

ca_features, ca_labels, _ = ACSIncome.df_to_pandas(ca_data, categories=categories, dummies=True)

ca_features.head()

In [None]:
ca_labels.head()