# sklearn's Category Encoders

#### sklearn's Category Encoders package
   * largely derived from StatsModel's Patsy package
   
   #### Classic Encoders:
        * Ordinal — convert string labels to integer values 1 through k. Ordinal.
        * OneHot — one column for each value to compare vs. all other values. Nominal, ordinal.
        * Binary — convert each integer to binary digits. Each binary digit gets one column. Some info loss but fewer dimensions. Ordinal.
        * BaseN — Ordinal, Binary, or higher encoding. Nominal, ordinal. Doesn’t add much functionality. Probably avoid.
        * Hashing — Like OneHot but fewer dimensions, some info loss due to collisions. Nominal, ordinal.
        * Sum — Just like OneHot except one value is held constant and encoded as -1 across all columns.
        
   #### Contrast Encoders:
        * The five contrast encoders all have multiple issues that I argue make them unlikely to be useful for machine learning. They all output one column for each value found in a column.
        * Helmert (reverse) — The mean of the dependent variable for a level is compared to the mean of the dependent variable over all previous levels.
        * Backward Difference — the mean of the dependent variable for a level is compared with the mean of the dependent variable for the prior level.
        * Polynomial — orthogonal polynomial contrasts. The coefficients taken on by polynomial coding for k=4 levels are the linear, quadratic, and cubic trends in the categorical variable.

   #### Bayesian Encoders:
       * The Bayesian encoders use information from the dependent variable in their encodings. They output one column and can work well with high cardinality data.
       * Target — use the mean of the DV, must take steps to avoid overfitting/ response leakage. Nominal, ordinal. For classification tasks.
       * LeaveOneOut — similar to target but avoids contamination. Nominal, ordinal. For classification tasks.
       * WeightOfEvidence — added in v1.3. Not documented in the docs as of April 11, 2019. The method is explained in this post.
       * James-Stein — forthcoming in v1.4. Described in the code here.
       * M-estimator — forthcoming in v1.4. Described in the code here. Simplified target encoder.
       
   * Note that all Category Encoders impute missing values automatically by default. However, I recommend filling missing data data yourself prior to encoding so you can test the results of several methods.
   * **Some terminology:**
       * *k* is the original number of unique values in your data column
       * *High cardinality* means a lot of unique values (a large *k*)
       * *High dimensionality* means a matrix with many dimensions; comes with Curse of Dimensionality (often results in overfitting)
       * *Sparse* data is a matrix with lots of zeroes relative to other values. Some algorithms may not work well with sparse data
       
        
        
