encoderder

Encode the csv-like file to sparse format for FM auxiliary feature input.

Factorization machine using the libsvm for input, and for experiment more conveniently, this repo is created !

Sparse format for FM

(ref:https://towardsdatascience.com/factorization-machines-for-item-recommendation-with-implicit-feedback-data-5655a7c749db)

As you can see the FM's input for user and item using the One-Hot encode to generate a huge sparse matrix . The scikit-learn OneHotEncoder can't afford this kind task.

Furthermore, we also want to add the short-text feature or other auxiliary feature into FM, and those feature can be all kinds of type, e.g. categorical data, numerical data, we want to encode them all, so develope this tool is necessary !

Usage

python3 encoderder.py -c [config]

Config

encoderder support json config format, which look like :

{
    "train": {
        "input": "./ml.csv",
        "output": "./train.txt",
        "cached": true,
        "seperator": ",",
        "header": true,
        "sparse": true,
        "target_columns": [
            {
                "index": 0,
                "type": "cat"
            },
            {
                "index": 1,
                "type": "cat"
            },
            {
                "index": 2,
                "type": "truth"
            }
        ]
    }
}

Config attributes

INPUT/OUTPUT:

input : input file
output : output file for sparse format

TARGET COLUMNS:

target column : the intersted column you want to encode, support some kinds of feature type
- cat : (categorical data)
- num : (numerical data)
- truth : (labled data)

OTHER CONFIG

cache : turn it on to will cached some data in memory, which cost more memory
seperator : the symbol seperate the columns
header : skip first line if the dataset contains the header
sparse : turn it on to generate the scipy's coo matrix in npz format output

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
encoder		encoder
example		example
utils		utils
README.md		README.md
encoderder.py		encoderder.py
main.py		main.py
pyproject.toml		pyproject.toml
test.json		test.json
train.txt		train.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

encoderder

Sparse format for FM

Usage

Config

Config attributes

INPUT/OUTPUT:

TARGET COLUMNS:

OTHER CONFIG

About

Releases

Packages

Languages

king0980692/encoderder

Folders and files

Latest commit

History

Repository files navigation

encoderder

Sparse format for FM

Usage

Config

Config attributes

INPUT/OUTPUT:

TARGET COLUMNS:

OTHER CONFIG

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages