Skip to content
Branch: master
Find file History
Latest commit 7b10933 Mar 29, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
data initialize files Mar 28, 2019
figures initialize files Mar 28, 2019 initialize files Mar 28, 2019

Feature-based Recommendation

Click-through Rate (CTR) prediction takes users' and items' features as input and outputs the probability of user clicking the item. For this problem, a core task is to learn (high-order) feature interactions because feature combinations are usually powerful indicators for prediction. We propose a self-attentive neural method (AutoInt) to automatically learning representations of high-order combination features.

In AutoInt, we first project all sparse features (both categorical and numerical features) into the low-dimensional space. Next, we feed embeddings of all fields into stacked multiple interacting layers implemented by self-attentive neural network. The output of the final interacting layer is the low-dimensional representation of learnt combinatorial features, which is further used for estimating the CTR via sigmoid function.

Next, we introduce how to run AutoInt on four benchmark data sets.


  • Tensorflow 1.4.0-rc1
  • Python 3
  • CUDA 8.0+ (For GPU)


Input Format

AutoInt requires the input data in the following format:

  • train_x: matrix with shape (num_sample, num_field). train_x[s][t] is the feature value of feature field t of sample s in the dataset. The default value for categorical feature is 1.
  • train_i: matrix with shape (num_sample, num_field). train_i[s][t] is the feature index of feature field t of sample s in the dataset. The maximal value of train_i is the feature size.
  • train_y: label of each sample in the dataset.

If you want to know how to preprocess the data, please refer to data/Dataprocess/Criteo/


We use four public real-world datasets(Avazu, Criteo, KDD12, MovieLens-1M) in our experiments. Since the first three datasets are super huge, they can not be fit into the memory as a whole. In our implementation, we split the whole dataset into 10 parts and we use the first file as test set and the second file as valid set. We provide the codes for preprocessing these three datasets in data/Dataprocess. If you want to reuse these codes, you should first run to generate train_x.txt, train_i.txt, train_y.txt as described in Input Format. Then you should run data/Dataprocesss/Kfold_split/ to split the whole dataset into ten folds. Finally you can run to scale the numerical value(optional).

To help test the correctness of the code and familarize yourself with the code, we upload the first 10000 samples of Criteo dataset in train_examples.txt. And we provide the scripts for preprocessing and training.(Please refer to data/ and, you may need to modify the path in and

After you run the data/, you should get a folder named Criteo which contains part*, feature_size.npy, fold_index.npy, train_*.txt. feature_size.npy contains the number of total features which will be used to initialize the model. train_*.txt is the whole dataset. If you use other small dataset, say MovieLens-1M, you only need to modify the function _run_ in autoint/

Here's how to run the preprocessing.

cd data
mkdir Criteo
python ./Dataprocess/Criteo/
python ./Dataprocess/Kfold_split/
python ./Dataprocess/Criteo/

Here's how to run the training.

CUDA_VISIBLE_DEVICES=0 python -m autoint.train \
                        --data_path data --data Criteo \
                        --blocks 3 --heads 2  --block_shape "[64, 64, 64]" \
                        --is_save --has_residual \
                        --save_path ./models/Criteo/b3h2_64x64x64/ \
                        --field_size 39  --run_times 1 \
                        --epoch 3 --batch_size 1024 \

You should see the output like this:

train logs
start testing!...
restored from ./models/Criteo/b3h2_64x64x64/1/
test-result = 0.8088, test-logloss = 0.4430
test_auc [0.8088305055534442]
test_log_loss [0.44297631300399626]
avg_auc 0.8088305055534442
avg_log_loss 0.44297631300399626


If you find AutoInt useful for your research, please consider citing the following paper:

  title={AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks},
  author={Weiping, Song and Chence, Shi and Zhiping, Xiao and Zhijian, Duan and Yewen, Xu and Ming, Zhang and Jian, Tang},
  journal={arXiv preprint arXiv:1810.11921},

Contact information

If you have questions related to the code, feel free to contact Weiping Song (, Chence Shi ( and Zhijian Duan (


This implementation gets inspirations from Kyubyong Park's transformer and Chenglong Chen' DeepFM.

You can’t perform that action at this time.