Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


Aesthetic Multi-Attribute Network (AMAN) contains Multi-Attribute Feature Network (MAFN), Channel and Spatial Attention Network (CSAN), and Language Generation Network (LGN). The core of MAFN contains GFN and AFN, which regress the global score and attribute scores of an image in PCCD using multi-task regression. They share the dense feature map and have separated global and attribute feature maps, respectively. Our AMAN is pre-trained on PCCD and finetuned on our DPC-Captions dataset. The CSAN dynamically adjusts the attentional weights of channel dimension and spatial dimension [6] of the extracted features. The LGN generates the final comments by LSTM networks which are fed with ground truth attribute captions in DPC-Captions and attribute feature maps from CSAN.

DPC-Captions Dataset

The aesthetic attributes of image captions are from PCCD[1], which contains comments and a score for each of the 7 aesthetic attributes (including overall impression, etc.). However, the scale of PCCD is quite small (only 4235 images). While the AVA[2] dataset contains 255,530 images with an assessment score distribution for each image. The images and score distributions of AVA dataset are crawled from the website of Their exist comments from multiple reviewers attached for every image. However, the multiple comments are not arranged by aesthetic attributes. We then crawl 330,000 images together with their comments from We call this dataset AVA-Plus.
Images of DPC-Captions are selected from the AVA-Plus with the help of PCCD datasets. The aesthetic attributes of PCCD dataset include Color Lighting, Composition, Depth of Field, Focus, General Impression and Use of Camera. For each aesthetic attribute, keywords of top 5 frequency are selected from the captions. We omit the adverbs, prepositions and conjunctions. We combine words with similar meaning such as color and colour, colors and colors. A statistic of the keywords frequency are shown in the table below.

(The datasets can be downloaded in

PCCD Dataset

PCCD is a nearly fully annotated dataset, which contains comments and a score for each of the 7 aesthetic attributes (including overall impression, etc.)

(The datasets in the /data file.)


The project code needs to run on Maxwell and Pascal architecture graphics cards. Incompatibility issues may occur with newer architecture graphics cards.

Ubuntu 16.04LTS, Nvidia-384, CUDA 8.0, cuDNN v7.1.3



You can run the file: to get the final file.

Other File (Including h5 file and other required files)

Baidu Netdisk: Link: Password:upgb

Our Paper

Xin Jin, Le Wu, Geng Zhao, Xiaodong Li, Xiaokun Zhang, Shiming Ge, Dongqing Zou, Bin Zhou, Xinghui Zhou. Aesthetic Attributes Assessment of Images. ACM Multimedia (ACMMM), Nice, France, 21-25 Oct. 2019. pdf-HD(31.1MB) pdf-LR(1.11MB) arXiv(1907.04983)


Please cite the ACM Multimedia paper if you use Aesthetic Multi-Attribute Network in your work:

  author    = {Xin Jin, Le Wu, Geng Zhao, Xiaodong Li, Xiaokun Zhang, Shiming Ge, Dongqing Zou, Bin Zhou and Xinghui Zhou},
  title     = {Aesthetic Attributes Assessment of Images},
  booktitle = {Proceedings of the 27th {ACM} International Conference on Multimedia,
               {MM} 2019, Nice, France, October 21-25, 2019},
  pages     = {311--319},
  year      = {2019},
  crossref  = {DBLP:conf/mm/2019},
  url       = {},
  doi       = {10.1145/3343031.3350970},
  timestamp = {Fri, 06 Dec 2019 16:44:03 +0100},
  biburl    = {},
  bibsource = {dblp computer science bibliography,}


No description, website, or topics provided.






No releases published


No packages published