Skip to content
This project contains the code of the implementation of the approach proposed in I. Gallo, A. Calefati, S. Nawaz and M.K. Janjua, "Image and Encoded Text Fusion for Multi-Modal Classification", DICTA2018, Canberra, Australia.
Python Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
accuracy
dataManagement
imageManipulation
labelManagement
logger
model
modelSaver
parameterManager
patience
result
scripts
tensorflowWrapper
textManagement
view
.gitignore
README.md
extract-dataset-ferramenta.sh
extraction_parameters.csv
requirements.txt
train-on-ferramenta.sh
training_parameters.csv

README.md

Multi-modal classification

This code is the implementation of the approach described in:

I. Gallo, A. Calefati, S. Nawaz and M.K. Janjua, "Image and Encoded Text Fusion for Multi-Modal Classification", presented at 2018 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Canberra, Australia, 2018

If you use this code entirely or partially, please cite our paper.

Pre-requisites

In the requirements.txt there are dependencies required by the project in order to work. However, Tensorflow packages has not been added because you can use either the GPU version or the standard one.

Install Tensorflow

To install Tensorflow, follow instructions at: https://www.tensorflow.org/install

Install dependencies

Install dependencies with:

pip install -r requirements.txt

How to use

  1. Launch the bash script "train-on-ferramenta.sh"

bash train-on-ferramenta.sh

It first download the tar.gz files of the dataset, then extracts it and, finally, launches the training process.

  1. When the process it's over, launch "extract-dataset-ferramenta.sh" script

bash extract-dataset-ferramenta.sh

It processes the original dataset making a copy of it, with the images that contain encoding of text information on top.

  1. With the dataset obtained, you can train a simple CNN for image classification exploiting the advantages of this approach.

Custom parameters

If you want to change the value of parameters used, you can modify values contained in training_parameters.csv and extraction_parameters.csv.

In these 2 files there are also references of where to find the dataset to load and its format.

If you want to run our code on your own multi-modal dataset (containing images and text for each sample), please check the format of train.csv and val.csv.

Help

If you are in troubles running the code, please contact us:

You can’t perform that action at this time.