Skip to content
/ TINTO Public
forked from oeg-upm/TINTO

Algorithm for converting Tidy Data into Synthetic Images

License

Notifications You must be signed in to change notification settings

BorjaRei/TINTO

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

License DOI Python Version Documentation Status Open In Colab

TINTO is an engine that constructs Synthetic Images from Tidy Data (also knows as Tabular Data).

Citing TINTO: If you used TINTO in your work, please cite the INFFUS Paper:

@article{inffus_TINTO,
    title = {A novel deep learning approach using blurring image techniques for Bluetooth-based indoor localisation},
    journal = {Information Fusion},
    author = {Reewos Talla-Chumpitaz and Manuel Castillo-Cara and Luis Orozco-Barbosa and Raúl García-Castro},
    volume = {91},
    pages = {173-186},
    year = {2023},
    issn = {1566-2535},
    doi = {https://doi.org/10.1016/j.inffus.2022.10.011}
}

Main Features

  • Supports all CSV data in Tidy Data format.
  • For now, the algorithm converts tabular data for binary and multi-class classification problems into machine learning.
  • Input data formats:
    • Tabular files: The input data must be in CSV, taking into account the Tidy Data format.
    • Tidy Data: The target (variable to be predicted) should be set as the last column of the dataset. Therefore, the first columns will be the features.
    • All data must be in numerical form. TINTO does not accept data in string or any other non-numeric format.
  • Two dimensionality reduction algorithms are used in image creation, PCA and t-SNE from the Scikit-learn Python library.
  • The synthetic images to be created will be in black and white, i.e. in 1 channel.
  • The synthetic image dimensions can be set as a parameter when creating them.
  • The synthetic images can be created using characteristic pixels or blurring painting technique (expressing an overlap of pixels as the maximum or average).
  • Runs on Linux, Windows and macOS systems.
  • Compatible with Python 3.7 or higher.

Video Documentation

TINTO-short-withSound.mp4

Getting Started

TINTO is easy to use in terminal:

Fist, it is important to install all previus libraries

    pip install -r requirements.txt

To run the engine via command line and see all the arguments you just need to execute the following:

    python tinto.py -h

Help

The default parameter are the following:

  • Dimensional Reduction Algorithm (-alg): Select the dimensionality reduction algorithm to be used for image creation. The PCA** or t-SNE algorithms can be chosen. By default, use the PCA** algorithm.
  • Image size (-px): 20x20 pixels
  • Blurring (-B): for default is False, i.e., it do not use Blurring technique and create de images with characteristic pixels
  • Amplification (-aB): Only if Blurring is True. It is the blurring amplification and for default is PI number, i.e., 3.141592653589793 aprox.
  • Blurring distance (-dB): Only if Blurring is True. It is Blurring distance and for default is 0.1 (10%).
  • Blurring steps (-sB): Only if Blurring is True. It is Blurring steps and for default is 4, i.e., expand 4 pixels the blurring.
  • Blurring option (-oB): Only if Blurring is True. It is the Blurring option and for default is mean, i.e., if two pixels are overlaping, calculate the average number of this two overlaping pixels.
  • Save Configuration (-sC): Save the configurarion in a pikle object. It is False for default.
  • Load Configuration (-lC): Load the configurarion in a pikle object. It is False for default.
  • Seed (-sd): Set a seed for the random numbers. It is 20 for default.
  • _t_SNE times replication (-tt): It is only used when t-SNE is used. It is t-SNE times replication and for defaultd is 4.
  • Verbose (-v). Show in terminal the execution. For default is False.

Previous considerations

Please note that the following considerations must be taken into account before running the script:

  • Data must be in CSV with the default separator, i.e., commas.
  • Only create images when we have data for a binary or multi-class classification problem.
  • The last column should be the targer (variable to predict).
  • The first columns will be the characteristics.
  • All variables must be in numerical format.
  • The script takes by default the first row as the name of each feature, therefore, the different features must be named.
  • Each sample (row) of the dataset will correspond to an image.

For example, the following table shows a classic example of the IRIS CSV dataset as it should look like for the run:

sepal length sepal width petal length petal width target
4.9 3.0 1.4 0.2 1
7.0 3.2 4.7 1.4 2
6.3 3.3 6.0 2.5 3

Simple example without Blurring

The following example shows how to create 20x20 images with characteristic pixels, i.e. without blurring.

    python tinto.py "iris.csv" "iris_images"

The images are created with the following considerations regarding the parameters used:

  • python: to launch the Python script
  • tinto.py: the name of the script
  • iris.csv: the dataset to use. In this example, the IRIS dataset is used.
  • iris/: the folder where the images will be saved.

Also, as no other parameters are indicated, you will choose the following parameters which are set by default:

  • Image size: 20x20 pixels
  • Blurring: No blurring will be used.
  • Seed: with the seed set to 20.

Within the folder named "iris/" we can find subfolders with numbers where each number corresponds to the target used. For example, for the dataset iris.csv we will have three subfolders named "1/", "2/" and "3/". The following Figure shows an image created according to the example seen.

Characteristic

More specific example

The following example shows how to create with blurring with a more especific parameters.

    python tinto.py "iris.csv" "iris_images_tSNE" -B -alg t-SNE -oB maximum -px 30 -sB 5

The images are created with the following considerations regarding the parameters used:

  • Blurring (-B): Create the images with blurring technique.
  • Dimensional Reduction Algorithm (-alg): t-SNE is used.
  • Blurring option (-oB): Create de images with maximum value of overlaping pixel
  • Image size (-px): 30x30 pixels
  • Blurring steps (-sB): Expand 5 pixels the blurring.

Blurring

License

TINTO is available under the Apache License 2.0.

Authors

Ontology Engineering Group, Universidad Politécnica de Madrid.

Contributors

See the full list of contributors here.

Ontology Engineering Group Universidad Politécnica de Madrid Universidad de Castilla-La Mancha

About

Algorithm for converting Tidy Data into Synthetic Images

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%