The problem consists in classifying all the blocks of the page layout of a document that has been detected by a segmentation process. This is an essential step in document analysis in order to separate text from graphic areas. Indeed, the five classes are: text (1), horizontal line (2), picture (3), vertical line (4) and graphic (5).
The 5473 examples comes from 54 distinct documents. Each observation concerns one block. All attributes are numeric. There is no missing value.
We advice you to first explore the Notebook file and the Presentation file in order to learn more about our dataset: the data visualization and the data modelisation.
Then you can go through our Flask app to play with the parameters and make predictions.
Members of Group : ZOBIRI Samia and SEYDI Aminata