Python-for-Data-Analysis

The problem consists in classifying all the blocks of the page layout of a document that has been detected by a segmentation process. This is an essential step in document analysis in order to separate text from graphic areas. Indeed, the five classes are: text (1), horizontal line (2), picture (3), vertical line (4) and graphic (5).

The 5473 examples comes from 54 distinct documents. Each observation concerns one block. All attributes are numeric. There is no missing value.

We advice you to first explore the Notebook file and the Presentation file in order to learn more about our dataset: the data visualization and the data modelisation.

Then you can go through our Flask app to play with the parameters and make predictions.

Members of Group : ZOBIRI Samia and SEYDI Aminata

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
image		image
models_and_data		models_and_data
templates		templates
PageBlock.ipynb		PageBlock.ipynb
Presentation.pdf		Presentation.pdf
README.md		README.md
app.py		app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python-for-Data-Analysis

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

SamZob/Python-for-Data-Analysis

Folders and files

Latest commit

History

Repository files navigation

Python-for-Data-Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages