Skip to content

apomarinov/pdf-proc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Task 1

Look at source.pdf, it has 2 tables on every page.
Imagine you need to take out every single table and have it in its own file that would be named corresponding to the number of its table from left to right:
Page1: table1 and table2
Page2: table3 and table4
...

In the images folder there are files that correspond to tables from the pdf.
If the pdf had 50 pages:

  • Files 1 to 50 are all of the left tables on every page of the pdf
  • Files 51 to 100 are all of the right tables on every page of the pdf

Each file contains its actual position in the pdf: File 51 is the first right column table so its content is 2 and so on.
Create an algorithm to rename all files to their correct position in the pdf, that is, their name to be the same as their content. The files are "images", so you cannot read their content and use it for the file name.

Task 2

  • From source.pdf create image pairs for every word, one for English and one for Thai(right most column).
  • Check example folder for example result.
  • Use any library you want, hint: ImageMagick, OpenCV
  • Name the files for every pair as they appear in the pdf:
    • for word ablaze(Page1) you should have 2 images: 4_EN and 4_TH
    • for word accept(Page1) you should have 2 images: 21_EN and 21_TH
    • for word acquire(Page2) you should have 2 images: 40_EN and 40_TH

Checkout the other branch for solutions ;)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published