GitHub - apomarinov/pdf-proc

Task 1

Look at source.pdf, it has 2 tables on every page.
Imagine you need to take out every single table and have it in its own file that would be named corresponding to the number of its table from left to right:
Page1: table1 and table2
Page2: table3 and table4
...

In the images folder there are files that correspond to tables from the pdf.
If the pdf had 50 pages:

Files 1 to 50 are all of the left tables on every page of the pdf
Files 51 to 100 are all of the right tables on every page of the pdf

Each file contains its actual position in the pdf: File 51 is the first right column table so its content is 2 and so on.
Create an algorithm to rename all files to their correct position in the pdf, that is, their name to be the same as their content. The files are "images", so you cannot read their content and use it for the file name.

Task 2

From source.pdf create image pairs for every word, one for English and one for Thai(right most column).
Check example folder for example result.
Use any library you want, hint: ImageMagick, OpenCV
Name the files for every pair as they appear in the pdf:
- for word ablaze(Page1) you should have 2 images: 4_EN and 4_TH
- for word accept(Page1) you should have 2 images: 21_EN and 21_TH
- for word acquire(Page2) you should have 2 images: 40_EN and 40_TH

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
example		example
images		images
.gitignore		.gitignore
README.md		README.md
source.pdf		source.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Task 1

Task 2

Checkout the other branch for solutions ;)

About

Releases

Packages

apomarinov/pdf-proc

Folders and files

Latest commit

History

Repository files navigation

Task 1

Task 2

Checkout the other branch for solutions ;)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages