Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse Images to get Text #5

Closed
3 tasks done
AnimeshSinha1309 opened this issue Mar 10, 2020 · 2 comments
Closed
3 tasks done

Parse Images to get Text #5

AnimeshSinha1309 opened this issue Mar 10, 2020 · 2 comments
Assignees
Labels
segment:books Databasing and collecting information on books type:feature New feature or request
Projects

Comments

@AnimeshSinha1309
Copy link
Owner

AnimeshSinha1309 commented Mar 10, 2020

Develop a basic model that can get details on the book just from a photograph of the front page. This is of primary use in older / self-bound books, where the task should be relatively easier, yet scaling this project with parsing index and more will depend crucially on our ability to do this. The following would be involved in the development right now.

  • Get the basic PyTesseract model working.
  • Get and tag a small dataset of books, get area of bounding boxes and plot for title/author/waste.
  • Start combining the texts based on overlap, proximity and size to get Title/author.
@AnimeshSinha1309 AnimeshSinha1309 added the type:feature New feature or request label Mar 10, 2020
@AnimeshSinha1309 AnimeshSinha1309 added this to the Early Preview Run milestone Mar 10, 2020
@AnimeshSinha1309 AnimeshSinha1309 added this to To do in Flutter App Jul 2, 2020
@AnimeshSinha1309 AnimeshSinha1309 added the segment:books Databasing and collecting information on books label Jul 2, 2020
@KanishAnand
Copy link
Collaborator

KanishAnand commented Jul 5, 2020

Issues in OCR

  • language of book cover page
  • accuracy is still not great
  • especially for large text or for cursive font
  • decide distance limit of merging bounding boxes.

@AnimeshSinha1309 AnimeshSinha1309 moved this from To do to In progress in Flutter App Jul 5, 2020
@AnimeshSinha1309
Copy link
Owner Author

We have this working, any integrations are deffered until we need it I guess, certainly not in this sprint.

Flutter App automation moved this from In progress to Done Jul 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
segment:books Databasing and collecting information on books type:feature New feature or request
Projects
Flutter App
  
Done
Development

No branches or pull requests

2 participants