Use Machine Learning #3

Divyansh-Gemini · 2024-03-01T15:44:49Z

Use Machine Learning instead of just searching keywords in the PDF to improve Accuracy.

sarayusreeyadavpadala · 2024-04-07T08:09:29Z

@Divyansh-Gemini I want to work on this issue. Can you provide further detail for clarification regarding this matter?

Divyansh-Gemini · 2024-04-17T18:18:16Z

Hi @sarayusreeyadavpadala
Our goal was to fetch required details from the PDF such as exporter name, exporter address, invoice no., invoice date, port of discharge, total net weight, total gross weight, etc...

For this purpose we have used camelot library for getting PDF's table data as Pandas Dataframe, & searching keywords in that df and returning the value that is next to it.

But the problem in the approach is that this is working only for fixed format on PDF Invoices. If there is any other format or the PDF has the scanned invoice, then this approach will not work.

So to solve this problem, we need to detect the data in PDF of those particular fields.

OCR can be useful for getting data from PDF if it is a scanned one.
ML can be used to detect the values of the fields we require.

Sample PDF Invoices:

Path of the Python code that is currently in use is app/src/main/python/camScript.py.

Feel free to ask if you have any other doubt.

sarayusreeyadavpadala · 2024-04-19T15:15:46Z

Hi @sarayusreeyadavpadala Our goal was to fetch required details from the PDF such as exporter name, exporter address, invoice no., invoice date, port of discharge, total net weight, total gross weight, etc...

For this purpose we have used camelot library for getting PDF's table data as Pandas Dataframe, & searching keywords in that df and returning the value that is next to it.

But the problem in the approach is that this is working only for fixed format on PDF Invoices. If there is any other format or the PDF has the scanned invoice, then this approach will not work.

So to solve this problem, we need to detect the data in PDF of those particular fields.

OCR can be useful for getting data from PDF if it is a scanned one.

ML can be used to detect the values of the fields we require.

Sample PDF Invoices:

INVOICE 6341.pdf

Sample Packing List _ Commercial Invoice 1.pdf

Sample Packing List _ Commercial Invoice 2.pdf

Path of the Python code that is currently in use is app/src/main/python/camScript.py.

Feel free to ask if you have any other doubt.

Thank you :-) @Divyansh-Gemini

sarayusreeyadavpadala · 2024-04-19T16:23:08Z

Are the PDFs being uploaded by the user specifically about food export invoices?

Divyansh-Gemini · 2024-04-21T11:53:56Z

Are the PDFs being uploaded by the user specifically about food export invoices?

Yes, Invoices are of exports related to Food products only.

Divyansh-Gemini added the enhancement New feature or request label Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Machine Learning #3

Use Machine Learning #3

Divyansh-Gemini commented Mar 1, 2024 •

edited

Loading

sarayusreeyadavpadala commented Apr 7, 2024 •

edited

Loading

Divyansh-Gemini commented Apr 17, 2024

sarayusreeyadavpadala commented Apr 19, 2024

sarayusreeyadavpadala commented Apr 19, 2024

Divyansh-Gemini commented Apr 21, 2024

Use Machine Learning #3

Use Machine Learning #3

Comments

Divyansh-Gemini commented Mar 1, 2024 • edited Loading

sarayusreeyadavpadala commented Apr 7, 2024 • edited Loading

Divyansh-Gemini commented Apr 17, 2024

sarayusreeyadavpadala commented Apr 19, 2024

sarayusreeyadavpadala commented Apr 19, 2024

Divyansh-Gemini commented Apr 21, 2024

Divyansh-Gemini commented Mar 1, 2024 •

edited

Loading

sarayusreeyadavpadala commented Apr 7, 2024 •

edited

Loading