Copy with bibtex / citation #1527
anandkumar89
started this conversation in
Ideas
Replies: 1 comment 3 replies
-
|
One way to get the document text is with the sioyek python package: from sioyek.sioyek import Sioyek, Document
sioyek = Sioyek(SIOYEK_PATH, LOCAL_DATABASE_FILE, SHARED_DATABASE_FILE)
pdf = Document(pdf_path, sioyek)
for page in range(pdf.doc.page_count):
ptext = pdf.doc[page].get_text()
...However, I would only recommend this if you are planning on using some of the other sioyek methods. You can look through them here. The library gives an amazing level of control, but since sioyek uses the MuPDF engine, you can just do: import pymupdf
with pymupdf.open(fpath) as doc:
for page in range(doc.page_count):
ptext = doc[page].get_text()As for the OCR, I'm not sure why you need that. If your pdf lacks a text layer, you can run it through something like ocrmypdf. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I often read articles and I want to copy selected text along with bibtex/citation not just [1,2] etc. but append the citation as -
I need help on following -
Getting text of pdf (for parsing references): Does sioyek parses and makes available text of pdf. I plan to write a python code that finds and extracts references from the article. I'd save it somewhere in database. This function can be called once for each article, gets stored in a database
select a rectangle and then OCR: how can I get screenshot of selected rectangle. Is it possible directly via sioyek or any easy way.
Thanks in advance. Totally new to sioyek, any heads up or suggestions will be helpful.
Beta Was this translation helpful? Give feedback.
All reactions