You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The .ps parser currently uses pstotext to extract content from postscript files. This is a great starting point, but it might be nice to replicate the behavior of the .pdf parser and have a pure python fallback method to make textract usable across platforms. From a minute of googling around, it looks like others have started down this path:
I'm not sure if it makes more sense to roll our own or just use these other packages to extract text in the right way (I have a slight bias for this approach), but I thought I'd throw this issue together in case it inspires ideas or contributions from others.
The text was updated successfully, but these errors were encountered:
The
.ps
parser currently usespstotext
to extract content from postscript files. This is a great starting point, but it might be nice to replicate the behavior of the.pdf
parser and have a pure python fallback method to make textract usable across platforms. From a minute of googling around, it looks like others have started down this path:I'm not sure if it makes more sense to roll our own or just use these other packages to extract text in the right way (I have a slight bias for this approach), but I thought I'd throw this issue together in case it inspires ideas or contributions from others.
The text was updated successfully, but these errors were encountered: