Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Font Identification #6

Closed
knowtheory opened this issue Jun 4, 2019 · 3 comments
Closed

Font Identification #6

knowtheory opened this issue Jun 4, 2019 · 3 comments

Comments

@knowtheory
Copy link
Collaborator

As it turns out, Congress's Office of Legislative Counsel provides guidelines for legislation. These guidelines include specific advices about formatting and style.

The guide specifies a font hierarchy, and we can use those as the guide for making assumptions about the draft legislative documents which are uploaded to the tool.

@DanielSchuman
Copy link
Contributor

DanielSchuman commented Jun 4, 2019 via email

@knowtheory
Copy link
Collaborator Author

@GullicksonK helpfully points out that GPO provides stylesheets for legislative XML documents (which may be included in some legislative PDFs).

And here's an example document

@knowtheory knowtheory mentioned this issue Jun 4, 2019
4 tasks
@knowtheory
Copy link
Collaborator Author

After sufficient spelunking of PDF.js internals, I've found that it's possible to read the font data embedded in the PDF. It's possible to export the fonts as well. For the time being just reading what the intended font is (the name of the font, and whether it's a bold or italic variation) is sufficient to style the output DocX. So, i'm going to mark this one as complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants