New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML ingestion fails if there is no <style> tag in the document head. #397

zdavis opened this Issue Jun 14, 2017 · 0 comments


None yet
1 participant

zdavis commented Jun 14, 2017

No description provided.

@zdavis zdavis added the bug label Jun 14, 2017

@zdavis zdavis added this to the v0.2 milestone Jun 14, 2017

@zdavis zdavis self-assigned this Jun 14, 2017

zdavis added a commit that referenced this issue Jun 20, 2017

[F] Refactor and improve ingestion; add specs
This commit moves all file-level access to ingested sources out of the
individual strategies and into the ingestion object. This change
simplifies code across all the strategies, because it allows each
strategy to access the ingestion source via relative package paths.
Prior to this change, each strategy was responsible for unzipping
packages, translating relative paths to absolute paths, etc. This
change also makes ingestion safer, as it gives us a central place to
validate relative paths to ensure that the ingestion doesn't escape
the extracted or copied package directory. All ingestion strategies
are now compatible with an extractable (zip) file, or with a source

This commit also adds integration specs for each strategy and improved
sample ingestion sources.

Finally, it removes Word ingestion, as there was overlap between HTML
and Word ingestion. Along these lines, the Google doc ingestion, which
pulls the document in as HTML, now delegates almost all functionality
to the HTML strategy. As such, it's an example of how we can write
strategies that rely on other strategies.

Resolves #399
Resolves #398
Resolves #397

@zdavis zdavis closed this in f769e2a Jun 20, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment