Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExtractBoilerpipeText to remove headers as well. #253 #256

Merged
merged 3 commits into from Aug 11, 2018
Merged

Conversation

greebie
Copy link
Contributor

@greebie greebie commented Aug 11, 2018

GitHub issue(s):
#253

What does this Pull Request do?

Alters ExtractBoilerpipeText to remove headers from text output as well.

How should this be tested?

The test I created + Travis and Codecov should cover it.

Additional Notes:

I attempted to provide a .keepHeader option, but to do so would require moving the .apply call from the object and that would require documentation changes. This seemed too much to accommodate a fairly marginal use-case.

This option can be provided by returning a class with .keepHeader and .text attributes. I do not think it's too necessary.

Interested parties

@ianmilligan1

Thanks in advance for your help with the Archives Unleashed Toolkit!

@codecov
Copy link

codecov bot commented Aug 11, 2018

Codecov Report

Merging #256 into master will decrease coverage by 0.16%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #256      +/-   ##
==========================================
- Coverage   70.52%   70.35%   -0.17%     
==========================================
  Files          41       41              
  Lines        1038     1039       +1     
  Branches      191      191              
==========================================
- Hits          732      731       -1     
- Misses        240      242       +2     
  Partials       66       66
Impacted Files Coverage Δ
...ivesunleashed/matchbox/ExtractBoilerpipeText.scala 66.66% <100%> (-20.84%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e4cf9a7...a35da9a. Read the comment docs.

Copy link
Member

@ianmilligan1 ianmilligan1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much cleaner output!

i.e.

(20091219,www.davidsuzuki.org,http://www.davidsuzuki.org/_pvw706D01C3/Oceans/PNCIMA/Unique-Features-Within-PNCIMA.asp,Here are more of the many unique features within PNCIMA: Endangered Species •There are 32 species listed as Endangered, Threatened, and Special Concern by the Committee on the Status of Endangered Wildlife in Canada (COSEWIC) that live in or migrate through the PNCIMA. North Pacific Right Whales •They are the most endangered whale in the world ...

@ianmilligan1
Copy link
Member

I'm happy to merge if this looks good to you as well @ruebot.

@ruebot
Copy link
Member

ruebot commented Aug 11, 2018

Do we need to update any documentation?

@ianmilligan1
Copy link
Member

Nope, it's a straight change (just removes the http headers from the boilerplate removal, which is what we should have been doing all along) - tested w/ the documentation script and works well.

@ruebot
Copy link
Member

ruebot commented Aug 11, 2018

@greebie does this resolve #253? The title and description of that issue, and this PR don't align well.

@ruebot ruebot merged commit 84a4c09 into master Aug 11, 2018
@ruebot ruebot deleted the issue-253 branch August 11, 2018 21:05
@greebie
Copy link
Contributor Author

greebie commented Aug 12, 2018

Yes it does resolve. I thought the solution was something else, but this seemed more straight-forward.

@ruebot
Copy link
Member

ruebot commented Aug 12, 2018

Ok. In the future, please be more explicit because as it stands now, it is very unclear. We need to know how and why the PR resolves the issues, and if it differs from the issue created, why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants