Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new url extensions, new rules, and unit tests #40

Open
arderyp opened this issue Apr 27, 2016 · 0 comments
Open

new url extensions, new rules, and unit tests #40

arderyp opened this issue Apr 27, 2016 · 0 comments

Comments

@arderyp
Copy link
Owner

arderyp commented Apr 27, 2016

new extensions to add to url glue logic:

  1. shtml

Add new tests to capture examples. Download data from prod and check for records with "bad_scrape"

NEW RULES:

  1. if ends in ";" don't glue next element? See http://www.supremecourt.gov/opinions/11pdf/10-945.pdf
  2. if next element starts with '/', keep gluing: See second example with bad spacing here: http://www.supremecourt.gov/opinions/12pdf/11-465_g314.pdf
@arderyp arderyp changed the title new extensions and unit tests new url extensions, new rules, and unit tests Apr 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant