Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken Newlines When Adding/Removing Template Properties #185

Closed
betson opened this issue Aug 15, 2017 · 3 comments
Closed

Broken Newlines When Adding/Removing Template Properties #185

betson opened this issue Aug 15, 2017 · 3 comments

Comments

@betson
Copy link

betson commented Aug 15, 2017

I've encountered a few templates where adding/removing properties results in unanticipated changes to the newlines, forcing multiple properties on the same line afterwards. This may be related to #155, but this is related more to newlines than whitespace around key/value.

The basic code sample I'm running is:

template.add('pop', api_val[1] + reference)
template.add('census estimate yr', year, before='pop')
template.remove('census yr')

Where template is the parsed "Infobox U.S. County" template from a number of different pages. The specific values of api_val[1], reference, and year are not super important, but you can see their values in the examples below. The diffs are displayed using this function, where old_text and new_text are just generated from str(template) before/after the changes:

def generate_diff(old_text, new_text):
    diff = unified_diff(old_text.splitlines(keepends=True), new_text.splitlines(keepends=True))
    diff = list(diff)
    return ''.join(diff)

Here are three examples that demonstrate the problem in different ways:

Spalding County, Georgia

@@ -9,8 +9,8 @@
 | area_land_sq_mi = 196 |
 | area_water_sq_mi = 3.1 |
 | area percentage = 1.6% |
-| census yr = 2010|
-| pop = 64073 |
+|
+|census estimate yr=2016| pop = 64806<ref name=PopHousingEst>{{cite web|url=https://www.census.gov/programs-surveys/popest.html|title=Population and Housing Unit Estimates |date=August 15, 2017 |accessdate=August 15, 2017|publisher=[[U.S. Census Bureau]]}}</ref> |
 | density_sq_mi = 326 |
 | web = www.spaldingcounty.com |
 | named for = [[Thomas Spalding]]

Clinton County, Illinois

@@ -13,8 +13,7 @@
  |area_land_sq_mi = 474
  |area_water_sq_mi = 29
  |area percentage = 5.8%
- |census yr = 2010
- |pop = 37762
+ |census estimate yr= 2016|pop = 37729<ref name=PopHousingEst>{{cite web|url=https://www.census.gov/programs-surveys/popest.html|title=Population and Housing Unit Estimates |date=August 15, 2017 |accessdate=August 15, 2017|publisher=[[U.S. Census Bureau]]}}</ref>
  |density_sq_mi = 80
  |web = www.clintonco.illinois.gov
 | district = 15th

Winnebago County, Illinois

@@ -8,9 +8,8 @@
  area_total_sq_mi = 519 |
  area_land_sq_mi = 513|
  area_water_sq_mi = 5.9 |
- area percentage = 1.1% |
- census yr = 2010|
- pop = 295266 |
+ area percentage = 1.1% |census estimate yr = 2016|
+ pop = 285873<ref name=PopHousingEst>{{cite web|url=https://www.census.gov/programs-surveys/popest.html|title=Population and Housing Unit Estimates |date=August 15, 2017 |accessdate=August 15, 2017|publisher=[[U.S. Census Bureau]]}}</ref> |
  density_sq_mi = 575 
 | web = www.wincoil.us 
 | founded year = 1836

Each of these pages has templates that have a slightly abnormal layout, but mwparserfromhell does well on most templates I've come across like this. Here is an example that works fine:

Rockdale County, Georgia

@@ -9,8 +9,8 @@
  area_land_sq_mi = 130 |
  area_water_sq_mi = 2.3 |
  area percentage = 1.7% |
- census yr = 2010|
- pop = 85215 |
+ census estimate yr = 2016|
+ pop = 89355<ref name=PopHousingEst>{{cite web|url=https://www.census.gov/programs-surveys/popest.html|title=Population and Housing Unit Estimates |date=August 15, 2017 |accessdate=August 15, 2017|publisher=[[U.S. Census Bureau]]}}</ref> |
  density_sq_mi = 657 |
  web = www.rockdalecounty.org 
 | ex image = Rockdale-county-courthouse.jpg

Other changes also generate the same problems. I tried manipulating the Parameter objects directly to just "rename" the census yr property into census estimate yr, but that change also results in changes to the newlines:

var = template.get('census yr')
var.name = 'census estimate yr'
@earwig
Copy link
Owner

earwig commented Aug 15, 2017

Thanks for the report. First impression is that this looks to be the same general issue as #155.

The syntax in example 1 is incredibly broken and I don't think I have much hope of getting MWPFH to do something 'correct' here, other than fixing spacing around "census estimate yr". 2 and 3 are more legitimate problems.

@earwig
Copy link
Owner

earwig commented Aug 15, 2017

I tweaked some of the logic in the above commit, which fixes the problems in your second and third examples and gives better results in the first. Example 1 is really a GIGO situation and I don't think it can be improved further.

I'm sure there are other nefariously formatted templates in the wild; if you see more, let's use the newly renamed #155 for them.

@betson
Copy link
Author

betson commented Aug 16, 2017

I can confirm that resolves the issue for the two templates mentioned. I'll comment on the other issue if I see any regressions with other templates or find other problematic ones.

Thank you for responding so quickly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants