Skip to content

fix: refresh One Building weather data and fix broken URLs #272#273

Merged
FedericoTartarini merged 6 commits intodevelopmentfrom
fix/272-url-problem
May 8, 2026
Merged

fix: refresh One Building weather data and fix broken URLs #272#273
FedericoTartarini merged 6 commits intodevelopmentfrom
fix/272-url-problem

Conversation

@t-kramer
Copy link
Copy Markdown
Contributor

@t-kramer t-kramer commented May 6, 2026

Fixes broken EPW weather file links in the One Building dataset map mentioned in #272.

The issue was initially reported for Singapore (Seletar AP, Changi Intl AP, Paya Lebar) #269 but turned out to be systemic. climate.onebuilding.org had reorganized their file structure, breaking thousands of URLs across all regions.

Rather than patching individual URLs, I refreshed the entire dataset from source:

  • Downloaded 49 updated KML files from climate.onebuilding.org/sources/ (up from 12 in the 2022 snapshot) and regenerated one_building.csv from scratch with 97,926 entries (vs. ~43,888 previously)
  • Fixed import_one_building_files.py to skip header Placemarks without URLs, required by the new KML format - Updated OneBuilding files.zip with the new KML files

After regeneration, 1,344 URLs were still broken due to further structural changes on the server.
I created a fix_broken_urls.py script to auto-discover correct paths, covering:

  • Files moved into state/province subdirectories
  • .epw → .zip extension changes (Ireland MetEireann dataset)
  • Removed path segments (California Climate Zones)
  • Renamed location strings, matched via WMO station ID
  • State directory typos (e.g. PB_Paraba → PB_Paraiba)

Result: 152 broken links remaining out of 97,926 (0.16%), these files were confirmed absent from the server.

Summary by CodeRabbit

  • Bug Fixes

    • Improved URL handling in KML file imports with more efficient extraction logic.
    • Enhanced location information request formatting for more reliable data retrieval.
  • Tests

    • Added explicit loading state assertions to ensure UI elements fully load before validation.
  • Chores

    • Updated .gitignore with additional entries for maintenance scripts.

@t-kramer t-kramer requested a review from FedericoTartarini May 6, 2026 15:53
@t-kramer t-kramer added the bug Something isn't working label May 6, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7bc83dfa-c9b3-4696-bc42-43e3f3b8f3ad

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR updates data maintenance script ignores in .gitignore, refactors URL extraction in a KML import function to pre-capture regex matches, adds Dash loading-state readiness checks to two UI tests, and applies minor formatting to a requests call.

Changes

Data Maintenance Scripts

Layer / File(s) Summary
Ignore Patterns
.gitignore
Adds comment block and ignore rules for check_links.py and fix_broken_urls.py maintenance scripts.

URL Extraction & Test Reliability

Layer / File(s) Summary
URL Extraction Refactoring
pages/lib/import_one_building_files.py
Pre-extracts URL from KML <Placemark> via url_match regex once; skips placemark if no match found. Reuses captured value in HTML link construction instead of re-running regex.
Test Readiness Assertions
tests/test_summary.py
Both test_location_info_loaded and test_unit_switch now explicitly wait for #location-info element to exit Dash loading state (data-dash-is-loading != "true") with 20s timeout before verifying expected values.
Minor Formatting
pages/summary.py
Reformats requests.get() URL argument onto indented line; endpoint and timeout=5 remain unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

A rabbit hops through links once more,
Pre-captured regex stores the URL core,
Tests now wait for dashboards bright,
Loading states confirmed right,
Scripts ignored, logic tight—hooray for the fix! 🐰

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: refreshing One Building weather data and fixing broken URLs, which aligns perfectly with the substantial dataset refresh, KML updates, URL corrections, and fix_broken_urls.py script additions.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/272-url-problem

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@FedericoTartarini FedericoTartarini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for fixing this. This is extremely important. However, I believe that something else has also changed because now, as you can see, the tests are not passing because the top component in the summary page is not loading properly for all the weather files. Would you like me to merge this pull request and then we open another issue for that specific problem? Or would you like to fix it in this pull request? My general preference is not to merge any pull requests that are failing the test.

@FedericoTartarini
Copy link
Copy Markdown
Contributor

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_summary.py (1)

56-73: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pipeline failure: hard-asserting optional external-API text makes this test flaky.

The pipeline failure (expected to contain text 'Köppen-Geiger climate zone: …' but actual value did not) is a direct consequence of asserting text that pages/summary.py only renders when http://climateapi.scottpinkelman.com returns HTTP 200. In CI, that endpoint is either down, rate-limited, or the 5 s timeout fires — causing the except Exception: pass path and the text never being rendered. The not_to_have_attribute wait correctly signals that the Dash callback has finished, but a finished callback with a silenced external-API failure still produces no climate-zone text.

Fix options (simplest first):

✅ Option A — skip the climate-zone line from the assertions (least intrusive)
     expected_texts = [
         "Location: Bologna Marconi AP, ITA",
         "Longitude: 11.2969",
         "Latitude: 44.5308",
         "Elevation above sea level: 37.0 m",
         "This file is based on data collected between 2004 and 2018",
-        "Köppen-Geiger climate zone: Cfa. Humid subtropical, no dry season.",
         "Average yearly temperature: 14.5 °C",
         "Hottest yearly temperature (99%): 34.0 °C",
         "Coldest yearly temperature (1%): -2.0 °C",
         "Annual cumulative horizontal solar radiation: 1546.12 kWh/m2",
         "Percentage of diffuse horizontal solar radiation: 39.4 %",
     ]
✅ Option B — assert the element is either present with correct text or absent (resilient to API down)
+    climate_locator = info_section.get_by_text("Köppen-Geiger climate zone:", exact=False)
+    if climate_locator.count() > 0:
+        expect(climate_locator).to_contain_text("Cfa. Humid subtropical, no dry season.")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_summary.py` around lines 56 - 73, The test is flaky because the
"Köppen-Geiger climate zone" line is only rendered when the external climate API
succeeds; update tests/test_summary.py to not hard-assert that line: either (A)
remove the "Köppen-Geiger climate zone: Cfa. Humid subtropical, no dry season."
entry from the expected_texts list, or (B) make the assertion for that specific
string optional by wrapping the check for that text (the loop that iterates over
expected_texts and calls expect(info_section).to_contain_text(text)) in a
conditional/try-block so that a missing climate-zone string is tolerated (e.g.,
try expect(...).to_contain_text(climate_text) and ignore AssertionError),
leaving all other expected_texts unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_summary.py`:
- Around line 56-58: The test currently uses
expect(info_section).not_to_have_attribute("data-dash-is-loading","true",...)
which can pass before loading begins; update the test to first wait for the
loading attribute to appear by asserting
expect(info_section).to_have_attribute("data-dash-is-loading","true",
timeout=...) and only after that assert
expect(info_section).not_to_have_attribute("data-dash-is-loading","true",
timeout=...); apply the same two-step pattern to the analogous checks in the
test_unit_switch block (lines referencing info_section in that test) so you wait
for loading to start then for it to finish.

---

Outside diff comments:
In `@tests/test_summary.py`:
- Around line 56-73: The test is flaky because the "Köppen-Geiger climate zone"
line is only rendered when the external climate API succeeds; update
tests/test_summary.py to not hard-assert that line: either (A) remove the
"Köppen-Geiger climate zone: Cfa. Humid subtropical, no dry season." entry from
the expected_texts list, or (B) make the assertion for that specific string
optional by wrapping the check for that text (the loop that iterates over
expected_texts and calls expect(info_section).to_contain_text(text)) in a
conditional/try-block so that a missing climate-zone string is tolerated (e.g.,
try expect(...).to_contain_text(climate_text) and ignore AssertionError),
leaving all other expected_texts unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5328e4f0-cf5b-4034-aa63-6c6b98dffee3

📥 Commits

Reviewing files that changed from the base of the PR and between bd05cd3 and ee1baaa.

⛔ Files ignored due to path filters (2)
  • assets/data/OneBuilding files.zip is excluded by !**/*.zip
  • assets/data/one_building.csv is excluded by !**/*.csv
📒 Files selected for processing (4)
  • .gitignore
  • pages/lib/import_one_building_files.py
  • pages/summary.py
  • tests/test_summary.py

Comment thread tests/test_summary.py
Comment on lines +56 to +58
expect(info_section).not_to_have_attribute(
"data-dash-is-loading", "true", timeout=20000
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

not_to_have_attribute may pass before loading even starts — consider waiting for loading to begin first.

data-dash-is-loading="true" is only set once Dash's callback begins rendering; if the check runs before the attribute is ever attached (i.e. before the callback fires), the assertion passes immediately and subsequent text checks run against incomplete content. A more robust pattern is to first assert the attribute is "true" (i.e. loading started) and then assert it is no longer "true":

🛡️ Proposed fix
-    expect(info_section).not_to_have_attribute(
-        "data-dash-is-loading", "true", timeout=20000
-    )
+    expect(info_section).to_have_attribute(
+        "data-dash-is-loading", "true", timeout=10000
+    )
+    expect(info_section).not_to_have_attribute(
+        "data-dash-is-loading", "true", timeout=20000
+    )

The same pattern should be applied to lines 94–96 in test_unit_switch.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
expect(info_section).not_to_have_attribute(
"data-dash-is-loading", "true", timeout=20000
)
expect(info_section).to_have_attribute(
"data-dash-is-loading", "true", timeout=10000
)
expect(info_section).not_to_have_attribute(
"data-dash-is-loading", "true", timeout=20000
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_summary.py` around lines 56 - 58, The test currently uses
expect(info_section).not_to_have_attribute("data-dash-is-loading","true",...)
which can pass before loading begins; update the test to first wait for the
loading attribute to appear by asserting
expect(info_section).to_have_attribute("data-dash-is-loading","true",
timeout=...) and only after that assert
expect(info_section).not_to_have_attribute("data-dash-is-loading","true",
timeout=...); apply the same two-step pattern to the analogous checks in the
test_unit_switch block (lines referencing info_section in that test) so you wait
for loading to start then for it to finish.

@t-kramer
Copy link
Copy Markdown
Contributor Author

t-kramer commented May 7, 2026

@FedericoTartarini Yes, I noticed the failed tests and will make sure to implement the fix. It seems to be a minor issue and I'll try to get it done ASAP so we can merge.

@t-kramer
Copy link
Copy Markdown
Contributor Author

t-kramer commented May 7, 2026

@FedericoTartarini, the issue was a dead API for Koeppen-Geiger climate zones, causing the location_info test to fail. It also causes an issue in the live version of Clima. I replaced it with a local workaround using the kgcpy package, which is more robust and does the same thing. The app is ready to go and should be deployed ASAP to apply the fixes. After that, Clima should run again with almost double the weather files after my recent addition of new files on onebuilding.org.

Let me know if you have any questions.

@stefanoschiavon @giobetti

@FedericoTartarini FedericoTartarini merged commit 5daa36b into development May 8, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants