fix: refresh One Building weather data and fix broken URLs #272 by t-kramer · Pull Request #273 · CenterForTheBuiltEnvironment/clima

t-kramer · 2026-05-06T15:53:23Z

Fixes broken EPW weather file links in the One Building dataset map mentioned in #272.

The issue was initially reported for Singapore (Seletar AP, Changi Intl AP, Paya Lebar) #269 but turned out to be systemic. climate.onebuilding.org had reorganized their file structure, breaking thousands of URLs across all regions.

Rather than patching individual URLs, I refreshed the entire dataset from source:

Downloaded 49 updated KML files from climate.onebuilding.org/sources/ (up from 12 in the 2022 snapshot) and regenerated one_building.csv from scratch with 97,926 entries (vs. ~43,888 previously)
Fixed import_one_building_files.py to skip header Placemarks without URLs, required by the new KML format - Updated OneBuilding files.zip with the new KML files

After regeneration, 1,344 URLs were still broken due to further structural changes on the server.
I created a fix_broken_urls.py script to auto-discover correct paths, covering:

Files moved into state/province subdirectories
.epw → .zip extension changes (Ireland MetEireann dataset)
Removed path segments (California Climate Zones)
Renamed location strings, matched via WMO station ID
State directory typos (e.g. PB_Paraba → PB_Paraiba)

Result: 152 broken links remaining out of 97,926 (0.16%), these files were confirmed absent from the server.

Summary by CodeRabbit

Bug Fixes
- Improved URL handling in KML file imports with more efficient extraction logic.
- Enhanced location information request formatting for more reliable data retrieval.
Tests
- Added explicit loading state assertions to ensure UI elements fully load before validation.
Chores
- Updated .gitignore with additional entries for maintenance scripts.

coderabbitai · 2026-05-06T15:53:36Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7bc83dfa-c9b3-4696-bc42-43e3f3b8f3ad

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR updates data maintenance script ignores in .gitignore, refactors URL extraction in a KML import function to pre-capture regex matches, adds Dash loading-state readiness checks to two UI tests, and applies minor formatting to a requests call.

Changes

Data Maintenance Scripts

Layer / File(s)	Summary
Ignore Patterns `.gitignore`	Adds comment block and ignore rules for `check_links.py` and `fix_broken_urls.py` maintenance scripts.

URL Extraction & Test Reliability

Layer / File(s)	Summary
URL Extraction Refactoring `pages/lib/import_one_building_files.py`	Pre-extracts URL from KML `<Placemark>` via `url_match` regex once; skips placemark if no match found. Reuses captured value in HTML link construction instead of re-running regex.
Test Readiness Assertions `tests/test_summary.py`	Both `test_location_info_loaded` and `test_unit_switch` now explicitly wait for `#location-info` element to exit Dash loading state (`data-dash-is-loading != "true"`) with 20s timeout before verifying expected values.
Minor Formatting `pages/summary.py`	Reformats `requests.get()` URL argument onto indented line; endpoint and `timeout=5` remain unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

A rabbit hops through links once more,
Pre-captured regex stores the URL core,
Tests now wait for dashboards bright,
Loading states confirmed right,
Scripts ignored, logic tight—hooray for the fix! 🐰

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main changes: refreshing One Building weather data and fixing broken URLs, which aligns perfectly with the substantial dataset refresh, KML updates, URL corrections, and fix_broken_urls.py script additions.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/272-url-problem

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

FedericoTartarini

Thank you very much for fixing this. This is extremely important. However, I believe that something else has also changed because now, as you can see, the tests are not passing because the top component in the summary page is not loading properly for all the weather files. Would you like me to merge this pull request and then we open another issue for that specific problem? Or would you like to fix it in this pull request? My general preference is not to merge any pull requests that are failing the test.

FedericoTartarini · 2026-05-06T23:40:46Z

@coderabbitai review

coderabbitai · 2026-05-06T23:41:22Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/test_summary.py (1)
56-73: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pipeline failure: hard-asserting optional external-API text makes this test flaky.

The pipeline failure (expected to contain text 'Köppen-Geiger climate zone: …' but actual value did not) is a direct consequence of asserting text that pages/summary.py only renders when http://climateapi.scottpinkelman.com returns HTTP 200. In CI, that endpoint is either down, rate-limited, or the 5 s timeout fires — causing the except Exception: pass path and the text never being rendered. The not_to_have_attribute wait correctly signals that the Dash callback has finished, but a finished callback with a silenced external-API failure still produces no climate-zone text.

Fix options (simplest first):
✅ Option A — skip the climate-zone line from the assertions (least intrusive)
     expected_texts = [
         "Location: Bologna Marconi AP, ITA",
         "Longitude: 11.2969",
         "Latitude: 44.5308",
         "Elevation above sea level: 37.0 m",
         "This file is based on data collected between 2004 and 2018",
-        "Köppen-Geiger climate zone: Cfa. Humid subtropical, no dry season.",
         "Average yearly temperature: 14.5 °C",
         "Hottest yearly temperature (99%): 34.0 °C",
         "Coldest yearly temperature (1%): -2.0 °C",
         "Annual cumulative horizontal solar radiation: 1546.12 kWh/m2",
         "Percentage of diffuse horizontal solar radiation: 39.4 %",
     ]
✅ Option B — assert the element is either present with correct text or absent (resilient to API down)
+    climate_locator = info_section.get_by_text("Köppen-Geiger climate zone:", exact=False)
+    if climate_locator.count() > 0:
+        expect(climate_locator).to_contain_text("Cfa. Humid subtropical, no dry season.")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_summary.py` around lines 56 - 73, The test is flaky because the
"Köppen-Geiger climate zone" line is only rendered when the external climate API
succeeds; update tests/test_summary.py to not hard-assert that line: either (A)
remove the "Köppen-Geiger climate zone: Cfa. Humid subtropical, no dry season."
entry from the expected_texts list, or (B) make the assertion for that specific
string optional by wrapping the check for that text (the loop that iterates over
expected_texts and calls expect(info_section).to_contain_text(text)) in a
conditional/try-block so that a missing climate-zone string is tolerated (e.g.,
try expect(...).to_contain_text(climate_text) and ignore AssertionError),
leaving all other expected_texts unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_summary.py`:
- Around line 56-58: The test currently uses
expect(info_section).not_to_have_attribute("data-dash-is-loading","true",...)
which can pass before loading begins; update the test to first wait for the
loading attribute to appear by asserting
expect(info_section).to_have_attribute("data-dash-is-loading","true",
timeout=...) and only after that assert
expect(info_section).not_to_have_attribute("data-dash-is-loading","true",
timeout=...); apply the same two-step pattern to the analogous checks in the
test_unit_switch block (lines referencing info_section in that test) so you wait
for loading to start then for it to finish.

---

Outside diff comments:
In `@tests/test_summary.py`:
- Around line 56-73: The test is flaky because the "Köppen-Geiger climate zone"
line is only rendered when the external climate API succeeds; update
tests/test_summary.py to not hard-assert that line: either (A) remove the
"Köppen-Geiger climate zone: Cfa. Humid subtropical, no dry season." entry from
the expected_texts list, or (B) make the assertion for that specific string
optional by wrapping the check for that text (the loop that iterates over
expected_texts and calls expect(info_section).to_contain_text(text)) in a
conditional/try-block so that a missing climate-zone string is tolerated (e.g.,
try expect(...).to_contain_text(climate_text) and ignore AssertionError),
leaving all other expected_texts unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5328e4f0-cf5b-4034-aa63-6c6b98dffee3

📥 Commits

Reviewing files that changed from the base of the PR and between bd05cd3 and ee1baaa.

⛔ Files ignored due to path filters (2)

assets/data/OneBuilding files.zip is excluded by !**/*.zip
assets/data/one_building.csv is excluded by !**/*.csv

📒 Files selected for processing (4)

.gitignore
pages/lib/import_one_building_files.py
pages/summary.py
tests/test_summary.py

coderabbitai · 2026-05-06T23:44:37Z

+    expect(info_section).not_to_have_attribute(
+        "data-dash-is-loading", "true", timeout=20000
+    )


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

not_to_have_attribute may pass before loading even starts — consider waiting for loading to begin first.

data-dash-is-loading="true" is only set once Dash's callback begins rendering; if the check runs before the attribute is ever attached (i.e. before the callback fires), the assertion passes immediately and subsequent text checks run against incomplete content. A more robust pattern is to first assert the attribute is "true" (i.e. loading started) and then assert it is no longer "true":

🛡️ Proposed fix

- expect(info_section).not_to_have_attribute( - "data-dash-is-loading", "true", timeout=20000 - ) + expect(info_section).to_have_attribute( + "data-dash-is-loading", "true", timeout=10000 + ) + expect(info_section).not_to_have_attribute( + "data-dash-is-loading", "true", timeout=20000 + )

The same pattern should be applied to lines 94–96 in test_unit_switch.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

expect(info_section).not_to_have_attribute(

"data-dash-is-loading", "true", timeout=20000

)

expect(info_section).to_have_attribute(

"data-dash-is-loading", "true", timeout=10000

)

expect(info_section).not_to_have_attribute(

"data-dash-is-loading", "true", timeout=20000

)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_summary.py` around lines 56 - 58, The test currently uses expect(info_section).not_to_have_attribute("data-dash-is-loading","true",...) which can pass before loading begins; update the test to first wait for the loading attribute to appear by asserting expect(info_section).to_have_attribute("data-dash-is-loading","true", timeout=...) and only after that assert expect(info_section).not_to_have_attribute("data-dash-is-loading","true", timeout=...); apply the same two-step pattern to the analogous checks in the test_unit_switch block (lines referencing info_section in that test) so you wait for loading to start then for it to finish.

t-kramer · 2026-05-07T07:07:29Z

@FedericoTartarini Yes, I noticed the failed tests and will make sure to implement the fix. It seems to be a minor issue and I'll try to get it done ASAP so we can merge.

t-kramer · 2026-05-07T11:48:50Z

@FedericoTartarini, the issue was a dead API for Koeppen-Geiger climate zones, causing the location_info test to fail. It also causes an issue in the live version of Clima. I replaced it with a local workaround using the kgcpy package, which is more robust and does the same thing. The app is ready to go and should be deployed ASAP to apply the fixes. After that, Clima should run again with almost double the weather files after my recent addition of new files on onebuilding.org.

Let me know if you have any questions.

@stefanoschiavon @giobetti

fix: refresh One Building weather data and fix broken URLs #272

f6d26da

t-kramer requested a review from FedericoTartarini May 6, 2026 15:53

t-kramer added the bug Something isn't working label May 6, 2026

t-kramer added 3 commits May 6, 2026 18:31

test: increase timeout for location info loading in summary tests

edbe62f

fix: format for ruff

82920e5

fix: add timeout to Köppen-Geiger climate zone API request

ee1baaa

FedericoTartarini reviewed May 6, 2026

View reviewed changes

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

fix: replace dead Köppen-Geiger API with kgcpy local lookup

1d29fa8

fix: update humidity legend entry #271

bc86cd4

t-kramer mentioned this pull request May 7, 2026

Remove humidity comfort band or add reference #271

Closed

FedericoTartarini merged commit 5daa36b into development May 8, 2026
2 checks passed

coderabbitai Bot mentioned this pull request May 8, 2026

Refactor header layout, fix weather data URLs, and update version #274

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: refresh One Building weather data and fix broken URLs #272#273

fix: refresh One Building weather data and fix broken URLs #272#273
FedericoTartarini merged 6 commits intodevelopmentfrom
fix/272-url-problem

t-kramer commented May 6, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 6, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

FedericoTartarini left a comment

Uh oh!

FedericoTartarini commented May 6, 2026

Uh oh!

coderabbitai Bot commented May 6, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 6, 2026

Uh oh!

t-kramer commented May 7, 2026

Uh oh!

t-kramer commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

t-kramer commented May 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

FedericoTartarini left a comment

Choose a reason for hiding this comment

Uh oh!

FedericoTartarini commented May 6, 2026

Uh oh!

coderabbitai Bot commented May 6, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

t-kramer commented May 7, 2026

Uh oh!

t-kramer commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

t-kramer commented May 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 6, 2026 •

edited

Loading