-
Notifications
You must be signed in to change notification settings - Fork 1
Feat: pems_data
package
#187
Conversation
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
a2ae975
to
c341d86
Compare
e259419
to
3ed130e
Compare
774bb4e
to
8a3ce21
Compare
03937a7
to
46be947
Compare
7e174b4
to
9be9a63
Compare
46be947
to
69f95c7
Compare
69f95c7
to
b6191ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This organization is really nice @thekaveman!
There are maybe just a few things that need tweaking since I am testing these changes (started by running bin/build.sh
) and I'm seeing this so far:
web
built ✅dev
built ✅streamlit
⚠️ ERROR [streamlit app_container 9/9] RUN pip install $(find /wheels -name pems_streamlit*.whl)
ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: '/caltrans/app/pems_data'
- I think the reason is because
streamlit
'spyproject
lists"pems_data @ file:./pems_data"
as a dependency but we have not copiedpems_data
into theapp_container
stage. The fix would be to addCOPY pems_data pems_data
to thestreamlit
Dockerfile - Adding the above gets us over the error and
pems_data
starts getting installed, but then I see this errorProcessing ./pems_data (from pems-streamlit==2024.12.2.dev183+gb6191ea.d20250725)
Getting requirements to build wheel: finished with status 'error'
LookupError: setuptools-scm was unable to detect version for /caltrans/app.
- The fix is to also copy the
.git
directory to theapp_container
stage so that setuptools-scm can see it in the container. This fixes this but then we get this errorcreating src/pems_data.egg-info
error: could not create 'src/pems_data.egg-info': Permission denied
- I'm testing/troubleshooting this point now but wanted to let you know what I'm seeing so far.
@lalver1 yep you are totally right. My bad, I was testing within the devcontainer and didn't run the Streamlit container directly. Working on a fix... |
@lalver1 Your diagnosis was correct:
And then of course you saw the issues with trying to copy the extra files in... The issue happens because for the devcontainer, we want the This commit makes a few changes. It sort of feels like a "hack", but honestly I wanted to try and avoid having to use a tool like
|
@lalver1 I'm considering whether we maybe want to go the other way...
Maybe that feels cleaner? And also, I'm worried about rewriting a file that is part of the source in terms of calculating the version info, since that looks at |
1f9e7d9
to
5246949
Compare
@lalver1 I did go the other way, turns out we don't need to rewrite anything. The direct package dependency reference works in both! |
5246949
to
bf7fa41
Compare
Nice job @thekaveman! 🙌 It's so nice we didn't need to overwrite anything in the Dockerfile. |
eaa79ce
to
db02e6e
Compare
- get a fully-formed bucket URL - get a list of object prefixes matching a pattern - read a parquet file into a DataFrame
- metadata and imputed aggregate 5min URLs - get district metadata - get station imputed aggregate 5min data
using a file: dependency on pems_data doesn't work during Docker build time since we 1) don't copy the pems_data source for the install stage, and 2) we don't copy the .git directory either -- nor do we want to copy either of these instead, use a normal package dependency during Docker build and build and download both local packages and all dependencies into a 'wheelhouse' for install update CI to install all 3 packages for tests
0c218f5
to
28ccf15
Compare
create an abstract base class interface for a data source, which represents the "how" of fetching data refactor S3Bucket to S3DataSource
the concept of a Service here is for the "what" of accessing specific data for business logic refactor the StationsBucket to a StationsService StationsService needs an IDataSource to read from (the "how")
28ccf15
to
8eee59f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! And the sources
and services
refactor also makes sense 👍
Thanks! I have a docs PR coming soon to help us keep track of all this stuff too... |
Closes #179
What this PR does
Follows the same general approach and outline from #181 and #184.
New
pems_data
packageCreate a new Python package with its own
pyproject.toml
listing dependencies and other metadata. The code is organized into thesrc
structure similar topems_web
in #181 andpems_streamlit
in #184.Refactor
pems_streamlit
data access#166 originally introduced loading of some key data from S3. This PR refactors that access from out of
pems_streamlit
and intopems_data
in two initial modules:pems_data.s3
contains general S3 access classes for PeMSpems_data.stations
contains station-specific S3 access classes for PeMSpems_data
is introduced as a dependency ofpems_streamlit
, which then uses the refactored classes in the same was as before.Devcontainer
Add
pems_data
to the devcontainer Dockerfile so it is installed (as editable), with its dependencies.Tests
Ensures coverage is reported for the new
pems_data
package. No changes needed to CI sincepems_data
is installed as a dependency ofpems_streamlit
as normal (we don't need editable installs in CI).