Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self Hosting Questions #11

Open
goldbattle opened this issue Jan 1, 2023 · 11 comments
Open

Self Hosting Questions #11

goldbattle opened this issue Jan 1, 2023 · 11 comments

Comments

@goldbattle
Copy link

Many thanks for the project and open sourcing your processing scripts.
I tried to dabble a bit myself before finding this project and only got as far as extracting and looking at the data in the grib files.
I was able to run the docker with your included scripts to download the data from s3 onto my machine.

Building

cd pirateweather/wgrib2/
docker build -t wgrib2 -f Dockerfile .
docker image list

Running

docker run --net=host \
    -v "D:\\WEATHER\\:/mnt" \
    -e bucket="noaa-hrrr-bdp-pds" \
    -e download_path="/mnt/data/efs1z" \
    -e temp_path="/mnt/data/tmp" \
    -e time="2022-12-30T15:45:00Z" \
    wgrib2 \
    /mnt/pirateweather/scripts/hrrrh_combined-fargate.py

I have a couple questions.

  1. The timestamp input seems to determine the times that are downloaded, how do you normally specify this? The current time?
  2. Do I need to run all scripts? or since I am in the US, can I just run the HRRR which I believe has most data (you note on the docs it doesn't seem to have UV, but that is ok for me currently).
  3. If I need to run the other scripts, can this be done concurrently, or just sequentially?
  4. The example run command puts files inside a efs1z folder. Is this a specific folder name? or does it have some meaning here? Should the other scripts be in a different folder?
  5. I want to run the download on a cron job, could you explain how I should interpret this trigger table? Should I run HRRR every hour to get data, or just every 3 hours could suffice?

The one code I am trying to find is what maps the lat, lon to query these files. The docs say that this is a lambda function that does this and is relatively detailed on the process. Is this code public? If so could you point me in the right direction? Many thanks!

@alexander0042
Copy link
Collaborator

alexander0042 commented Jan 3, 2023

Hi,

Thanks for checking out this project, and I appreciate your detailed questions here! The "open" aspect of this project is really important to me, so I'm happy to see people digging into the source, but I know this side of things could be much (much!) clearer. I'll try to address things point by point here.

  1. The "time" function is designed to be the current time as a string, using the format "%Y-%m-%dT%H:%M:%S%z". This is how AWS says when the function is run, and then the processing script does back the number of hours in that table to find the file.
  2. Nope! I call this as 4 separate step functions, just changing the run command (like you've done!).
  3. The docker image is designed to run one script at a time, but no reason you couldn't have multiple copies of the same image running.
  4. The efs1z is just my internal AWS structure coming out, so you could store it anywhere. If you're curious, the name comes from storage on the EFS file system (which is an incredibly flexible tool to get data to Lambda), set to use 1 zone.
  5. I sort of covered this in the first question, but to clarify, you want to run it on the "Ingest Times (UTC)" row and pass the current time to the container. So to run HRRR-Hourly (hrrrh), you'd set Cron to 2:30,8:30,14:30,20:30 and pass (using 2:30 as an example) "2023-01-01T02:30:00+0000".

With respect to the read script, you're right that it's not currently in this repository. There are two issues with it- it's an uncommented mess of me learning Python on the fly while building this, and relies on a ton of assumptions with respect to Lambda and AWS gateway. I think an easier solution is to ask what your ultimate goal is here and go at this from that direction, since with these scripts, all the data will be there. Something along the lines of this notebook is what I have in mind, since it's shows a python script to extract a data point time series from the NetCDF file.

@goldbattle
Copy link
Author

goldbattle commented Jan 3, 2023 via email

@SoulRaven
Copy link

+1 the project is interesting and i will get a spin on this and integrate in a open source project with the API written in python. a ready to go solution with my roundbox project. Is work in progress also, but the idea is to integrate anything as fast as possible and ready to deploy.
Can you show more about what have you write in the backend for the api?

@github-actions
Copy link

There has been no activity on this issue for ninety days and unless you comment on the issue it will automatically close in seven days.

@github-actions github-actions bot added the stale label Jul 12, 2023
@goldbattle
Copy link
Author

Please leave this open as the API to raw data scripts are still something not included in the open sourced code.

@github-actions github-actions bot removed the stale label Jul 13, 2023
@alexander0042
Copy link
Collaborator

Happy to leave this open for now, and it still is on the roadmap; however, the issue remains that everything is still very tightly integrated with AWS/ Lambda at the moment, so not usable outside of my specific environment. In order to speed up response times, I'm eventually migrating this to docker, so very doable down the line! I'll also caution that processing scripts download ~100GB/ day, so will require a pretty beefy internet connection to self host

@fox91
Copy link

fox91 commented Jul 13, 2023

Self host on AWS is always an option 😉
Please release as is, we don't mind if it isn't optimized or if we can't run with one click.
Open source doesn't mean "run easily on your device with your custom config"...

@lordbagel42
Copy link

That would be exactly my thinking, I dislike subscriptions because my internet isn't the most stable, I would much rather just donate some money and then run the server on my own stuff that way if I wish to do 50,000 API requests per month, I could. I personally want it for Home Assistant.

@cloneofghosts cloneofghosts pinned this issue Nov 7, 2023
@cloneofghosts cloneofghosts added this to the Pirate Weather 2.0 milestone Dec 12, 2023
@msft-jeelpatel
Copy link

Hi, any detailed guide on how to self host this and run on your own machine?

@alexander0042
Copy link
Collaborator

Posting this here since I think it fits with this discussion, but I'm looking into what license I should use for the open-source stuff. Currently, everything is licensed under Apache 2.0; however, since the V2.0 code is pretty well all new, there's an option to take another look at this. My goal here is to make it possible to self host and run the entire stack (which will require a pretty beefy computing setup, but within the realm of possibility), but also want to avoid what happened to Redis, and have some provider come along and replicate it all without contributing back to keep improving this project. Along these lines, I'm debating releasing V2 under AGPL, and curious what people think about this?

I know it's a pretty restrictive license; however, the current status quo is not having the source public at all, which certainly isn't ideal either! The flip side of this is that I think I'll have to add a contributor license to make it possible for commercial uses of the project possible with permission. Again, definitely not ideal, but in order for free version of this to keep running, the AWS bill has to be paid somehow, so this seems like the way. I'm envisioning an min,io sort of structure- not ideal, but a practical way to make this open while keeping the lights on for the project

@lordbagel42
Copy link

Personally, I’m a member of SlimeVR. We dual-license under Apache and MIT.

I dislike the GPL license for how “poisonous” it is. However, it's better than nothing, and I will support the project with either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Status: In Progress
Development

No branches or pull requests

7 participants