Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weather DB Schema Plans (long term storage v. API call) #24

Open
tnigon opened this issue May 11, 2020 · 0 comments
Open

Weather DB Schema Plans (long term storage v. API call) #24

tnigon opened this issue May 11, 2020 · 0 comments
Labels
documentation Improvements or additions to documentation

Comments

@tnigon
Copy link
Contributor

tnigon commented May 11, 2020

To Do

Create a schematic for DB schema/structure for weather data. This should include all relevant DB tables in addition to any weather data stored in memory (e.g., from a quick API call; maybe for forecast data).

  1. For each DB table or in-memory dataframe, list the column headings that describe the nature of the data. Make it obvious which tables can be joined together.
  2. For each DB table or in-memory dataframe, define the temporal resolution that must be stored (15 min, hourly, daily, etc.).
  3. For now, let's assume we will want to organize/query weather data by location (not by customer, field name, etc.). Not sure what this looks like, but try to capture this concept in the schematic.
  4. What data we actually use for our prediction models is somewhat irrelevant to this issue. As we build more models/customer use cases, we will naturally rely on different weather products, which we can't anticipate now. Set up to scale easily for basice weather columns (e.g., those of the EPIC .Dly files) - T_min (ºC), T_max (ºC), relative humidity (%), precipitation (mm), wind speed (m/s @ 2m above ground level), and solar radiation (MJ/m^2).

Notes

  • Retrieving/storing data is certainly separate from compiling (I would call it populating an X matrix). This card is dealing with data storage, not building an X matrix. I suggest always having a default weather product to use (this will be whatever product gives us the highest likelihood of the existence of data - probably some hourly or daily historical API). Then for each "observed" data product (this could be a weather station/point), we store data in addition to the "default". Only at the point of populating the X matrix do we choose which weather data to use. A simple if/then would be if weather station data exists, overwrite API data, else use API data.
@tnigon tnigon added the documentation Improvements or additions to documentation label May 11, 2020
@tnigon tnigon added this to To do in research_tools project via automation May 11, 2020
@tnigon tnigon moved this from To do to In progress in research_tools project May 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
research_tools project
  
In progress
Development

No branches or pull requests

2 participants