-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DS Dict #34
Comments
Right now all dependencies are imported through sagerx.py and user_defined_macros...which should allow us to move things outside of dynamic dag pretty easily. Another consideration here using JSON files to store this data. JSON would allow us to define functions in it to pass to dataset_dicts if needed as one offs. There are pros and cons (I think mostly cons) but just an alternative https://stackoverflow.com/questions/51936785/store-python-function-in-json/51938459 |
Could also consider moving dynamic DAG into its own folder with settings files for each DS also stored in same folder. This way moving them out of dynamic DAG will be easy |
Sorry totally missed these comments earlier. Im still somewhat torn about the best way to handle this. The JSON alt makes sense but I do think the cons outway the pros if we are using |
Alternative consider making independent DAGs for each data source. Then using either functional or object paradigm make a generate tasks code block that can be placed as a single line of code to generate all tasks for each DAG. This will module and abstract the DAG functions, organize everything. Looking at the datasets we currently have there is benefit on a few (CMS, RxNorm) to adjust our process and grab historical files. So I feel like in the short future we will be pulling alot of these out of the dynamic DAG anyways, so my thought is how do we do that and still keep the code maintainable and able to share a common set of processes. This is pretty complex and might be above my software engineering skills at the moment. |
As example the Download_dataset should be a class in its own module. With subclasses and methods that can be adjusted from a dataset info class. Then a SQL class should be formed in its own module Then a DS class should call both of these as part of composition |
@Bridg109 / @yevgenybulochnik - does this issue represent the huge topic of re-structuring the dynamic DAG into like functional programming or a class system? What would be a better name for this issue than "DS Dict" to capture that? Feel free to rename it. |
See notes about RxNorm and NADAC on this PR when we tackle this issue: #158 |
Fixed by #215 |
Problem Statement
The DS (data source) dictionary is starting to get large. We had discussed moving this into the data source folders themselves. Not sure we want to do this at the moment but wanted to start a discussion.
Questions:
Criteria for Success
Move the large dictionary out of the dynamic dag. We need to pay close attention to what the dependencies are for each DS. Kent i think you already thought about this to some degree with name spacing the user defined macros
Additional Information
The text was updated successfully, but these errors were encountered: