This was written in 2019 to address a structural issue in the codebase for my master's thesis. It was not intended to ever be seen or maintained by others... but I have no other nonproprietary code samples. It's not great code; it's lacking:
- type checking
- proper docstrings
- variable name expansion (in my workflow, this happens in the transition from concept to prototype)
- unit testing for all classes
- CI with integration testing
Makefile-like dependency checking in python.
The problem boils down to "If file A depends on file B, then check whether file B has been modified more recently than file A." Obviously, this can get complex with large trees of dependencies that might have interlinking between branches.
Nodes are manually created, and are each characterized by a manually set mtime.
Superceded
Use case:
- Given a bunch of CSVs in each of many VID directorys
- eg1: .../src/[VID]/*.csv
- eg2: .../src/[VID]-*.csv
- For each VID filespec, we wish to depend a separate .../db/VID.pickle
Simple solution:
- create every parent db/VID node.
- for each such parent, create a regex matching the path of all dependencies
- create a dependent regex node.
TBD Hard solution:
- define a search that extracts the VID from the parent node
- use a subexpression to create a dependent node for each match.
NAME / -> (basedir)/db/final.db /A -> (basedir)/db/intermediate_a.db /A/AA -> (basedir)/src1/[a]/*.csv /A/AB -> (basedir)/src2/[glob_a].csv [WATCHES MANY FILES] /B -> (basedir)/db/intermediate_b.db /C -> (basedir)/db/intermediate_c.db /C/ABA-> special case? [REFERENCE TO EXISTING NODE!]
med_tree_literal = {'/': {'a': 'c*'} ,{'b': None} ]}
r = dnode('root','db/final.db') a = dnode('A','db/intermediate_a.db',parent=r) aa = dnode('aa','db/intermediate_a.db',parent=a)
This represents as:
'db/final.db'
'src1/[a]/*.csv'db/final.dbdb/intermediate_a.db'
'db/final.dbdb/intermediate_a.db
- Tree container holding nodes.
- Root node stores tree-global info --> bad; special node.
Choices:
- Static basedir. Inelegant, may have multiple trees.
- Specify absolute path when creating every node. Annoying.
- Root node remembers basedir, subsequent paths are relative. Bad: Every node needs to know root; can't create bottom-up.
- Tree class that holds nodes. Seems unnecessary.
VID -> text description of filespec glob [VID] -> expanded list
ALLCDBs cdb-([VID]).db --> one per VID, each depending on db-\1.db ../\1/.csv ALLTDBs tdb-([VID]).db tripdb ../trips_and_charges/.csv cdb-\1.db cdb-\1.db
tiny clean ALLTDBS ALLCDBS