# Run Analysis

Data sources are stored as arbitrarily deep directories in /data. A data source is a directory with an index file which contains front matter such as filetype, cadence, etc related to how to access the data and when/whether to update the data automatically.  

Some data sources have a static cadence, meaning they wont be automatically updated by this notebook.  

Others specify an update cadence, a time last updated, and a method of updating which may include things like an api key, etc.  

An analysis, likewise will be an arbitrarily deep sirectory wihtin /analysis which contains an index file with front matter like title and dependencies.  

Dependencies are paths to data sources or other analyses. Whenever an analysis was last modified before any one of its dependencies, the analysis is stale and needs to be run again.  

In [1]:
#### Make sure all required packages are installed and imported

import importlib, subprocess, sys
from typing import Optional

def _ensure(pkg_name: str, import_name: Optional[str] = None):
    try:
        importlib.import_module(import_name or pkg_name)
    except ModuleNotFoundError:
        subprocess.check_call([sys.executable, '-m', 'pip', 'install', pkg_name])
    finally:
        globals()[import_name or pkg_name] = importlib.import_module(import_name or pkg_name)

_ensure('pandas')

print('All dependencies ready.\n')

All dependencies ready.



## Update Data Sources

This cell needs to find all of the /data index files, check whether that data source needs to be updated, and then do whatever updates are appropriate.  



In [None]:
def updateData(path)
    
    

updateData('./data')

## Update Analyses

This cell needs to do the same type of scan across all the analyses in /analysis. It needs to iterate across all the analyses and check the time last modified for all the dependencies. If any dependency was modified more recently than the analysis, then the analysis needs to be run again. The time last modified of the analysis is the most recent file modification time in the analysis directory, because the analysis directory will contain some arbitrary number of output files.  

Because some analyses will list other analyses as dependencies, this loop of checking across all of the analyses needs to keep running until none of them have anything to do, up to some reasonable limit of times to prevent arbitrary recursion.  

In [None]:
def updateAnalyses(path)
    

updateAnalyses('./analysis')