Skip to content

4. Update

DΓ©nes Csala edited this page May 3, 2021 · 2 revisions

Structure

The data update process is governed through Jupyter notebooks. This is currently undergoing a major refactoring process for python modularization.

  • The map updater updates the interactive incidence map. This is set up to run daily. It has been automated. When it finishes a run, the map data is automatically sent to a custom Mapbox style, based on a custom tileset. This means a new dataset is generated for the daily mapbox.html embed and panel links are updated in the Grafana backend. Furthermore, an automated push to the data repository is also made by @roeimbot.

  • The data updater updates the daily data. This is set up to run daily. It has been automated. This includes:

    • Daily case and vaccination data update in the InfluxDB backend. All of these changes are immediately reflected in the Grafana frontend.
    • Welcome dashboard update in through the Grafana backend
    • New and active cases on a county level
    • Daily county level case and incidence maps

    Furthermore, an automated push to the data repository is also made by @roeimbot.

  • The formatter updates everything else. This is triggered manually but it is set up to run at least weekly. All updates are configured to take place in the InfluxDB backend. These are immediately reflected in the Grafana frontend. The Welcome dashboard udpate is processed through the Grafana backend. This stage does not include an automated data repository push, as this is picked up the next day at the latest by the daily updates. The formatter data update includes (on top of the daily updates):

    • Week-ahead case forecast
    • Governmental measures
    • Global cases and events
    • Stocks, financial markets and exchanges rates
    • Firms measures
    • Firms social media
    • News feed update
    • Social media update
    • Oxford government stringency indicator
    • Google mobility data by sector and county
    • Border crossings
    • Company registries
    • County-level mobility map
    • Real estate transactions and prices
    • County and city-level real estate map
    • Industry matrix
    • Severity matrix
    • Macroeconomic heatmap
    • GDP time series history and forecast
    • Data complexity measures

Data push

Throughout RoEIM, we use @roeimbot for automated data pushes to the data repository. We chose to update only the data subfolder of the project repository, since the rest of the code requires more curation from our side. This requires you to set up a "partial git sync", with oly that particular subfolder. In technical terms, this is called a sparse-checkout. These are the steps for setting it up:

  • Create a subfolder as the home directory for your target GitHub repo

    mkdir incidence
    cd incidence

  • Initialize a repository here

    git init
    git remote add -f origin https://github.com/denesdata/roeim

  • Set up sparse-checkout and configure it for your data folder

    git config core.sparseCheckout true
    echo "your/data/folder/path/from/origin" >> .git/info/sparse-checkout

  • Pull! If you set it up correctly, this will only populate the data folder.

    git pull origin master

  • Then, any time you generate new data, just send it to the data folder through Jupyter and push it up to the GitHub data repository.

    git add --all
    git commit --all -m "automated incidence data update 2021-05-01"
    git push origin master

  • if you would like to avoid the GitHub authentication prompt every commit, it might help to set up a credential cache. (The last parameter is the timeout for the cache, in seconds)

    git config --global credential.helper "cache --timeout=2592000"

πŸ€“ AWESOME!

You're a real RoEIM GEEK now! You may, however, want to read further about the structure of the πŸ‘‰ 5. Data.

Clone this wiki locally