Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[epic] OpenSpending new stewardship and migration to new setup (2020) #1479

Open
7 of 35 tasks
rufuspollock opened this issue Jul 3, 2020 · 7 comments
Open
7 of 35 tasks

Comments

@rufuspollock
Copy link
Member

rufuspollock commented Jul 3, 2020

It's time to evolve OpenSpending! Specifically:

  1. OpenSpending stewardship is moving to Datopian
  2. Migrate OpenSpending platform into DataHub.io (key APIs will be maintained and there may be a dedicated catalog)

Key goals

  • Migrate OpenSpending onto DataHub.io so as to provide a long-term sustainable home for this data and its users
  • Support GIFT and other stakeholders in continuing to open up fiscal data.

Background:

  • OpenSpending is a fiscal data platform
  • It was started (as Where Does Money Go) at OKF in ~2009.
  • It has evolved a lot over the years. First, from Where Does My Money Go to OpenSpending. Then in 2014/2015 to a new microservice structure (designed by Rufus and Adam).
  • The last few years have seen less core development due to reduced funding available
  • Today, there are a variety of users including Global Initiative Fiscal Transparency (GIFT), various governments (via GIFT) and others.

Acceptance

  • OpenSpending stewarded by Datopian and they are responsible for infrastructure.
  • OpenSpending datasets stored following pattern: f11s+git+💨 (frictionless+github+s3)
    • Frictionless: dataset structure
    • Github: for storing metadata and as edit/read
    • Cloud: for file storage
  • OpenSpending showcase part of DataHub.io
  • Dedicated API (for GIFT and other API consumers)

NB:

  • NOT migrating visualizations

Specifics

OpenSpending users

  • Old urls for openspending.org still "work" i.e. either still or redirect to right place
  • API still works for datasets where API matters
    • Works: either still returns as berfore or with info on upgrade path
  • All useful/active data migrated (or archived)
    • Data has been filtered
    • Data is migrated to FGC (f11s+git+cloud)
  • Datasets show up on datahub.io in xxx TODO organization (e.g. openspending)

GIFT

  • Can rely on the API and supervise and/or approve changes made to the hosting system.

NB: there is separate further work for the import system

OKF

  • Current OpenSpending infrastructure on k8s can be shutdown so it is no longer incurring significant monthly costs

Secondary

  • The community site has been migrated community.openspending.org ...
  • (??) Clean archive copy (write to html and then to e.g. s3 which we can call old-2020.openspending.org) not sure this is possible / worth doing since everything is JS etc. More valuable for old pre 2016 site 😉

NOT needed

  • Maintaining the visualizations

Tasks

  • Get access and audit data
  • Plan of migration and preliminaries
  • Actual migration

Overview and Status of migration

graph TD

start[Start]

start --> comms[Make announcement]
comms --> discuss[Community Discussions]
discuss --> planofmig
start --> getaccess[Take on DNS, get access]
getaccess --> audit[Audit data]
audit --> planofmig[Plan of Migration]
planofmig --> mig[Actual Migration]
mig --> theend[End]

classDef done fill:#21bf73,stroke:#333,stroke-width:1px;
classDef nearlydone fill:lightgreen,stroke:#333,stroke-width:1px;
classDef inprogress fill:orange,stroke:#333,stroke-width:1px;
classDef next fill:lightblue,stroke:#333,stroke-width:1px;

class getaccess,start,comms done;
class audit nearlydone;
class discuss,planofmig inprogress;
class mig next;
graph TD

subgraph Key
  done[Done]
  nearlydone[Nearly Done]
  inprogress[In Progress]
  next[Next Up]
end

classDef done fill:#21bf73,stroke:#333,stroke-width:1px;
classDef nearlydone fill:lightgreen,stroke:#333,stroke-width:1px;
classDef inprogress fill:orange,stroke:#333,stroke-width:1px;
classDef next fill:lightblue,stroke:#333,stroke-width:1px;

class done done;
class nearlydone nearlydone;
class inprogress inprogress;
class next next;

Get access and audit data

Notes

  • Most of OS data not used (we think).
  • OS API is used by some govs e.g. Mexican gov (and maybe others).

Plan of migration and preliminaries

  • Migration to github org ...
  • Plan out layout
  • Data in cloud via giftless
    • Set up giftless
  • Automated UAT tests

Actual migration

  • Migrate datasets
    • Script
    • Run
    • ...
  • Spin up bespoke API service for those who need it
  • Show datasets on datahub.io
  • Migrate the community site into tech.datopian.com/openspending/ or openspending.datopian.com

Analysis

What will a dataset look like in storage when migration is done

  • each OpenSpending dataset has a repo on github in an org related to OpenSpending
    • TODO: is this separate from org for app code or not?
  • each dataset is a Frictionless Dataset/Package with a datapackage.json which is a Frictionless Tabular Dataset/Package, and possibly (esp GIFT ones) a Fiscal Data Package
    • Data if small could be stored on github
    • For larger data we store to cloud
      • Either S3 or GCP (prob s3)
      • If on cloud we store with giftless 🎁 with proper lfs info
        • do we want datahub next setup for this - yes, probably ...
    • README (?)

Questions from Sebastien

"Identify data is test / obsolete / irrelevant"

  • Beyond duplicates, empty tables and test data, what would be "obsolete" or "irrelevant"?
  • Hypotheses
    • A: We want to keep everything pretty much
    • B: we want to keep the minimum
      • Beyond what was mentioned, I think everything will be good except whatever Lorena lists as not being important. I think Lorena and Rufus would know best how to do the cleanup once I do a "pre-selection". I would assume that data that has been sitting for a long time without any updates might not be as useful but there might be some "historic" data that doesn't change over time that's still relevant today (?).
    • C: something in between
  • Rufus: we want to get rid of as much as possible 😄 so that we have high quality material. "Fewer, better"

What is DataHub next?

  • Read https://github.com/datopian/datahub-next. Come up with an understanding of a) how do we store a dataset (or data file) i.e. metadata and bytes [b) how do we retrieve c) how do we showcase [d) how do we upload]]

What are we migrating? What, from where and to where on GitHub?

  • We are migrating all data (we plan to keep)
  • We are migrating APIs
  • We are migrating showcases (to datahub.io, and, for GIFT possibly their own site)
@rufuspollock rufuspollock changed the title [epic] OpenSpending migration to new setup and stewardship (2020) [epic] OpenSpending new stewardship and migration to new setup (2020) Jul 3, 2020
@rufuspollock rufuspollock self-assigned this Jul 9, 2020
@sglavoie sglavoie added this to In Progress in Cleanup July 2020 Jul 17, 2020
@jbothma
Copy link

jbothma commented Jul 20, 2020

It's time to evolve OpenSpending! Specifically:

  • OpenSpending stewardship is moving to Datopian
  • Migrate OpenSpending platform into DataHub.io (key APIs will be maintained and there may be a dedicated catalog)

Really happy to hear you've found a future home!

API still works for datasets where API matters

  • Works: either still returns as berfore or with info on upgrade path

Heads-up vulekamali.gov.za very much depends on the fiscal (babbage) API - particularly the model and aggregate endpoints. Possibly also facts and members.

Please give notice to jd@openup.org.za, info@openup.org.za and info@vulekamali.gov.za if this API will cease to work and requires changes in the implementation.

NOT migrating visualizations

I would love to hear more about this decision. We don't currently use these visualisations for vulekamali - they were never quite what we wanted or as user-friendly as we wanted. It is also quite easy for someone who doesn't quite know what they're doing to do an aggregate that is nonsensical, without realising it, and misinterpret the data.

I'm wondering if it's simply a cost/scoping issue, if there is a better alternative, or why the visualisations feature is being cut.

All the best for the migration! This is a tool that saved a massive amount of effort when building vulekamali.gov.za

@jbothma
Copy link

jbothma commented Jul 20, 2020

We currently have quite detailed documentation for adding datasets using OS Packager which we then use in the API. This is because the people in our national treasury who would be preparing and uploading this data are not confident enough with yaml, git, and so on, to use the declarative and automated tooling.

Will OS Packager still be available? Will a replacement be available with equivalent functionality?

@rufuspollock
Copy link
Member Author

rufuspollock commented Jul 21, 2020

@jbothma

NOT migrating visualizations

I would love to hear more about this decision. We don't currently use these visualisations for vulekamali - they were never quite what we wanted or as user-friendly as we wanted. It is also quite easy for someone who doesn't quite know what they're doing to do an aggregate that is nonsensical, without realising it, and misinterpret the data.

I'm wondering if it's simply a cost/scoping issue, if there is a better alternative, or why the visualisations feature is being cut.

Like you said we have found people did not use this that much and we want to converge towards the Data Explorer which we use in DataHub http://tech.datopian.com/data-explorer/

We currently have quite detailed documentation for adding datasets using OS Packager which we then use in the API. This is because the people in our national treasury who would be preparing and uploading this data are not confident enough with yaml, git, and so on, to use the declarative and automated tooling.

Will OS Packager still be available? Will a replacement be available with equivalent functionality?

@jbothma our default plan is to replace it, probably with a general UI tool for adding data to DataHub. We get that we want an option for relatively non-technical users.

PS: @jbothma we now have a new chat channel on discord https://discord.gg/xvS9hwj

@rufuspollock rufuspollock moved this from Someday Maybe to Blocked (Waiting for) in Migration to New Setup (2020) Jul 23, 2020
@rufuspollock rufuspollock moved this from Blocked (Waiting for) to In progress in Migration to New Setup (2020) Jul 23, 2020
@rufuspollock
Copy link
Member Author

@jbothma as per email thread (and sorry to miss this in July)

Heads-up vulekamali.gov.za very much depends on the fiscal (babbage) API - particularly the model and aggregate endpoints. Possibly also facts and members.

Please give notice to jd@openup.org.za, info@openup.org.za and info@vulekamali.gov.za if this API will cease to work and requires changes in the implementation.

We're generally planning to deprecate the API and do that quite soon now (no-one other than you has contacted us or responded so far). However, as stated in original issue:

Spin up bespoke API service for those who need it

In your case it sounds like you would need that, at least for some period of time so we'll be looking to do that /cc @sglavoie

@rufuspollock
Copy link
Member Author

UPDATE: we have shut down OpenSpending uploading for the time being as we prepare the migration. We plan to re-enable this (on new setup) in January.

@schlos
Copy link

schlos commented Apr 14, 2021

Hi @rufuspollock ,
What will happen with offenerhaushalt.de? It also uses OpenSpending API (https://github.com/okfde/offenerhaushalt.de) and I've been working on a fork for Croatian open spending project.

Moreover, Open SpendingProject Team from Open Knowledge International (OKI) in cooperation with GIFT (Global Initiative for Fiscal Transparency) and BOOST initiative of the World bank worked on visualizations for Croatian state budget that is available at: https://openspending.org/viewer/667df60aa07c34260eae9b55b2778712:croatia-budget-spending?lang=en

  • what will happen to all those already existing visualizations on OpenSpending.org and access to these data via API?

Thanks,
Miroslav

@rufuspollock
Copy link
Member Author

@schlos can you have a catchup with @sglavoie who is leading on the migration. That way we can understand what you are doing at the moment and how that could work going forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

4 participants