Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The worst data management ever #1508

Open
avatorl opened this issue Mar 25, 2020 · 12 comments
Open

The worst data management ever #1508

avatorl opened this issue Mar 25, 2020 · 12 comments

Comments

@avatorl
Copy link

@avatorl avatorl commented Mar 25, 2020

Dear @CSSEGISandData,

you're the only team that provides consolidated data fast enough (WHO has 2 days delays). But if you need help with data management please free to ask for the community help! I'm sure there are a lot of professionals here to suggest you better data management approaches.

Because renaming 'South Korea' into 'Korea, South', 'Long' field into 'Long_', changing CSV file structure without any notification, then doing the same with March 22 file today is just insane. Honestly, it's the worst data management ever... But I'm sure all together we can improve this process and you'll be the best COVID-19 data provider in the world...

Kind regards,
Andrzej

My work https://avatorl.org/covid-19/ depends on you. In any case, thank you for your hard work!

@amnonbc
Copy link

@amnonbc amnonbc commented Mar 25, 2020

Every time the data schema is changed, every application that relies on this data gets broken.

@oltdaniel
Copy link

@oltdaniel oltdaniel commented Mar 25, 2020

@avatorl @amnonbc I try to have an consisten structure by converting it each time on an update. Available under https://c.oltdaniel.at . But yes, I need to keep an eye on the data changes each time and it is really annoying.

The new weird line I found is a7f6c19#diff-f5c0b4fda7af2ad33c80e431ca1030adR240 and now I need to write a special case for it.

@XaXaXa12345
Copy link

@XaXaXa12345 XaXaXa12345 commented Mar 25, 2020

The data from this project is simply wrong.
Italy,France,Spain, Switzerland don't math official figures.
It seems like guys are feeding fake data to the public.
I've switched my projects to www.worldometers.info/coronavirus/
It's much more reliable and trustworthy than junk that is provided by this project.

@cortical-iv
Copy link

@cortical-iv cortical-iv commented Mar 25, 2020

after the breaking changes instituted here, and lack of response and communication and professionalism, I switched over to this:
https://covidtracking.com/

For the data interface:
https://covidtracking.com/api/

It is maintained by real developers.

@idkmsu
Copy link

@idkmsu idkmsu commented Mar 25, 2020

Why are you depending on someone else's data for your application? And then why are you complaining about data that someone else's application uses and is designed for? And why are you complaining about your application breaking when they change the data format to work with their application, not yours. You all need to gain a little perspective, realize that this isn't your project data, and treat it with respect instead of bashing the project. It's not yours, they are kind enough to post the data for others to use, however they are not responsible for your purpose in the end. The world is already in a pretty bad place, the last thing anyone needs is to hear you gripe and complain about your applications breaking because the owner of this changes data to work with their application more effectively. Shame on you all.

@oltdaniel
Copy link

@oltdaniel oltdaniel commented Mar 25, 2020

@idkmsu That's like the worst argument I've heard. You publish data so others can use it, and using means they depend on it.

@idkmsu
Copy link

@idkmsu idkmsu commented Mar 25, 2020

But they aren't maintaining the data for your application are they? You should be thanking them for doing it at all, not bashing them and acting like jerks.

@oltdaniel
Copy link

@oltdaniel oltdaniel commented Mar 25, 2020

@idkmsu You have a certain responsibility by publishing a dataset. If you change some stuff each damn day, and you are the biggest known source even used by TV channels, your "fans" or developers relying on this source get angry, cause they need to fix up shit every day.

Yes we are all grateful for this data source, but the data format they present to us is definitely not the on they feed to their dashboard, as it is really hard to parse their stuff every day in a new format and add more and more special cases you can replace the next day by new ones.

@idkmsu
Copy link

@idkmsu idkmsu commented Mar 25, 2020

I'll agree to disagree. If I am publishing a dataset that I am using for my project, anyone else that uses it for their own purposes is responsible for modifying their application to work with my data. Anyone that gripes about datasets changing for their own purposes from a project that isn't theirs is just lazy. You are spoiled by CI/CD.

@amnonbc
Copy link

@amnonbc amnonbc commented Mar 25, 2020

The data was presumably published to Github to allow people to use it and build systems which analyse and display the parameters of the pandemic.
But anyone who builds such a system will regularly have it broken by schema changes.

You say that I am lazy - because I don't want to keep adapting my application to chase after the latest schema change. And you may be right. Once bitten, twice shy. I have moved my application to pull data from https://raw.githubusercontent.com/datasets/covid-19/master/data/time-series-19-covid-combined.csv . Hopefully the maintainers will keep this data schemas stable.

@roenw
Copy link

@roenw roenw commented Mar 26, 2020

@idkmsu Although I do thank them for the work they do, if I build my application around their data and their formatting, I could have a certain expectation that the data will continue to work with my application for the foreseeable future.

The data changes have broken the backend Python/Flask API that my application is built on, too. Changes in data formatting have widespread effects.

@idkmsu
Copy link

@idkmsu idkmsu commented Mar 26, 2020

Until I see that the whiners on this comment thread have donated a LARGE sum of money to this group, or even better yet, show me that they are working on the data aggregation efforts, I'm standing by my statements and position that it is shameful that you guys are complaining about a free product. Next thing you guys are going to be doing is asking for a refund. If you aren't paying for it, don't complain about it. There is enough going on in the world that you entitled developers think you can complain about something that is not designed for you. It's data being shared. Work with it, or aggregate it yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants