Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DB structure COVID 19 #107

Open
alessandroNa opened this issue Feb 23, 2020 · 6 comments
Open

DB structure COVID 19 #107

alessandroNa opened this issue Feb 23, 2020 · 6 comments

Comments

@alessandroNa
Copy link

@alessandroNa alessandroNa commented Feb 23, 2020

I manipulated the file structure and created a single DB with the following structure

| Data | Province / State | Country / Region | Type | Value |

The manipulation was done by hand so the file is not updated systematically. I hope we can automate data extraction with this structure in order to make the analysis easier.

Obviously report any errors

Covid 19.xlsx

Good luck to all

@3verse
Copy link

@3verse 3verse commented Feb 24, 2020

hi @DataEnthusiast84 thanks for sending this over.
Question, might be just me :) ... how come you're not using a column for confirmed/recovered/deaths? Wouldn't that reduce the overall number of records and data redundancy of your table?

@alessandroNa
Copy link
Author

@alessandroNa alessandroNa commented Feb 24, 2020

Hello @3verse and thanks for the question!

I state that I am not an expert systems engineer but from what little I remember during the examination of databases is that creating a database structure as simple as possible for queries is the best choice.
I say this as a non-expert, however, during my previous work I have always found myself at an advantage to have structures of this kind.
Also because to select data it would be enough (for example in SQL) to apply a "where" clause.
I add: a structure like mine is easily manipulated through subsequent queries (and on excel via pivot tables) while a structure like the one you assume is already the result of a data organization in a certain type. Let's say that the DB as I built it would be the simplest possible model that allows data processing and organization at all levels and in all ways.

This is why I am "pushing" those responsible for this project to create files of this type ... ease of organizing data in any way rather than, for example, creating a column for each day.

For curiosity: I write from Italy...

P.S:I'm sorry for my bad English...

@3verse
Copy link

@3verse 3verse commented Feb 24, 2020

hi @DataEnthusiast84,

Definitely the structure with all dates spread across columns didn't make much sense to me as I would probably consider one record per day/location, so I agree with you on that one - it must be related to how other tools can digest it, I'm using Tableau for my visualisations for example and I could still pivot it in a way that works well, although it would force me to having to do this every day to update my dashboards.

As per the use of three columns per confirmed/recovered/deaths I'd suggest that approach so that's simply easier to get totals by days without having to resort to complicated queries when selecting the data. Say, for example, you want to get a daily total of confirmed/recovered/deaths to trend them on a line chart, you'd only have to query something that would look like
select date, region, sum(confirmed), sum(recovered), sum(deaths) group by date order by date

makes sense?

p.s.: absolutely no need to apologise :) also, I'm from Italy too but I live in London at the moment.

@alessandroNa
Copy link
Author

@alessandroNa alessandroNa commented Feb 24, 2020

Hello @3verse

it is a pleasure to talk to you!

In the case of a structured DB as I proposed, it would be enough to insert a clause of type "where".
For example:

select date, region, sum (value)
from tableExample
where type = confirmed
group by ...

Warning: in the data set that is made available the sum (value) or the sum (confirmed) does not provide readable data since the daily data are the "accumulated" and not only the new cases. The new cases are the difference between day n and day n-1.

Regarding the number of columns, I imagine that for example mysql places a limit on the management of the columns. Generally speaking, it is always better to have a DB with "infinite" rows than a DB with numerous columns!
Generally speaking, we can say that "type" is an attribute of the entity (which we can call coronavirus daily events). And then if for example you had to add a further "type" for example the "healed" one way (right) is to update the table of "types" or another way (in my personal opinion wrong) is to add an additional column to an already existing DB (you could talk about problems with control and validation set data).

I hope I explained myself with my bad English!

@3verse
Copy link

@3verse 3verse commented Feb 24, 2020

Pleasure is mine @DataEnthusiast84 :)

@alessandroNa
Copy link
Author

@alessandroNa alessandroNa commented Feb 25, 2020

Update: DB structure at 24/02/2020

Covid 19 20200224.xlsx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants