Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Italian data source #119

Open
AlessandroAnnini opened this issue Feb 24, 2020 · 29 comments
Open

Italian data source #119

AlessandroAnnini opened this issue Feb 24, 2020 · 29 comments

Comments

@AlessandroAnnini
Copy link

@AlessandroAnnini AlessandroAnnini commented Feb 24, 2020

Hi,
could you please update your data with the italian official data source:
http://www.salute.gov.it/portale/nuovocoronavirus/dettaglioContenutiNuovoCoronavirus.jsp?lingua=italiano&id=5351&area=nuovoCoronavirus&menu=vuoto

actually it's not a "data source", it's just a web page but it's updated frequently and you have the "Province/State".

@alessandroNa
Copy link

@alessandroNa alessandroNa commented Feb 24, 2020

small reflection:

it is shameful that in 2020 it is not possible to have access to structured data but that communications take place through static web pages.

@AlessandroAnnini
Copy link
Author

@AlessandroAnnini AlessandroAnnini commented Feb 24, 2020

@DataEnthusiast84 I think just the same: it's an ugly page with orthographic errors as I tweeted here on their account.

Italian government is the one using the most outdated/illogical/opaque technologies in the world.
It's a shame that no one react to this, not even journalists that should work using the best they can... while they cannot even translate correctly:
here the writer says that the data can be downloaded as a google spreadsheet, probably because she has no idea of what github is.

so, yeah, you're pretty right!

p.s. hours later the error is still there

@iceweasel1
Copy link

@iceweasel1 iceweasel1 commented Feb 24, 2020

small reflection:

it is shameful that in 2020 it is not possible to have access to structured data but that communications take place through static web pages.

As our Chancellor Dr. Angela Merkel said in Germany in 2013 over the Internet:

"The internet is uncharted territory for all of us."

I think that's probably true for the whole world 7 years later...

@alessandroNa
Copy link

@alessandroNa alessandroNa commented Feb 25, 2020

Hello @AlessandroAnnini

I suggest you create a db that can be updated on the cases that are published daily on the various sources. What do you think about it? taking as reference the official sources of course ...

Obviously the proposal is open to all.
It could be done here on github or on google sheets.
My only problem is that using Qliksense I can't connect it directly with github but if someone more experienced than me can create some scripts that directly update a sheet on google sheets it would be wonderful.

I assume a DB structure like the following:

| Date | Region | Province | Common | Type | Value |

Where by type there will be: positive, deceased, discharged.

If we organize ourselves, we create a repository for the Italian situation here on github or on kaggle ...

@AlessandroAnnini
Copy link
Author

@AlessandroAnnini AlessandroAnnini commented Feb 25, 2020

It's a nice idea but adding another source, unofficial and delayed would have no value but for everybody else... this should be institutional, official and realtime (I pay taxes in Italy, it would be nice to have back at least a simple service like this).

I made a tweet about this, feel free to retweet and spam it everywhere: https://twitter.com/ale_annini/status/1232282022229532672

@alessandroNa
Copy link

@alessandroNa alessandroNa commented Feb 25, 2020

@AlessandroAnnini I think that a lot of data is in the possession of the ministry / civil protection and that they do not think to share it in platforms like GitHub. Definitely questionable choice.
Today I heard statistics on the number of infected men and women etc ... So I imagine that the data is there but ... that it is not shared uniformly and universally with tools accessible to all

@marinog82
Copy link

@marinog82 marinog82 commented Feb 28, 2020

I've created an API with a data in this structure :
| Date | Region | Province | Common | Type | Value |

and every day i update the data with information found on official site : http://www.salute.gov.it/portale/nuovocoronavirus/dettaglioContenutiNuovoCoronavirus.jsp?lingua=italiano&id=5351&area=nuovoCoronavirus&menu=vuoto

This is url of API :

https://sheets.googleapis.com/v4/spreadsheets/1jxkZpw2XjQzG04VTwChsqRnWn4-FsHH6a7UHVxvO95c/values/Dati?majorDimension=ROWS&key=AIzaSyAy6NFBLKa42yB9KMkFNucI4NLyXxlJ6jQ

@alessandroNa
Copy link

@alessandroNa alessandroNa commented Feb 29, 2020

great works @marinog82 !!!
A questions: can you link directy the google spreadsheet or create a repository here?

I use qliksense business (ex cloud) but I don't understand in which mode can I connect to your fantastic work

I suggest to all this repository that i created
DataEnthusiast84/CoViD-19-Italy
but it's very very hard to know the data with this details!

P.S: da Abruzzese ti suggerisco...Abruzzo una sola "b" ;)

@marinog82
Copy link

@marinog82 marinog82 commented Feb 29, 2020

yes absolutely
You can find the spreadsheet a this link sheet "Dati" :

https://docs.google.com/spreadsheets/d/1jxkZpw2XjQzG04VTwChsqRnWn4-FsHH6a7UHVxvO95c/edit?usp=sharing

@alessandroNa
Copy link

@alessandroNa alessandroNa commented Mar 1, 2020

@marinog82 thanks for sharing!!!! I suggest you to see this repository that I created...
DataEnthusiast84/CoViD-19-Italy

There is little data but I think that the scheme in that example is a little bit interesting.

P.S: your script how often it updates? Because in the page http://www.salute.gov.it/nuovocoronavirus i think that the ufficial update it's at 18.00 PM.

@ThomasAlessandro
Copy link

@ThomasAlessandro ThomasAlessandro commented Mar 2, 2020

Hi @marinog82 @DataEnthusiast84 ,

I would like to share also my data. I have gathered this information from the salute.gov.it wesite and the protezionecivile.gov.it website. As for today I have not yet found another source for the data in Italy (for example sex, country or age of the patient) although for a while they were reporting sex and age (at least of the deseased), so that info is somewhere.

From what I can observe, the data in Italy updates every day at 18:00 GMT +1 after the press conference of the protezione civile.

Here is the link to the data: https://1drv.ms/x/s!Au3cCceWILI85BbL648-rWtVp8cs?e=tJJ4Wt

Please do let me know if you have any feedback. I would like to add the R0 for each region, but I still don't know how you calculate this.

Thank you

@alessandroNa
Copy link

@alessandroNa alessandroNa commented Mar 3, 2020

Hi @ThomasAlessandro !

thanks for sharing your work!
From what I can understand is that there is currently no "open" data source and the only data that we have available are those that provide us with the Ministero della Salute organized as we see every day.

I hope that we have these data at the ministry level.

@etrevis
Copy link

@etrevis etrevis commented Mar 4, 2020

Hi all,

thanks for sharing! @ThomasAlessandro @marinog82
I'm struggling to find data from single provinces. I now it's made available daily here (here-pdf) and that the Italian Wikipedia page somehow plots the data on the map. Any guess on how to get it?

@ThomasAlessandro
Copy link

@ThomasAlessandro ThomasAlessandro commented Mar 4, 2020

Hi @etrevis,

Digging around the wiki page I have found out that the guy updating the map is called night lantern and he is getting data from a website called the Local. I don't know where the local is getting its data from as the website frequently asks me to sign up to their newsletter and blocking the page :P. As for you, so far I am using the information an data given by the protezione civile.

If you can manage to discover where the local is getting its data, let me know :)

Hope this helps

@etrevis
Copy link

@etrevis etrevis commented Mar 4, 2020

Hi @ThomasAlessandro,

I tried the same without success. Then I found historic data on ilsole24ore.it . I managed to send a message to the designer who made the infographics, hopefully she will share the data :)
Anyway, I will start recording it from today.

@alessandroNa
Copy link

@alessandroNa alessandroNa commented Mar 4, 2020

Hi all!!

@etrevis
let us know if they share the data! would be very nice!

@alessandroNa
Copy link

@alessandroNa alessandroNa commented Mar 5, 2020

yes absolutely
You can find the spreadsheet a this link sheet "Dati" :

https://docs.google.com/spreadsheets/d/1jxkZpw2XjQzG04VTwChsqRnWn4-FsHH6a7UHVxvO95c/edit?usp=sharing

@marinog82 what about the update of this spreadsheet? The file it's stopped at 4/3/2020...

@AlessandroAnnini
Copy link
Author

@AlessandroAnnini AlessandroAnnini commented Mar 6, 2020

@etrevis the data you're looking for can be found here:
http://www.protezionecivile.gov.it/documents/20182/1221364/Dati+Province+5marzo2020

if you modify the url with another date you can get that data too, but only starting from march 2nd
http://www.protezionecivile.gov.it/documents/20182/1221364/Dati+Province+2marzo2020

Obviously the connection is NOT secure... it is http and not https, this will result in 'mixed content' issues when you try to get the data from a script.

In Italy we force people to use certified email (PEC) but we don't believe in https connection because WHATEVER!!!

@marinog82
Copy link

@marinog82 marinog82 commented Mar 6, 2020

@DataEnthusiast84 Yesterday the government changed the communicated now using a pdfs,
it's necessary change a my script

@AlessandroAnnini
Copy link
Author

@AlessandroAnnini AlessandroAnnini commented Mar 6, 2020

@marinog82 did you manage to get the document even if hosted from an unsecure server?

@marinog82
Copy link

@marinog82 marinog82 commented Mar 6, 2020

@AlessandroAnnini which document? The pdf?

@AlessandroAnnini
Copy link
Author

@AlessandroAnnini AlessandroAnnini commented Mar 6, 2020

yes, that one

@marinog82
Copy link

@marinog82 marinog82 commented Mar 6, 2020

@AlessandroAnnini you can get the document in this mode :
curl -L -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36' http://www.protezionecivile.g
ov.it/documents/20182/1221364/Dati+Riepilogo+Nazionale+3marzo2020

@alessandroNa
Copy link

@alessandroNa alessandroNa commented Mar 6, 2020

Instead of making the data open ... now we spread it through pdf ...

Poor Italy...

@AlessandroAnnini
Copy link
Author

@AlessandroAnnini AlessandroAnnini commented Mar 6, 2020

tnx @marinog82, we're probably talking about different scripts :)

@alessandroNa
Copy link

@alessandroNa alessandroNa commented Mar 6, 2020

@marinog82

https://docs.google.com/spreadsheets/d/1jxkZpw2XjQzG04VTwChsqRnWn4-FsHH6a7UHVxvO95c/edit#gid=0

warning: the script populates the worksheet from 6/2/2020 in capital letters. this can create problems for data analysis software.

For example:
LOMBARDY <> Lombardy

UPDATE:
The problem is that at the date 6/3/2020 is update the data of the 5/3/2020. Please verify...

@etrevis
Copy link

@etrevis etrevis commented Mar 7, 2020

@etrevis the data you're looking for can be found here:
http://www.protezionecivile.gov.it/documents/20182/1221364/Dati+Province+5marzo2020

if you modify the url with another date you can get that data too, but only starting from march 2nd
http://www.protezionecivile.gov.it/documents/20182/1221364/Dati+Province+2marzo2020

Obviously the connection is NOT secure... it is http and not https, this will result in 'mixed content' issues when you try to get the data from a script.

In Italy we force people to use certified email (PEC) but we don't believe in https connection because WHATEVER!!!

Thanks, I knew about these files but I was looking for something more, let's say, modern.
You can try 'forcing' https by typing it, but it seems like they misconfigured the server as the certificates are still valid.

As of now, ilsole24ore.it did not reply to my request. I have good news though, province data from 02/03/2020 are now available on my repo with a structure like the one provided by @ThomasAlessandro. Before that only (his) regional data are available.

I have automatized the process (except pdfs download, but that should be easy to implement) in python, soon I will upload the code.

Enjoy!

@ThomasAlessandro
Copy link

@ThomasAlessandro ThomasAlessandro commented Mar 7, 2020

Hi All,

I think this is going to be of big help to us

The protezione civile has just created their own map of the situation in Italy. You can find the repository here https://github.com/pcm-dpc/COVID-19

@alessandroNa
Copy link

@alessandroNa alessandroNa commented Mar 7, 2020

@ThomasAlessandro

Wow..
I will never stop thanking you for this report. Finally our prayers have been heard! Open data finally available and detailed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants