# Blessed be the Fruit - Project documentation
<a href="https://orsolamborrini.github.io/blessedfruit/">Blessed be the Fruit</a> is a project developed by Maddalena Ghiotto, Chloe Papadopoulou, and Orsola Maria Borrini for the final exam of the course <a href="https://www.unibo.it/it/didattica/insegnamenti/insegnamento/2022/424645">"Open Access and Digital Ethics"</a> held by professor Monica Palmirani within the <a href="https://corsi.unibo.it/2cycle/DigitalHumanitiesKnowledge">Digital Humanities and Digital Knowledge Master Degree</a> (University of Bologna), during the A.Y. 2022/2023.

## Introduction
Many existing studies, mainly focused on the US, have been relating teen pregnancy to a variety of socioeconomic factors that may influence it. Amongst them, low income and poverty, education levels, race or ethnicity and, finally, religion.<br> In this project we wanted to move the focus away from the US, one of the industrialized countries in which teen pregnancy and birth rates are the highest, to Italy, and study whether there could be a relation between education, religion observance and <b>pregnancy rates</b> in the Mediterranean country.

As the headquarters of the Catholic Church reside in Vatican City, enclaved in Rome, the relationship between Italians and this Church surely is particularly strong. However, according to <a href="https://www.gesis.org/en/eurobarometer-data-service/survey-series/standard-special-eb/study-overview/eurobarometer-904-za7556-december-2018">Eurostat’s Eurobarometer survey</a> in 2018, 85.6% of Italy’s population is Christian, while 2.6% is a follower of other religions and 11.7% are non-religious. As we wanted to analyse any possible influence on pregnancy rates in young women with no distinction between the various faiths, we have decided to discard the special relation with the Catholic Church and have considered <b>general religious observance</b>.

Nevertheless, correlation is not causation, and there are surely many other factors contributing to pregnancy rates in young women: therefore, we have also included the <b>education level</b>, considering the early leavers from higher education (aged 18 to 24).


## Scenario

[<b>Istat</b>](https://www.istat.it/)  is the Italian National Institute of Statistics, the main producer of official statistics in the service of citizens and policy-makers. It is structured in several different databases that allow for browsing and downloading the data produced by the institute for free.<br>
Specifically, we have used both general and specific databases:
<ul>
<li><a href="http://dati.istat.it/?lang=en"><b>I.Stat</b></a>, a datawarehouse organised by theme, presented in multidimensional tables and with a wide range of standard metadata</li>
<li><a href="https://esploradati.istat.it/databrowser/#/"><b>IstatData</b></a>, the new database into which all I.Stat content will be gradually migrated (until the data transfer is completed, the two systems will coexist)</li>
<li><a href="https://demo.istat.it/?l=en"><b>Demo - Demography in figures</b></a>, providing official data on resident population in the Italian municipalities and information on main demographic phenomena</li>
</ul>
To make our project as accessible as possible even in the future we have preferred IstatData over I.Stat when possible.


### Statement of responsibility
Team member | Task | Contact
--- | --- | ---
Maddalena Ghiotto | Project Ideation — Data retrieval — Mashup datasets — Technical analysis — RDF assertion of the metadata | [contact](maddalena.ghiotto@studio.unibo.it)
Chloe Papadopoulou | Project ideation — Data retrieval — Ethical analysis — Visualization — Sustainability of the update | [contact](chloi.papadopoulou@studio.unibo.it)
Orsola Maria Borrini | Project ideation — Data retrieval — Mashup datasets — Quality and legal analyses — Website development | [contact](orsolamaria.borrini@studio.unibo.it)


## Original and mashup datasets
The project comprises the use of <b>16 different datasets</b>, between source ones and mashup ones.

The <b>7 source datasets</b> have been downloaded in .csv format from different databases belonging to Istat:

Id | Dataset | Description (factor of interest) | Provenience | Link / Path
--- | --- | --- | --- | --- 
D1 | Population estimates 2002-2019 by age and sex at Jan 1st | POPULATION | demo | [Link](https://demo.istat.it/app/?i=RIC&l=en)
D2 | Resident population by age, sex and marital status on 1st January 2022 | POPULATION | demo | [Link](https://demo.istat.it/app/?i=POS&l=en)
D3 | Aspects of daily life: Religious observances - regions and type of municipality | RELIGION | I.Stat | Daily life and citizen opinions > Social Activities and religious observances > Religious observances - regions and type of municipality
D4 | Mother - Age and citizenship | PREGNANCY | IstatData | [Link](https://esploradati.istat.it/databrowser/#/en/dw/categories/IT1,POP,1.0/POP_BIRTHFERT/DCIS_NATI1/DCIS_NATI1_PARENT_CHARACT/IT1,25_74_DF_DCIS_NATI1_8,1.0)
D5 | Spontaneous abortions - resignation from the place of the event: Age of women - prov. | PREGNANCY | I.Stat | Health Statistics > Women Reproductive Health > Spontaneous abortions - resignation from the place of the event > Provincial data > Age of women - prov.
D6 | Induced abortions - Migration: Events by region of residence of the woman and region of intervention | PREGNANCY | I.Stat | Health Statistics > Women Reproductive Health > Voluntary interruptions of pregnancy - characteristics of the woman > Provincial data > Age - prov. of event
D7 | Early leavers from education and training - aged 18 to 24 - previous regulation (until 2020) | EDUCATION | I.Stat | Education and Training > Early leavers from education and training - aged 18 to 24 - previous regulation (until 2020) > Data summary

During the <b>download phase</b>, we have manually filtered out everything that was not of interest for our research, keeping only the data strictly related to our research question (we have, for example, discarded any information related to marital status in the datasets regarding population).

Still, the source datasets went through an additional <b>clean up phase</b> in which we discarded duplicate (e.g., columns with different names and values but referring to the same information) and irrelevant data and, when necessary, added missing "coded data" to allow for an easier management of the datasets.

Finally, we proceeded with the <b>mashup phase</b>, creating the final three main mashup datasets used to answer our research question. As with source and clean datasets, we distinguished between the three years of our time span of interest: in this way, we ended up with 9 final mashup datasets (three for each factor of interest).

Id | Dataset | Description (factor of interest) | Original source datasets | Year
--- | --- | --- | --- | ---
MD1_2017 | Religious observance in each region | RELIGION - % of religious observance in each region (over the total population) | D1, D2, D3 | 2017
MD1_2018 | Religious observance in each region | RELIGION - % of religious observance in each region (over the total population) | D1, D2, D3 | 2018
MD1_2019 | Religious observance in each region | RELIGION - % of religious observance in each region (over the total population) | D1, D2, D3 | 2019
MD2_2017 | Pregnancy rates in young women in each region | PREGNANCY - % of pregnancies in young women (15-25) in each region (over the total population of young women aged 15-25) | D4, D5, D6 | 2017
MD2_2018 | Pregnancy rates in young women in each region | PREGNANCY - % of pregnancies in young women (15-25) in each region (over the total population of young women aged 15-25) | D4, D5, D6 | 2018
MD2_2019 | Pregnancy rates in young women in each region | PREGNANCY - % of pregnancies in young women (15-25) in each region (over the total population of young women aged 15-25) | D4, D5, D6 | 2019
MD3_2017 | (Higher) education rates in young women in each region | EDUCATION - % of women early leavers (18-24) in each region (over the total population) | D1, D2, D7 | 2017
MD3_2018 | (Higher) education rates in young women in each region | EDUCATION - % of women early leavers (18-24) in each region (over the total population) | D1, D2, D7 | 2018
MD3_2019 | (Higher) education rates in young women in each region | EDUCATION - % of women early leavers (18-24) in each region (over the total population) | D1, D2, D7 | 2019

The code and more detailed documentation for the clean up and mashup phases is freely donwloadable and can be found in `documentation > CLEAN.ipynb` and `documentation > MASHUP.ipynb`.

## Quality analysis
Following the Italian <a href="https://docs.italia.it/italia/daf/lg-patrimonio-pubblico/it/stabile/aspettiorg.html#qualita-dei-dati"><b>National Guidelines</b></a> ("Linee guida nazionali per la valorizzazione del patrimonio informativo pubblico"), developed in the context of the Data & Analytics Framework project by AgID and the Digital Transformation Team, we have performed a quality analysis of our source datasets to ensure their <b>good condition</b> and their <b>suitability</b> for the intended use.
Specifically, there are four main factors to look for when analysing data quality:
<ul>
<li><b>Accuracy (syntactic and semantic)</b>: the data and its attributes correctly represent the real value of the concept or event they refer to</li>
<li><b>Coherence</b>: the data and its attributes do not present any contradictions with respect to other data in the context of use by the administration owner</li>
<li><b>Completeness</b>: the data are exhaustive for what concerns every expected value and with respect to the related entities (sources) that contribute to the definition of the procedure</li>
<li><b>Timeliness (or promptness of updating)</b>: the data and its attributes refer to the "correct time" (up to date) with respect to the procedure they refer to</li>
</ul>

The following table showcases the quality of each of the source datasets and highlights possible flaws.
Id | Accuracy | Coherence | Completeness | Timeliness
--- | --- | --- | --- | --- 
D1 - Population 2017 | Satisfied | x | x | x
D2 - Population 2018, 2019 | x | x | x | x
D3 - Religious observance | x | x | x | x
D4 - Live births | x | x | x | x
D5 - Spontaneous abortions | x | x | x | x
D6 - Induced abortions | x | x | x | x
D7 - Early leavers from education | x | x | x | x


## Legal analysis

### Privacy Issues
To check: | D1 - Population 2017 | D2 - Population 2018, 2019 | D3 - Religious observance | D4 - Live births | D5 - Spontaneous abortions | D6 - Induced abortions | D7 - Early leavers from education
--- | --- | --- | --- | --- | --- | --- | --- 
Is the dataset free of any personal data as defined in the Regulation (EU) 2016/679? | y | y | y | y | y | y | y 
Is the dataset free of any indirect personal data that could be used for identifying the natural person? | y | y | y | y | y | y | y 
Is the dataset free of any particular personal data (art. 9 GDPR)? | y | y | y | y | y | y | y
Is the dataset free of any information that combined with common data available in the web, could identify the person? | y | y | y | y | y | y | y 
Is the dataset free of any information related to human rights (e.g., refugees, witness protection, etc.) | y | y | y | y | y | y | y 
Do you use a tool for calculating the range of the risk of deanonymization? | y | y | y | y | y | y | y 
Are you using geolocalization capabilities? | y | y | y | y | y | y | y 
Did you check that the open data platform respect all the privacy regulations (registration of the end-user, profiling, cookies, analytics, etc.)? | y | y | y | y | y | y | y
Do you know who is, in your open data platform, the Controller and Processor of the privacy data of the system? | y | y | y | y | y | y | y 
Where are the datasets physically stored (country and jurisdiction)? | y | y | y | y | y | y | y 
Do you have non-personal data? | y | y | y | y | y | y | y 

### Intellectual Property Rights
To check: | D1 - Population 2017 | D2 - Population 2018, 2019 | D3 - Religious observance | D4 - Live births | D5 - Spontaneous abortions | D6 - Induced abortions | D7 - Early leavers from education
--- | --- | --- | --- | --- | --- | --- | --- 
Have you created and generated the dataset? | y | y | y | y | y | y | y 
Are you the owner of the dataset? | y | y | y | y | y | y | y 
Are you sure not to use third party data without the proper authorization and license? | y | y | y | y | y | y | y
Have you checked if there are any limitations in your national legal system for releasing some kind of datasets with open license? | y | y | y | y | y | y | y 

### Licences
To check: | D1 - Population 2017 | D2 - Population 2018, 2019 | D3 - Religious observance | D4 - Live births | D5 - Spontaneous abortions | D6 - Induced abortions | D7 - Early leavers from education
--- | --- | --- | --- | --- | --- | --- | --- 
Do you release the dataset with an open data licence? | y | y | y | y | y | y | y 
Do you include the clause: "In any case the dataset can't be used for re-identifying the person"? | y | y | y | y | y | y | y 
Do you release the API (in case you have it) with an open source license? | y | y | y | y | y | y | y
Have you checked that the open data/API platform licence regime is in compliance with your IPR policy? | y | y | y | y | y | y | y 


### Limitations on public access
To check: | D1 - Population 2017 | D2 - Population 2018, 2019 | D3 - Religious observance | D4 - Live births | D5 - Spontaneous abortions | D6 - Induced abortions | D7 - Early leavers from education
--- | --- | --- | --- | --- | --- | --- | --- 
Do you check that the dataset concerns your institutional competences, scope and finality? | y | y | y | y | y | y | y 
Do you check the limitations for the publication stated by your national legislation or by the EU directives? | y | y | y | y | y | y | y 
Do you check if there are some limitations connected to the international relations, public security or national defence? | y | y | y | y | y | y | y
Do you check if there are some limitations concerning the public interest? | y | y | y | y | y | y | y 
Do you check the international law limitations? | y | y | y | y | y | y | y
Do you check the INSPIRE law limitations for the spatial data? | y | y | y | y | y | y | y 


### Economical conditions
To check: | D1 - Population 2017 | D2 - Population 2018, 2019 | D3 - Religious observance | D4 - Live births | D5 - Spontaneous abortions | D6 - Induced abortions | D7 - Early leavers from education
--- | --- | --- | --- | --- | --- | --- | --- 
Do you check that the dataset could be released for free? | y | y | y | y | y | y | y 
Do you check if there are some agreements with some other partners in order to release the dataset with a reasonable price? | y | y | y | y | y | y | y 
Do you check if the open data platform terms of service include a clause of “non liability agreement” regarding the dataset and API provided? | y | y | y | y | y | y | y
In case you decide to release the dataset to a reasonable price do you check if the limitation imposed by the new directive 2019/1024/EU are respected? | y | y | y | y | y | y | y 
In case you decide to release the dataset to a reasonable price do you check the e-Commerce directive and regulation? | y | y | y | y | y | y | y


### Temporary aspects
To check: | D1 - Population 2017 | D2 - Population 2018, 2019 | D3 - Religious observance | D4 - Live births | D5 - Spontaneous abortions | D6 - Induced abortions | D7 - Early leavers from education
--- | --- | --- | --- | --- | --- | --- | --- 
Do you have a temporary policy for updating the dataset? | y | y | y | y | y | y | y 
Do you have some mechanism for informing the end-user that the dataset is updated at a given time to avoid mis-usage and so potential risk of damage? | y | y | y | y | y | y | y 
Did you check if the dataset for some reason cannot be indexed by the research engines (e.g., Google, Yahoo, etc.)? | y | y | y | y | y | y | y
In case of personal data, do you have a reasonable technical mechanism for collecting request of deletion (e.g., right to be forgotten)? | y | y | y | y | y | y | y 

## Ethical analysis

## Technical analysis

Source datasets:
Id | Provenience | Format | Metadata | URI | Licence
--- | --- | --- | --- | --- | ---
D1 | [demo](https://demo.istat.it/?l=en) | .csv, .xlsx, .pdf | Not provided |  [Link](https://demo.istat.it/app/?i=RIC&l=en) | CC BY 3.0
D2 | [demo](https://demo.istat.it/?l=en) | .csv, .xlsx, .pdf | Not provided |  [Link](https://demo.istat.it/app/?i=POS&l=en) | CC BY 3.0
D3 | [I.Stat](http://dati.istat.it/?lang=en) |  .csv, .xlsx, .px, .xml | Provided |  [Link](http://dati.istat.it/index.aspx?queryid=24349) | CC BY 3.0
D4 | [IstatData](https://esploradati.istat.it/databrowser/#/) | .json, .xml, .xlsx, .csv | Provided | [Link](https://esploradati.istat.it/databrowser/#/en/dw/categories/IT1,POP,1.0/POP_BIRTHFERT/DCIS_NATI1/DCIS_NATI1_PARENT_CHARACT/IT1,25_74_DF_DCIS_NATI1_8,1.0) | CC BY 3.0
D5 | [I.Stat](http://dati.istat.it/?lang=en) | .csv, .xlsx, .px, .xml | Provided |  [Link](http://dati.istat.it/index.aspx?queryid=29218) | CC BY 3.0
D6 | [I.Stat](http://dati.istat.it/?lang=en) | .csv, .xlsx, .px, .xml | Provided |  [Link](http://dati.istat.it/index.aspx?queryid=7098) | CC BY 3.0
D7 | [I.Stat](http://dati.istat.it/?lang=en) | .csv, .xlsx, .px, .xml | Provided |  [Link](http://dati.istat.it/Index.aspx?DataSetCode=DCCV_ESL_UNT2020) | CC BY 3.0

Mashup datasets:
Id | Creation date | Format | Metadata | URI | Licence
--- | --- | --- | --- | --- | ---
MD1 | creation_date | .csv | Provided | [MD1_17](https://github.com/OrsolaMBorrini/blessedfruit/blob/main/data/mashupDS/MD1_17.csv), [MD1_18](https://github.com/OrsolaMBorrini/blessedfruit/blob/main/data/mashupDS/MD1_18.csv), [MD1_19](https://github.com/OrsolaMBorrini/blessedfruit/blob/main/data/mashupDS/MD1_19.csv) | final_licence
MD2 | creation_date | .csv | Provided | [MD2-PERC-17](https://github.com/OrsolaMBorrini/blessedfruit/blob/main/data/mashupDS/MD2-PERC-2017.csv), [MD2-PERC-18](https://github.com/OrsolaMBorrini/blessedfruit/blob/main/data/mashupDS/MD2-PERC-2018.csv), [MD2-PERC-19](https://github.com/OrsolaMBorrini/blessedfruit/blob/main/data/mashupDS/MD2-PERC-2019.csv) | final_licence
MD3 | creation_date | .csv | Provided | [MD3_17](https://github.com/OrsolaMBorrini/blessedfruit/blob/main/data/mashupDS/MD3_17.csv), [MD3_18](https://github.com/OrsolaMBorrini/blessedfruit/blob/main/data/mashupDS/MD3_18.csv), [MD3_19](https://github.com/OrsolaMBorrini/blessedfruit/blob/main/data/mashupDS/MD3_19.csv) | final_licence

## Sustainability of the update

## Visualization

## RDF Assertion of the metadata