Open Access Website structure

1.  HOME

    1.  About the Project

    2.  Problem (scenario) and research questions

2.  Data

    1.  Brief intro

    2.  Source datasets

    3.  Mashup datasets

    4.  Processing of data

3.  Analysis

    1.  Quality Analysis

> Following Brazil’s Open Data Policy (Decree No. 8.777/2016) and the
> reuse principles described in the national open data
> platform [<u>dados.gov.br</u>](https://dados.gov.br/), we conducted a
> comprehensive quality assessment of all datasets used in this study.
> This evaluation ensures that our analysis of urbanization,
> deforestation, and flood-related disasters in Brazil is grounded in
> reliable and ethically reusable public data.
>
> The quality ≤ assessment was conducted using four key dimensions:
> Accuracy, Coherence, Completeness, and Timeliness.

1.  **Accuracy (Syntactic and Semantic)**

-   **S2ID (D1):** Data is self-reported by municipalities, which
    introduces variability in accuracy due to local reporting
    capabilities and technical standards.

-   **INMET (D2):** Rainfall data is collected from calibrated
    meteorological stations, ensuring a high degree of measurement
    reliability.

-   **IBGE (D3):** Official demographic statistics from the 2018 census;
    syntactically and semantically consistent.

-   **MapBiomas Urban Expansion (D4):** Derived from satellite imagery
    with validated classification processes. Urban land cover classes
    were extracted consistently.

-   **MapBiomas Deforestation (D5):** Follows the same rigorous image
    classification process as D4, with a strong accuracy track record
    across Brazilian biomes.

-   **(D6)**

1.  **Coherence**

-   Municipality names and state identifiers were standardized to ensure
    interoperability between datasets.

-   Rainfall (INMET) and disaster records (S2ID) were temporally and
    spatially aligned and showed expected patterns in high-risk regions.

-   Urban expansion and forest loss trends (MapBiomas) complemented each
    other and reflected known environmental transitions.

1.  **Completeness**

-   **S2ID:** Underreporting is a known issue in certain municipalities.
    The dataset required cleaning to handle encoding issues and numeric
    inconsistencies.

-   **INMET:** Complete for 2024, though metadata extraction for each
    weather station required manual merging.

-   **IBGE:** Fully complete for 2018. No updated census values are
    available at the municipal level for more recent years.

-   **MapBiomas (D4 & D5):** Complete and consistent at the national
    level, covering 1985–2022 with annual updates and no missing years.

1.  **Timeliness**

-   **S2ID:** Includes records up to 2024. Reporting delays may limit
    visibility of recent disaster impacts.

-   **INMET:** Fully up to date with daily records through 2024.

-   **IBGE:** Timeliness is limited by the date of the last census
    (2018), affecting population-based indicators.

-   **MapBiomas:** Last available version is 2022 (Collection 9), with
    annual updates generally released mid-year.

> **Summary Table – Data Quality Dimensions**

| **ID** | **Dataset**                   | **Accuracy** | **Coherence** | **Completeness** | **Timeliness** |
|----|--------------------------|---------|-----------|-------------|-----------|
| D1     | S2ID – Disaster Records       | Medium       | Medium        | Medium           | Medium         |
| D2     | INMET – Rainfall Data         | High         | High          | High             | High           |
| D3     | IBGE – Population Data (2018) | High         | High          | High             | Low            |
| D4     | MapBiomas – Urban Expansion   | High         | High          | High             | Medium         |
| D5     | MapBiomas – Deforestation     | High         | High          | High             | Medium         |

> All datasets were obtained through official Brazilian open data
> platforms and are used following the country's open data reuse policy.
> Any pre-processing performed preserved the dataset’s original
> structure, and all transformations are documented to support
> reproducibility and transparency.

1.  Legal Analysis

> This legal analysis evaluates the permissibility and sustainability of
> using and publishing the datasets incorporated into this project. The
> assessment is grounded in Brazilian open data regulation, especially
> the *Política de Dados Abertos* (Decree No. 8.777/2016), the *Lei de
> Acesso à Informação (LAI)*, and reuse principles from the national
> platform [<u>dados.gov.br</u>](https://dados.gov.br/).
>
> **1. Privacy Issues**
>
> All datasets used are classified as **non-personal data** under
> the *Lei Geral de Proteção de Dados (LGPD – Law No. 13.709/2018)*.
> None of the datasets
> ([<u>S2ID</u>](https://s2id.mi.gov.br/), [<u>INMET</u>](https://portal.inmet.gov.br/), [<u>IBGE</u>](https://www.ibge.gov.br/), <u>MapBiomas</u>)
> contain names, document numbers, biometric data, or any attribute that
> could identify individuals. S2ID includes aggregated figures on
> population affected by disasters, but these are fully anonymized and
> reported at the municipal level, presenting no risk of
> deanonymization.
>
> **2. Intellectual Property Rights**
>
> All datasets originate from official Brazilian public agencies and are
> released under public or open terms:

-   [**<u>S2ID</u>**](https://s2id.mi.gov.br/) – Ministério da
    > Integração e do Desenvolvimento Regional

-   [**<u>INMET</u>**](https://portal.inmet.gov.br/) – Instituto
    > Nacional de Meteorologia

-   [**<u>IBGE</u>**](https://www.ibge.gov.br/) – Instituto Brasileiro
    > de Geografia e Estatística

-   [**<u>MapBiomas</u>**](https://mapbiomas.org/) – Scientific and
    > civic coalition with open data policies

> These sources do not impose restrictions on reuse, as long as proper
> attribution is ensured.
>
> **3. Licenses**
>
> Datasets are licensed as follows:

-   **IBGE**: CC BY 4.0

-   **MapBiomas**: CC BY-SA 4.0

-   **S2ID and INMET**: Public domain or institutional open terms

> The final mashup datasets produced in this project are published under
> the **CC BY 4.0** license to ensure compatibility and openness.
>
> **4. Access Limitations**
>
> All datasets are openly available without registration or access
> restrictions. No sensitive data is included. There are no diplomatic,
> military, or classified elements requiring special handling, in
> accordance with Art. 7 of the *Lei de Acesso à Informação*.
>
> **5. Economic Conditions**
>
> All data was accessed free of charge from public repositories. The
> reuse complies with the principles of non-commercial restriction and
> respects the original platforms’ terms of service. No resale or
> licensing fees are involved.
>
> **6. Temporal Aspects**
>
> Datasets vary in update frequency:

-   **INMET**: Updated daily (2024 complete)

-   **S2ID**: Updated by local governments, may have delays

-   **IBGE**: Latest complete data from 2018 census

-   **MapBiomas**: Updated annually, most recent version is from 2022

> Documentation includes date references to avoid misinterpretation and
> ensure transparent temporal alignment.
>
> **Final Note on Publication**
>
> This project adopts **best practices in open data reuse** as
> recommended
> by [<u>dados.gov.br</u>](https://dados.gov.br/dados/conteudo/saiba-como-publicar-um-reuso).
> This includes:

-   Attributing all original sources

-   Documenting transformations and derived variables

-   Maintaining transparency in all methodological choices

> Furthermore, Brazil’s Open Data Policy aligns with international
> standards such as:

-   The **International Open Data Charter**, promoting open-by-default
    > public data

-   The **Open Government Partnership (OGP)**, of which Brazil is a
    > founding member

-   The **UNESCO Open Science Recommendation** and **UN SDGs**,
    > emphasizing open data in disaster resilience

-   The **Digital Public Goods Alliance**, which recognizes open
    > geospatial and environmental data as infrastructure

> Publishing these integrated datasets under the **CC BY 4.0
> license** ensures alignment with national and global open access
> principles.

1.  Ethical Analysis

> **Data Ethics Approach**
>
> The ethical analysis of this project follows the principles of
> the [<u>Data Ethics EU
> Guidelines</u>](https://dataethics.eu/data-ethics-principles/) and
> the [<u>Open Data Institute’s Ethics
> Canvas</u>](https://theodi.org/article/the-data-ethics-canvas-2021).
> Given the project’s focus on climate-related disasters (e.g., floods)
> and their impact on human populations and ecosystems, we ensured an
> ethical handling of public data from acquisition to interpretation.
>
> Our datasets include information aggregated at both
> the **municipality** and **state (UF)** levels, which required
> specific attention to **regional equity**, **data granularity**,
> and **avoiding misinterpretation of territorial vulnerabilities**.
>
> **Data Ethics Principles**

-   **Human-Centric Design:** The analysis aims to understand how
    > urbanization, forest loss, and climate dynamics contribute to
    > disaster exposure—particularly among vulnerable communities. The
    > use of demographic indicators and disaster impact data seeks to
    > inform equitable urban and environmental policies centered on
    > human resilience.

-   **Fairness and Equity:** Aggregated indicators were used to avoid
    > identifying individuals or small groups. In comparing
    > municipalities and states, we were careful not to stigmatize
    > certain regions with higher reported disasters or exposure.
    > Instead, we emphasized structural and historical factors (e.g.,
    > infrastructure, land use) to explain disparities.

-   **Transparency:** The data sources—S2ID, INMET, MapBiomas, and
    > IBGE—are publicly accessible and cited throughout the project. All
    > transformations, filtering methods, and indicators (e.g., urban
    > growth, deforestation rate, precipitation volume) are documented
    > in a reproducible workflow.

-   **Accountability:** The project adheres to the [<u>Brazilian Open
    > Data
    > Policy</u>](https://dados.gov.br/dados/conteudo/politica-de-dados-abertos),
    > and metadata were respected for each reused dataset. Data reuse
    > policies were checked through the <u>Reuso de
    > Dados</u> guidelines. The team also ensured internal
    > responsibility for every data-cleaning or aggregation operation.

-   **Privacy and Respect for Affected Populations:** No microdata or
    > personally identifiable information was used. When handling
    > sensitive metrics (e.g., number of people affected or displaced),
    > we ensured these were aggregated at the municipal or state level.
    > Regional summaries are framed to inform risk reduction—not to
    > judge the effectiveness of local responses.

> **Ethical Concerns and Mitigation**

-   **Avoiding Sensationalism:** While 2024 saw historic flooding in
    > southern Brazil, the project refrains from spotlighting dramatic
    > figures without proper context. Instead, it contextualizes flood
    > patterns within national trends and regional vulnerabilities.

-   **Geographical Sensitivity:** Because data are available at
    > both **municipal and state level**, we took care to ensure that
    > analyses do not reinforce existing inequalities between wealthier
    > and poorer regions. Disparities in data reporting across states
    > were acknowledged and mitigated with proportional indicators.

-   **Responsibility in Interpretation:** Visualizations and analyses
    > are explicitly marked as non-causal. Correlation between
    > urbanization and flood frequency, for example, is interpreted as a
    > signal—not proof—of systemic risk.

-   **Public Engagement and Literacy:** The project includes simplified
    > graphs and charts to communicate findings to broader audiences,
    > particularly civil society, educators, and local administrators.
    > Technical notebooks and a GitHub repository are provided for
    > specialists who wish to audit or extend the methodology.

> **Final Note**
>
> The ethical safeguards adopted in this project reflect
> both **Brazilian public data governance
> standards** and **international ethical frameworks**. By treating
> both **individual and regional vulnerabilities** with respect and
> care, the project promotes the ethical reuse of public data to support
> resilience-building and informed policymaking

1.  Technical Analysis

> This technical analysis provides a comprehensive overview of the
> source datasets used in our project on hydrological disasters and
> climate vulnerability in Brazil. All datasets were evaluated in light
> of the **Brazilian Open Data Policy** and metadata expectations
> published on [<u>dados.gov.br</u>](https://dados.gov.br).
> Additionally, we applied the **FAIR principles** (Findable,
> Accessible, Interoperable, Reusable) to assess their potential for
> long-term reuse and integration.
>
> **Metadata Assessment**
>
> We classified metadata quality using the AGID-inspired model based on
> syntactic and semantic correctness, completeness, and consistency of
> documentation.

| **ID** | **Source** | **Format**  | **Metadata**                                                                                                        | **License**                      | **URI**                                                 |
|----|----------|-------|-------------------------|----------|-----------------|
| **D1** | S2ID (MDR) | .csv        | Level 2: Basic metadata and COBRADE classifications included; no persistent URI. Data structure requires cleaning.  | Open access, no explicit license | [<u>s2id.mi.gov.br</u>](https://s2id.mi.gov.br)         |
| **D2** | IBGE       | .csv, .shp  | Level 4: Fully documented, standardized geocodes, shapefiles and API endpoints provided.                            | CC BY 4.0                        | [<u>ibge.gov.br</u>](https://www.ibge.gov.br/)          |
| **D3** | MapBiomas  | .csv, .xlsx | Level 4: Rich metadata, class hierarchies and long-term collections (Collection 9). Clear documentation and schema. | CC BY-SA 4.0                     | [<u>mapbiomas.org</u>](https://mapbiomas.org)           |
| **D4** | INMET      | .csv        | Level 1: Minimal metadata; station names in filenames; no unique identifiers or descriptors.                        | Unclear                          | [<u>bdmep.inmet.gov.br</u>](https://bdmep.inmet.gov.br) |

> **FAIR Principles Evaluation**
>
> We also evaluated each dataset according to the **FAIR data
> principles** to assess their reuse potential:

| **FAIR Principle** | **Description**                                                                   | **Assessment**                                                                                                                                           |
|-------------|------------------------|------------------------------------|
| **Findable**       | Data should have globally unique, persistent identifiers and searchable metadata. | IBGE and MapBiomas datasets are fully findable via [<u>dados.gov.br</u>](https://dados.gov.br). INMET and S2ID lack persistent identifiers.              |
| **Accessible**     | Data and metadata should be retrievable using open protocols.                     | All datasets are downloadable via open protocols. IBGE and MapBiomas are most robust. INMET provides access via a portal but with minimal documentation. |
| **Interoperable**  | Data should use formal standards and shared vocabularies.                         | MapBiomas uses standard land classification codes. IBGE relies on official geographic codes. S2ID uses COBRADE, while INMET lacks standard structure.    |
| **Reusable**       | Data should include clear licenses, provenance, and rich documentation.           | IBGE and MapBiomas provide complete licensing and documentation. S2ID is open but lacks formal reuse policy. INMET does not specify license terms.       |

1.  RDF metadata assertion of the datasets

    1.  The mashup datasets produced have been described using metadata
        aligned with the DCAT Version 3 specification. This choice
        highlights our commitment to embracing advanced, interoperable
        metadata standards. By doing so, we ensure adaptability,
        semantic depth, and compatibility with modern digital systems,
        facilitating broader reuse and seamless integration across
        diverse platforms.

> In our implementation, metadata is expressed using the Resource
> Description Framework (RDF), a W3C standard that structures
> information as a set of triples—subject, predicate, and object. This
> model enables precise, machine-readable assertions about datasets,
> including their title, publisher, license, themes, spatial and
> temporal coverage, and distribution formats. RDF metadata assertions
> facilitate linking and federating datasets across different
> institutions and domains, thereby supporting the principles of Linked
> Open Data and the Semantic Web.
>
> Although we considered the Brazilian context and national open data
> policies—especially those outlined on the *Plataforma de Dados
> Abertos* (dados.gov.br)—we chose to implement DCAT Version 3. While
> national guidelines often reference earlier versions of DCAT, our
> decision allows us to leverage recent enhancements like improved
> support for multilingualism, versioning, and complex data
> distributions.
>
> This approach is fully aligned with Brazil’s legislative framework for
> transparency and open government, particularly:

-   **Lei nº 12.527/2011** (*Lei de Acesso à Informação* – LAI), which
    guarantees citizens' access to public information;

-   **Decreto nº 8.777/2016**, which establishes the *Política de Dados
    Abertos do Poder Executivo federal*, encouraging the proactive
    release of government data.

-   *Estratégia de Governo Digital* (EGD), which promotes data
    interoperability and digital innovation in public administration.

> By adopting RDF-based metadata assertions and adhering to DCAT Version
> 3, our datasets ensure semantic interoperability with both national
> and international frameworks. This supports Brazil’s broader goals of
> data transparency, accountability, and open innovation, while also
> preparing our metadata for integration into global Linked Data
> environments.

1.  Semantic Web

> The metadata for the source datasets was primarily extracted from
> their original repositories. In cases where metadata was missing or
> incomplete, additional descriptive elements were inferred and
> integrated following the same principles applied to the mashup
> datasets. For example, thematic categories were assigned based on
> controlled vocabularies inspired by recognized authority files and
> aligned with Brazilian open data classification practices.
>
> To enhance the semantic description of our datasets and ensure
> compliance with **Linked Open Data (LOD)** principles, we adopted
> several well-established ontologies, including **DCAT**, **DCTERMS**,
> **PROV**, **FOAF**, **ADMS**, **SKOS**, and **CC**. These ontologies
> offer a structured framework for metadata modeling that promotes
> interoperability, discoverability, and reusability across systems and
> platforms:
>
> **Use of Ontologies:**

-   **DCAT (Data Catalog Vocabulary)** plays a central role in
    describing datasets and catalogs on the Web. Key properties such as
    dcat:dataset associate datasets with catalogs, dcat: theme
    classifies them by subject (e.g., *Population and Society*, *civil
    protection, enviroment*), and dcat:distribution identifies available
    formats such as CSV, JSON, or RDF. Learn more at [<u>W3C
    DCAT</u>](https://www.w3.org/TR/vocab-dcat-3/).

-   **DCTERMS (Dublin Core Terms)** provides essential metadata elements
    like dcterms:title (dataset titles), dcterms:description, and
    dcterms:accessRights, which clarify whether datasets are public,
    restricted, or confidential. This ensures alignment with **Lei nº
    12.527/2011 (Lei de Acesso à Informação)** and other open government
    data policies in Brazil.

-   **ADMS (Asset Description Metadata Schema)** complements DCAT by
    enabling the description of data assets, services, and public sector
    information—playing a vital role in the management and cataloging of
    government data under initiatives like the **Política de Dados
    Abertos**, regulated by **Decreto nº 8.777/2016**.

-   **CC (Creative Commons)** vocabularies (e.g., cc:license) specify
    licensing terms for each dataset, facilitating transparency in data
    usage rights and encouraging responsible reuse, in line with
    Brazil’s open data commitments.

-   **PROV (Provenance Ontology)** and **FOAF (Friend of a Friend)** are
    used to describe data provenance and the agents (individuals or
    institutions) responsible for generating and curating the data. This
    supports traceability, accountability, and long-term data
    stewardship.

-   **SKOS (Simple Knowledge Organization System)** structures
    controlled vocabularies and taxonomies used in tagging and
    classifying datasets, improving semantic interoperability and
    searchability.

> By adopting this ontology-based approach, our metadata ensures
> compliance with both Brazilian legislation and international best
> practices. It supports the principles outlined in the **Estratégia de
> Governo Digital (EGD)** and integrates seamlessly with Brazil’s
> national open data platform,
> [<u>dados.gov.br</u>](https://dados.gov.br), fostering transparency,
> civic participation, and digital innovation.