<a href="https://colab.research.google.com/github/alekswheeler/global-_species_abundance_and_diversity/blob/main/AquecimentoGlobalEAbumdanciaDeEspecies.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introdução

<p>We describe below the data and provide an overview of the specific variables that are constructed for the analysis in the paper entitled, “Estimating Extinction Risks with Species Occurrence Data from the Global Biodiversity Information Facility” by Susmita Dasgupta, Brian Blankespoor, and David Wheeler (2024).</p><p><br></p><p><strong>Variables:</strong></p><p><u>Species</u></p><p>·&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;species = species with occurrence data. GBIF and authors’ calculation.</p><p>·&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;d_assignedrisk = GBIF species with IUCN Red-List (Version 2022-2). Yes==1, No==0</p><p><u>Threats</u></p><p>·&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;parkpct = formal protection index. To compute the formal protection index <em>parkpct</em>, we have transformed the shapefile from the World Database of Protected Areas (UNEP-WCMC 2019) (which includes 283,568 polygons) into a global raster with a resolution of .05 decimal degrees (about 5 km). Each raster cell has value 1 if it includes a protected area and 0 otherwise.</p><p>·&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;popdens = population density. We measure density with a spatial raster at 2.5 min resolution (.042 decimal degrees) from the Gridded Population of the World (GPW), v4 (SEDAC/CIESIN, 2023). Using the previously-described population raster, we assign each terrestrial raster cell to one of 15 scaled population group <em>Popdens</em>.</p><p>·&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;eezpct = percent of the total occurrence area of a species that lies within an EEZ. We employ the Extended Economic Zones (EEZ) boundary shapefile from the Maritime Regions Geodatabase maintained by the Flanders Marine Institute (2019). For each species, we construct an EEZ coverage index as the percent of its total occurrence area that lies within an EEZ : <em>eezpct</em>.</p><p>·&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;popshadow = Coastal Population Influence. We have constructed a general index of population influence with a spatial kriging algorithm that replaces offshore missing values in our global terrestrial population raster with projected values from proximate onshore populations.</p><p>·&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;totfishing = total fishing intensity [Fishing Intensity (1)]. AIS-Based (Automatic Identification System) Global Fishing Effort from Global Fishing Watch (e.g. Kroodsma et al. 2018).</p><p>·&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;natfishing = nature total fishing intensity [Fishing Intensity (2)]. The estimates of fishing intensity are from high-resolution satellite imagery from Global Fishing Watch along with collaboration with several research institutions (Paolo et al. 2024).</p><p>·&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;pthreat = predicted threat probability index (0-100).</p><p>·&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;popgroupmax = Max population group (1:15)</p><p><br></p><p><u>References:</u></p><p>Dasgupta, S., B. Blankespoor, and D. Wheeler. 2023. Revisiting Global Biodiversity:&nbsp;A Spatial Analysis of Species Occurrence Data from the Global Biodiversity Information Facility.&nbsp;World Bank, September.</p><p>Flanders Marine Institute. 2019. Maritime Boundaries Geodatabase: Maritime Boundaries and Exclusive Economic Zones (200NM), version 11. Available online at https://www.marineregions.org/. <a href="https://doi.org/10.14284/386" rel="noopener noreferrer" target="_blank">https://doi.org/10.14284/386</a></p><p>Global Fishing Watch. https://globalfishingwatch.org/dataset-and-code-fishing-effort/</p><p>Kroodsma, D. A., Mayorga, J., Hochberg, T., Miller, N. A., Boerder, K., Ferretti, F., ... &amp; Worm, B. (2018). Tracking the global footprint of fisheries.&nbsp;<em>Science</em>,&nbsp;<em>359</em>(6378), 904-908.</p><p>&nbsp;</p><p>IUCN. 2022.&nbsp;The IUCN Red List of Threatened Species. Version 2022-2. <a href="https://www.iucnredlist.org/" rel="noopener noreferrer" target="_blank">https://www.iucnredlist.org</a>. Accessed on [9 May 2023] at <a href="https://doi.org/10.15468/0qnb58" rel="noopener noreferrer" target="_blank">https://doi.org/10.15468/0qnb58</a></p><p>&nbsp;</p><p>SEDAC/CIESIN (NASA Socioeconomic Data and Applications Center/Center for International Earth Science Information Network, Columbia University). 2023. gpw_v4_population_count_adjusted_to_2015_unwpp_country_totals_rev11_2020_2pt5_min.tif</p><p>&nbsp;</p><p>Paolo, F., D. Kroodsma, J. Raynor et al. 2024. Satellite mapping reveals extensive industrial activity at sea.&nbsp;Nature, 625: 85–91.</p><p>UNEP-WCMC. 2019. User Manual for the World Database on Protected Areas and world database on other effective area-based conservation measures: 1.6. UNEP-WCMC: Cambridge, UK. Available at: http://wcmc.io/WDPA_Manual</p>

TODO:

  * Entender nosso conjunto de dados

    para ver se tem fit com o problema que queremos tratar
    
  * Definir o problema

Tarefas: Achar uma maneira de incrementar os nossos dados através do GBIF (dados temporais ou ocorrência por ano)


## Definição do problema

## Descrição do conjunto de dados

In [1]:
import os
import zipfile
import requests

# 🔗 URL do Dropbox (modificada para permitir download direto)
dropbox_url = "https://www.dropbox.com/s/z2govg67jp3jn75/archive.zip?dl=1"

# 📂 Nome do arquivo ZIP localmente no Colab
zip_path = "dataset.zip"

# ⬇️ Baixa o arquivo ZIP do Dropbox
response = requests.get(dropbox_url)
with open(zip_path, "wb") as f:
    f.write(response.content)

print("✅ Download concluído!")

# 📂 Pasta de extração
extract_folder = "/content/dataset_csvs"
os.makedirs(extract_folder, exist_ok=True)

# 🔄 Extrai todos os arquivos CSV do ZIP
with zipfile.ZipFile(zip_path, "r") as zip_ref:
    zip_ref.extractall(extract_folder)

# 🔍 Lista os arquivos extraídos
csv_files = [f for f in os.listdir(extract_folder) if f.endswith(".csv")]

print("\n📂 Arquivos CSV extraídos:")
for file in csv_files:
    print(f"- {file}")

print("\n✅ Extração concluída! Todos os CSVs estão em:", extract_folder)


✅ Download concluído!

📂 Arquivos CSV extraídos:
- BioTIMECitations_24_06_2021.csv
- BioTIMEMetadata_24_06_2021.csv
- BioTIMEQuery_24_06_2021.csv

✅ Extração concluída! Todos os CSVs estão em: /content/dataset_csvs


In [10]:
with open("/content/dataset_csvs/BioTIMEMetadata_24_06_2021.csv", "rb") as f:
    result = chardet.detect(f.read(100000))  # Lê os primeiros 100 KB
    print(result["encoding"])

ascii


In [2]:
import pandas as pd

csv_path = "/content/dataset_csvs/BioTIMEQuery_24_06_2021.csv"  # Substitua pelo nome real do arquivo
bio_time_query_df = pd.read_csv(csv_path)

print(bio_time_query_df.head())  # Exibe as primeiras linhas

  bio_time_query_df = pd.read_csv(csv_path)


   Unnamed: 0  STUDY_ID  DAY  MONTH  YEAR  \
0           1        10  NaN    NaN  1984   
1           2        10  NaN    NaN  1984   
2           3        10  NaN    NaN  1984   
3           4        10  NaN    NaN  1984   
4           5        10  NaN    NaN  1984   

                                SAMPLE_DESC PLOT  ID_SPECIES  LATITUDE  \
0  47.400000_-95.120000_12_Control_0_Medium   12          22      47.4   
1  47.400000_-95.120000_12_Control_0_Medium   12          23      47.4   
2  47.400000_-95.120000_12_Control_0_Medium   12          24      47.4   
3  47.400000_-95.120000_12_Control_0_Medium   12         607      47.4   
4   47.400000_-95.120000_12_Control_0_Small   12        1911      47.4   

   LONGITUDE  sum.allrawdata.ABUNDANCE  sum.allrawdata.BIOMASS    GENUS  \
0     -95.12                       1.0                     0.0     Acer   
1     -95.12                       3.0                     0.0     Acer   
2     -95.12                       1.0                     

In [3]:
import pandas as pd

csv_path = "/content/dataset_csvs/BioTIMEMetadata_24_06_2021.csv"  # Substitua pelo nome real do arquivo
bio_time_metadata_df = pd.read_csv(csv_path)

print(bio_time_metadata_df.head())  # Exibe as primeiras linhas

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 221216: invalid start byte

# Metodologia

## Pré processamento

## Visualização de dados

## Classificação

# Conclusão

## Discussãocação dos Resultados

# Video


Um vídeo (de aproximadamente 5 minutos) descrevendo o trabalho e os resultados. Pretendo divulgar esse vídeo para os demais colegas da disciplina e para os professores do DI. **Os vídeos são apenas para facilitar a correção e não serão publicados!**

# Referências

https://www.kaggle.com/datasets/thedevastator/global-species-abundance-and-diversity