# Estimating deforestation in the Amazon

The goal is to estimate the fraction of the Amazon rainforest lost between 2000 and 2015. The data contains gold-standard deforestation labels for parcels that were collected through field visits (1), as well as predictions of forest cover based on applying computer vision to satellite imagery (2).

1. E. L. Bullock, C. E. Woodcock, C. Souza Jr, P. Olofsson, Satellite‐based estimates reveal widespread forest degradation in the Amazon. Global Change Biology 26(5), 2956–2969 (2020).
2. J. O. Sexton, J. X-P. Song, M. Feng, P. Noojipady, A. Anand, C. Huang, D-H. Kim, K. M. Collins, S. Channan, C. DiMiceli, J. R. Townshend, Global, 30-m resolution continuous fields of tree cover: Landsat-based rescaling of MODIS vegetation continuous fields with lidar-based estimates of error. International Journal of Digital Earth 6(5), 427–448 (2013).

### Import necessary packages

In [1]:
import numpy as np
from datasets import load_dataset
from FL_cpp_method import analyze_dataset, plot_cpp

In [2]:
# 示例调用
dataset_name = "forest"
data = load_dataset('../data/', dataset_name)
Y_total = data["Y"]
Yhat_total = data["Yhat"]

alpha = 0.05

method = "mean"

dataset_dist = 'IID'
# dataset_dist = 'Non-IID'

# num_ratio = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
num_ratio = [1, 1, 1, 1, 1]  # 数据量分布平衡
# num_ratio = [1,2,3,4,5]  # 数据量分布不平衡
# num_ratio = [5,4,3,2,1]  # 数据量分布不平衡

# 计算标注真实值、各节点上、组合数据后和FL后的平均值cpp
true_theta, cpp_intervals, ppi_ci_combined, mean_cpp = analyze_dataset(alpha, None, Y_total, Yhat_total, dataset_dist,
                                                                            num_ratio, method, grid=None)
# 画图
file_name = dataset_dist + '-' + dataset_name + '.pdf'
xlim = [-0.1, 0.5]
ylim = [0, 1.0]
title = "fraction of areas deforested"
plot_cpp(true_theta, cpp_intervals, ppi_ci_combined, mean_cpp, file_name, xlim, ylim, title)

分组： 1
带标签的样本量： 32
不带标签的样本量： 288
分组： 2
带标签的样本量： 31
不带标签的样本量： 288
分组： 3
带标签的样本量： 31
不带标签的样本量： 288
分组： 4
带标签的样本量： 31
不带标签的样本量： 288
分组： 5
带标签的样本量： 31
不带标签的样本量： 288
imputed var: [9.77475999e-06]
rectifier var [0.00085123]
带标签的样本量： 156
不带标签的样本量： 1440

最终结果：
真实 theta: 0.15162907268170425
CPP intervals: [array([0.10265189, 0.39930464]), array([0.01110186, 0.21075008]), array([0.01941744, 0.2187347 ]), array([0.15040467, 0.43878369]), array([0.0753131 , 0.34892104])]
组合数据的置信区间: [0.1459373  0.26310118]
联邦聚合后的置信区间: [0.14006069 0.2550829 ]
