# Analisis Data Deforestasi Pulau Sumatra Indonesia

## Tujuan
Buktikan bahwa angka deforestasi pulau sumatra lebih dari angka deforestasi seluruh daerah indonesia lainnya (Indonesia minus sumatra) selama tahun 2013-2020.

### Hipotesis
* Jumlah deforestasi per tahun seluruh provinsi di sumatra > Jumlah deforestasi per tahun seluruh provinsi selain sumatra

### Tahapan
1. Pra-pemrosesan Data
2. Analisis Data Exploratif (EDA)
3. Analisis Data Statistik (Hyphothesis Testing)

## 1. Pra-pemrosesan Data

In [1]:
# Import Library
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from scipy import stats as st
import warnings
warnings.filterwarnings('ignore')

### 1.1. Memuat Data

In [2]:
data = pd.read_excel('Indo_60_1641792743_fix.xls', sheet_name='B Indonesia')
data

Unnamed: 0,provinsi,kh_1314,apl_1314,deforestasi_1314,kh_1415,apl_1415,deforestasi_1415,kh_1516,apl_1516,deforestasi_1516,...,deforestasi_1617,kh_1718,apl_1718,deforestasi_1718,kh_1819,apl_1819,deforestasi_1819,kh_1920,apl_1920,deforestasi_1920
0,Aceh,3363.9,4284.2,7648.1,-72.1,3303.5,3231.5,12523,10548.6,23071.6,...,15515.7,3216.4,4285.8,7502.2,6737.5,4870.5,11608,1195.3,722.6,1917.9
1,Sumatera Utara,4675.2,1465.3,6140.5,15450.3,5215.4,20665.7,7907.3,1807,9714.3,...,22549.6,4255.7,3063.4,7319.1,9583.9,2818.5,12402.4,470.4,763.2,1233.6
2,Sumatera Barat,3330.3,1725.7,5056,8813.9,1685.1,10498.9,8199.1,-381.4,7817.8,...,8680,3824.1,1689.9,5514,7626,1698.9,9324.8,700.2,74.4,774.6
3,Riau,180786.5,21152.9,201939.4,124314.6,11216.1,135530.7,18365.4,5825.4,24190.9,...,6981.6,23672.3,20042.5,43714.8,136998.3,5012.9,142011.1,5672.6,893.6,6566.2
4,Jambi,-12809.6,2868.1,-9941.5,15422.4,1470.7,16893.1,24263,558.4,24821.4,...,32871.3,5994.4,3241.2,9235.6,26109.6,1272.3,27382,4035.0,451.9,4486.9
5,Sumatera Selatan,2853.5,673.7,3527.2,271032.8,19744.2,290777,3089.3,1204.8,4294.2,...,22286.6,3619.4,121.6,3741,57857.6,2797.5,60655.1,-2353.6,41.6,-2312.0
6,Bengkulu,12106.8,-37.4,12069.4,2139.6,1993.5,4133.1,1493.5,305.2,1798.8,...,5091.5,7064,642.4,7706.4,1959.4,201.2,2160.6,3022.5,337.3,3359.8
7,Lampung,197.8,-33.9,163.9,12909.9,3790.3,16700.2,1384.1,-170.5,1213.6,...,4420.3,1104.7,253.8,1358.5,182.3,443.2,625.4,384.1,79.2,463.3
8,Kepulauan Bangka Belitung,1209.7,489.7,1699.4,13594.7,5697.7,19292.3,344.8,1347.5,1692.3,...,3181.7,3559.1,186.7,3745.8,1574.3,580.2,2154.4,89.3,29.1,118.4
9,Kepulauan Riau,4.3,0.4,4.6,1191.5,130.4,1322,2651.4,-1953.7,697.7,...,5662.8,503.9,-646.4,-142.5,356.7,205.3,562,980.3,334.1,1314.4


### 1.2. Mengeksplorasi Data Awal

In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 34 entries, 0 to 33
Data columns (total 22 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   provinsi          34 non-null     object 
 1   kh_1314           34 non-null     object 
 2   apl_1314          34 non-null     object 
 3   deforestasi_1314  34 non-null     object 
 4   kh_1415           34 non-null     object 
 5   apl_1415          34 non-null     object 
 6   deforestasi_1415  34 non-null     object 
 7   kh_1516           34 non-null     object 
 8   apl_1516          34 non-null     object 
 9   deforestasi_1516  34 non-null     object 
 10  kh_1617           34 non-null     object 
 11  apl_1617          34 non-null     object 
 12  deforestasi_1617  34 non-null     object 
 13  kh_1718           34 non-null     object 
 14  apl_1718          34 non-null     object 
 15  deforestasi_1718  34 non-null     object 
 16  kh_1819           34 non-null     object 
 17 

**Deskripsi data**: 


- `provinsi` — nama provinsi di Indonesia
- `kh_1314` — kawasan hutan tahun 2013-2014
- `apl_1314` — Areal Penggunaan Lain / Bukan Kawasan Hutan 2013-2014
- `deforestasi_1314` — Total Deforestasi 2013-2014
- `kh_1415` — kawasan hutan tahun 2014-2015
- `apl_1415` — Areal Penggunaan Lain / Bukan Kawasan Hutan 2014-2015
- `deforestasi_1415` — Total Deforestasi 2014-2015
- `kh_1516` — kawasan hutan tahun 2015-2016
- `apl_1516` — Areal Penggunaan Lain / Bukan Kawasan Hutan 2015-2016
- `deforestasi_1516` — Total Deforestasi 2015-2016
- `kh_1617` — kawasan hutan tahun 2016-2017
- `apl_1617` — Areal Penggunaan Lain / Bukan Kawasan Hutan 2016-2017
- `deforestasi_1617` — Total Deforestasi 2016-2017
- `kh_1718` — kawasan hutan tahun 2017-2018
- `apl_1718` — Areal Penggunaan Lain / Bukan Kawasan Hutan 2017-2018
- `deforestasi_1718` — Total Deforestasi 2017-2018
- `kh_1819` — kawasan hutan tahun 2018-2019
- `apl_1819` — Areal Penggunaan Lain / Bukan Kawasan Hutan 2018-2019
- `deforestasi_1819` — Total Deforestasi 2018-2019
- `kh_1920` — kawasan hutan tahun 2019-2020
- `apl_1920` — Areal Penggunaan Lain / Bukan Kawasan Hutan 2019-2020
- `deforestasi_1920` — Total Deforestasi 2019-2020


**Kesimpulan sementara:**

* Data berupa object, harusnya dalam float. Hal ini diakibatkan terdapat data (-)

In [4]:
# Replcae (-) with 0
data = data.replace('-', 0)

In [5]:
# Convert all float except the first columns
def convert_to_float(df, except_col):
    float_cols = [col for col in df.columns if col != except_col]

    # Convert all columns in float_cols to float
    df[float_cols] = df[float_cols].astype(float)
    
    return df

In [6]:
data = convert_to_float(data, 'provinsi')

In [7]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 34 entries, 0 to 33
Data columns (total 22 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   provinsi          34 non-null     object 
 1   kh_1314           34 non-null     float64
 2   apl_1314          34 non-null     float64
 3   deforestasi_1314  34 non-null     float64
 4   kh_1415           34 non-null     float64
 5   apl_1415          34 non-null     float64
 6   deforestasi_1415  34 non-null     float64
 7   kh_1516           34 non-null     float64
 8   apl_1516          34 non-null     float64
 9   deforestasi_1516  34 non-null     float64
 10  kh_1617           34 non-null     float64
 11  apl_1617          34 non-null     float64
 12  deforestasi_1617  34 non-null     float64
 13  kh_1718           34 non-null     float64
 14  apl_1718          34 non-null     float64
 15  deforestasi_1718  34 non-null     float64
 16  kh_1819           34 non-null     float64
 17 

In [8]:
# Check statistical data
data.describe()

Unnamed: 0,kh_1314,apl_1314,deforestasi_1314,kh_1415,apl_1415,deforestasi_1415,kh_1516,apl_1516,deforestasi_1516,kh_1617,...,deforestasi_1617,kh_1718,apl_1718,deforestasi_1718,kh_1819,apl_1819,deforestasi_1819,kh_1920,apl_1920,deforestasi_1920
count,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,...,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0
mean,17207.879412,6166.911765,23374.758824,47976.905882,16269.052941,64245.97,25368.602941,11641.794118,37010.408824,17481.370588,...,28235.926471,13136.697059,12712.658824,25849.355882,22109.802941,5093.635294,27203.432353,3940.923529,2850.829412,6791.752941
std,58144.945266,18556.230151,75302.626538,146450.573955,47894.051449,191640.3,74398.26166,35679.151092,109085.989675,52566.481337,...,84059.14575,38097.362435,37593.781952,75472.700559,67503.976902,15180.046703,81727.066273,11478.979284,8412.340034,19703.945237
min,-12809.6,-3901.0,-14880.5,-72.1,0.0,0.0,0.0,-1953.7,0.0,-8915.4,...,-14802.7,-1683.9,-646.4,-1369.5,-229.2,0.0,-228.8,-2353.6,0.0,-2312.0
25%,117.225,5.35,114.15,1755.875,1498.225,4333.2,1411.45,0.0,1333.275,0.0,...,0.0,1329.7,395.4,1797.775,571.2,189.05,787.0,118.15,36.125,142.725
50%,1682.55,778.5,2477.2,7259.9,3472.7,11225.2,6452.75,1577.25,7636.65,3081.6,...,5377.15,3589.25,2303.55,6853.8,2125.9,1233.6,4384.6,858.7,363.6,1462.1
75%,5062.725,2636.65,7357.45,14965.475,10934.45,21198.65,17785.425,5668.675,24663.775,15085.1,...,21349.125,8443.65,8480.85,16859.575,9094.425,3285.225,12203.8,3160.425,2098.275,6841.8
max,292533.9,104837.9,397370.9,815607.5,276574.0,1092182.0,431266.3,197910.6,629176.9,297183.2,...,480010.8,223323.9,216115.2,439439.1,375866.7,86591.8,462458.5,66995.7,48464.1,115459.8


**Kesimpulan:**

* Data telah diubah menjadi float.
* Tidak ada data yang hilang.
* Tidak ada data yang duplikat.

## 2. Analisis Data Exploratif (EDA)

In [9]:
data['provinsi'].unique()

array(['Aceh', 'Sumatera Utara', 'Sumatera Barat', 'Riau', 'Jambi',
       'Sumatera Selatan', 'Bengkulu', 'Lampung',
       'Kepulauan Bangka Belitung', 'Kepulauan Riau', 'DKI Jakarta',
       'Jawa Barat', 'Jawa Tengah', 'DI Yogyakarta', 'Jawa Timur',
       'Banten', 'Bali', 'Nusa Tenggara Barat', 'Nusa Tenggara Timur',
       'Kalimantan Barat', 'Kalimantan Tengah', 'Kalimantan Selatan',
       'Kalimantan Timur dan Kalimantan Utara ', 'Sulawesi Utara',
       'Sulawesi Tengah', 'Sulawesi Selatan', 'Sulawesi Tenggara',
       'Gorontalo', 'Sulawesi Barat', 'Maluku', 'Maluku Utara',
       'Papua Barat', 'Papua', 'INDONESIA'], dtype=object)

Provinsi yang ada di Sumatera adalah : Aceh, Sumatera Utara, Sumatera Barat, Riau, Jambi, Sumatera Selatan, Bengkulu, Lampung, Kepulauan Bangka Belitung, dan Kepulauan Riau.

In [10]:
# Take sumatera province only
sumatera_data = data.iloc[0:10]

In [11]:
# Take only total deforestasi data and add new sum column
sumatera_data_deforestasi = sumatera_data.filter(like='deforestasi')
sumatera_data_deforestasi.insert(0, 'provinsi', data['provinsi'])
sumatera_data_deforestasi['sum'] = sumatera_data_deforestasi.iloc[:, 1:].sum(axis=1)

In [12]:
sumatera_data_deforestasi

Unnamed: 0,provinsi,deforestasi_1314,deforestasi_1415,deforestasi_1516,deforestasi_1617,deforestasi_1718,deforestasi_1819,deforestasi_1920,sum
0,Aceh,7648.1,3231.5,23071.6,15515.7,7502.2,11608.0,1917.9,70495.0
1,Sumatera Utara,6140.5,20665.7,9714.3,22549.6,7319.1,12402.4,1233.6,80025.2
2,Sumatera Barat,5056.0,10498.9,7817.8,8680.0,5514.0,9324.8,774.6,47666.1
3,Riau,201939.4,135530.7,24190.9,6981.6,43714.8,142011.1,6566.2,560934.7
4,Jambi,-9941.5,16893.1,24821.4,32871.3,9235.6,27382.0,4486.9,105748.8
5,Sumatera Selatan,3527.2,290777.0,4294.2,22286.6,3741.0,60655.1,-2312.0,382969.1
6,Bengkulu,12069.4,4133.1,1798.8,5091.5,7706.4,2160.6,3359.8,36319.6
7,Lampung,163.9,16700.2,1213.6,4420.3,1358.5,625.4,463.3,24945.2
8,Kepulauan Bangka Belitung,1699.4,19292.3,1692.3,3181.7,3745.8,2154.4,118.4,31884.3
9,Kepulauan Riau,4.6,1322.0,697.7,5662.8,-142.5,562.0,1314.4,9421.0


In [13]:
# Take all province except sumatera province
outside_sumatera_data = data.iloc[10:33]

In [14]:
outside_sumatera_data

Unnamed: 0,provinsi,kh_1314,apl_1314,deforestasi_1314,kh_1415,apl_1415,deforestasi_1415,kh_1516,apl_1516,deforestasi_1516,...,deforestasi_1617,kh_1718,apl_1718,deforestasi_1718,kh_1819,apl_1819,deforestasi_1819,kh_1920,apl_1920,deforestasi_1920
10,DKI Jakarta,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,-0.8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
11,Jawa Barat,-10979.5,-3901.0,-14880.5,1750.5,4309.5,6060.0,0.0,0.0,0.0,...,-188.0,2598.9,3789.6,6388.5,3782.4,1194.9,4977.4,0.0,0.0,0.0
12,Jawa Tengah,-79.2,-15.1,-94.3,1589.5,3641.9,5231.4,0.0,0.0,0.0,...,0.0,2004.7,1110.8,3115.6,1859.6,44.5,1904.0,0.0,0.0,0.0
13,DI Yogyakarta,0.9,31.9,32.8,51.8,0.0,51.8,0.0,0.0,0.0,...,-270.2,2.4,587.4,589.8,141.4,185.0,326.4,0.0,0.0,0.0
14,Jawa Timur,5452.2,2044.9,7497.1,3621.0,4128.1,7749.2,0.0,0.0,0.0,...,0.0,3298.6,5511.7,8810.3,5066.4,738.3,5804.7,0.0,0.0,0.0
15,Banten,-237.3,-28.7,-266.1,1282.2,529.2,1811.5,0.0,0.0,0.0,...,-5262.1,-523.6,240.6,-283.1,67.3,16.1,83.4,0.0,34.3,34.3
16,Bali,110.0,20.2,130.2,77.5,0.0,77.5,4096.0,3359.5,7455.5,...,234.2,99.5,177.4,276.9,89.3,1.7,91.0,43.8,1.2,45.0
17,Nusa Tenggara Barat,0.0,0.0,0.0,8896.4,186.8,9083.2,12561.9,9800.5,22362.4,...,-14802.7,7065.6,3170.7,10236.4,12382.4,3589.4,15971.9,10571.9,2436.0,13007.9
18,Nusa Tenggara Timur,138.9,-30.0,108.8,2962.7,10969.3,13932.0,0.0,0.0,0.0,...,0.0,6037.1,11652.9,17689.9,1279.2,2233.6,3512.8,2579.3,5710.2,8289.5
19,Kalimantan Barat,7872.8,22046.8,29919.6,5754.0,34723.2,40477.2,60146.3,64809.7,124956.0,...,19296.9,13157.7,29133.7,42291.4,15109.1,8985.9,24095.0,6546.6,9787.9,16334.5


In [15]:
# Take only total deforestasi data and add new sum column
outside_sumatera_data_deforestasi = outside_sumatera_data.filter(like='deforestasi')
outside_sumatera_data_deforestasi.insert(0, 'provinsi', data['provinsi'])
outside_sumatera_data_deforestasi['sum'] = outside_sumatera_data_deforestasi.iloc[:, 1:].sum(axis=1)

In [16]:
outside_sumatera_data_deforestasi

Unnamed: 0,provinsi,deforestasi_1314,deforestasi_1415,deforestasi_1516,deforestasi_1617,deforestasi_1718,deforestasi_1819,deforestasi_1920,sum
10,DKI Jakarta,0.0,0.0,0.0,-0.8,0.0,0.0,0.0,-0.8
11,Jawa Barat,-14880.5,6060.0,0.0,-188.0,6388.5,4977.4,0.0,2357.4
12,Jawa Tengah,-94.3,5231.4,0.0,0.0,3115.6,1904.0,0.0,10156.7
13,DI Yogyakarta,32.8,51.8,0.0,-270.2,589.8,326.4,0.0,730.6
14,Jawa Timur,7497.1,7749.2,0.0,0.0,8810.3,5804.7,0.0,29861.3
15,Banten,-266.1,1811.5,0.0,-5262.1,-283.1,83.4,34.3,-3882.1
16,Bali,130.2,77.5,7455.5,234.2,276.9,91.0,45.0,8310.3
17,Nusa Tenggara Barat,0.0,9083.2,22362.4,-14802.7,10236.4,15971.9,13007.9,55859.1
18,Nusa Tenggara Timur,108.8,13932.0,0.0,0.0,17689.9,3512.8,8289.5,43533.0
19,Kalimantan Barat,29919.6,40477.2,124956.0,19296.9,42291.4,24095.0,16334.5,297370.6


**Kesimpulan:**

* Data provinsi sumatera dan luar sumatera telah dibuat.
* Dibuat pula data Jumlah total deforestasi pertahun.

## 3. Analisis Data Statistik (Hypothesis Testing)

### 3.1. Menguji Hipotesis

**Hipotesis Pertama:**

**Jumlah deforestasi per tahun seluruh provinsi di sumatra > Jumlah deforestasi per tahun seluruh provinsi selain sumatra.**

Dalam uji variansi, jika rasio sampel dengan varians yang lebih besar dibandingkan dengan sample yang variansya lebih kecil <4, maka bisa kita assumsikan varians keduanya setara. Atau bisa menggunakan uji levene dimana H0 nya adalah kedua grup memiliki varians setara dan H1 nya adalah kedua grup memiliki varians yang berbeda. Jika di hasil levene test menunjukkan p-value < dari critical value (alpha) maka kita bisa menolak H0, dan sebaliknya.

In [18]:
# Create a sample for levene test
sample_1 = outside_sumatera_data_deforestasi['sum']
sample_2 = sumatera_data_deforestasi['sum']

In [19]:
# Levene test
levene_test = st.levene(sample_1, sample_2)

print(levene_test)
if levene_test.pvalue > 0.05:
    print("Tidak ada bukti untuk menolak hipotesis bahwa varians kedua kelompok sama.")
else:
    print("Hipotesis bahwa varians kedua kelompok berbeda signifikan.")

LeveneResult(statistic=0.12611726344437793, pvalue=0.7248971700965459)
Tidak ada bukti untuk menolak hipotesis bahwa varians kedua kelompok sama.


Dikarenakan p-value > dari critical value (alpha = 0.05) maka kita tidak bisa menolak H0 dan kedua sampel dianggap memiliki varians yang sama (equal_var = True)

* H0 (Null Hypothesis) : Jumlah deforestasi per tahun seluruh provinsi di sumatra > Jumlah deforestasi per tahun seluruh provinsi selain sumatra.
* H1 (Alternative Hypothesis) : Jumlah deforestasi per tahun seluruh provinsi di sumatra = Jumlah deforestasi per tahun seluruh provinsi selain sumatra

In [20]:
# ttest_ind for sample 1 and 2
alpha = 0.05

results = st.ttest_ind(sample_1, sample_2, equal_var=True)

print('p-value: ', results.pvalue)

if results.pvalue < alpha:
    print("Kita menolak hipotesis nol")
else:
    print("Kita tidak dapat menolak hipotesis nol")

p-value:  0.5485576567086605
Kita tidak dapat menolak hipotesis nol


**Kesimpulan:**

Jumlah deforestasi per tahun seluruh provinsi di sumatra > Jumlah deforestasi per tahun seluruh provinsi selain sumatra adalah **BENAR**

## Kesimpulan Umum

#### A. Pra-pemrosesan

Dari eksplorasi pra-pemrosesan data, dapat disimpulkan bahwa:
1. Dataframe menyangkut data deforestasi hutan di Indonesia berdasarkan provinsi pertahun 2013-2020.
2. Missing value berupa data (-) diubah menjadi 0 dan data duplikat tidak ditemukan.

#### B. Analisis Data Esksploratif (EDA)

Dari hasil analisis data esksploratif, dapat disimpulkan bahwa:
1. Data provinsi sumatera dilakukan slicing dan dibuat menjadi data baru. Data baru menyangkut total deforestasi pertahun.
2. Telah dibuat Jumlah total deforestasi pertrahun.
3. Data selain provinsi sumatera juga dibuat.

#### C. Analisis Data Statistik (Hypothesis Testing)

Dari hasil analisis pengujian hipotesis, dapat disimpulkan bahwa:
1. Hasil yang didapat dari uji hipotesis pertama yaitu Jumlah deforestasi per tahun seluruh provinsi di sumatra > Jumlah deforestasi per tahun seluruh provinsi selain sumatra adalah benar. Hipotesis tersebut tidak dapat ditolak.