## Check the suitability of Gumbel distribution to daily wind data

In [None]:
import numpy as np
import pandas as pd
import scipy as sp

In [None]:
import folium

### First, a matrix is created to store the results of the Anderson-Darling analysis recap, the matrix is called 'rekap'

In [None]:
rekap = pd.DataFrame(data = np.zeros((5,2)),columns = ['AD value','critical value'],
                    index = ['1 bulan','2 bulan','3 bulan', '4 bulan','6 bulan'])
rekap

# Location of Evaluation Station 1

The station that was evaluated first was the AWS Mekarsari Wind Station in Depok, with the exact location as follows

In [None]:
peta = folium.Map(location=[-6.41882,107.5],zoom_start=8.4,tiles = 'Stamen Terrain')
folium.Marker([-6.41882,106.9832],
              popup='<strong>Stasiun AWS Mekarsari</strong>',
             tooltip = 'Stasiun AWS Mekarsari').add_to(peta)
peta

In [None]:
def marks(map,coord):
    folium.CircleMarker(coord,radius=0.3,color = 'red').add_to(map)

marks(peta,[-6.4,107])
marks(peta,[-6.4,106.9])
marks(peta,[-6.4,107.1])
marks(peta,[-6.5,107])
marks(peta,[-6.5,107.1])
marks(peta,[-6.5,106.9])
marks(peta,[-6.3,107])
peta

In [None]:
folium.Marker([-7.368609,108.11317],
              popup='<strong>Stasiun AWS Tasikmalaya</strong>',
             tooltip = 'Stasiun AWS Tasikmalaya').add_to(peta)

In [None]:
marks(peta,[-7.3,108])
marks(peta,[-7.4,108])
marks(peta,[-7.3,108.1])
marks(peta,[-7.4,108.1])
marks(peta,[-7.5,108.1])
marks(peta,[-7.3,108.2])
marks(peta,[-7.4,108.2])
peta

### For presentation purposes, the data analysis section displays an overview of the data

In [None]:
peta = folium.Map(location=[-7.25,107.0],zoom_start=8.4,tiles = 'Stamen Terrain')
peta

In [None]:
def marks(map,coord):
    folium.CircleMarker(coord,radius=0.3,color = 'red').add_to(map)

In [None]:
res = 0.1
longitude = 105.0
for i in range(40):
    latitude = -9.0
    for j in range(35):
        marks(peta,[latitude,longitude])
        latitude = latitude+res
    longitude = longitude+res
    
peta

Importing daily maximum data from Excel to Python DataFrame

In [None]:
df = pd.read_csv('data\STA2042.csv',header = None)
df.columns = ['tgl1','tgl2','jam','windspeed']

Discarding some unmeasured data on certain days, and discard redundant columns ; and it can be seen from the output that there are 1325 rows of data remaining and 2 columns

In [None]:
df = df.dropna()
df = df.drop(columns = ['jam','tgl2'])
df.shape

In [None]:
df.iloc[:,0] = pd.to_datetime(df['tgl1'],infer_datetime_format = True)
df = df.set_index('tgl1')

Below, the process of evaluating the maximum data per 1 month (1 MS = 1 Months) available from the DataFrame object

In [None]:
dfmaxperM = df.windspeed.resample("MS").agg(['max','count'])
dfinvalid = dfmaxperM['count'] < 15
dfmaxperM = dfmaxperM['max']
dfmaxperM[dfinvalid] = np.nan
dfmaxperM = dfmaxperM.dropna()
dfmaxperM_sort = dfmaxperM.sort_values(ascending = True)
dfmaxperM_sort.head(15)

The following is the Anderson-Darling analysis process whose tools have been provided by Python with the function **scipy.stats.anderson**

**If the AD value > Critical Value, it is concluded that the distribution type is not suitable for Gumbel distribution**, the significance value used is constant at 5%

In [None]:
import scipy.stats
def testAD(df):
    dftinjau = df.astype(np.float64)
    dftinjau = dftinjau.to_list()
    a,b,c = sp.stats.anderson(dftinjau,dist='gumbel_r')
    stat_value = a
    crit_value = b[2]
    sig_value = c[2]
    return stat_value,crit_value,sig_value

a,b,c = testAD(dfmaxperM_sort)
rekap.iloc[0,:] = [a,b]
rekap

### Plotting is performed to see the visualization of the distribution of data, the maximum value is taken per 1 month

### This plotting mechanism is used as a function to make it easier to call for further experiments

In [None]:
#Test Plotting
import matplotlib.pyplot as plt
def letsplot(dfs,frequency):
    #Defining Properties
    calib = pd.DataFrame(data = np.zeros(len(dfs)),columns = ['rank'])
    calib['rank'] = np.arange(1,len(calib)+1)
    calib['gringorten'] = (calib['rank']-0.44)/(len(calib)+0.12)
    calib['y'] = -np.log(-np.log(calib['gringorten']))
    calib.head()
    calib['y'].dtypes

    #Plotting
    plt.figure(figsize=(15,5))
    plt.scatter(calib['y'],dfs.iloc[:,0],color = 'red',label = 'Observasi')
    m,b = np.polyfit(calib['y'],dfs.iloc[:,0],1)
    plt.title('Kecepatan Angin Desain, dengan frekuensi data maksimum per ' + str(frequency))
    plt.plot(calib['y'],(m*calib['y']+b),label = 'Estimasi')
    plt.xlabel('Reduced Variate')
    plt.ylabel('Design Wind Speed (m/s)')
    plt.legend(fontsize='x-large')
    plt.show()

letsplot(dfmaxperM_sort.to_frame().astype(np.float64),'1 bulan')

**Seen in the graph above, the distribution seems "deviated" from the regression line which is the *best fit* of the Gumbel distribution**, if the maximum data taken is 1-month

Next, try to take a maximum data sample of 2 months (2 MS = 2 Months)

In [None]:
dfmaxperM = df.windspeed.resample("2MS").agg(['max','count'])
dfinvalid = dfmaxperM['count'] < 15
dfmaxperM = dfmaxperM['max']
dfmaxperM[dfinvalid] = np.nan
dfmaxperM = dfmaxperM.dropna()
dfmaxperM_sort = dfmaxperM.sort_values(ascending = True)

In [None]:
a,b,c = testAD(dfmaxperM_sort)
rekap.iloc[1,:] = [a,b]
rekap

In [None]:
letsplot(dfmaxperM_sort.to_frame().astype(np.float64),'2 bulan')

It also looks very deviated from the *best fit* regression of the Gumbel distribution, the AD value is also > critical value

Then try if the maximum data is per 3 months

In [None]:
dfmaxperM = df.windspeed.resample("3MS").agg(['max','count'])
dfinvalid = dfmaxperM['count'] < 15
dfmaxperM = dfmaxperM['max']
dfmaxperM[dfinvalid] = np.nan
dfmaxperM = dfmaxperM.dropna()
dfmaxperM_sort = dfmaxperM.sort_values(ascending = True)

a,b,c = testAD(dfmaxperM_sort)
rekap.iloc[2,:] = [a,b]
rekap

In [None]:
letsplot(dfmaxperM_sort.to_frame().astype(np.float64),'3 bulan')

For this tri-monthly data, the AD value is declared **accepted** for the Gumbel Distribution, although the value is slightly different from the critical value.

In [None]:
dfmaxperM = df.windspeed.resample("4MS").agg(['max','count'])
dfinvalid = dfmaxperM['count'] < 15
dfmaxperM = dfmaxperM['max']
dfmaxperM[dfinvalid] = np.nan
dfmaxperM = dfmaxperM.dropna()
dfmaxperM_sort = dfmaxperM.sort_values(ascending = True)

a,b,c = testAD(dfmaxperM_sort)
rekap.iloc[3,:] = [a,b]
rekap

In [None]:
letsplot(dfmaxperM_sort.to_frame().astype(np.float64),'4 bulan')

For this 4-monthly data, it was again declared **not acceptable** categorized as Gumbel Distribution

In [None]:
dfmaxperM = df.windspeed.resample("6MS").agg(['max','count'])
dfinvalid = dfmaxperM['count'] < 15
dfmaxperM = dfmaxperM['max']
dfmaxperM[dfinvalid] = np.nan
dfmaxperM = dfmaxperM.dropna()
dfmaxperM_sort = dfmaxperM.sort_values(ascending = True)

a,b,c = testAD(dfmaxperM_sort)
rekap.iloc[4,:] = [a,b]
rekap

In [None]:
letsplot(dfmaxperM_sort.to_frame().astype(np.float64),'6 bulan')

For 6-month data, it is also concluded **not accepted** in the Gumbel Distribution category

In [None]:
x = range(5)
y = np.array(rekap.iloc[:,0])
z = np.array(rekap.iloc[:,1])
labels = ['1 bulan','2 bulan','3 bulan','4 bulan', '6 bulan']
plt.figure(figsize=(15,5))
plt.xticks(x, labels)
plt.plot(x, y)
plt.plot(x,z)
plt.legend(['AD Value','Critical Limit Value'])
plt.title('Rekap Hasil Test AD untuk interval data bervariasi di AWS Mekarsari',
          fontsize = 15)
plt.fill_between(x,0, z,color = 'green',alpha = 0.2)
plt.fill_between(x, z, 3,color='red', alpha = 0.2)
plt.show()

## Conclusion

### *Overall*, data at AWS Mekarsari Station only deserves to be declared Gumbel Distribution if the maximum data is taken every 3 months

### Next, check the daily data for AWS Tasikmalaya Station, which is located at the following coordinates:

In [None]:
peta = folium.Map(location=[-7.368609,108.11317],zoom_start=10)
folium.Marker([-7.368609,108.11317],
              popup='<strong>Stasiun AWS Tasikmalaya</strong>',
             tooltip = 'Stasiun AWS Tasikmalaya').add_to(peta)
#peta

In [None]:
marks(peta,[-7.3,108])
marks(peta,[-7.4,108])
marks(peta,[-7.3,108.1])
marks(peta,[-7.4,108.1])
marks(peta,[-7.5,108.1])
marks(peta,[-7.3,108.2])
marks(peta,[-7.4,108.2])
#peta

In [None]:
rekap = pd.DataFrame(data = np.zeros((5,2)),columns = ['AD value','critical value'],
                    index = ['1 bulan','2 bulan','3 bulan', '4 bulan','6 bulan'])
rekap

In [None]:
def marks(map,coord):
    folium.Marker(coord,icon = folium.Icon(icon ='flag')).add_to(map)

marks(peta,[-6.4,107])
marks(peta,[-6.4,106.9])
marks(peta,[-6.4,107.1])
marks(peta,[-6.5,107])
marks(peta,[-6.5,107.1])
marks(peta,[-6.5,106.9])
marks(peta,[-6.3,107])
peta

In [None]:
df = pd.read_csv('data\STA2086.csv',header = None)
df.columns = ['tgl1','tgl2','jam','windspeed']
df.head()
df.shape

In [None]:
df = df.dropna()
df = df.drop(columns = ['jam','tgl2'])
df.shape

df.iloc[:,0] = pd.to_datetime(df['tgl1'],infer_datetime_format = True)
df = df.set_index('tgl1')

dfmaxperM = df.windspeed.resample("MS").agg(['max','count'])
dfinvalid = dfmaxperM['count'] < 15
dfmaxperM = dfmaxperM['max']
dfmaxperM[dfinvalid] = np.nan
dfmaxperM = dfmaxperM.dropna()
dfmaxperM_sort = dfmaxperM.sort_values(ascending = True)

**The same steps are carried out for the AWS Tasikmalaya station; checking if the data is broken down into maximum 1,2,3,4, and 6 months**

In [None]:
a,b,c = testAD(dfmaxperM_sort)
rekap.iloc[0,:] = [a,b]
rekap

In [None]:
letsplot(dfmaxperM_sort.to_frame().astype(np.float64),'1 bulan')

In [None]:
dfmaxperM = df.windspeed.resample("2MS").agg(['max','count'])
dfinvalid = dfmaxperM['count'] < 15
dfmaxperM = dfmaxperM['max']
dfmaxperM[dfinvalid] = np.nan
dfmaxperM = dfmaxperM.dropna()
dfmaxperM_sort = dfmaxperM.sort_values(ascending = True)

In [None]:
a,b,c = testAD(dfmaxperM_sort)
rekap.iloc[1,:] = [a,b]
rekap

In [None]:
letsplot(dfmaxperM_sort.to_frame().astype(np.float64),'2 bulan')

In [None]:
dfmaxperM = df.windspeed.resample("3MS").agg(['max','count'])
dfinvalid = dfmaxperM['count'] < 15
dfmaxperM = dfmaxperM['max']
dfmaxperM[dfinvalid] = np.nan
dfmaxperM = dfmaxperM.dropna()
dfmaxperM_sort = dfmaxperM.sort_values(ascending = True)

In [None]:
a,b,c = testAD(dfmaxperM_sort)
rekap.iloc[2,:] = [a,b]
rekap

In [None]:
letsplot(dfmaxperM_sort.to_frame().astype(np.float64),'3 bulan')

In [None]:
dfmaxperM = df.windspeed.resample("4MS").agg(['max','count'])
dfinvalid = dfmaxperM['count'] < 15
dfmaxperM = dfmaxperM['max']
dfmaxperM[dfinvalid] = np.nan
dfmaxperM = dfmaxperM.dropna()
dfmaxperM_sort = dfmaxperM.sort_values(ascending = True)

In [None]:
a,b,c = testAD(dfmaxperM_sort)
rekap.iloc[3,:] = [a,b]
rekap

In [None]:
letsplot(dfmaxperM_sort.to_frame().astype(np.float64),'4 bulan')

In [None]:
dfmaxperM = df.windspeed.resample("6MS").agg(['max','count'])
dfinvalid = dfmaxperM['count'] < 15
dfmaxperM = dfmaxperM['max']
dfmaxperM[dfinvalid] = np.nan
dfmaxperM = dfmaxperM.dropna()
dfmaxperM_sort = dfmaxperM.sort_values(ascending = True)

In [None]:
a,b,c = testAD(dfmaxperM_sort)
rekap.iloc[4,:] = [a,b]
rekap

In [None]:
letsplot(dfmaxperM_sort.to_frame().astype(np.float64),'6 bulan')

In [None]:
x = range(5)
y = np.array(rekap.iloc[:,0])
z = np.array(rekap.iloc[:,1])
labels = ['1 bulan','2 bulan','3 bulan','4 bulan', '6 bulan']
plt.figure(figsize=(15,5))
plt.xticks(x, labels)
plt.plot(x, y)
plt.plot(x,z)
plt.legend(['AD Value','Critical Limit Value'])
plt.title('Rekap Hasil Test AD untuk interval data bervariasi, stasiun Tasikmalaya',fontsize = 15)
plt.fill_between(x,0, z,color = 'green',alpha = 0.2)
plt.fill_between(x, z, 0.8, color='red', alpha = 0.2)
plt.show()

## It can be seen, for the AWS Tasikmalaya station, if the data is divided into a maximum of 1,2,3,4, and 6 months, all of them are categorized as Gumbel Distribution

# Conclusion

## For AWS Mekarsari station :
### If the wind data is taken a maximum of 1 month:
$$ AD_{value} = 2,966, AD_{critical value} = 0.737 $$
Conclusion: Maximum data 1 month **not acceptable** to be considered as Gumbel distribution

### If the wind data is taken a maximum of every 2 months:
$$ AD_{value} = 1.851, AD_{critical value} = 0.729 $$
Conclusion: Maximum data 2 months **not acceptable** to be considered as Gumbel distribution

### If the wind data is taken a maximum of every 3 months:
$$ AD_{value} = 0.716, AD_{critical value} = 0.724 $$
Conclusion: Maximum data 3 months **acceptable** to be considered as Gumbel distribution

### If the wind data is taken a maximum of every 4 months:
$$ AD_{value} = 0.879, AD_{critical value} = 0.717 $$
Conclusion: Maximum data 4 months **not acceptable** to be considered as Gumbel distribution

### If the wind data is taken a maximum of 6 months:
$$ AD_{value} = 1.091, AD_{critical value} = 0.71 $$
Conclusion: Maximum 6 months data **not acceptable** to be considered as Gumbel distribution


## For Tasikmalaya station
### If the wind data is taken a maximum of 1 month:
$$ AD_{value} = 0.395, AD_{critical value} = 0.737 $$
Conclusion: Maximum data 1 month **acceptable** to be considered as Gumbel distribution

### If the wind data is taken a maximum of every 2 months:
$$ AD_{value} = 0.276, AD_{critical value} = 0.729 $$
Conclusion: Maximum data 2 months **acceptable** to be considered as Gumbel distribution

### If the wind data is taken a maximum of every 3 months:
$$ AD_{value} = 0.417, AD_{critical value} = 0.724 $$
Conclusion: Maximum data 3 months **acceptable** to be considered as Gumbel distribution

### If the wind data is taken a maximum of every 4 months:
$$ AD_{value} = 0.498, AD_{critical value} = 0.717 $$
Conclusion: Maximum data 4 months **acceptable** to be considered as Gumbel distribution

### If the wind data is taken a maximum of 6 months:
$$ AD_{value} = 0.685, AD_{critical value} = 0.71 $$
Conclusion: Maximum data 6 months **acceptable** to be considered as Gumbel distribution