Berdasarkan isu [#60](https://github.com/taruma/hidrokit/issues/60): __Menentukan kriteria berdasarkan total nilai bulanan__

Deskripsi permasalahan:
- Saya memiliki __dataset harian__ nilai a, b, c selama 50 tahun.
- Saya ingin mengkategorikan nilai di kolom a, __berdasarkan__ nilai total bulanan-nya.
- Hasil pengategorian tersebut dibuat pada kolom baru bernama 'kategori'.

In [0]:
# import library
import pandas as pd
import numpy as np

In [0]:
# buat dataset random
SEED = 110891
np.random.seed(SEED)
date_index = pd.date_range('20000101', '20501231')
dataset = pd.DataFrame(
    index=date_index,
    data=np.random.rand(len(date_index), 3)*10,
    columns='a b c'.split()
)
dataset.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 18628 entries, 2000-01-01 to 2050-12-31
Freq: D
Data columns (total 3 columns):
a    18628 non-null float64
b    18628 non-null float64
c    18628 non-null float64
dtypes: float64(3)
memory usage: 582.1 KB


In [0]:
dataset.head()

Unnamed: 0,a,b,c
2000-01-01,2.877766,3.213378,2.609509
2000-01-02,6.087079,0.528814,2.18141
2000-01-03,7.788553,8.597564,3.401175
2000-01-04,7.952444,3.182324,1.6006
2000-01-05,1.721316,7.618591,5.018759


In [0]:
# menghitung total ch setiap bulannya
total_summary = dataset.groupby([dataset.index.year, dataset.index.month]).sum()
total_summary

Unnamed: 0,Unnamed: 1,a,b,c
2000,1,162.847727,156.897045,144.812644
2000,2,146.883002,163.938908,125.541164
2000,3,149.917733,162.343038,153.770483
2000,4,151.332312,146.398578,131.572011
2000,5,148.824870,159.312776,178.204217
2000,6,142.645200,141.804644,162.177388
2000,7,170.140213,130.409430,191.273702
2000,8,170.068540,159.861622,179.001989
2000,9,178.699585,157.825042,144.558428
2000,10,145.530305,191.522354,178.904414


In [0]:
# Memilih nilai spesifik bulan dan tahun pada kolom tertentu
TAHUN, BULAN, KOLOM = 2000, 1, 'a'
total_summary.loc[(TAHUN, BULAN), KOLOM]

162.84772662835172

In [0]:
# 

def total_each_month(x, column='ch', summary=None):
    year = x['index'].year
    month = x['index'].month
    return summary.loc[(year, month), column]

def criteria(value):
    if value < 130:
        return "rendah"
    elif value >= 130 and value <= 160:
        return "sedang"
    else:
        return "tinggi"

KOLOM = 'a'
    
dataset['total_a'] = dataset.reset_index().apply(lambda x: total_each_month(x, column=KOLOM, summary=total_summary), axis=1).values
dataset['kategori'] = dataset['total_a'].apply(criteria)

In [0]:
dataset.head()

Unnamed: 0,a,b,c,total_a,kategori
2000-01-01,2.877766,3.213378,2.609509,162.847727,tinggi
2000-01-02,6.087079,0.528814,2.18141,162.847727,tinggi
2000-01-03,7.788553,8.597564,3.401175,162.847727,tinggi
2000-01-04,7.952444,3.182324,1.6006,162.847727,tinggi
2000-01-05,1.721316,7.618591,5.018759,162.847727,tinggi


In [0]:
dataset.sample(n=10, random_state=SEED)

Unnamed: 0,a,b,c,total_a,kategori
2048-11-11,1.223814,0.436503,5.906254,146.398529,sedang
2006-04-05,7.494172,3.334511,1.98266,161.010925,tinggi
2038-06-19,1.737448,1.266665,2.950612,130.366532,sedang
2022-06-14,7.999164,1.079437,4.436997,171.157052,tinggi
2012-10-24,9.006064,1.942224,0.583746,160.42403,tinggi
2007-06-10,0.790671,6.943135,1.963164,139.102825,sedang
2008-03-31,7.362242,3.070636,9.000031,143.855427,sedang
2004-08-20,6.696845,2.676529,3.607975,179.57375,tinggi
2014-12-28,1.035354,8.811785,0.589958,143.172699,sedang
2044-07-11,3.686524,8.440504,0.009692,151.676157,sedang


# Changelog

```
- 20190717 - 1.0.0 - Initial
```

#### Copyright &copy; 2019 [Taruma Sakti Megariansyah](https://taruma.github.io)

Source code in this notebook is licensed under a [MIT License](https://choosealicense.com/licenses/mit/). Data in this notebook is licensed under a [Creative Common Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0/). 