# Data & Algoritma Understanding

## Data Understanding

### 📊 Nama Dataset
liputan6_data.tar.gz

### 🌍 Languages
- Indonesian

### 🧩 Data Structure
|Nama Kolom|Tipe Data|
|----|--------|
|`id`|`string`|
|`url`|`string`|
|`clean_article`|`string`|
|`clean_summary`|`string`|
|`extractive_summary`|`string`|

### Data Instances
|Nama Kolom|Contoh Data|
|----------|-----------|
|`id`|26408|
|`url`|https://www.liputan6.com/news/read/26408/pbb-siap-membantu-penyelesaian-konflik-ambon|
|`clean_article`|Liputan6.com, Ambon: Partai Bulan Bintang wilayah Maluku bertekad membantu pemerintah menyelesaikan konflik di provinsi tersebut. Syaratnya, penanganan penyelesaian konflik Maluku harus dimulai dari awal kerusuhan, yakni 19 Januari 1999. Demikian hasil Musyawarah Wilayah I PBB Maluku yang dimulai Sabtu pekan silam dan berakhir Senin (31/12) di Ambon. Menurut seorang fungsionaris PBB Ridwan Hasan, persoalan di Maluku bisa selesai asalkan pemerintah dan aparat keamanan serius menangani setiap persoalan di Maluku secara komprehensif dan bijaksana. Itulah sebabnya, PBB wilayah Maluku akan menjadikan penyelesaian konflik sebagai agenda utama partai. PBB Maluku juga akan mendukung penegakan hukum secara terpadu dan tanpa pandang bulu. Siapa saja yang melanggar hukum harus ditindak. Ridwan berharap, Ketua PBB Maluku yang baru, Ali Fauzi, dapat menindak lanjuti agenda politik partai yang telah diamanatkan dan mau mendukung penegakan hukum di Maluku. (ULF/Sahlan Heluth).|
|`clean_summary`|Konflik Ambon telah berlangsung selama tiga tahun. Partai Bulan Bintang wilayah Maluku siap membantu pemerintah menyelesaikan kasus di provinsi tersebut.|
|`extractive_summary`|Liputan6.com, Ambon: Partai Bulan Bintang wilayah Maluku bertekad membantu pemerintah menyelesaikan konflik di provinsi tersebut. Siapa saja yang melanggar hukum harus ditindak.|

### Data Fields
|Nama Kolom|Keterangan|
|----------|----------|
|`id`|Kolom id unique|
|`url`|URL Article|
|`clean_article`|Isi original article|
|`clean_summary`|Ringkasan Abstract|
|`extractive_summary`|Ringkasan Ekstractif|

## Algoritma Understanding

# Model Training & Evaluation

### Load Dataset dan Convert Data


In [2]:
import pandas as pd
import glob

file_list = glob.glob('../data/liputan6_data/canonical/train/*.json')
df_list = [pd.read_json(f, lines=True) for f in file_list]
df_train = pd.concat(df_list, ignore_index=True)
df_train.head()

Unnamed: 0,id,url,clean_article,clean_summary,extractive_summary
0,100000,https://www.liputan6.com/news/read/100000/yudh...,"[[Liputan6, ., com, ,, Jakarta, :, Presiden, S...","[[Menurut, Presiden, Susilo, Bambang, Yudhoyon...","[0, 1]"
1,100002,https://www.liputan6.com/news/read/100002/jepa...,"[[Liputan6, ., com, ,, Jakarta, :, Perdana, Me...","[[Pada, masa, silam, Jepang, terlalu, ambisius...","[2, 3]"
2,100003,https://www.liputan6.com/news/read/100003/pulu...,"[[Liputan6, ., com, ,, Kutai, :, Banjir, denga...","[[Puluhan, hektare, areal, persawahan, yang, s...","[1, 5]"
3,100004,https://www.liputan6.com/news/read/100004/pres...,"[[Liputan6, ., com, ,, Jakarta, :, Presiden, S...","[[Sekjen, PBB, Kofi, Annan, memuji, langkah, P...","[2, 5]"
4,100005,https://www.liputan6.com/news/read/100005/warg...,"[[Liputan6, ., com, ,, Solok, :, Warga, Kampun...","[[Untuk, mempercepat, pelaksanaan, belajar-men...","[0, 2]"


In [4]:
df_train.to_csv('../data/train_data.csv', index=False)
df_train.head()

Unnamed: 0,id,url,clean_article,clean_summary,extractive_summary
0,100000,https://www.liputan6.com/news/read/100000/yudh...,"[[Liputan6, ., com, ,, Jakarta, :, Presiden, S...","[[Menurut, Presiden, Susilo, Bambang, Yudhoyon...","[0, 1]"
1,100002,https://www.liputan6.com/news/read/100002/jepa...,"[[Liputan6, ., com, ,, Jakarta, :, Perdana, Me...","[[Pada, masa, silam, Jepang, terlalu, ambisius...","[2, 3]"
2,100003,https://www.liputan6.com/news/read/100003/pulu...,"[[Liputan6, ., com, ,, Kutai, :, Banjir, denga...","[[Puluhan, hektare, areal, persawahan, yang, s...","[1, 5]"
3,100004,https://www.liputan6.com/news/read/100004/pres...,"[[Liputan6, ., com, ,, Jakarta, :, Presiden, S...","[[Sekjen, PBB, Kofi, Annan, memuji, langkah, P...","[2, 5]"
4,100005,https://www.liputan6.com/news/read/100005/warg...,"[[Liputan6, ., com, ,, Solok, :, Warga, Kampun...","[[Untuk, mempercepat, pelaksanaan, belajar-men...","[0, 2]"


In [5]:
file_list = glob.glob('../data/liputan6_data/canonical/dev/*.json')
df_list = [pd.read_json(f, lines=True) for f in file_list]
df_dev = pd.concat(df_list, ignore_index=True)
df_dev.head()

Unnamed: 0,id,url,clean_article,clean_summary,extractive_summary
0,1,https://www.liputan6.com/news/read/1/batas-pen...,"[[Liputan6, ., com, ,, Jakarta, :, Pemerintah,...","[[Pemerintah, memberikan, tenggat, 14, hari, k...","[1, 8]"
1,10,https://www.liputan6.com/news/read/10/belasan-...,"[[Liputan6, ., com, ,, Jakarta, :, Diperkiraka...","[[Satu, dari, 20, orang, Indonesia, diperkirak...","[2, 4]"
2,1000,https://www.liputan6.com/news/read/1000/lagi--...,"[[Liputan6, ., com, ,, Banda, Aceh, :, Aksi, p...","[[Peledakan, bom, kembali, terjadi, di, Aceh, ...","[2, 5]"
3,10000,https://www.liputan6.com/news/read/10000/penge...,"[[Liputan6, ., com, ,, Surabaya, :, Petugas, K...","[[Polres, Surabaya, Timur, menangkap, seorang,...","[0, 5]"
4,10001,https://www.liputan6.com/news/read/10001/menye...,"[[Liputan6, ., com, ,, Jakarta, :, Yogyakarta,...","[[Lima, seniman, Yogyakarta, dan, Bali, mengge...","[0, 2]"


In [6]:
df_dev.to_csv('../data/dev_data.csv', index=False)
df_dev.head()

Unnamed: 0,id,url,clean_article,clean_summary,extractive_summary
0,1,https://www.liputan6.com/news/read/1/batas-pen...,"[[Liputan6, ., com, ,, Jakarta, :, Pemerintah,...","[[Pemerintah, memberikan, tenggat, 14, hari, k...","[1, 8]"
1,10,https://www.liputan6.com/news/read/10/belasan-...,"[[Liputan6, ., com, ,, Jakarta, :, Diperkiraka...","[[Satu, dari, 20, orang, Indonesia, diperkirak...","[2, 4]"
2,1000,https://www.liputan6.com/news/read/1000/lagi--...,"[[Liputan6, ., com, ,, Banda, Aceh, :, Aksi, p...","[[Peledakan, bom, kembali, terjadi, di, Aceh, ...","[2, 5]"
3,10000,https://www.liputan6.com/news/read/10000/penge...,"[[Liputan6, ., com, ,, Surabaya, :, Petugas, K...","[[Polres, Surabaya, Timur, menangkap, seorang,...","[0, 5]"
4,10001,https://www.liputan6.com/news/read/10001/menye...,"[[Liputan6, ., com, ,, Jakarta, :, Yogyakarta,...","[[Lima, seniman, Yogyakarta, dan, Bali, mengge...","[0, 2]"


In [7]:
file_list = glob.glob('../data/liputan6_data/canonical/test/*.json')
df_list = [pd.read_json(f, lines=True) for f in file_list]
df_test = pd.concat(df_list, ignore_index=True)
df_test.head()

Unnamed: 0,id,url,clean_article,clean_summary,extractive_summary
0,13019,https://www.liputan6.com/news/read/13019/kapol...,"[[Liputan6, ., com, ,, Jakarta, :, Kepolisian,...","[[Kapolda, Riau, baru, Brigjen, Pol, .], [John...","[0, 4, 9]"
1,13020,https://www.liputan6.com/news/read/13020/bi-di...,"[[Liputan6, ., com, ,, Jakarta, :, Bank, Indon...","[[Kendati, Bank, Sentral, AS, menurunkan, suku...","[0, 4]"
2,13022,https://www.liputan6.com/news/read/13022/pemer...,"[[Liputan6, ., com, ,, Jakarta, :, Berbagai, k...","[[Pemerintah, bermaksud, akan, lebih, menganda...","[0, 7]"
3,13024,https://www.liputan6.com/news/read/13024/perub...,"[[Liputan6, ., com, ,, Jakarta, :, Penghapusan...","[[Revisi, Kepmennaker, Nomor, 78, Tahun, 2001,...","[0, 8]"
4,13025,https://www.liputan6.com/news/read/13025/puluh...,"[[Liputan6, ., com, ,, Jakarta, :, Operasi, Sa...","[[Polisi, menangkap, 32, pengunjung, Diskotik,...","[2, 3]"


In [8]:
df_test.to_csv('../data/test_data.csv', index=False)
df_test.head()

Unnamed: 0,id,url,clean_article,clean_summary,extractive_summary
0,13019,https://www.liputan6.com/news/read/13019/kapol...,"[[Liputan6, ., com, ,, Jakarta, :, Kepolisian,...","[[Kapolda, Riau, baru, Brigjen, Pol, .], [John...","[0, 4, 9]"
1,13020,https://www.liputan6.com/news/read/13020/bi-di...,"[[Liputan6, ., com, ,, Jakarta, :, Bank, Indon...","[[Kendati, Bank, Sentral, AS, menurunkan, suku...","[0, 4]"
2,13022,https://www.liputan6.com/news/read/13022/pemer...,"[[Liputan6, ., com, ,, Jakarta, :, Berbagai, k...","[[Pemerintah, bermaksud, akan, lebih, menganda...","[0, 7]"
3,13024,https://www.liputan6.com/news/read/13024/perub...,"[[Liputan6, ., com, ,, Jakarta, :, Penghapusan...","[[Revisi, Kepmennaker, Nomor, 78, Tahun, 2001,...","[0, 8]"
4,13025,https://www.liputan6.com/news/read/13025/puluh...,"[[Liputan6, ., com, ,, Jakarta, :, Operasi, Sa...","[[Polisi, menangkap, 32, pengunjung, Diskotik,...","[2, 3]"
