# Normalisasi / Preprocessing

Proses normalisasi atau preprocessing data dilakukan agar mendapat data yang berkualitas.

Library Sastrawi digunakan karena sastrawi sendiri merupakan sebuah perpustakaan (library) untuk pemrosesan bahasa alami (natural language processing atau NLP) dalam bahasa Indonesia. Tujuan utama Sastrawi adalah memberikan dukungan untuk tugas-tugas seperti stemming (menghilangkan imbuhan kata), tokenisasi (pemisahan teks menjadi kata-kata), dan beberapa fungsi pemrosesan bahasa lainnya dalam konteks bahasa Indonesia.

Untuk kode dibawah ini melakukan


*   Cleaning data (cek data null atau tidak)
*   Tokenizing
*   Stopword
*   Stemming



Impor library sastrawi

In [None]:
!pip install Sastrawi

Collecting Sastrawi
  Downloading Sastrawi-1.0.1-py2.py3-none-any.whl (209 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m209.7/209.7 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: Sastrawi
Successfully installed Sastrawi-1.0.1


In [None]:
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
from Sastrawi.Stemmer.StemmerFactory import StemmerFactory

import warnings
import pandas as pd
import numpy as np
import nltk
import re
import csv

nltk.download('stopwords')
nltk.download('punkt')
warnings.filterwarnings('ignore')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [None]:
df = pd.read_csv('/content/drive/MyDrive/ppw/data_label_crawling.csv')
df


Unnamed: 0,Judul,Penulis,Dosen Pembimbing I,Dosen Pembimbing II,Abstrak,Label
0,PERANCANGAN DAN IMPLEMENTASI SISTEM DATABASE ...,A.Ubaidillah S.Kom,Budi Setyono M.T,Hermawan S.T,Sistem informasi akademik (SIAKAD) merupaka...,RPL
1,APLIKASI KONTROL DAN MONITORING JARINGAN KOMPU...,"M. Basith Ardianto,","Drs. Budi Soesilo, MT","Koko Joni, ST",Berjalannya koneksi jaringan komputer dengan l...,RPL
2,RANCANG BANGUN APLIKASI PROXY SERVER UNTUK ENK...,"Akhmad Suyandi, S.Kom","Drs. Budi Soesilo, M.T","Hermawan, ST, MT",Web server adalah sebuah perangkat lunak serve...,RPL
3,SISTEM PENDUKUNG KEPUTUSAN OPTIMASI PENJADWALA...,Heri Supriyanto,"Mulaab, S.Si., M.Kom","Firli Irhamni, ST., M.Kom",Penjadwalan kuliah di Perguruan Tinggi me...,KK
4,SISTEM AUGMENTED REALITY ANIMASI BENDA BERGERA...,Septian Rahman Hakim,"Arik Kurniawati, S.Kom., M.T.","Haryanto, S.T., M.T.",Seiring perkembangan teknologi yang ada diduni...,KK
...,...,...,...,...,...,...
853,PENERAPAN ALGORITMA LONG-SHORT TERM MEMORY UNT...,Rachmad Agung Pambudi,"Eka Mala Sari Rochman, S.Kom., M.Kom","Sri Herawati, S.Kom., M.Kom",Investasi saham selama ini memiliki resiko ker...,KK
854,SISTEM PENCARIAN TEKS AL-QURAN TERJEMAHAN BERB...,Nadila Hidayanti,"Achmad Jauhari, S.T., M.Kom","Ika Oktavia Suzanti, S.Kom., M.Cs",Information Retrieval (IR) merupakan pengambil...,KK
855,KLASIFIKASI KOMPLEKSITAS VISUAL CITRA SAMPAH M...,Afni Sakinah,"Dr. Indah Agustien Siradjuddin, S.Kom., M.Kom.","Moch. Kautsar Sophan, S.Kom., M.MT.",Klasifikasi citra merupakan proses pengelompok...,KK
856,IDENTIFIKASI BINER ATRIBUT PEJALAN KAKI MENGGU...,Friska Fatmawatiningrum,"Dr. Indah Agustien Siradjuddin, S.Kom., M.Kom.","Prof. Dr. Arief Muntasa, S.Si., M.MT.",Identifikasi atribut pejalan kaki merupakan sa...,KK


**Cleaning**

Hapus data null

In [None]:
df.isnull().sum()

Judul                   6
Penulis                10
Dosen Pembimbing I     10
Dosen Pembimbing II    11
Abstrak                29
Label                   0
dtype: int64

In [None]:
df = df.dropna()
df.isnull().sum()

Judul                  0
Penulis                0
Dosen Pembimbing I     0
Dosen Pembimbing II    0
Abstrak                0
Label                  0
dtype: int64

Menghapus karakter tertentu

In [None]:
def cleaning(text):
  text = re.sub(r'[^a-zA-Z\s]', '', text).strip()
  return text

df['Cleaning'] = df['Abstrak'].apply(cleaning)
df['Cleaning']

0      Sistem  informasi  akademik  SIAKAD merupakan ...
1      Berjalannya koneksi jaringan komputer dengan l...
2      Web server adalah sebuah perangkat lunak serve...
3      Penjadwalan  kuliah  di  Perguruan  Tinggi  me...
4      Seiring perkembangan teknologi yang ada diduni...
                             ...                        
853    Investasi saham selama ini memiliki resiko ker...
854    Information Retrieval IR merupakan pengambilan...
855    Klasifikasi citra merupakan proses pengelompok...
856    Identifikasi atribut pejalan kaki merupakan sa...
857    Topik deteksi objek telah menarik perhatian ya...
Name: Cleaning, Length: 828, dtype: object

Mencetak karakter khusus pada data

In [None]:
def cek_specialCharacter(dokumen):
  karakter = ['!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '-', '_', '+', '=', '{', '}', '[', ']', '|', '\\', ':', ';', '"', "'", '<', '>', ',', '.', '?', '/', '`', '~']
  for i in dokumen:
    if i in karakter :
      print(dokumen)
df['Cleaning'].apply(cek_specialCharacter)

0      None
1      None
2      None
3      None
4      None
       ... 
853    None
854    None
855    None
856    None
857    None
Name: Cleaning, Length: 828, dtype: object

**Tokenizing**

In [None]:
def tokenizer(text):
  text = text.lower()
  return word_tokenize(text)

df['Tokenizing'] = df['Cleaning'].apply(tokenizer)
df['Tokenizing']

0      [sistem, informasi, akademik, siakad, merupaka...
1      [berjalannya, koneksi, jaringan, komputer, den...
2      [web, server, adalah, sebuah, perangkat, lunak...
3      [penjadwalan, kuliah, di, perguruan, tinggi, m...
4      [seiring, perkembangan, teknologi, yang, ada, ...
                             ...                        
853    [investasi, saham, selama, ini, memiliki, resi...
854    [information, retrieval, ir, merupakan, pengam...
855    [klasifikasi, citra, merupakan, proses, pengel...
856    [identifikasi, atribut, pejalan, kaki, merupak...
857    [topik, deteksi, objek, telah, menarik, perhat...
Name: Tokenizing, Length: 828, dtype: object

In [None]:
def count_word(dokumens):
  return len(dokumens)

df['Count Word'] = df['Tokenizing'].apply(count_word)
df

Unnamed: 0,Judul,Penulis,Dosen Pembimbing I,Dosen Pembimbing II,Abstrak,Label,Cleaning,Tokenizing,Count Word
0,PERANCANGAN DAN IMPLEMENTASI SISTEM DATABASE ...,A.Ubaidillah S.Kom,Budi Setyono M.T,Hermawan S.T,Sistem informasi akademik (SIAKAD) merupaka...,RPL,Sistem informasi akademik SIAKAD merupakan ...,"[sistem, informasi, akademik, siakad, merupaka...",150
1,APLIKASI KONTROL DAN MONITORING JARINGAN KOMPU...,"M. Basith Ardianto,","Drs. Budi Soesilo, MT","Koko Joni, ST",Berjalannya koneksi jaringan komputer dengan l...,RPL,Berjalannya koneksi jaringan komputer dengan l...,"[berjalannya, koneksi, jaringan, komputer, den...",204
2,RANCANG BANGUN APLIKASI PROXY SERVER UNTUK ENK...,"Akhmad Suyandi, S.Kom","Drs. Budi Soesilo, M.T","Hermawan, ST, MT",Web server adalah sebuah perangkat lunak serve...,RPL,Web server adalah sebuah perangkat lunak serve...,"[web, server, adalah, sebuah, perangkat, lunak...",182
3,SISTEM PENDUKUNG KEPUTUSAN OPTIMASI PENJADWALA...,Heri Supriyanto,"Mulaab, S.Si., M.Kom","Firli Irhamni, ST., M.Kom",Penjadwalan kuliah di Perguruan Tinggi me...,KK,Penjadwalan kuliah di Perguruan Tinggi me...,"[penjadwalan, kuliah, di, perguruan, tinggi, m...",134
4,SISTEM AUGMENTED REALITY ANIMASI BENDA BERGERA...,Septian Rahman Hakim,"Arik Kurniawati, S.Kom., M.T.","Haryanto, S.T., M.T.",Seiring perkembangan teknologi yang ada diduni...,KK,Seiring perkembangan teknologi yang ada diduni...,"[seiring, perkembangan, teknologi, yang, ada, ...",137
...,...,...,...,...,...,...,...,...,...
853,PENERAPAN ALGORITMA LONG-SHORT TERM MEMORY UNT...,Rachmad Agung Pambudi,"Eka Mala Sari Rochman, S.Kom., M.Kom","Sri Herawati, S.Kom., M.Kom",Investasi saham selama ini memiliki resiko ker...,KK,Investasi saham selama ini memiliki resiko ker...,"[investasi, saham, selama, ini, memiliki, resi...",173
854,SISTEM PENCARIAN TEKS AL-QURAN TERJEMAHAN BERB...,Nadila Hidayanti,"Achmad Jauhari, S.T., M.Kom","Ika Oktavia Suzanti, S.Kom., M.Cs",Information Retrieval (IR) merupakan pengambil...,KK,Information Retrieval IR merupakan pengambilan...,"[information, retrieval, ir, merupakan, pengam...",134
855,KLASIFIKASI KOMPLEKSITAS VISUAL CITRA SAMPAH M...,Afni Sakinah,"Dr. Indah Agustien Siradjuddin, S.Kom., M.Kom.","Moch. Kautsar Sophan, S.Kom., M.MT.",Klasifikasi citra merupakan proses pengelompok...,KK,Klasifikasi citra merupakan proses pengelompok...,"[klasifikasi, citra, merupakan, proses, pengel...",259
856,IDENTIFIKASI BINER ATRIBUT PEJALAN KAKI MENGGU...,Friska Fatmawatiningrum,"Dr. Indah Agustien Siradjuddin, S.Kom., M.Kom.","Prof. Dr. Arief Muntasa, S.Si., M.MT.",Identifikasi atribut pejalan kaki merupakan sa...,KK,Identifikasi atribut pejalan kaki merupakan sa...,"[identifikasi, atribut, pejalan, kaki, merupak...",211


**Stopword**

In [None]:
corpus = stopwords.words('indonesian')

def stopwordText(words):
 return [word for word in words if word not in corpus]

df['Stopword Removal'] = df['Tokenizing'].apply(stopwordText)

# Gabungkan kembali token menjadi kalimat utuh
df['Full Text'] = df['Stopword Removal'].apply(lambda x: ' '.join(x))
df['Full Text']

0      sistem informasi akademik siakad sistem inform...
1      berjalannya koneksi jaringan komputer lancar g...
2      web server perangkat lunak server berfungsi me...
3      penjadwalan kuliah perguruan kompleks permasal...
4      seiring perkembangan teknologi didunia muncul ...
                             ...                        
853    investasi saham memiliki resiko kerugian dikar...
854    information retrieval ir pengambilan informasi...
855    klasifikasi citra proses pengelompokan piksel ...
856    identifikasi atribut pejalan kaki salah peneli...
857    topik deteksi objek menarik perhatian perkemba...
Name: Full Text, Length: 828, dtype: object

**Stemming**

In [None]:
def stemmingText(dokumens):
    factory = StemmerFactory()
    stemmer = factory.create_stemmer()
    return [stemmer.stem(i) for i in dokumens]

df['Stemming'] = df['Stopword Removal'].apply(stemmingText)

# Menyimpan hasil stemming ke dalam DataFrame baru
stemmed_df = df[['Stemming']]

# Menyimpan DataFrame hasil stemming ke dalam file CSV
stemmed_df.to_csv('/content/drive/MyDrive/ppw/PPWcoba/hasil_stemming.csv', index=False)


###VSM

Setelah data selesai di preprocessing atau dinormalisasi, maka bisa dilanjutkan ke VSM (Vector Space Model) yaitu model representasi dokumen yang digunakan dalam pengolahan bahasa alami (natural language processing atau NLP) dan temu balik informasi (information retrieval). VSM mengonversi dokumen teks ke dalam ruang vektor, di mana setiap kata atau istilah direpresentasikan sebagai dimensi dalam vektor.

Pada langkah kali ini di lakukan


*   One Hot Encoding
*   Term Freq
*   Log Freq
*   TF-IDF



One Hot Encoding

In [None]:
def pandasOneHotEncoder(dokumens):
  encoder  = pd.get_dummies(dokumens.apply(pd.Series).stack()).sum(level=0)
  df = pd.concat([dokumens, encoder], axis=1)

  return df

oneHotEncoder = pandasOneHotEncoder(df['Stopword Removal'])
oneHotEncoder

Unnamed: 0,Stopword Removal,a,aalysis,aam,ab,abad,abadi,ability,abjad,absensi,...,zara,zat,zcz,zf,zona,zone,zoning,zoom,zucara,zungu
0,"[sistem, informasi, akademik, siakad, sistem, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"[berjalannya, koneksi, jaringan, komputer, lan...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"[web, server, perangkat, lunak, server, berfun...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"[penjadwalan, kuliah, perguruan, kompleks, per...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"[seiring, perkembangan, teknologi, didunia, mu...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
853,"[investasi, saham, memiliki, resiko, kerugian,...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
854,"[information, retrieval, ir, pengambilan, info...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
855,"[klasifikasi, citra, proses, pengelompokan, pi...",2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
856,"[identifikasi, atribut, pejalan, kaki, salah, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
oneHotEncoder.to_csv('/content/drive/MyDrive/ppw/PPWcoba/OneHotEncoder.csv', index=False)

**Term Freq**

In [None]:
def term_freq(dokumens):
  # Buat objek CountVectorizer
  vectorizer = CountVectorizer()
  tf_matrix = vectorizer.fit_transform(dokumens).toarray()
  terms = vectorizer.get_feature_names_out()

  final_tf = pd.DataFrame(tf_matrix, columns=terms)
  final_tf.insert(0, 'Dokumen', dokumens)

  return (vectorizer, final_tf, tf_matrix, terms)

tf_vectorizer, final_tf, tf_matrix, tf_terms = term_freq(df['Full Text'])
final_tf

Unnamed: 0,Dokumen,aalysis,aam,ab,abad,abadi,ability,abjad,absensi,absolut,...,zara,zat,zcz,zf,zona,zone,zoning,zoom,zucara,zungu
0,sistem informasi akademik siakad sistem inform...,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,berjalannya koneksi jaringan komputer lancar g...,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,web server perangkat lunak server berfungsi me...,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,penjadwalan kuliah perguruan kompleks permasal...,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,seiring perkembangan teknologi didunia muncul ...,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
823,kurangnya pemahaman gejala penyakit saluran pe...,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
824,data set hilang utama studi bersifat substansi...,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
825,proses seleksi penerimaan tenaga kerja faktor ...,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
826,sapi salah hewan ternak komoditi utama bahan p...,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [None]:
final_tf.to_csv('/content/drive/MyDrive/ppw/PPWcoba/TermFrequensi.csv', index=False)

**Logarithm freq**

In [None]:
def logarithm_freq(dokumens):
  return np.log10(dokumens + 1)

df_logarithm_freq = pd.DataFrame(tf_matrix, columns=tf_terms).apply(logarithm_freq)
df_logarithm_freq.insert(0, 'Dokumen', df['Full Text'])
df_logarithm_freq

Unnamed: 0,Dokumen,aalysis,aam,ab,abad,abadi,ability,abjad,absensi,absolut,...,zara,zat,zcz,zf,zona,zone,zoning,zoom,zucara,zungu
0,sistem informasi akademik siakad sistem inform...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,berjalannya koneksi jaringan komputer lancar g...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,web server perangkat lunak server berfungsi me...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,penjadwalan kuliah perguruan kompleks permasal...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,seiring perkembangan teknologi didunia muncul ...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
823,kurangnya pemahaman gejala penyakit saluran pe...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
824,data set hilang utama studi bersifat substansi...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
825,proses seleksi penerimaan tenaga kerja faktor ...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
826,sapi salah hewan ternak komoditi utama bahan p...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
df_logarithm_freq.to_csv('/content/drive/MyDrive/ppw/PPWcoba/Logarithm Frequensi.csv', index=False)

TF-IDF


In [None]:
def tfidf(dokumen):
  vectorizer = TfidfVectorizer()
  x = vectorizer.fit_transform(dokumen).toarray()
  terms = vectorizer.get_feature_names_out()

  final_tfidf = pd.DataFrame(x, columns=terms)
  final_tfidf.insert(0, 'Dokumen', dokumen)

  return (vectorizer, final_tfidf)

tfidf_vectorizer, final_tfidf = tfidf(df['Full Text'])
final_tfidf

Unnamed: 0,Dokumen,aalysis,aam,ab,abad,abadi,ability,abjad,absensi,absolut,...,zara,zat,zcz,zf,zona,zone,zoning,zoom,zucara,zungu
0,sistem informasi akademik siakad sistem inform...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,berjalannya koneksi jaringan komputer lancar g...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,web server perangkat lunak server berfungsi me...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,penjadwalan kuliah perguruan kompleks permasal...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,seiring perkembangan teknologi didunia muncul ...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
823,kurangnya pemahaman gejala penyakit saluran pe...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
824,data set hilang utama studi bersifat substansi...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
825,proses seleksi penerimaan tenaga kerja faktor ...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
826,sapi salah hewan ternak komoditi utama bahan p...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
final_tfidf.to_csv('/content/drive/MyDrive/ppw/PPWcoba/TF-IDF.csv', index=False)