## Term Frequency-Inverse Document Frequency (TF-IDF)
Term Frequency - Inverse Document Frequency (TF-IDF) is a widely used statistical method in natural language processing and information retrieval. **It measures how important a term is within a document relative to a collection of documents** (i.e., relative to a corpus).
- **Term Frequency:** TF of a term or word is the number of times the term appears in a document compared to the total number of words in the document.
- **Inverse Document Frequency:** IDF of a term reflects the proportion of documents in the corpus that contain the term. Words unique to a small percentage of documents (e.g., technical jargon terms) receive higher importance values than words common across all documents (e.g., a, the, and).

### About Project
This project devided into **three notebooks** that explained the usage of TF-IDF using **Bahasa Indonesia & English.** The process flow of this project start from data collection (corpus) to pre-processing and algorithm fitting, the detailed steps explained below:
1. **Data Collection (self-produce)**
2. **Text Pre-Processing (Case Folding, Punctuation Removal, Tokenizing, Applying Stop Words, Stemming)**
3. **Fitting the TF-IDF Algorithm**
4. **Testing for Input and Output**

#### The Notebook Divided into three sub-process:
1. text-preprocessing-english.ipynb
2. text-preprocessing-indonesia.ipynb
3. implementation.ipynb

### Listing library used in this project

In [1]:
# Numpy for numerical manipulation
!pip install numpy



In [2]:
# Dataframe manipulation using Pandas
!pip install pandas



In [3]:
# TF-IDF Algorithm using sklearn
!pip install scikit-learn



## Library Initialization

In [4]:
import numpy as np
import pandas as pd

from sklearn.feature_extraction.text import TfidfVectorizer

## Importing Dataset for English

### Training Dataset

In [5]:
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.max_colwidth', 1000)

dataset = pd.read_csv("clean-corpus-inggris.csv", index_col=0)
dataset

Unnamed: 0,teks
0,call bird habit
1,brother like bird month father gave black bird
2,antoni bird lost
3,greedi characterist hate
4,bird two wing two leg
5,sustain becom key focu consum electron nokia set apart priorit ecofriendli practic manufactur process align grow demand ethic sourc recycl mobil devic
6,advent augment realiti applic smartphon like iphon transform versatil tool blur line digit physic realm offer user immers experi previous unimagin
7,smartphon manufactur strive market domin iphon distinguish seamless ecosystem integr hardwar softwar servic creat cohes user experi unparallel competitor
8,era privaci concern loom larg iphon stand robust secur featur provid user peac mind amidst grow threat person data smartphon
9,nokia synonym mobil innov undergo resurg telecommun industri leverag heritag reintroduc icon design infus modern technolog advanc


### Output Dataset

In [6]:
validate = pd.read_csv("corpus-inggris.csv")
validate

Unnamed: 0,id,text,topic
0,ENG1,"They called him a bird, because of his habit",bird
1,ENG2,My brother likes bird and after a month my father gave him a black bird,bird
2,ENG3,Antony has a bird and he lost it,bird
3,ENG4,Greedy is the most characteristic that I hate,hate
4,ENG5,Bird has two wings and two legs,bird
5,ENG6,"As sustainability becomes a key focus in consumer electronics, Nokia sets itself apart by prioritizing eco-friendly practices in its manufacturing processes, aligning with the growing demand for ethically sourced and recyclable mobile devices.",nokia
6,ENG7,"With the advent of augmented reality applications, smartphones like the iPhone are transforming into versatile tools that blur the lines between digital and physical realms, offering users immersive experiences previously unimaginable.",smartphone
7,ENG8,"As smartphone manufacturers strive for market dominance, the iPhone distinguishes itself with its seamless ecosystem, where integration between hardware, software, and services creates a cohesive user experience unparalleled by its competitors.",smartphone
8,ENG9,"In an era where privacy concerns loom large, the iPhone stands out for its robust security features, providing users with peace of mind amidst growing threats to personal data on smartphones.",smartphone
9,ENG10,"Nokia, once synonymous with mobile innovation, is undergoing a resurgence in the telecommunications industry, leveraging its heritage to reintroduce iconic designs infused with modern technological advancements.",nokia


In [7]:
corpus = dataset.teks.tolist()
corpus

['call bird habit',
 'brother like bird month father gave black bird',
 'antoni bird lost',
 'greedi characterist hate',
 'bird two wing two leg',
 'sustain becom key focu consum electron nokia set apart priorit ecofriendli practic manufactur process align grow demand ethic sourc recycl mobil devic',
 'advent augment realiti applic smartphon like iphon transform versatil tool blur line digit physic realm offer user immers experi previous unimagin',
 'smartphon manufactur strive market domin iphon distinguish seamless ecosystem integr hardwar softwar servic creat cohes user experi unparallel competitor',
 'era privaci concern loom larg iphon stand robust secur featur provid user peac mind amidst grow threat person data smartphon',
 'nokia synonym mobil innov undergo resurg telecommun industri leverag heritag reintroduc icon design infus modern technolog advanc',
 'cellular biolog research uncov intric mechan govern cell signal pathway shed light fundament process crucial understand huma

### Scikit-learn TF-IDF

In [8]:
tr_idf_model  = TfidfVectorizer()
tf_idf_vector = tr_idf_model.fit_transform(corpus)

In [9]:
print(type(tf_idf_vector), tf_idf_vector.shape)

<class 'scipy.sparse._csr.csr_matrix'> (15, 181)


In [10]:
tf_idf_array = tf_idf_vector.toarray()

print(tf_idf_array)

[[0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]
 ...
 [0.         0.         0.         ... 0.         0.40036069 0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]]


In [11]:
words_set = tr_idf_model.get_feature_names_out()

print(words_set)

['advanc' 'advent' 'align' 'amidst' 'antoni' 'apart' 'applic' 'art'
 'assess' 'augment' 'bartend' 'becom' 'beverag' 'biolog' 'bird' 'black'
 'blur' 'bottl' 'brother' 'call' 'cell' 'cellular' 'certain'
 'characterist' 'circuitri' 'cohes' 'compel' 'competitor' 'concern'
 'consequ' 'consum' 'creat' 'crucial' 'data' 'dehydr' 'demand' 'design'
 'develop' 'devic' 'differ' 'digit' 'diseas' 'distinguish' 'domin' 'drink'
 'ecofriendli' 'ecolog' 'ecosystem' 'electron' 'elucid' 'environment'
 'era' 'essenti' 'ethic' 'examin' 'excess' 'experi' 'father' 'featur'
 'find' 'flavor' 'focu' 'footprint' 'fundament' 'gather' 'gave' 'govern'
 'greedi' 'grow' 'habit' 'hardwar' 'harmoni' 'hate' 'health' 'heritag'
 'hospit' 'human' 'icon' 'immers' 'impact' 'impair' 'individu' 'industri'
 'infus' 'ingredi' 'innov' 'integr' 'interact' 'intric' 'iphon' 'judgment'
 'key' 'larg' 'learn' 'leg' 'leverag' 'light' 'like' 'line' 'loom' 'lost'
 'manipul' 'manufactur' 'market' 'master' 'materi' 'mechan' 'memori'
 'mind' 

In [12]:
df_tf_idf = pd.DataFrame(tf_idf_array, columns = list(words_set))

df_tf_idf

Unnamed: 0,advanc,advent,align,amidst,antoni,apart,applic,art,assess,augment,bartend,becom,beverag,biolog,bird,black,blur,bottl,brother,call,cell,cellular,certain,characterist,circuitri,cohes,compel,competitor,concern,consequ,consum,creat,crucial,data,dehydr,demand,design,develop,devic,differ,digit,diseas,distinguish,domin,drink,ecofriendli,ecolog,ecosystem,electron,elucid,environment,era,essenti,ethic,examin,excess,experi,father,featur,find,flavor,focu,footprint,fundament,gather,gave,govern,greedi,grow,habit,hardwar,harmoni,hate,health,heritag,hospit,human,icon,immers,impact,impair,individu,industri,infus,ingredi,innov,integr,interact,intric,iphon,judgment,key,larg,learn,leg,leverag,light,like,line,loom,lost,manipul,manufactur,market,master,materi,mechan,memori,mind,mitig,mixolog,mobil,modern,month,neural,neurobiolog,nokia,nuanc,offer,often,packag,pathway,peac,person,physic,popular,potenti,practic,precis,previous,priorit,privaci,process,product,profess,prompt,provid,realiti,realm,record,recycl,reintroduc,requir,research,resurg,robust,scientist,seamless,secur,servic,set,shed,signal,smartphon,social,soda,softwar,sourc,stand,strive,sustain,synonym,techniqu,technolog,telecommun,threat,tool,transform,two,unawar,uncov,undergo,underli,understand,unimagin,unparallel,usag,user,versatil,water,wing
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.444852,0.0,0.0,0.0,0.0,0.633288,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.633288,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.505381,0.359728,0.0,0.0,0.359728,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.359728,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.359728,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.312363,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.359728,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.633288,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.444852,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.633288,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.57735,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.57735,0.0,0.0,0.0,0.0,0.57735,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.275662,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.392431,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.784861,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.392431
5,0.0,0.0,0.221564,0.0,0.0,0.221564,0.0,0.0,0.0,0.0,0.0,0.221564,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.221564,0.0,0.0,0.0,0.0,0.221564,0.0,0.0,0.221564,0.0,0.0,0.0,0.0,0.0,0.0,0.221564,0.0,0.0,0.221564,0.0,0.0,0.0,0.0,0.221564,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.221564,0.0,0.0,0.0,0.0,0.0,0.0,0.192391,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.221564,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.171693,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.192391,0.0,0.0,0.0,0.0,0.192391,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.221564,0.0,0.0,0.221564,0.0,0.192391,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.221564,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.221564,0.0,0.0,0.0,0.0,0.0,0.0,0.221564,0.0,0.0,0.192391,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.22848,0.0,0.0,0.0,0.0,0.22848,0.0,0.0,0.22848,0.0,0.0,0.0,0.0,0.0,0.0,0.22848,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.22848,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.177052,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.22848,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.177052,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.198396,0.22848,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.22848,0.0,0.0,0.0,0.0,0.0,0.22848,0.0,0.0,0.0,0.0,0.22848,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.22848,0.22848,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.177052,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.22848,0.22848,0.0,0.0,0.0,0.0,0.0,0.0,0.22848,0.0,0.0,0.177052,0.22848,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.244292,0.0,0.244292,0.0,0.0,0.0,0.212126,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.244292,0.244292,0.0,0.0,0.0,0.244292,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.189305,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.244292,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.244292,0.0,0.0,0.189305,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.189305,0.244292,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.244292,0.0,0.244292,0.0,0.0,0.0,0.189305,0.0,0.0,0.244292,0.0,0.0,0.244292,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.244292,0.0,0.189305,0.0,0.0,0.0
8,0.0,0.0,0.0,0.232148,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.232148,0.0,0.0,0.0,0.0,0.232148,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.232148,0.0,0.0,0.0,0.0,0.0,0.0,0.232148,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.201581,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.179894,0.0,0.0,0.232148,0.0,0.0,0.0,0.0,0.0,0.0,0.232148,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.232148,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.232148,0.232148,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.232148,0.0,0.0,0.0,0.0,0.232148,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.232148,0.0,0.0,0.232148,0.0,0.0,0.0,0.0,0.179894,0.0,0.0,0.0,0.0,0.232148,0.0,0.0,0.0,0.0,0.0,0.0,0.232148,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.179894,0.0,0.0,0.0
9,0.247978,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.247978,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.247978,0.0,0.0,0.247978,0.0,0.0,0.0,0.0,0.247978,0.247978,0.0,0.215327,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.247978,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.215327,0.247978,0.0,0.0,0.0,0.215327,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.247978,0.0,0.0,0.247978,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.247978,0.0,0.247978,0.247978,0.0,0.0,0.0,0.0,0.0,0.0,0.247978,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [13]:
result = pd.concat([df_tf_idf, validate.text], axis=1)
result.head(2)

Unnamed: 0,advanc,advent,align,amidst,antoni,apart,applic,art,assess,augment,bartend,becom,beverag,biolog,bird,black,blur,bottl,brother,call,cell,cellular,certain,characterist,circuitri,cohes,compel,competitor,concern,consequ,consum,creat,crucial,data,dehydr,demand,design,develop,devic,differ,digit,diseas,distinguish,domin,drink,ecofriendli,ecolog,ecosystem,electron,elucid,environment,era,essenti,ethic,examin,excess,experi,father,featur,find,flavor,focu,footprint,fundament,gather,gave,govern,greedi,grow,habit,hardwar,harmoni,hate,health,heritag,hospit,human,icon,immers,impact,impair,individu,industri,infus,ingredi,innov,integr,interact,intric,iphon,judgment,key,larg,learn,leg,leverag,light,like,line,loom,lost,manipul,manufactur,market,master,materi,mechan,memori,mind,mitig,mixolog,mobil,modern,month,neural,neurobiolog,nokia,nuanc,offer,often,packag,pathway,peac,person,physic,popular,potenti,practic,precis,previous,priorit,privaci,process,product,profess,prompt,provid,realiti,realm,record,recycl,reintroduc,requir,research,resurg,robust,scientist,seamless,secur,servic,set,shed,signal,smartphon,social,soda,softwar,sourc,stand,strive,sustain,synonym,techniqu,technolog,telecommun,threat,tool,transform,two,unawar,uncov,undergo,underli,understand,unimagin,unparallel,usag,user,versatil,water,wing,text
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.444852,0.0,0.0,0.0,0.0,0.633288,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.633288,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"They called him a bird, because of his habit"
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.505381,0.359728,0.0,0.0,0.359728,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.359728,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.359728,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.312363,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.359728,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,My brother likes bird and after a month my father gave him a black bird


In [14]:
print(words_set)

['advanc' 'advent' 'align' 'amidst' 'antoni' 'apart' 'applic' 'art'
 'assess' 'augment' 'bartend' 'becom' 'beverag' 'biolog' 'bird' 'black'
 'blur' 'bottl' 'brother' 'call' 'cell' 'cellular' 'certain'
 'characterist' 'circuitri' 'cohes' 'compel' 'competitor' 'concern'
 'consequ' 'consum' 'creat' 'crucial' 'data' 'dehydr' 'demand' 'design'
 'develop' 'devic' 'differ' 'digit' 'diseas' 'distinguish' 'domin' 'drink'
 'ecofriendli' 'ecolog' 'ecosystem' 'electron' 'elucid' 'environment'
 'era' 'essenti' 'ethic' 'examin' 'excess' 'experi' 'father' 'featur'
 'find' 'flavor' 'focu' 'footprint' 'fundament' 'gather' 'gave' 'govern'
 'greedi' 'grow' 'habit' 'hardwar' 'harmoni' 'hate' 'health' 'heritag'
 'hospit' 'human' 'icon' 'immers' 'impact' 'impair' 'individu' 'industri'
 'infus' 'ingredi' 'innov' 'integr' 'interact' 'intric' 'iphon' 'judgment'
 'key' 'larg' 'learn' 'leg' 'leverag' 'light' 'like' 'line' 'loom' 'lost'
 'manipul' 'manufactur' 'market' 'master' 'materi' 'mechan' 'memori'
 'mind' 

In [15]:
search = input("Masukan kata kunci yang akan dicari: ")
result[[search, "text"]].sort_values(by=search,ascending = False)

Masukan kata kunci yang akan dicari:  wing


Unnamed: 0,wing,text
4,0.392431,Bird has two wings and two legs
0,0.0,"They called him a bird, because of his habit"
1,0.0,My brother likes bird and after a month my father gave him a black bird
2,0.0,Antony has a bird and he lost it
3,0.0,Greedy is the most characteristic that I hate
5,0.0,"As sustainability becomes a key focus in consumer electronics, Nokia sets itself apart by prioritizing eco-friendly practices in its manufacturing processes, aligning with the growing demand for ethically sourced and recyclable mobile devices."
6,0.0,"With the advent of augmented reality applications, smartphones like the iPhone are transforming into versatile tools that blur the lines between digital and physical realms, offering users immersive experiences previously unimaginable."
7,0.0,"As smartphone manufacturers strive for market dominance, the iPhone distinguishes itself with its seamless ecosystem, where integration between hardware, software, and services creates a cohesive user experience unparalleled by its competitors."
8,0.0,"In an era where privacy concerns loom large, the iPhone stands out for its robust security features, providing users with peace of mind amidst growing threats to personal data on smartphones."
9,0.0,"Nokia, once synonymous with mobile innovation, is undergoing a resurgence in the telecommunications industry, leveraging its heritage to reintroduce iconic designs infused with modern technological advancements."


## Importing Dataset for Indonesia

### Training Dataset

In [16]:
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.max_colwidth', 1000)

dataset_id = pd.read_csv("clean-corpus-indonesia.csv", index_col=0)
dataset_id

Unnamed: 0,teks
0,makhluk hidup objek kaji utama bidang biologi
1,biologi seluler cabang biologi ajar struktur fungsi sel makhluk hidup
2,jenis sel makhluk hidup
3,sel istilah biologi unit dasar hidup
4,inti sel kandung materi genetik bentuk dna
5,gen instruksi simpan dna sel
6,sel unit kecil susun tubuh makhluk hidup
7,biologi kembang diferensiasi sel proses sel kembang fungsi spesifik
8,makhluk hidup uniseluler makhluk hidup memilki sel tunggal multiseluler makhluk hidup organisme milik sel
9,biokimia cabang biologi ajar kimia dasar hidup makhluk hidup


### Output Dataset

In [17]:
validate_id = pd.read_csv("corpus-indonesia.csv")
validate_id

Unnamed: 0,id,teks,topik
0,IND1,MAKHLUK HIDUP ADALAH OBJEK KAJIAN UTAMA DALAM BIDANG BIOLOGI.,biologi
1,IND2,Biologi seluler = cabang biologi yang mempelajari struktur dan fungsi sel-sel dalam makhluk hidup.,biologi
2,IND3,Apa Saja Jenis-Jenis Sel yang Ada pada Makhluk Hidup?,sel
3,IND4,Sel merupakan istilah biologi untuk unit dasar dari kehidupan.,sel
4,IND5,Inti sel mengandung materi genetik dalam bentuk DNA.,dna
5,IND6,Gen adalah instruksi yang disimpan dalam DNA suatu sel.,dna
6,IND7,Mengapa sel disebut sebagai unit terkecil yang menyusun tubuh makhluk hidup?,sel
7,IND8,"Dalam biologi perkembangan, diferensiasi sel adalah proses di mana sel-sel mengembangkan fungsi spesifik.",biologi
8,IND9,"Makhluk hidup uniseluler adalah makhluk hidup yang hanya memilki sebuah sel tunggal, sedangkan multiseluler adalah makhluk hidup atau organisme yang memiliki lebih dari satu sel.",sel
9,IND10,BIOKIMIA ADALAH CABANG BIOLOGI YANG MEMPELAJARI KIMIA DASAR DALAM KEHIDUPAN MAKHLUK HIDUP.,biologi


In [18]:
corpus_id = dataset_id.teks.tolist()
corpus_id

['makhluk hidup objek kaji utama bidang biologi',
 'biologi seluler cabang biologi ajar struktur fungsi sel makhluk hidup',
 'jenis sel makhluk hidup',
 'sel istilah biologi unit dasar hidup',
 'inti sel kandung materi genetik bentuk dna',
 'gen instruksi simpan dna sel',
 'sel unit kecil susun tubuh makhluk hidup',
 'biologi kembang diferensiasi sel proses sel kembang fungsi spesifik',
 'makhluk hidup uniseluler makhluk hidup memilki sel tunggal multiseluler makhluk hidup organisme milik sel',
 'biokimia cabang biologi ajar kimia dasar hidup makhluk hidup',
 'studi biologi bantu paham kerja sistem makhluk hidup',
 'evolusi proses utama biologi ubah populasi makhluk hidup',
 'proses evolusi bentuk agam jenis sel dunia',
 'evolusi sel hasil organel kompleks mitokondria kloroplas',
 'teliti baru evolusi sel lanjut respons tekan seleksi lingkung ubah']

### Scikit-learn TF-IDF

In [19]:
tr_idf_model_id  = TfidfVectorizer()
tf_idf_vector_id = tr_idf_model_id.fit_transform(corpus_id)

In [20]:
print(type(tf_idf_vector_id), tf_idf_vector_id.shape)

<class 'scipy.sparse._csr.csr_matrix'> (15, 63)


In [21]:
tf_idf_array_id = tf_idf_vector_id.toarray()

print(tf_idf_array_id)

[[0.         0.         0.         0.         0.         0.46901929
  0.         0.2578775  0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.22389126 0.         0.         0.         0.         0.46901929
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.23993837 0.         0.
  0.         0.         0.         0.46901929 0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.40726428]
 [0.         0.35055419 0.         0.         0.         0.
  0.         0.44393797 0.35055419 0.         0.         0.
  0.         0.         0.35055419 0.         0.         0.
  0.19271521 0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.20652781 0.

In [22]:
words_set_id = tr_idf_model_id.get_feature_names_out()

print(words_set_id)

['agam' 'ajar' 'bantu' 'baru' 'bentuk' 'bidang' 'biokimia' 'biologi'
 'cabang' 'dasar' 'diferensiasi' 'dna' 'dunia' 'evolusi' 'fungsi' 'gen'
 'genetik' 'hasil' 'hidup' 'instruksi' 'inti' 'istilah' 'jenis' 'kaji'
 'kandung' 'kecil' 'kembang' 'kerja' 'kimia' 'kloroplas' 'kompleks'
 'lanjut' 'lingkung' 'makhluk' 'materi' 'memilki' 'milik' 'mitokondria'
 'multiseluler' 'objek' 'organel' 'organisme' 'paham' 'populasi' 'proses'
 'respons' 'sel' 'seleksi' 'seluler' 'simpan' 'sistem' 'spesifik'
 'struktur' 'studi' 'susun' 'tekan' 'teliti' 'tubuh' 'tunggal' 'ubah'
 'uniseluler' 'unit' 'utama']


In [23]:
df_tf_idf_id = pd.DataFrame(tf_idf_array_id, columns = list(words_set_id))

df_tf_idf_id

Unnamed: 0,agam,ajar,bantu,baru,bentuk,bidang,biokimia,biologi,cabang,dasar,diferensiasi,dna,dunia,evolusi,fungsi,gen,genetik,hasil,hidup,instruksi,inti,istilah,jenis,kaji,kandung,kecil,kembang,kerja,kimia,kloroplas,kompleks,lanjut,lingkung,makhluk,materi,memilki,milik,mitokondria,multiseluler,objek,organel,organisme,paham,populasi,proses,respons,sel,seleksi,seluler,simpan,sistem,spesifik,struktur,studi,susun,tekan,teliti,tubuh,tunggal,ubah,uniseluler,unit,utama
0,0.0,0.0,0.0,0.0,0.0,0.469019,0.0,0.257878,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.223891,0.0,0.0,0.0,0.0,0.469019,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.239938,0.0,0.0,0.0,0.0,0.0,0.469019,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.407264
1,0.0,0.350554,0.0,0.0,0.0,0.0,0.0,0.443938,0.350554,0.0,0.0,0.0,0.0,0.0,0.350554,0.0,0.0,0.0,0.192715,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.206528,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.168813,0.0,0.40371,0.0,0.0,0.0,0.40371,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.400813,0.0,0.0,0.0,0.72909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.429541,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.351101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.306736,0.0,0.484427,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.266311,0.0,0.0,0.557882,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.233281,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.484427,0.0
4,0.0,0.0,0.0,0.0,0.364252,0.0,0.0,0.0,0.0,0.0,0.0,0.364252,0.0,0.0,0.0,0.0,0.419485,0.0,0.0,0.0,0.419485,0.0,0.0,0.0,0.419485,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.419485,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.17541,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.438079,0.0,0.0,0.0,0.504507,0.0,0.0,0.0,0.504507,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.210962,0.0,0.0,0.504507,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.227097,0.0,0.0,0.0,0.0,0.0,0.0,0.475736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.243374,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.198931,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.475736,0.0,0.0,0.475736,0.0,0.0,0.0,0.413096,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.190203,0.0,0.0,0.345936,0.0,0.0,0.0,0.300387,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.691872,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.26807,0.0,0.289309,0.0,0.0,0.0,0.0,0.345936,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.42973,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.46053,0.0,0.300074,0.300074,0.0,0.300074,0.0,0.0,0.300074,0.0,0.0,0.0,0.0,0.250954,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.300074,0.0,0.300074,0.0,0.0
9,0.0,0.362513,0.0,0.0,0.0,0.0,0.417483,0.229541,0.362513,0.362513,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.398579,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.417483,0.0,0.0,0.0,0.0,0.213574,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [24]:
result_id = pd.concat([df_tf_idf_id, validate_id.teks], axis=1)
result_id.head(2)

Unnamed: 0,agam,ajar,bantu,baru,bentuk,bidang,biokimia,biologi,cabang,dasar,diferensiasi,dna,dunia,evolusi,fungsi,gen,genetik,hasil,hidup,instruksi,inti,istilah,jenis,kaji,kandung,kecil,kembang,kerja,kimia,kloroplas,kompleks,lanjut,lingkung,makhluk,materi,memilki,milik,mitokondria,multiseluler,objek,organel,organisme,paham,populasi,proses,respons,sel,seleksi,seluler,simpan,sistem,spesifik,struktur,studi,susun,tekan,teliti,tubuh,tunggal,ubah,uniseluler,unit,utama,teks
0,0.0,0.0,0.0,0.0,0.0,0.469019,0.0,0.257878,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.223891,0.0,0.0,0.0,0.0,0.469019,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.239938,0.0,0.0,0.0,0.0,0.0,0.469019,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.407264,MAKHLUK HIDUP ADALAH OBJEK KAJIAN UTAMA DALAM BIDANG BIOLOGI.
1,0.0,0.350554,0.0,0.0,0.0,0.0,0.0,0.443938,0.350554,0.0,0.0,0.0,0.0,0.0,0.350554,0.0,0.0,0.0,0.192715,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.206528,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.168813,0.0,0.40371,0.0,0.0,0.0,0.40371,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Biologi seluler = cabang biologi yang mempelajari struktur dan fungsi sel-sel dalam makhluk hidup.


In [25]:
print(words_set_id)

['agam' 'ajar' 'bantu' 'baru' 'bentuk' 'bidang' 'biokimia' 'biologi'
 'cabang' 'dasar' 'diferensiasi' 'dna' 'dunia' 'evolusi' 'fungsi' 'gen'
 'genetik' 'hasil' 'hidup' 'instruksi' 'inti' 'istilah' 'jenis' 'kaji'
 'kandung' 'kecil' 'kembang' 'kerja' 'kimia' 'kloroplas' 'kompleks'
 'lanjut' 'lingkung' 'makhluk' 'materi' 'memilki' 'milik' 'mitokondria'
 'multiseluler' 'objek' 'organel' 'organisme' 'paham' 'populasi' 'proses'
 'respons' 'sel' 'seleksi' 'seluler' 'simpan' 'sistem' 'spesifik'
 'struktur' 'studi' 'susun' 'tekan' 'teliti' 'tubuh' 'tunggal' 'ubah'
 'uniseluler' 'unit' 'utama']


In [26]:
search_id = input("Masukan kata kunci yang akan dicari: ")
result_id[[search_id, "teks"]].sort_values(by=search_id, ascending = False)

Masukan kata kunci yang akan dicari:  agam


Unnamed: 0,agam,teks
12,0.457544,PROSES EVOLUSI TELAH MEMBENTUK BERAGAMNYA JENIS SEL YANG ADA DI DUNIA INI.
0,0.0,MAKHLUK HIDUP ADALAH OBJEK KAJIAN UTAMA DALAM BIDANG BIOLOGI.
1,0.0,Biologi seluler = cabang biologi yang mempelajari struktur dan fungsi sel-sel dalam makhluk hidup.
2,0.0,Apa Saja Jenis-Jenis Sel yang Ada pada Makhluk Hidup?
3,0.0,Sel merupakan istilah biologi untuk unit dasar dari kehidupan.
4,0.0,Inti sel mengandung materi genetik dalam bentuk DNA.
5,0.0,Gen adalah instruksi yang disimpan dalam DNA suatu sel.
6,0.0,Mengapa sel disebut sebagai unit terkecil yang menyusun tubuh makhluk hidup?
7,0.0,"Dalam biologi perkembangan, diferensiasi sel adalah proses di mana sel-sel mengembangkan fungsi spesifik."
8,0.0,"Makhluk hidup uniseluler adalah makhluk hidup yang hanya memilki sebuah sel tunggal, sedangkan multiseluler adalah makhluk hidup atau organisme yang memiliki lebih dari satu sel."
