Nama     : **Muhamad Ilyas**               
Email    : **181240000831@unisnu.ac.id**   
Domisili : **Kabupaten Jepara, Jawa Tengah**

# **Domain Rekomendasi Post**

## **Project Overview Sistem Rekomendasi Post**

Postingan adalah tindakan dalam mengirim sebuah konten ke internet ke media maya, atau sebuah publikasi yang menggunakan media-media elektronik berbasis online. Dengan seiring semakin berkembangnya dunia elektronik membuat konten semakin mudah dilakukan dan dalam waktu yang singkat dapat dilihat oleh banyak orang. Tak jarang juga konten tersebut menjadi *trending* atau mendapatkan perhatian dari banyak orang.

Agar konten yang dimuat atau dipublish terlebih pada konten artikel mudah untuk dicari para pembaca yang sesuai dengan preferensinya maka perlukan adanya sistem rekomendasi.

Penelitian ini bertujuan untuk membuat sistem rekomendasi untuk para pembaca konten artikel. Dan dalam penyelesaiannya menggunakan metode pendekatan **Content-based Filtering**.

**Rujukan** :

[Jurnal MDPI-"Recommendation Systems: Algorithms, Challenges, Metrics, and Business Opportunities"](https://www.mdpi.com/2076-3417/10/21/7748)

## **Business Understanding**
Banyak pembaca konten artikel ketika mencari artikel untuk dibaca, mereka mengalami kesulitan dalam mencari konten artikel lain yang sesuai dengan kesukaannya. Dan diperlukan sistem rekomendasi untuk mengatasinya.

### **Problem Statements**
Bagaimana cara merekomendasi konten artikel untuk para pembaca sesuai dengan apa yang mereka inginkan menggunakan teknik machine learning ?

### **Goals**
Tujuan yang ingin dicapai  adalah dengan membuatkan sistem rekomendasi konten artikel untuk para pembaca berdasarkan file dataset mengenai data post, data user dan data view.

### **Solution Statements**
Pemecahan masalah dalam hal ini saya mengajukan metode pendekatan **Content-based Filtering**.


*   **Content-based Filtering**, adalah algoritma sistem rekomendasi yang bekerja dengan cara merekomendasikan item yang mirip dengan item yang disukai oleh pengguna tersebut pada masa lalu. Pada projek ini merekomendasikan jenis konten artikel yang sesuai dengan preferensi para pembaca tersebut. Kemiripan yang dimaksud berdasarkan pada kategori post tersebut.

## **Data Understanding**
Dataset yang saya gunakan diambil dari [kaggle "Post Recommendations Dataset"](https://www.kaggle.com/vatsalparsaniya/post-pecommendation?select=view_data.csv), yang terdiri dari 3 file CSV yaitu :
1. Users Data, dengan data yang berisi 500 data
1. Posts Data, dengan data yang berisi 6000 data
1. View Data, dengan data yang berisi 71800 data

Dengan variabel tabel sebagai berikut :
**User Data**
- user_id : Mempresentasikan ID untuk pengguna
- first_name : Mempresentasikan nama depan pengguna
- last_name : Mempresentasikan nama belakang pengguna
- gender : Mempresentasikan jenis kelamin pengguna
- avatar : Mempresentasikan avatar pengguna
- city : Mempresentasikan kota pengguna
- academics : Mempresentasikan pendidikan pengguna

**Posts Data**
- post_id : Mempresentasikan ID untuk post
- title : Mempresentasikan judul postingan
- category : Mempresentasikan kategori postingan

**View Data**
- user_id : Mempresentasikan ID untuk pengguna
- post_id : Mempresentasikan ID untuk post
- time_stamp : Mempresentasikan waktu pengguna melihat post

### **Menload Library**

**Menghubungkan dengan Google Drive**

In [None]:
#menghubungkan dengan google drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


**Import Library**

In [None]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from scipy import sparse as sp
from sklearn.metrics import pairwise as pw
import seaborn as sns

In [None]:
!pip install lightfm

from lightfm import LightFM
from lightfm.evaluation import precision_at_k, recall_at_k, auc_score, reciprocal_rank

Collecting lightfm
  Downloading lightfm-1.16.tar.gz (310 kB)
[?25l[K     |█                               | 10 kB 32.7 MB/s eta 0:00:01[K     |██▏                             | 20 kB 9.8 MB/s eta 0:00:01[K     |███▏                            | 30 kB 8.3 MB/s eta 0:00:01[K     |████▎                           | 40 kB 7.7 MB/s eta 0:00:01[K     |█████▎                          | 51 kB 4.2 MB/s eta 0:00:01[K     |██████▍                         | 61 kB 4.4 MB/s eta 0:00:01[K     |███████▍                        | 71 kB 4.6 MB/s eta 0:00:01[K     |████████▌                       | 81 kB 5.2 MB/s eta 0:00:01[K     |█████████▌                      | 92 kB 3.9 MB/s eta 0:00:01[K     |██████████▋                     | 102 kB 4.3 MB/s eta 0:00:01[K     |███████████▋                    | 112 kB 4.3 MB/s eta 0:00:01[K     |████████████▊                   | 122 kB 4.3 MB/s eta 0:00:01[K     |█████████████▊                  | 133 kB 4.3 MB/s eta 0:00:01[K     |███████

### **Load Dataset**

**Membaca Dataset**

In [None]:
#membaca file
post = pd.read_csv('/content/drive/MyDrive/Dataset/Rekomendasi_Posting/post_data.csv')
user = pd.read_csv('/content/drive/MyDrive/Dataset/Rekomendasi_Posting/user_data.csv')
view = pd.read_csv('/content/drive/MyDrive/Dataset/Rekomendasi_Posting/view_data.csv')

print('Jumlah data pada buku: ', len(post.post_id.unique()))
print('Jumlah data pada ranting: ', len(user.user_id.unique()))
print('Jumlah data pada pengguna: ', len(view.user_id.unique()))

Jumlah data pada buku:  6000
Jumlah data pada ranting:  500
Jumlah data pada pengguna:  500


**Post**

In [None]:
post.head()

Unnamed: 0,title,category,post_id
0,Find A Quick Way To GRAPHIC,graphic,10260109
1,How To Sell CRAFT,Craft,39550285
2,POLITICS An Incredibly Easy Method That Works ...,politics,935118791
3,5 Brilliant Ways To Use POLITICAL,political,151805043
4,How To Make Your MATHEMATICS Look Amazing In ...,Mathematics,995833095


## **Data Preparation**

#### **Melakukan Eksplorasi pada variabel Post**

**Melihat Data**

In [None]:
#melihat variabel post
post

Unnamed: 0,title,category,post_id
0,Find A Quick Way To GRAPHIC,graphic,10260109
1,How To Sell CRAFT,Craft,39550285
2,POLITICS An Incredibly Easy Method That Works ...,politics,935118791
3,5 Brilliant Ways To Use POLITICAL,political,151805043
4,How To Make Your MATHEMATICS Look Amazing In ...,Mathematics,995833095
...,...,...,...
5995,Who Else Wants To Be Successful With PROGRAMMING,programming,815625033
5996,Avoid The Top 10 SCIENCE Mistakes,science,870247682
5997,7 and a Half Very Simple Things You Can Do To...,drawing,856393394
5998,Why Everything You Know About ZOOLOGY Is A Lie,zoology,152219066


**Cek tipe data**

In [None]:
#Eksplorasi variabel post
post.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6000 entries, 0 to 5999
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   title     6000 non-null   object
 1   category  6000 non-null   object
 2   post_id   6000 non-null   int64 
dtypes: int64(1), object(2)
memory usage: 140.8+ KB


**Describe data**

In [None]:
post.describe()

Unnamed: 0,post_id
count,6000.0
mean,501563400.0
std,287206900.0
min,10109920.0
25%,251386700.0
50%,497758400.0
75%,748732700.0
max,999953800.0


**Cek Kategori**

In [None]:
#melihat data entri kategori pada post
print('Banyak Post: ', len(post.post_id.unique()))
print('Jenis-Jenis Kategori: ', post.category.unique())

Banyak Post:  6000
Jenis-Jenis Kategori:  ['graphic' 'Craft' 'politics' 'political' 'Mathematics' 'zoology'
 'business' 'dance' 'banking' 'HR management' 'art' 'science' 'Music'
 'operating system' 'Fashion Design' 'programming' 'painting'
 'photography' 'drawing' 'GST']


#### **Melakukan Eksplorasi pada variabel User**

**Melihat Data**

In [None]:
#melihat fitur pada variabel user
user

Unnamed: 0,user_id,first_name,last_name,gender,avatar,city,academics
0,5eece14efc13ae6609000000,Milena,Lacelett,Female,https://robohash.org/quisidomnis.png?size=50x5...,Blagoveshchensk,undergraduate
1,5eece14efc13ae6609000001,Nolan,Satcher,Male,https://robohash.org/dignissimosrepudiandaedol...,Wufeng,undergraduate
2,5eece14efc13ae6609000002,Eveleen,Cotterell,Female,https://robohash.org/remomnissuscipit.png?size...,Barra Bonita,undergraduate
3,5eece14efc13ae6609000003,Petrina,Berr,Female,https://robohash.org/estquasconsectetur.png?si...,San Angelo,undergraduate
4,5eece14efc13ae6609000004,Saunderson,Duquesnay,Male,https://robohash.org/nullaaest.png?size=50x50&...,Olszówka,graduate
...,...,...,...,...,...,...,...
495,5eece14ffc13ae66090001ef,Jada,Capaldi,Female,https://robohash.org/optioperferendisnobis.png...,Pau,undergraduate
496,5eece14ffc13ae66090001f0,Robin,Kike,Male,https://robohash.org/voluptatemestenim.png?siz...,Komendantsky aerodrom,graduate
497,5eece14ffc13ae66090001f1,Gwenneth,Dally,Female,https://robohash.org/etnihilqui.png?size=50x50...,Łomża,graduate
498,5eece14ffc13ae66090001f2,Nickolas,McTrustram,Male,https://robohash.org/nonquiaut.png?size=50x50&...,Bani,graduate


**Cek tipe data**

In [None]:
user.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   user_id     500 non-null    object
 1   first_name  500 non-null    object
 2   last_name   500 non-null    object
 3   gender      500 non-null    object
 4   avatar      500 non-null    object
 5   city        500 non-null    object
 6   academics   500 non-null    object
dtypes: object(7)
memory usage: 27.5+ KB


**Preferensi Variabel**

In [None]:
#Pola preferensi variabel user
print(user.shape)

(500, 7)


#### **Melakukan eksplorasi pada variabel View**

**melihat data**

In [None]:
#melihat fitur pada variabel user
view

Unnamed: 0,user_id,post_id,time_stamp
0,5eece14ffc13ae660900008b,136781766,01/01/2019 01:30 PM
1,5eece14efc13ae660900003c,43094523,01/01/2019 01:33 PM
2,5eece14efc13ae6609000025,42428071,01/01/2019 01:43 PM
3,5eece14ffc13ae66090001d4,76472880,01/01/2019 01:54 PM
4,5eece14ffc13ae66090000ac,202721843,01/01/2019 02:00 PM
...,...,...,...
71795,5eece14ffc13ae660900018c,615389604,12/31/2019 12:37 AM
71796,5eece14ffc13ae660900010c,348689108,12/31/2019 12:50 PM
71797,5eece14ffc13ae6609000190,619052165,12/31/2019 12:51 AM
71798,5eece14efc13ae6609000067,426384418,12/31/2019 12:51 PM


**Cek tipe data**

In [None]:
view.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 71800 entries, 0 to 71799
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   user_id     71800 non-null  object
 1   post_id     71800 non-null  int64 
 2   time_stamp  71800 non-null  object
dtypes: int64(1), object(2)
memory usage: 1.6+ MB


**Describe data**

In [None]:
#melihat distribusi view
view.describe()

Unnamed: 0,post_id
count,71800.0
mean,500265200.0
std,286864000.0
min,10109920.0
25%,250536000.0
50%,497671600.0
75%,747250000.0
max,999953800.0


**Preferensi variabel**

In [None]:
#Pola preferensi variabel view
print(view.shape)

(71800, 3)


In [None]:
#melihat jumlah pengguna yang melakukan view
print('Jumlah user_id: ', len(view.user_id.unique()))
print('Jumlah post_id: ', len(view.post_id.unique()))
print('Jumlah yang melakukan view: ', len(view.time_stamp))

Jumlah user_id:  500
Jumlah post_id:  6000
Jumlah yang melakukan view:  71800


### **Menggabungkan/Join file user, view dan post**

In [None]:
#menggabungkan seluruh user_id pada satu kategori user
user_view = np.concatenate((
    user.user_id.unique(),
    view.user_id.unique()
))

#mengurutkan data dan menghapus data yang sama
user_view = np.sort(np.unique(user_view))

print('Jumlah seluruh User: ', len(user_view))

Jumlah seluruh User:  500


In [None]:
# Menggabungkan dataframe post dengan view berdasarkan nilai post_id
posting = pd.merge(view, post, on='post_id', how='left')
posting

Unnamed: 0,user_id,post_id,time_stamp,title,category
0,5eece14ffc13ae660900008b,136781766,01/01/2019 01:30 PM,Sexy BANKING,banking
1,5eece14efc13ae660900003c,43094523,01/01/2019 01:33 PM,10 Ways To Immediately Start Selling PROGRAMMING,programming
2,5eece14efc13ae6609000025,42428071,01/01/2019 01:43 PM,DRAWING Adventures,drawing
3,5eece14ffc13ae66090001d4,76472880,01/01/2019 01:54 PM,The Ultimate Guide To POLITICS,politics
4,5eece14ffc13ae66090000ac,202721843,01/01/2019 02:00 PM,ZOOLOGY And Love Have 4 Things In Common,zoology
...,...,...,...,...,...
71795,5eece14ffc13ae660900018c,615389604,12/31/2019 12:37 AM,5 Brilliant Ways To Teach Your Audience About ...,operating system
71796,5eece14ffc13ae660900010c,348689108,12/31/2019 12:50 PM,The Secrets To Finding World Class Tools For ...,GST
71797,5eece14ffc13ae6609000190,619052165,12/31/2019 12:51 AM,Double Your Profit With These 5 Tips on CRAFT,Craft
71798,5eece14efc13ae6609000067,426384418,12/31/2019 12:51 PM,It's All About (The) DANCE,dance


**Menghitung Jumlah Posting (Penggabungan File)**

In [None]:
#menghitung jumlah posting dan user berdasarkan user_id
posting.groupby('user_id').sum()

Unnamed: 0_level_0,post_id
user_id,Unnamed: 1_level_1
5eece14efc13ae6609000000,18681567067
5eece14efc13ae6609000001,88180254335
5eece14efc13ae6609000002,1622785879
5eece14efc13ae6609000003,74019301414
5eece14efc13ae6609000004,53991518779
...,...
5eece14ffc13ae66090001ef,93485019069
5eece14ffc13ae66090001f0,130604286210
5eece14ffc13ae66090001f1,56615051425
5eece14ffc13ae66090001f2,7908529474


### **Mengatasi Missing Value**

In [None]:
#cek missing value
posting.isnull().sum()

user_id       0
post_id       0
time_stamp    0
title         0
category      0
dtype: int64

In [None]:
#menggabungkan posting deng user berdasarkan user_id
posting_user = pd.merge(posting, user[['user_id','first_name']], on='user_id', how='left')

posting_user

Unnamed: 0,user_id,post_id,time_stamp,title,category,first_name
0,5eece14ffc13ae660900008b,136781766,01/01/2019 01:30 PM,Sexy BANKING,banking,Hollie
1,5eece14efc13ae660900003c,43094523,01/01/2019 01:33 PM,10 Ways To Immediately Start Selling PROGRAMMING,programming,Reinaldos
2,5eece14efc13ae6609000025,42428071,01/01/2019 01:43 PM,DRAWING Adventures,drawing,Jamison
3,5eece14ffc13ae66090001d4,76472880,01/01/2019 01:54 PM,The Ultimate Guide To POLITICS,politics,Herschel
4,5eece14ffc13ae66090000ac,202721843,01/01/2019 02:00 PM,ZOOLOGY And Love Have 4 Things In Common,zoology,Fabien
...,...,...,...,...,...,...
71795,5eece14ffc13ae660900018c,615389604,12/31/2019 12:37 AM,5 Brilliant Ways To Teach Your Audience About ...,operating system,Godfry
71796,5eece14ffc13ae660900010c,348689108,12/31/2019 12:50 PM,The Secrets To Finding World Class Tools For ...,GST,Christabel
71797,5eece14ffc13ae6609000190,619052165,12/31/2019 12:51 AM,Double Your Profit With These 5 Tips on CRAFT,Craft,Bobbe
71798,5eece14efc13ae6609000067,426384418,12/31/2019 12:51 PM,It's All About (The) DANCE,dance,Reagen


In [None]:
#mengecek missing value
posting_user.isnull().sum()

user_id       0
post_id       0
time_stamp    0
title         0
category      0
first_name    0
dtype: int64

### **Cek Ulang Data**

**Mengurutkan post berdasarkan ID**

In [None]:
#mengurutkan posting_user berdasarkan user_id
fix_post = posting_user.sort_values('user_id', ascending=True)

fix_post

Unnamed: 0,user_id,post_id,time_stamp,title,category,first_name
35655,5eece14efc13ae6609000000,463497729,05/02/2020 09:49 PM,DRAWING Expert Interview,drawing,Milena
54621,5eece14efc13ae6609000000,248322316,09/05/2019 03:43 AM,Best 50 Tips For POLITICS,politics,Milena
13822,5eece14efc13ae6609000000,426579591,02/18/2019 02:32 PM,Turn Your BANKING Into A High Performing Machine,banking,Milena
21189,5eece14efc13ae6609000000,496086394,03/15/2019 04:21 AM,Find Out How I Cured My PAINTING In 2 Days,painting,Milena
58759,5eece14efc13ae6609000000,776624330,10/02/2019 11:17 AM,How GST Made Me A Better Salesperson,GST,Milena
...,...,...,...,...,...,...
19572,5eece14ffc13ae66090001f3,350848098,03/09/2019 08:35 AM,Secrets To Getting SCIENCE To Complete Tasks ...,science,Hewie
2891,5eece14ffc13ae66090001f3,335868578,01/11/2019 03:20 AM,Interesting Facts I Bet You Never Knew About ...,Mathematics,Hewie
60262,5eece14ffc13ae66090001f3,315719214,10/13/2019 05:01 AM,3 Ways Create Better MATHEMATICS With The Hel...,Mathematics,Hewie
70068,5eece14ffc13ae66090001f3,230418952,12/19/2019 11:31 PM,Is GST Worth [$] To You?,GST,Hewie


**Mengecek Jumlah Post**

In [None]:
#mengecek jumlah fix_post
len(fix_post.user_id.unique())

500

**Mengecek kategori post yang unik**

In [None]:
#mengecek kategori post yang unik
fix_post.category.unique()

array(['drawing', 'politics', 'banking', 'painting', 'GST', 'Music',
       'science', 'Craft', 'operating system', 'art', 'photography',
       'Fashion Design', 'business', 'programming', 'political',
       'graphic', 'HR management', 'Mathematics', 'zoology', 'dance'],
      dtype=object)

**Mengecek kategori**

In [None]:
# Mengecek kategori
fix_post[fix_post['category'] == 'science']

Unnamed: 0,user_id,post_id,time_stamp,title,category,first_name
64898,5eece14efc13ae6609000000,373489370,11/14/2019 06:31 AM,The SCIENCE Mystery Revealed,science,Milena
16100,5eece14efc13ae6609000001,482013078,02/25/2020 05:01 AM,Want A Thriving Business? Focus On SCIENCE!,science,Nolan
26801,5eece14efc13ae6609000001,477559082,04/02/2020 11:39 AM,Apply These 5 Secret Techniques To Improve SC...,science,Nolan
35921,5eece14efc13ae6609000001,628428739,05/03/2020 08:24 AM,How To Save Money with SCIENCE?,science,Nolan
43690,5eece14efc13ae6609000001,512937025,06/23/2019 04:15 PM,Can You Really Find SCIENCE (on the Web)?,science,Nolan
...,...,...,...,...,...,...
40372,5eece14ffc13ae66090001f3,837953527,05/31/2019 11:47 AM,You Will Thank Us - 10 Tips About SCIENCE You...,science,Hewie
66108,5eece14ffc13ae66090001f3,429195097,11/22/2019 11:10 AM,Got Stuck? Try These Tips To Streamline Your ...,science,Hewie
51249,5eece14ffc13ae66090001f3,101134061,08/13/2019 03:52 PM,3 Ways To Master SCIENCE Without Breaking A S...,science,Hewie
43558,5eece14ffc13ae66090001f3,181078873,06/22/2019 05:19 AM,Who Else Wants To Enjoy SCIENCE,science,Hewie


**Membuat variabel preparation**

In [None]:
#membuat variabel preparation
preparation = fix_post
preparation.sort_values('user_id')

Unnamed: 0,user_id,post_id,time_stamp,title,category,first_name
35655,5eece14efc13ae6609000000,463497729,05/02/2020 09:49 PM,DRAWING Expert Interview,drawing,Milena
32889,5eece14efc13ae6609000000,152382065,04/23/2020 05:44 PM,Some People Excel At GRAPHIC And Some Don't -...,graphic,Milena
12053,5eece14efc13ae6609000000,356832954,02/12/2019 04:36 PM,Revolutionize Your PHOTOGRAPHY With These Eas...,photography,Milena
42881,5eece14efc13ae6609000000,621229618,06/17/2019 09:04 PM,How We Improved Our PROGRAMMING In One Week(M...,programming,Milena
28266,5eece14efc13ae6609000000,871854229,04/07/2020 05:16 PM,14 Days To A Better FASHION DESIGN,Fashion Design,Milena
...,...,...,...,...,...,...
49799,5eece14ffc13ae66090001f3,577855799,08/03/2019 08:00 PM,Is OPERATING SYSTEM Worth [$] To You?,operating system,Hewie
46718,5eece14ffc13ae66090001f3,851484973,07/13/2019 10:24 AM,Answered: Your Most Burning Questions About DR...,drawing,Hewie
43747,5eece14ffc13ae66090001f3,185587418,06/23/2019 09:12 PM,The Death Of MUSIC And How To Avoid It,Music,Hewie
10492,5eece14ffc13ae66090001f3,809678544,02/06/2020 10:14 PM,5 Romantic FASHION DESIGN Ideas,Fashion Design,Hewie


**Membuang data duplikat**

In [None]:
#membuang data duplikat
preparation = preparation.drop_duplicates('user_id')
preparation

Unnamed: 0,user_id,post_id,time_stamp,title,category,first_name
35655,5eece14efc13ae6609000000,463497729,05/02/2020 09:49 PM,DRAWING Expert Interview,drawing,Milena
51708,5eece14efc13ae6609000001,68297155,08/16/2019 05:02 AM,How To Handle Every DANCE Challenge With Ease...,dance,Nolan
26931,5eece14efc13ae6609000002,712376577,04/03/2019 10:56 AM,Got Stuck? Try These Tips To Streamline Your ...,political,Eveleen
41760,5eece14efc13ae6609000003,416785331,06/09/2019 10:27 AM,Sick And Tired Of Doing ART The Old Way? Read...,art,Petrina
51120,5eece14efc13ae6609000004,673865778,08/12/2019 05:07 AM,What Make PROGRAMMING Don't Want You To Know,programming,Saunderson
...,...,...,...,...,...,...
50230,5eece14ffc13ae66090001ef,253404881,08/06/2019 06:31 PM,Interesting Facts I Bet You Never Knew About ...,programming,Jada
22349,5eece14ffc13ae66090001f0,964033517,03/19/2019 04:31 AM,How To Sell BANKING,banking,Robin
29141,5eece14ffc13ae66090001f1,709925436,04/10/2020 06:06 PM,How To Become Better With ART In 10 Minutes,art,Gwenneth
22318,5eece14ffc13ae66090001f2,944738783,03/19/2019 01:16 AM,Secrets To POLITICAL – Even In This Down Economy,political,Nickolas


**Mengkonversi data series menjadi list**

In [None]:
#mengkonversi data series menjadi list
posting_id = preparation['user_id'].tolist()

posting_title = preparation['title'].tolist()

posting_category = preparation['category'].tolist()

posting_name = preparation['first_name'].tolist()

print(len(posting_id))
print(len(posting_title))
print(len(posting_category))
print(len(posting_name))

500
500
500
500


**Menjadikan id, post_name dan category menjadi satu direktori**

In [None]:
#membuat directory
post_new = pd.DataFrame({
    'id': posting_id,
    'title': posting_title,
    'category': posting_category  ,
    'name': posting_name
})
post_new

Unnamed: 0,id,title,category,name
0,5eece14efc13ae6609000000,DRAWING Expert Interview,drawing,Milena
1,5eece14efc13ae6609000001,How To Handle Every DANCE Challenge With Ease...,dance,Nolan
2,5eece14efc13ae6609000002,Got Stuck? Try These Tips To Streamline Your ...,political,Eveleen
3,5eece14efc13ae6609000003,Sick And Tired Of Doing ART The Old Way? Read...,art,Petrina
4,5eece14efc13ae6609000004,What Make PROGRAMMING Don't Want You To Know,programming,Saunderson
...,...,...,...,...
495,5eece14ffc13ae66090001ef,Interesting Facts I Bet You Never Knew About ...,programming,Jada
496,5eece14ffc13ae66090001f0,How To Sell BANKING,banking,Robin
497,5eece14ffc13ae66090001f1,How To Become Better With ART In 10 Minutes,art,Gwenneth
498,5eece14ffc13ae66090001f2,Secrets To POLITICAL – Even In This Down Economy,political,Nickolas


## **Model Development dengan Content Based Filtering**

### **TF-IDF Vectorizer**

In [None]:
#membangun model
tf = TfidfVectorizer()

tf.fit(post_new['category'])

tf.get_feature_names()

['art',
 'banking',
 'business',
 'craft',
 'dance',
 'design',
 'drawing',
 'fashion',
 'graphic',
 'gst',
 'hr',
 'management',
 'mathematics',
 'music',
 'operating',
 'painting',
 'photography',
 'political',
 'politics',
 'programming',
 'science',
 'system',
 'zoology']

In [None]:
#melakukan fit dan transformasi ke matriks
tfidf_matrix = tf.fit_transform(post_new['category'])

tfidf_matrix.shape

(500, 23)

In [None]:
# Mengubah vektor tf-idf dalam bentuk matriks
tfidf_matrix.todense()

matrix([[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [1., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]])

In [None]:
#melihat matriks tfidf untuk post_name dan category
pd.DataFrame(
    tfidf_matrix.todense(),
    columns=tf.get_feature_names(),
    index=post_new.title
).sample(23, axis=1).sample(10, axis=0)

Unnamed: 0_level_0,business,science,art,painting,craft,fashion,zoology,music,photography,political,operating,gst,system,management,programming,hr,dance,graphic,design,mathematics,banking,drawing,politics
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
CRAFT: The Samurai Way,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Fear? Not If You Use POLITICAL The Right Way!,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Top 25 Quotes On POLITICS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
Why Ignoring MUSIC Will Cost You Time and Sales,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Why Most DANCE Fail,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
Proof That POLITICAL Is Exactly What You Are Looking For,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5 Things To Do Immediately About MATHEMATICS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
How To Win Buyers And Influence Sales with DRAWING,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
The Secret of Successful MATHEMATICS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
Short Story: The Truth About BANKING,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


### **Cosine Similarity**

In [None]:
#menghitung Cosine Similarity (Derajat Kesamaan)
cosine = cosine_similarity(tfidf_matrix)
cosine 

array([[1., 0., 0., ..., 0., 0., 1.],
       [0., 1., 0., ..., 0., 0., 0.],
       [0., 0., 1., ..., 0., 1., 0.],
       ...,
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 1., ..., 0., 1., 0.],
       [1., 0., 0., ..., 0., 0., 1.]])

In [None]:
#membuat dataframe dari variabel cosine
cosine_df = pd.DataFrame(cosine, index=post_new['name'], columns=post_new['name'])
print('Shape:', cosine_df.shape)

cosine_df.sample(10, axis=1).sample(15, axis=0)

Shape: (500, 500)


name,Beitris,Philippa,Brew,Carri,Baily,Gauthier,Laraine,Carney,Kirby,Trevar
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Michal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Carri,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
Darin,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Krishnah,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Hermon,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
Lezlie,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
Edwin,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Trace,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
Hunter,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Chase,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0


### **Mendapatkan Rekomendasi**

In [None]:
def post_recommendations(nama, similarity_data=cosine_df, items=post_new[['name', 'category', 'title']], k=5):

    index = similarity_data.loc[:,nama].to_numpy().argpartition(
        range(-1, -k, -1))
    
    # Mengambil data dengan similarity terbesar dari index yang ada
    closest = similarity_data.columns[index[-1:-(k+2):-1]]
    
    # Drop nama_resto agar nama resto yang dicari tidak muncul dalam daftar rekomendasi
    closest = closest.drop(nama, errors='ignore')
 
    return pd.DataFrame(closest).merge(items).head(k)

In [None]:
#menemukan rekomendasi postingan
post_new[post_new.name.eq('Lenora')]

Unnamed: 0,id,title,category,name
318,5eece14ffc13ae660900013e,17 Tricks About PAINTING You Wish You Knew Be...,painting,Lenora


**Mendapatkan Top Rekomendasi**

In [None]:
#menemukan Rekomendasi postingan
post_recommendations('Lenora')

Unnamed: 0,name,category,title
0,Bethanne,painting,5 Reasons PAINTING Is A Waste Of Time
1,Bethanne,politics,Proof That POLITICS Is Exactly What You Are L...
2,Bethanne,Fashion Design,5 Best Ways To Sell FASHION DESIGN
3,Onfre,painting,PAINTING And Love Have 4 Things In Common
4,Gian,painting,The PAINTING Mystery Revealed


## **Evaluasi Model**

**Model Evaluation**   

**Evaluasi Model Content Based FIltering**
Untuk metrik evalusi pada proyek ini, menggunakan metrik presisi. Yang mana kerja dari metrik ini memilih item yang lebih mirip diantara item yang tersedia. Metrik ini melihat sebagai operasi biner yang membedakan item yang baik dari item yang tidak baik. Rumus dari metrik presisi adalah sebagai berikut :
![Precision](https://dicoding-web-img.sgp1.cdn.digitaloceanspaces.com/original/academy/dos:819311f78d87da1e0fd8660171fa58e620211012160253.png)
```
Precission = 3/5.
Jadi presisinya = 60%
```
