<a href="https://colab.research.google.com/github/KurniaTanggang/rekomendasi_aplikasi/blob/main/rekomendasiAplikasi.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Rekomendasi Aplikasi Google Play Store
Aplikasi adalah perangkat lunak yang berisi program untuk menjalankan fungsi tertentu untuk mencapai tujuan dari pengguna. Aplikasi terdapat di berbagai sistem operasi seperti iOS, Android dan lain-lain. Google Play Store adalah toko aplikasi resmi untuk sistem operasi Android, terdapat banyak aplikasi yang dapat menunjang aktivitas pengguna sehari-hari dengan berbagai kategori seperti Game untuk hiburan, Education untuk belajar dan banyak kategori lainnya. Banyaknya aplikasi yang tersedia terkadang membuat kita tidak puas hanya dengan menggunakan satu aplikasi saja, kita ingin mencoba aplikasi lainnya namun ingin aplikasi tersebut memiliki kemiripan dengan aplikasi yang sudah kita pakai. Misal, kita telah memakai sebuah aplikasi Game tetapi karena bosan memainkan hal yang sama kita ingin mencoba aplikasi Game lainnya. Oleh karena itu, akan dibuat sistem Rekomendasi Aplikasi pada Google Play Store yang dapat merekomendasikan aplikasi yang serupa dengan aplikasi yang pernah digunakan.

## Businesss Understanding
Membuat sistem yang dapat merekomendasikan aplikasi yang mirip dengan aplikasi yang pernah digunakan. Menggunakan pendekatan Content Based Filtering pada sistem, karena rekomendasi yang akan dibuat berdasarkan aplikasi yang pernah digunakan.

### Problem Statement
- Bagaimana memberikan rekomendasi aplikasi yang mirip dengan aplikasi yang pernah digunakan oleh pengguna ?

### Goal
- Memberikan rekomendasi aplikasi yang memiliki kemiripan dengan aplikasi yang pernah digunakan.

### Solution approach
Untuk membuat sistem rekomendasi Aplikasi, teknik yang akan digunakan adalah **Content Based Filtering**. Content Based Filtering adalah pendekatan sistem rekomendasi dengan merekomendasikan item yang mirip dengan item yang disukai atau pernah diguankan oleh pengguna. Dalam kasus ini, sistem akan merekomendasikan aplikasi yang mirip dengan aplikasi yang pernah digunakan sebelumnya berdasarkan kategori pada aplikasi. Kelebihan dari sistem rekomendasi ini adalah sistem dapat merekomendasikan item terbaru atau yang bahkan belum pernah di-*rate* oleh siapaun. Dan tentunya juga memiliki kekurangan yaitu sistem memerlukan data item yang pernah digunakan oleh pengguna atau riwayat interaksi penguna dengan sistem rekomendasi.

## Data Understanding

Library yang dibutuhkan

In [None]:
import os
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from pathlib import Path
import matplotlib.pyplot as plt


### Data Loading
Load dataset dari [Google Play Store Apps](https://www.kaggle.com/lava18/google-play-store-apps)

In [None]:
os.environ['KAGGLE_USERNAME'] = "niatanggang"
os.environ['KAGGLE_KEY'] =  "39a6d91a4bff7060406d0489c44251f2"

!kaggle datasets download -d lava18/google-play-store-apps

!unzip -q google-play-store-apps -d .
os.listdir('/content/')

Downloading google-play-store-apps.zip to /content
  0% 0.00/1.94M [00:00<?, ?B/s]
100% 1.94M/1.94M [00:00<00:00, 65.3MB/s]


['.config',
 'googleplaystore.csv',
 'license.txt',
 'googleplaystore_user_reviews.csv',
 'google-play-store-apps.zip',
 'sample_data']

File yang akan dipakai adalah **googleplaystore.csv**

In [None]:
df_app = pd.read_csv('/content/googleplaystore.csv')

df_app

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up
...,...,...,...,...,...,...,...,...,...,...,...,...,...
10836,Sya9a Maroc - FR,FAMILY,4.5,38,53M,"5,000+",Free,0,Everyone,Education,"July 25, 2017",1.48,4.1 and up
10837,Fr. Mike Schmitz Audio Teachings,FAMILY,5.0,4,3.6M,100+,Free,0,Everyone,Education,"July 6, 2018",1.0,4.1 and up
10838,Parkinson Exercices FR,MEDICAL,,3,9.5M,"1,000+",Free,0,Everyone,Medical,"January 20, 2017",1.0,2.2 and up
10839,The SCP Foundation DB fr nn5n,BOOKS_AND_REFERENCE,4.5,114,Varies with device,"1,000+",Free,0,Mature 17+,Books & Reference,"January 19, 2015",Varies with device,Varies with device


### Deskripsi Variabel Aplikasi (df_app)

In [None]:
df_app.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10841 entries, 0 to 10840
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             10841 non-null  object 
 1   Category        10841 non-null  object 
 2   Rating          9367 non-null   float64
 3   Reviews         10841 non-null  object 
 4   Size            10841 non-null  object 
 5   Installs        10841 non-null  object 
 6   Type            10840 non-null  object 
 7   Price           10841 non-null  object 
 8   Content Rating  10840 non-null  object 
 9   Genres          10841 non-null  object 
 10  Last Updated    10841 non-null  object 
 11  Current Ver     10833 non-null  object 
 12  Android Ver     10838 non-null  object 
dtypes: float64(1), object(12)
memory usage: 1.1+ MB


Dari info di atas, diketahui bahwa googleplaystore.csv memiliki 13 fitur dengan 1 fitur bertipe data float dan 12 object. Namun, di sini hanya akan memakai fitur **App** dan **Category** untuk membangun sistem rekomendasi berbasis konten.

In [None]:
df_app = df_app[['App', 'Category']]
df_app.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10841 entries, 0 to 10840
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   App       10841 non-null  object
 1   Category  10841 non-null  object
dtypes: object(2)
memory usage: 169.5+ KB


Banyaknya data adalah sebanyak 10841

In [None]:
df_app.shape

(10841, 2)

## Data Preparation

#### Missing Value

Mengecek apakah ada missing value pada data.

In [None]:
df_app.isnull().sum()

App         0
Category    0
dtype: int64

Menampilkan banyaknya kategori aplikasi, pada data terdapat 34 kategori aplikasi.

In [None]:
print("Banyak Category : ", len(df_app.Category.unique()))
print("Category Aplikasi : ", df_app.Category.unique())

Banyak Category :  34
Category Aplikasi :  ['ART_AND_DESIGN' 'AUTO_AND_VEHICLES' 'BEAUTY' 'BOOKS_AND_REFERENCE'
 'BUSINESS' 'COMICS' 'COMMUNICATION' 'DATING' 'EDUCATION' 'ENTERTAINMENT'
 'EVENTS' 'FINANCE' 'FOOD_AND_DRINK' 'HEALTH_AND_FITNESS' 'HOUSE_AND_HOME'
 'LIBRARIES_AND_DEMO' 'LIFESTYLE' 'GAME' 'FAMILY' 'MEDICAL' 'SOCIAL'
 'SHOPPING' 'PHOTOGRAPHY' 'SPORTS' 'TRAVEL_AND_LOCAL' 'TOOLS'
 'PERSONALIZATION' 'PRODUCTIVITY' 'PARENTING' 'WEATHER' 'VIDEO_PLAYERS'
 'NEWS_AND_MAGAZINES' 'MAPS_AND_NAVIGATION' '1.9']


Dilihat dari hasil di atas ada fitur Category bernama '1.9'. Maka akan dianalisis apa saja App yang mempunyai Category 1.9

In [None]:
df_app[df_app['Category'] == '1.9']

Unnamed: 0,App,Category
10472,Life Made WI-Fi Touchscreen Photo Frame,1.9


Ternyata hanya satu App yang memiliki kategory 1.9, karena cuma ada satu maka data ini akan dihapuskan.

In [None]:
df_app.drop(df_app[df_app.Category == '1.9'].index, inplace=True)
print("Banyak Category : ", len(df_app.Category.unique()))
print("Category Aplikasi : ", df_app.Category.unique())

Banyak Category :  33
Category Aplikasi :  ['ART_AND_DESIGN' 'AUTO_AND_VEHICLES' 'BEAUTY' 'BOOKS_AND_REFERENCE'
 'BUSINESS' 'COMICS' 'COMMUNICATION' 'DATING' 'EDUCATION' 'ENTERTAINMENT'
 'EVENTS' 'FINANCE' 'FOOD_AND_DRINK' 'HEALTH_AND_FITNESS' 'HOUSE_AND_HOME'
 'LIBRARIES_AND_DEMO' 'LIFESTYLE' 'GAME' 'FAMILY' 'MEDICAL' 'SOCIAL'
 'SHOPPING' 'PHOTOGRAPHY' 'SPORTS' 'TRAVEL_AND_LOCAL' 'TOOLS'
 'PERSONALIZATION' 'PRODUCTIVITY' 'PARENTING' 'WEATHER' 'VIDEO_PLAYERS'
 'NEWS_AND_MAGAZINES' 'MAPS_AND_NAVIGATION']


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


## Modeling : Content Based Filtering
Sistem rekomendasi aplikasi berdasarkan kemiripan dengan aplikasi sebelumnya.

### TF-IDF Vectorizer

Menggunakan fungsi tfidfvectorizer() untuk menemukan representasi fitur penting.

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
 
# Inisialisasi TfidfVectorizer
tf = TfidfVectorizer()
 
# Melakukan perhitungan idf pada data Category
tf.fit(df_app['Category']) 
 
# Mapping array dari fitur index integer ke fitur nama
tf.get_feature_names() 

['art_and_design',
 'auto_and_vehicles',
 'beauty',
 'books_and_reference',
 'business',
 'comics',
 'communication',
 'dating',
 'education',
 'entertainment',
 'events',
 'family',
 'finance',
 'food_and_drink',
 'game',
 'health_and_fitness',
 'house_and_home',
 'libraries_and_demo',
 'lifestyle',
 'maps_and_navigation',
 'medical',
 'news_and_magazines',
 'parenting',
 'personalization',
 'photography',
 'productivity',
 'shopping',
 'social',
 'sports',
 'tools',
 'travel_and_local',
 'video_players',
 'weather']

Fit dan transformasi ke dalam bentuk matriks

In [None]:
tfidf_matrix = tf.fit_transform(df_app['Category']) 
tfidf_matrix.shape 

(10840, 33)

Menghasilkan vektor tf-idf dalam bentuk matriks dengan fungsi todense()

In [None]:
tfidf_matrix.todense()

matrix([[1., 0., 0., ..., 0., 0., 0.],
        [1., 0., 0., ..., 0., 0., 0.],
        [1., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]])

Membuat dataframe untuk melihat tf-idf matrix.
- Kolom berisi Category
- Baris berisi nama aplikasi

Jika nilai matriks 1.0, artinya baris App merupakan Category dari nama kolom. Contohnya seperti App *Palace Pets in Whisker Haven* yang merupakan Category *family*.

In [None]:
pd.DataFrame(
    tfidf_matrix.todense(), 
    columns=tf.get_feature_names(),
    index=df_app.App
).sample(8, axis=1).sample(10, axis=0)

Unnamed: 0_level_0,finance,entertainment,medical,news_and_magazines,family,tools,lifestyle,house_and_home
App,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
FindLoving,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Chinese Chess / Co Tuong,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
D. H. Lawrence Poems FREE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Remote For Pioneer AV Receivers and Blu-Ray,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
Weather by eltiempo.es,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
abeoCoder,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
Florida Offline Road Map,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Palace Pets in Whisker Haven,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
BZ Berner Zeitung E-Paper,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
Beauty and the Beast,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0


### Cosine Similarity
Menghitung Cosine Similarity pada antar App

In [None]:
from sklearn.metrics.pairwise import cosine_similarity
 
cosine_sim = cosine_similarity(tfidf_matrix) 
cosine_sim

array([[1., 1., 1., ..., 0., 0., 0.],
       [1., 1., 1., ..., 0., 0., 0.],
       [1., 1., 1., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 0., 0., 1.]])

Menampilkan similirity matrix pada setiap App

In [None]:
cosine_sim_df = pd.DataFrame(cosine_sim, index=df_app['App'], columns=df_app['App'])
print('Shape:', cosine_sim_df.shape)
 
cosine_sim_df.sample(5, axis=1).sample(10, axis=0)

Shape: (10840, 10840)


App,Alzashop.com,BRL AG,Oxford Dictionary of English : Free,HTC Sense Input - EN,"RetailMeNot - Coupons, Deals & Discount Shopping"
App,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
AB Screen Recorder,0.0,1.0,0.0,1.0,0.0
Flow Free,0.0,0.0,0.0,0.0,0.0
Power Widget,0.0,1.0,0.0,1.0,0.0
InstaCam - Camera for Selfie,0.0,0.0,0.0,0.0,0.0
Nero AirBurn,0.0,0.0,0.0,0.0,0.0
Fashion in Vogue,0.0,0.0,0.0,0.0,0.0
yHomework - Math Solver,0.0,0.0,0.0,0.0,0.0
7 Nights at Pixel Pizzeria - 2,0.0,0.0,0.0,0.0,0.0
Elemental Knights R Platinum,0.0,0.0,0.0,0.0,0.0
Night Photo Frame,0.0,0.0,0.0,0.0,0.0


Membuat fungsi app_recommendations dengan parameter nama_app, similarity_data, items (App dan Category), dan k (banyak rekomendasi yang akan diberikan)

In [None]:
def app_recommendations(nama_app, similarity_data=cosine_sim_df, items=df_app[['App', 'Category']], k=10):

    index = similarity_data.loc[:,nama_app].to_numpy().argpartition(
        range(-1, -k, -1))
    
    closest = similarity_data.columns[index[-1:-(k+2):-1]]
    
    closest = closest.drop(nama_app, errors='ignore')
 
    return pd.DataFrame(closest).merge(items).head(k)

Menerapkan fungsi dengan menemukan rekomendasi yang mirip dengan App *Sketch - Draw & Paint*.

In [None]:
df_app[df_app.App.eq('Sketch - Draw & Paint')]

Unnamed: 0,App,Category
3,Sketch - Draw & Paint,ART_AND_DESIGN


App *Sketch - Draw & Paint* merupakan kategory *ART_AND_DESIGN*. Mememanggil fungsi app_recommendation untuk mendapatkan rekomendasi dengan kategory yang mirip.

In [None]:
app_recommendations('Sketch - Draw & Paint')

Unnamed: 0,App,Category
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN
1,Little Teddy Bear Colouring Book Game,ART_AND_DESIGN
2,Install images with music to make video withou...,ART_AND_DESIGN
3,"Canva: Poster, banner, card maker & graphic de...",ART_AND_DESIGN
4,Popsicle Sticks and Similar DIY Craft Ideas,ART_AND_DESIGN
5,Paint Splash!,ART_AND_DESIGN
6,How To Draw Food,ART_AND_DESIGN
7,How to draw Ladybug and Cat Noir,ART_AND_DESIGN
8,Drawing Clothes Fashion Ideas,ART_AND_DESIGN
9,Textgram - write on photos,ART_AND_DESIGN


## Evaluasi

### Precision
formula :   
`precision = banyak rekomendasi relevan/banyak item yang direkomendasikan.`

Item yang direkomendasikan adalah sebanyak 10 item (k=10) dan banyaknya rekomendasi yang sesuai dengan App *Sketch - Draw & Paint* adalah 10 App karena semua App yang direkomendasikan merupakan kategori yang sama dengan App *Sketch - Draw & Paint* yaitu kategori *ART_AND_DESIGN*.

In [None]:
relevant = 10
item_recommended = 10
precision = relevant/item_recommend
print("Hasil Precision = ", precision)

Hasil Precision =  1.0
