# Code Documentation

## Library Imports

The following libraries are imported in the code:

- `pandas` (imported as `pd`): A library for data manipulation and analysis.
- `numpy` (imported as `np`): A library for numerical computing in Python.
- `sklearn.feature_extraction.text.TfidfVectorizer`: A module from scikit-learn that converts a collection of raw documents to a matrix of TF-IDF features.
- `sklearn.metrics.pairwise.cosine_similarity`: A module from scikit-learn that computes the cosine similarity between pairs of samples.


In [2]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Library Imports

The following library is imported in the code:

- `google.colab`: A library for working with Google Colab, a cloud-based Jupyter notebook environment.

## Functionality

1. The code snippet uses the `google.colab` library to mount the Google Drive to the Colab runtime. This allows accessing and working with files and directories in the Google Drive within the Colab environment.

2. The `drive.mount()` function is called with the /content/drive directory as the mount point, where the Google Drive will be mounted. Once the Google Drive is successfully mounted, the files and directories within it can be accessed using standard file I/O operations.


In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Code

## The code snippet performs the following tasks:

- The code snippet uses the `pd.read_csv()` function from the pandas library to read data from a CSV file.

- The file path of the CSV file is specified as `/content/drive/MyDrive/For Capstone/Collecting data/Place Detail (Scored + Keyword 1 & 2 Extracted + Additional Feature (longlang, contact etc)) + (finished Vectorized).csv`.

- The data from the CSV file is then stored in the variable `df`, which is likely a pandas DataFrame object. The DataFrame can be further processed and analyzed using the capabilities provided by the pandas library.

In [4]:
# Membaca data dari file CSV
df = pd.read_csv('/content/drive/MyDrive/For Capstone/Collecting data/Place Detail (Scored + Keyword 1 & 2 Extracted  + Additional Feature (longlang, contact etc)) + (finished Vectorized).csv')

## The code snippet demonstrates the following functionality:

- It preprocesses the `One_Keywords` column in the DataFrame by replacing `np.nan` values with empty strings.
- It performs TF-IDF vectorization on the preprocessed `One_Keywords` column using the `TfidfVectorizer` from `scikit-learn`.
- It computes the cosine similarity between the TF-IDF vector representations.
- It provides a `get_recommendations()` function to retrieve the top N recommended items based on cosine similarity, given an item title.

In [5]:
# Mengganti nilai np.nan dengan string kosong
df['One_Keywords'] = df['One_Keywords'].fillna("")

# Inisialisasi objek TfidfVectorizer dan lakukan vektorisasi
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(df['One_Keywords'])

# Menghitung cosine similarity antara representasi vektor TF-IDF
cosine_sim = cosine_similarity(X)

# Fungsi untuk memberikan rekomendasi
def get_recommendations(item_title, cosine_similarities, df, top_n=5):
    item_index = df[df['Name'] == item_title].index[0]  # Dapatkan indeks item acuan
    similarity_scores = list(enumerate(cosine_similarities[item_index]))  # Dapatkan skor kesamaan dengan item lain
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)  # Urutkan berdasarkan skor kesamaan
    top_items = similarity_scores[1:top_n+1]  # Ambil top N item teratas (exluding item acuan)
    top_item_indices = [i[0] for i in top_items]  # Dapatkan indeks item-item teratas
    top_item_titles = df['Name'].iloc[top_item_indices]  # Dapatkan judul item-item teratas
    return top_item_titles

# Functionality
## The code snippet demonstrates the following functionality:

- It calls the `get_recommendations()` function with a specific item title, cosine similarity matrix, DataFrame, and the desired number of top recommendations.
- The function returns a list of recommended item titles based on cosine similarity.
- The recommended item titles are then printed to the console.

In [6]:
# Contoh penggunaan: Memberikan rekomendasi untuk item acuan "Cafe Sawah Pujonkidul"

recommendations = get_recommendations("Warkop Brewok II", cosine_sim, df, top_n=10)
print("Rekomendasi:")
print(recommendations)


Rekomendasi:
2522               Warkop Ucok Kediri
999                    Tematik Coffee
1776                       Merci Cafe
1787           bitiga Coffee and Food
3886                      Kopi Shinta
208       Das.kopi (kopi.ketan.pecel)
2352    Sebelas Coffee Crafter Klaten
2329              MOKSHA COFFEE SPACE
2454                       Kopi Lokal
2944                   Omah Majapahit
Name: Name, dtype: object


# Creating a Different Output from the First Code

Example:
> 'name': 'Warkop Ucok Kediri', 'address': 'Jl. Patiunus No.21, Kemasan, Kec. Kota, Kota Kediri, Jawa Timur 64129, Indonesia', 'rating': 4.6, 'total_review': 64, 'url_photo': [link foto](https://lh3.googleusercontent.com/places/ANJU3Dsy73PEHuK0qVfnJM1_Qcc-tzPrrk0YGsXbwTsvjJOTnvvCKvh0ZkN_CXa2U8TBuJPczUcv35YVCAJ1kbnCQY-_vTFVMTuBWMU=s1600-w400)

In [8]:
def get_recommendations(item_title, cosine_similarities, df, top_n=5):
    item_index = df[df['Name'] == item_title].index[0]  # Get the index of the reference item
    similarity_scores = list(enumerate(cosine_similarities[item_index]))  # Get the similarity scores with other items
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)  # Sort by similarity scores
    top_items = similarity_scores[1:top_n+1]  # Take the top N items (excluding the reference item)
    top_item_indices = [i[0] for i in top_items]  # Get the indices of the top items
    top_item_data = df.iloc[top_item_indices].reset_index(drop=True)  # Get the data of the top items and reset index

    # Create a list of dictionaries for the recommended items
    recommendations = []
    for _, row in top_item_data.iterrows():
        recommendation = {
            'name': row['Name'],
            'address': row['Formatted Address'],
            'rating': row['rating'],
            'total_review': row['total_reviews'],
            'url_photo': row.get('Photo URL', 'N/A')
        }
        recommendations.append(recommendation)

    return recommendations


In [9]:
# Get recommendations
recommendations = get_recommendations("Warkop Brewok II", cosine_sim, df, top_n=10)

# Print the recommendations
print("Rekomendasi:")
for recommendation in recommendations:
    print(recommendation)

Rekomendasi:
{'name': 'Warkop Ucok Kediri', 'address': 'Jl. Patiunus No.21, Kemasan, Kec. Kota, Kota Kediri, Jawa Timur 64129, Indonesia', 'rating': 4.6, 'total_review': 64, 'url_photo': 'https://lh3.googleusercontent.com/places/ANJU3Dsy73PEHuK0qVfnJM1_Qcc-tzPrrk0YGsXbwTsvjJOTnvvCKvh0ZkN_CXa2U8TBuJPczUcv35YVCAJ1kbnCQY-_vTFVMTuBWMU=s1600-w400'}
{'name': 'Tematik Coffee', 'address': 'Jl. Kalimantan No.54, Kasin, Kec. Klojen, Kota Malang, Jawa Timur 65117, Indonesia', 'rating': 4.9, 'total_review': 51, 'url_photo': 'https://lh3.googleusercontent.com/places/ANJU3Dum9fUhiRXZstNUc1UcCVPHUgnB-8Opvibp03bQzRc7HkMik6wMt8yDgmkM3Z3w9afb9dFJT_ltlmCoj5LBsUBewKTLtDtuK2Y=s1600-w400'}
{'name': 'Merci Cafe', 'address': 'Jl. Kav. DPR III No.23, Nggrekmas, Pagerwojo, Kec. Buduran, Kabupaten Sidoarjo, Jawa Timur 61252, Indonesia', 'rating': 4.6, 'total_review': 817, 'url_photo': 'https://lh3.googleusercontent.com/places/ANJU3Duo2nzd8_vuaBwgN266FKyVtIjAPGCCsjgTI0Rs_NJGcj5XIxbcRxVhaRIRDQPqnEvAYXwwGq94bFdI0jB

# Import Pickle File For Back-End requirements

- It saves the computed cosine_sim object as a pickle file.
- It saves the vectorizer object as a pickle file.


In [None]:
import pickle

# Membaca data dari file CSV
df = pd.read_csv('/content/drive/MyDrive/For Capstone/Collecting data/Place Detail (Scored + Keyword 1 & 2 Extracted  + Additional Feature (longlang, contact etc)) + (finished Vectorized).csv')

# Mengganti nilai np.nan dengan string kosong
df['One_Keywords'] = df['One_Keywords'].fillna("")

# Inisialisasi objek TfidfVectorizer dan lakukan vektorisasi
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(df['One_Keywords'])

# Menghitung cosine similarity antara representasi vektor TF-IDF
cosine_sim = cosine_similarity(X)

# Save cosine_sim as a pickle file
with open('/content/drive/MyDrive/For Capstone/cosine_similarity.pkl', 'wb') as f:
    pickle.dump(cosine_sim, f)

# Save vectorizer as a pickle file
with open('/content/drive/MyDrive/For Capstone/vectorizer.pkl', 'wb') as f:
    pickle.dump(vectorizer, f)