# Yemeksepeti Data Collection

This notebook aims to employ various web scraping techniques to gather data on restaurants from the popular platform "Yemeksepeti". The collected data will be stored in the JSON format within the designated directory named `collected_data`.

The dataset will encompass a wide range of information pertaining to the restaurants, encompassing details such as their menu items, pricing, customer comments, ratings, and corresponding timestamps.

Subsequently, the primary goal is to conduct semantic analysis on the acquired dataset, with the aim of gaining deeper insights into consumer behavior patterns within the context of Turkey.

#### 📁 The files are too large (13.9 MB + 84.4 MB) to be uploaded to github, you can acess them through [Google Drive](https://drive.google.com/drive/folders/1l4J1IXDtvGCOBzbD7jX-Y-Kud4FOj86S?usp=sharing).

In [None]:
# import libraries

import grequests
import requests
import json
import os

In [None]:
# directories 

RESTAURANT_LIST_DIR = '../collected_data/yemeksepeti_restaurants_list_per_city/'
REVIEWS_COLLECTION_DIR = '../collected_data/yemeksepeti_reviews_collection_per_city/'

## A List of restaurants in every city in Turkey 

In this section, our goal is to gather a comprehensive list of all restaurants in every city in Turkey, complete with detailed information such as their address, budget, cuisine offerings, and most importantly, the URLs leading to their respective pages on Yemeksepeti.

To accomplish this task, we will send a specific request to Yemeksepeti's database and parse the response into a JSON format. This will enable us to efficiently organize and explore the gathered information, providing valuable insights

In [None]:
# get detailed data for all restaurants in a city and return its content
def get_restaurants_data_for_city(city_id):
    # Yemeksepeti uses the Delivery Hero servers to store it's data, hence we can get restaurant's data by sending a request to delivery hero
    # the arguments ...&offset=0&limit=500&... can be added to the url to specify the data request size
    yemeksepeti_request_url = f"https://disco.deliveryhero.io/listing/api/v1/pandora/vendors?language_id=2&vertical=restaurants&country=tr&include=characteristics&configuration=Variant1&sort=&city_id={city_id}"
    
    # the headers are acquired from the requests from yemeksepeti's webpage
    # perseus-client-id and perseus-seesion-id might needed be updated occasionally
    headers = {
    'accept': 'application/json, text/plain, */*',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.9',
    'dnt': '1',
    'origin': 'https://www.yemeksepeti.com',
    'perseus-client-id': '1683402284518.770138949646337300.d4t6e4b8h2',
    'perseus-session-id': '1686229407319.641123708405837200.lsjp7m4zlk', #'perseus-session-id': '1686226160511.373585736230385800.r7egjnpz9u',
    'referer': 'https://www.yemeksepeti.com/',
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36',
    'x-disco-client-id': 'web',
    'x-fp-api-key': 'volo '
    }

    # send the request and store the response
    response = requests.get(yemeksepeti_request_url, headers=headers)
    
    return response


In [None]:
# A list of all cities in turkey with their index - 1 corresponding to the city's id (il kodu)
# e.g. cities[33] == cities[34 - 1] == "Istanbul" 

cities = ["Adana", "Adıyaman", "Afyon", "Ağrı", "Amasya", "Ankara", "Antalya", "Artvin", "Aydın", "Balıkesir", "Bilecik", "Bingöl", "Bitlis", "Bolu", "Burdur", "Bursa", "Çanakkale", "Çankırı", "Çorum", "Denizli", "Diyarbakır", "Edirne", "Elazığ", "Erzincan", "Erzurum", "Eskişehir", "Gaziantep", "Giresun", "Gümüşhane", "Hakkari", "Hatay", "Isparta", "İçel (Mersin)", "İstanbul", "İzmir", "Kars", "Kastamonu", "Kayseri", "Kırklareli", "Kırşehir", "Kocaeli", "Konya", "Kütahya", "Malatya", "Manisa", "Kahramanmaraş", "Mardin", "Muğla", "Muş", "Nevşehir", "Niğde", "Ordu", "Rize", "Sakarya", "Samsun", "Siirt", "Sinop", "Sivas", "Tekirdağ", "Tokat", "Trabzon", "Tunceli", "Şanlıurfa", "Uşak", "Van", "Yozgat", "Zonguldak", "Aksaray", "Bayburt", "Karaman", "Kırıkkale", "Batman", "Şırnak", "Bartın", "Ardahan", "Iğdır", "Yalova", "Karabük", "Kilis", "Osmaniye", "Düzce"]

In [None]:
# Save the data of restaurants in all cities as seperate json files
def save_restaurants_data_json():
    restaurants_data_files_list = os.listdir(RESTAURANT_LIST_DIR)

    for city_id, city_name in enumerate(cities):
        filename = f'{city_id + 1}_yemeksepeti_{city_name}_restaurants_data.json'

        # check for missing cities
        if (filename not in restaurants_data_files_list):
            # Fetch the data for a specific city
            print(f'({city_id+1} / {len(cities)}) Fetching the data for {city_name} restaurants:')
            restaurants_data = get_restaurants_data_for_city(city_id + 1)

            # save the response data as json
            print("saving the json data...")
            DATA_EXPORT_DIR = RESTAURANT_LIST_DIR + filename
            with open(DATA_EXPORT_DIR, 'wb') as of:
                of.write(restaurants_data.content)

            print(f"succefully exported the data. path: `{DATA_EXPORT_DIR}`\n")

In [None]:
save_restaurants_data_json()

## Restaurant Reviews Compilation
With our comprehensive inventory of restaurants across all cities in Turkey in hand, we now turn our attention to capturing the customer's experience through customer reviews. 

In this section, our aim is to collect the data using proper api's and store these valuable reviews in JSON format.

In [None]:
# grequests is used to get the responses asynchronously since the data is too large
 
def get_restaurant_reviews(restaurant_code):
    restaurant_ratings_request_url = f"https://reviews-api-tr.fd-api.com/reviews/vendor/{restaurant_code}?global_entity_id=YS_TR"
    
    request = grequests.get(restaurant_ratings_request_url)
    return request


In [None]:
def exception_handler(request, exception):
    print(f"Failed to fetch reviews. request: {request}")

In [None]:
# returns a dictionary of all restaurant reviews in {'code': review_data, ...} fromat
# screenlock = Semaphore(value=1)

def get_restaurant_reviews_from_city(city_id, city_name):
    restaurants_data_filename = f'{city_id + 1}_yemeksepeti_{city_name}_restaurants_data.json'
    DATA_FILE_DIR = RESTAURANT_LIST_DIR + restaurants_data_filename
    with open(DATA_FILE_DIR, 'r') as file:
        json_data = json.load(file)
        
    restaurants_count = len(json_data['data']['items']) # or = json_data['data']['available_count']
    print(f"Fetching reviews from {restaurants_count} restaurants in {city_name}.", flush=True)

    # all of reviews are stored as {'code': review_data, ...} in the dictionary below
    reviews_collection = {}
    requests = []
    restaurant_codes = []
    for restaurant in json_data['data']['items']:
        # restaurant_url = restaurant['redirection_url']
        restaurant_code = restaurant['code']
        requests.append(get_restaurant_reviews(restaurant_code))
        restaurant_codes.append(restaurant_code)

    responses = grequests.map(requests, exception_handler=exception_handler)

    for response, restaurant_code in zip(responses, restaurant_codes):
        if response is not None and response.status_code == 200:
            reviews_collection[restaurant_code] = response.json()

    return reviews_collection

In [None]:
# Save the costumer reviews for all resturants in every cities as seperate json files
def save_costuemr_reviews_json():
    review_file_list = os.listdir(REVIEWS_COLLECTION_DIR)
    for city_id, city_name in enumerate(cities):
        
        restaurants_reviews_filename = f'{city_id + 1}_yemeksepeti_{city_name}_restaurants_reviews.json'

        if restaurants_reviews_filename not in review_file_list:

            reviews_collection = get_restaurant_reviews_from_city(city_id, city_name)

            REVIEWS_EXPORT_DIR = REVIEWS_COLLECTION_DIR + restaurants_reviews_filename
            with open(REVIEWS_EXPORT_DIR, "w") as json_file:
                json.dump(reviews_collection, json_file)

In [None]:
save_costuemr_reviews_json()