# ☕ Data Extraction: Merida Coffee Shops

This data acquisition part uses two services from the Google Places API within the Google Maps Platform to collect data for coffee shops in Merida, Yucatan.  

---

## Google Places API Workflow

This project follows a two-step extraction process:

1. **Places Text Search (New)** – performs text-based queries to extract the Place IDs.
2. **Place Details** – uses the collected Place IDs to retrieve the complete metadata for each place.

---

## 1. Defining the Search Area

The Google Places API provides two parameters to geographically constrain results: **locationBias** and **locationRestriction**.  
Since this project requires strict adherence to Merida’s municipal limits, the **locationRestriction** parameter is selected. It defines a bounding box that limits all returned results to the specified area, ensuring spatial precision and consistency.

#### Gridding the City with Folium

Because the Google Places API limits each query to 20 results, Merida’s area is subdivided into smaller rectangular viewports.  
Each grid cell defines a separate locationRestriction (SW and NE coordinates) and is queried independently with the Text Search service.  
Once all Place IDs are collected, the Place Details service is called to extract complete metadata for each identified place.

The Folium library is used to visualize and verify the grid layout, ensuring full coverage of the study area and validating the data extraction process.


In [8]:
import folium

#Create the map of Merida using an approximate center point.
m = folium.Map(location=[20.9939879883004, -89.62853393602846],min_zoom=12)
delta = 0.0135 #The variable “delta” controls how “large” each rectangle is.

initial_lat, initial_lng = 20.891532412575916, -89.73272017481521 #This is the initial SW point where the loop start to create the other viewport rectangles. 


#The points generated will be stored and will be used when the API is called.
rectangles_viewports = []

#Generate each SW and NE point from each rectangle using the initial point
for i in range(15):
    for j in range(15):
        low = [initial_lat+i*delta, initial_lng+j*delta]
        high = [initial_lat+(i+1)*delta, initial_lng+(j+1)*delta]
        folium.Rectangle(
            bounds = [low, high],
            tooltip = f'({i+1},{j+1})',
            fill = True
        ).add_to(m)
        
        rectangles_viewports.append((tuple(low), tuple(high)))
m

In addition to searching the inner city, the search will also be conducted in metropolitan areas such as Kanasin, Caucel, Temozon Norte, Uman, Las Americas, and Cholul.

In [2]:
#Number of calls that will be made to the API
print(len(rectangles_viewports))

225


## 2. Data Acquisition
The process involves using two Google Places API services: initially, the SearchText service to obtain place IDs, followed by the Place Details service to collect comprehensive data for each identified place.

### Place IDs extraction


In [2]:
import time
import requests
from dotenv import load_dotenv
import os
import pandas as pd
import json

In [3]:
#Loading the API key to make the requests
load_dotenv()
API_KEY = os.getenv('API_KEY')

In [5]:
#Define the API endpoint and headers to make the calls
search_url = "https://places.googleapis.com/v1/places:searchText"

search_headers = {
    'Content-Type' : 'application/json',
    'X-Goog-Api-Key': API_KEY,
    'X-Goog-FieldMask': 'places.id' #Since we will only extract the Place IDs in this part, only this parameter will be specified for it to be returned.
}

In [None]:
places_id_raw = [] #The raw data returned by the API will be stored here

#Extract the Place IDs by iterating through each of the 255 rectangle viewports, which will be saved in a .json file.
for low, high in rectangles_viewports:
    try: 
        payload = {
            'textQuery' : 'cafeteria',
            'includedType': 'cafe',
            'strictTypeFiltering':True,
            'pageSize': 20,
            'locationRestriction' : { #Using the rectangle viewports we generated earlier, we use their NE and SW coordinates to perform the search.
                'rectangle':{
                    'low':{
                        'latitude' : low[0],
                        'longitude' : low[1]
                    },
                    'high':{
                        'latitude': high[0],
                        'longitude': high[1]
                        
                    }
                }
            }
        }
        
        response = requests.post(url = search_url, json=payload, headers = search_headers)
        time.sleep(0.5)
        response.raise_for_status()
        
        data = response.json() #We convert the responses into a JSON format.
        
        places_id_raw.extend(data.get('places', [])) #We store each ID in the variable “places_id_raw.” 
        
    except requests.exceptions.RequestException as e:
        print(f'ERROR!!! --> {e}')
    

#We store all the IDs we receive in a .json file.
with open('data/places_id_raw.json', 'w') as f:
    json.dump(places_id_raw, f, indent=4)

In [4]:
#All place IDs are stored in a JSON file for use when using "Place Details" calls
with open('data/places_id_raw.json', 'r') as file:
    ids_dict = json.load(file)

In [5]:
#Number of coffee shops found
len(ids_dict)

748

We identified 748 cafeterias, which should be a fairly reliable estimate, though some locations may not have been returned by the API. In the Exploratory Analysis section, this dataset will be cleaned and refined to determine the final number of coffee shops.

### Places details data extraction


The fields we will request are:
- **displayName** = Name of the establishment
- **formattedAddress** = Address
- **location** = Geographic coordinates (Longitude and Latitude)
- **businessStatus** = Whether the establishment is in operation or not
- **primaryTypeDisplayName** = Google Maps category
- **priceRange** = Price range you would pay per person
- **rating** = 1 to 5-star scale based on user reviews
- **userRatingCount** = Number of reviews
- **websiteUri** = website or social media link of the place

In [None]:
#We define the headers to make API calls. Here we put the fields we require.
details_headers = {
    'Content-Type' : 'application/json',
    'X-Goog-Api-Key': API_KEY,
    'X-Goog-FieldMask':'displayName,formattedAddress,location,businessStatus,primaryTypeDisplayName,priceRange,rating,userRatingCount,websiteUri'
}

In [None]:
#Variable where we will store the data
details_data_raw = []

In [None]:
for place in ids_dict:
    try:
        details_url = f'https://places.googleapis.com/v1/places/{place['id']}' #We call the API for each Places ID we obtained in the previous section.
        
        response = requests.get(url=details_url, headers=details_headers)
        time.sleep(0.5)
        response.raise_for_status()
        
        #We convert the responses into a JSON format and add them to the variable we defined earlier.
        data = response.json()
        
        details_data_raw.append(data)
        
    except requests.exceptions.RequestException as e:
        print(f'ERROR!!! --> {e}')


#We convert “details_data_raw” into a pandas DataFrame and save it to a CSV file.
pd.DataFrame(details_data_raw).to_csv('data/details_data_raw.csv')

Here we finish the process of creating our dataset. We can see the format of our DataFrame as follows:

In [7]:
pd.read_csv('data/details_data_raw.csv', index_col=0).tail()

Unnamed: 0,formattedAddress,location,rating,businessStatus,userRatingCount,displayName,primaryTypeDisplayName,priceRange,websiteUri
743,97302 Calle 100-2 #140h Fraccionamiento las am...,"{'latitude': 21.080935099999998, 'longitude': ...",,OPERATIONAL,,"{'text': 'Espacio 11:11', 'languageCode': 'es'}","{'text': 'Cafe', 'languageCode': 'en-US'}",,
744,"C. 49-B 926, entre 112 y 108 A, Fraccionamient...","{'latitude': 21.0818943, 'longitude': -89.6616...",5.0,OPERATIONAL,10.0,"{'text': 'Frapplo', 'languageCode': 'es'}","{'text': 'Coffee Shop', 'languageCode': 'en-US'}","{'startPrice': {'currencyCode': 'MXN', 'units'...",https://www.instagram.com/frapplo_/
745,"C. 100-1 137, Fraccionamiento Las Américas 2, ...","{'latitude': 21.081186499999998, 'longitude': ...",5.0,OPERATIONAL,2.0,"{'text': 'Mejorar el futuro!', 'languageCode':...","{'text': 'Cafe', 'languageCode': 'en-US'}",,
746,"Carr. Mérida - Progreso, 97302 Xcanatún, Yuc.,...","{'latitude': 21.0814901, 'longitude': -89.6352...",4.3,OPERATIONAL,183.0,"{'text': 'Starbucks Carretera Progreso', 'lang...","{'text': 'Coffee Shop', 'languageCode': 'en-US'}","{'startPrice': {'currencyCode': 'MXN', 'units'...",
747,"C. 21 entre 4, 97302 Chablekal, Yuc., Mexico","{'latitude': 21.091645099999997, 'longitude': ...",5.0,OPERATIONAL,1.0,"{'text': 'DRAGÓN SUSHI🐉', 'languageCode': 'es'}","{'text': 'Coffee Shop', 'languageCode': 'en-US'}",,


Due to limitations related to the data format and file size, the exploratory analysis will be conducted in a separate notebook to ensure a cleaner workflow and better performance during the analysis process.