# ☕ Data Extraction: Merida Coffee Shops

This notebook initiates the project by performing the **Data Acquisition** phase. The primary data source is the **Google Places API**, accessed via the Google Maps Platform. This API provides comprehensive, up-to-date geospatial and business details crucial for competitive analysis.

The goal of this phase is to systematically extract the location and metadata for all coffee shops within the municipal boundaries of Merida, Yucatan.

---

### Google Places API Service Selection

We utilize the **Places Text Search (New)** service because it allows for flexible querying based on a simple text string. This service efficiently returns a list of matching places, along with essential details like coordinates, ratings, and price levels.

## 1. Defining the Search Area

The Google Places API offers two parameters for geographically limiting a search: `locationBias` and `locationRestriction`.

| Method | Description | Rationale for Selection |
| :--- | :--- | :--- |
| **`locationBias`** | Specifies a region (like a circle or rectangle) near which results are preferred, but places **outside** this region may still be returned. | Less precise for municipal boundary analysis. |
| **`locationRestriction`** | **Strictly** defines a bounding box (rectangle viewport) from which all results must originate. Results outside this area are discarded. | **Selected Option:** This guarantees that all extracted data points fall within the defined area of Merida, ensuring the dataset is geographically precise and preventing overlap issues common when merging results from circular searches. |

We will define the search area using a **rectangle viewport** via the `locationRestriction` parameter to ensure a high-quality, boundary-conforming dataset.

#### Gridding the City with Folium

To ensure full coverage of Merida and avoid missing results due to the **Google Places API’s 20-result limit per query**, the city’s bounding box is divided into smaller **rectangular viewports**.

Each viewport defines a `locationRestriction` (SW and NE coordinates) that will be queried separately. This prevents data loss in dense areas and ensures all coffee shops are captured.

##### Grid Generation

We use **Folium** library to visualize and verify the grid layout. Each rectangle corresponds to one API query area.



In [20]:
#Import all the necessary libraries
import pandas as pd
import numpy as np
import folium
import requests
import time

In [21]:

#Create the map of Merida using an approximate center point.
m = folium.Map(location=[20.9939879883004, -89.62853393602846], zoom_start=12)
delta = 0.0135 #The variable “delta” controls how “large” each rectangle is.

initial_lat, initial_lng = 20.891532412575916, -89.73272017481521 #This is the initial SW point where the loop will start to create the other rectangles. 


#The points generated will be stored and will be used when the API is called.
low_points = []
high_points = []

for i in range(15):
    for j in range(15):
        low = [initial_lat+i*delta, initial_lng+j*delta]
        high = [initial_lat+(i+1)*delta, initial_lng+(j+1)*delta]
        folium.Rectangle(
            bounds = [low, high],
            tooltip = f'({i},{j})',
            fill = True
        ).add_to(m)
        
        low_points.append(tuple(low))
        high_points.append(tuple(high))

m

In [22]:
print(low_points)
print(high_points)

[(20.891532412575916, -89.73272017481521), (20.891532412575916, -89.71922017481522), (20.891532412575916, -89.70572017481521), (20.891532412575916, -89.69222017481522), (20.891532412575916, -89.67872017481521), (20.891532412575916, -89.66522017481522), (20.891532412575916, -89.65172017481521), (20.891532412575916, -89.63822017481522), (20.891532412575916, -89.62472017481521), (20.891532412575916, -89.61122017481522), (20.891532412575916, -89.59772017481521), (20.891532412575916, -89.58422017481521), (20.891532412575916, -89.57072017481521), (20.891532412575916, -89.55722017481521), (20.891532412575916, -89.54372017481522), (20.905032412575917, -89.73272017481521), (20.905032412575917, -89.71922017481522), (20.905032412575917, -89.70572017481521), (20.905032412575917, -89.69222017481522), (20.905032412575917, -89.67872017481521), (20.905032412575917, -89.66522017481522), (20.905032412575917, -89.65172017481521), (20.905032412575917, -89.63822017481522), (20.905032412575917, -89.62472017

In [23]:
print(len(low_points))
print(len(high_points))

225
225
