You are an assistant that analyzes a Point of Interest (POI) for food & nightlife (restaurants, cafes, bars). 
Your task is to produce a **structured JSON** describing the POI in **3 hierarchical levels**.

**Allowed values:**

1. "poi_type" (top-level category) – must be one of:
   - "Restaurant"
   - "Cafe"
   - "Bar"

2. "main_subcategory" (more specific category) – must be one of:
   - Restaurant: ["Casual Restaurant", "Fine Dining", "Family Restaurant", "Buffet", "Diner"]
   - Cafe: ["Coffee Shop", "Bakery", "Dessert Shop", "Brunch Spot", "Tea House"]
   - Bar: ["Pub", "Cocktail Lounge", "Wine Bar", "Rooftop Bar", "Nightclub", "Live Music Bar", "Brewery"]

3. "specialization" (style, cuisine, focus) – examples include:
   - Restaurant: ["Seafood", "Sushi", "BBQ", "Steakhouse", "Italian", "French Cuisine", "Local Food", "Vegan/Vegetarian", "Fast Casual", "Street Food"]
   - Cafe: ["Coffee & Light Bites", "Pastries", "Breakfast/Brunch", "Desserts", "Vegan/Vegetarian Options", "Work-friendly", "Specialty Tea"]
   - Bar: ["Beer Selection", "Cocktails", "Wine List", "Rooftop View", "Live Music", "Happy Hour", "Craft Beer", "Speakeasy"]

Additionally, estimate **suitability percentages** for SOLO, GROUP, and FAMILY visitors (numbers 0-100, sum not required).

**Rules:**
- Only classify food & nightlife POIs.
- **Do not invent new categories**; always choose from the allowed values above.
- Suitability is based on POI metadata ("Popular for", "Atmosphere", "Offerings").
- Keep JSON strictly valid; do not add any extra text outside the JSON.
- Use three levels consistently: poi_type → main_subcategory → specialization.

**Example Input (Restaurant):**



{
  "Offerings": [
    "All you can eat",
    "Coffee",
    "Private dining room",
    "Quick bite",
    "Small plates",
    "Vegan options",
    "Vegetarian options"
  ],
  "Highlights": [
    "Great coffee",
    "Great dessert",
    "Great tea selection"
  ],
  "Popular for": [
    "Family-friendly",
    "Groups",
    "LGBTQ+ friendly",
    "Tourists",
    "Transgender safespace"
  ],
  "Atmosphere": [
    "Casual",
    "Cozy",
    "Quiet",
    "Romantic",
    "Upscale"
  ],
  "Accessibility": [
    "Breakfast",
    "Brunch",
    "Lunch",
    "Dinner",
    "Catering",
    "Dessert",
    "Seating",
    "Table service"
  ]
}

**Return JSON following the format above.**

In [2]:
!pip install google-genai

Collecting google-genai
  Downloading google_genai-1.57.0-py3-none-any.whl.metadata (53 kB)
Collecting google-auth<3.0.0,>=2.46.0 (from google-auth[requests]<3.0.0,>=2.46.0->google-genai)
  Downloading google_auth-2.47.0-py3-none-any.whl.metadata (6.4 kB)
Collecting tenacity<9.2.0,>=8.2.3 (from google-genai)
  Using cached tenacity-9.1.2-py3-none-any.whl.metadata (1.2 kB)
Collecting websockets<15.1.0,>=13.0.0 (from google-genai)
  Using cached websockets-15.0.1-cp311-cp311-win_amd64.whl.metadata (7.0 kB)
Collecting distro<2,>=1.7.0 (from google-genai)
  Downloading distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting sniffio (from google-genai)
  Using cached sniffio-1.3.1-py3-none-any.whl.metadata (3.9 kB)
Collecting pyasn1-modules>=0.2.1 (from google-auth<3.0.0,>=2.46.0->google-auth[requests]<3.0.0,>=2.46.0->google-genai)
  Using cached pyasn1_modules-0.4.2-py3-none-any.whl.metadata (3.5 kB)
Collecting rsa<5,>=3.1.4 (from google-auth<3.0.0,>=2.46.0->google-auth[requests]<3.0.0,

In [5]:
# dùng cho gpt
!pip install openai


Collecting openai
  Downloading openai-2.14.0-py3-none-any.whl.metadata (29 kB)
Collecting jiter<1,>=0.10.0 (from openai)
  Downloading jiter-0.12.0-cp311-cp311-win_amd64.whl.metadata (5.3 kB)
Downloading openai-2.14.0-py3-none-any.whl (1.1 MB)
   ---------------------------------------- 0.0/1.1 MB ? eta -:--:--
   ---------------------------------------- 1.1/1.1 MB 7.3 MB/s  0:00:00
Downloading jiter-0.12.0-cp311-cp311-win_amd64.whl (204 kB)
Installing collected packages: jiter, openai

   -------------------- ------------------- 1/2 [openai]
   -------------------- ------------------- 1/2 [openai]
   -------------------- ------------------- 1/2 [openai]
   -------------------- ------------------- 1/2 [openai]
   -------------------- ------------------- 1/2 [openai]
   -------------------- ------------------- 1/2 [openai]
   -------------------- ------------------- 1/2 [openai]
   -------------------- ------------------- 1/2 [openai]
   -------------------- ------------------- 1/2 [op

In [1]:
import os
from google import genai
from openai import OpenAI
from google.genai import types
from dotenv import load_dotenv
load_dotenv()

True

In [2]:
os.getcwd()

'c:\\Users\\nguye\\Desktop\\vinamo\\Kyanon-support-localtion\\scripts\\clean_data'

# LLM gemini-2.5-flash

In [9]:
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
client = genai.Client(api_key=GOOGLE_API_KEY)

model = "gemini-2.5-flash"


In [None]:
filepath = os.path.join(os.getcwd(), "input.txt")

# Đọc nội dung từ file
with open(filepath, "r", encoding="utf-8") as f:
    text = f.read()


contents = [
    types.Content(
        role="user",
        parts=[
            types.Part.from_text(text=text),
        ],
    ),
]

generate_content_config = types.GenerateContentConfig(
    thinking_config=types.ThinkingConfig(
        thinking_budget=0,
    ),
)

response = client.models.generate_content(
    model=model,
    contents=contents,
    config=generate_content_config,
)

final_response = response.candidates[0].content.parts[0].text
print(final_response)


# LLM gpt5-mini

In [4]:
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
client = OpenAI(api_key=OPENAI_API_KEY)

In [None]:
# Đọc file input
filepath = os.path.join(os.getcwd(), "input.txt")
with open(filepath, "r", encoding="utf-8") as f:    
    text = f.read()

response = client.responses.create(
    model="gpt-5-mini",
    input=text
)

print(response.output_text)

{
  "poi_type": "Cafe",
  "main_subcategory": "Bakery",
  "specialization": "Pastries",
  "suitability": {
    "SOLO": 75,
    "GROUP": 85,
    "FAMILY": 90
  }
}


# thêm dữ liệu vào txt trc khi gửi llm

In [6]:
import os
from openai import OpenAI

client = OpenAI()

# Đọc file input
filepath = os.path.join(os.getcwd(), "input.txt")
with open(filepath, "r", encoding="utf-8") as f:
    text = f.read()

# Dữ liệu bổ sung muốn thêm
additional_data = "\n\nVui lòng phân tích nội dung trên theo cấu trúc JSON cho POI (Restaurant/Cafe/Bar)."

# Gộp nội dung file + dữ liệu bổ sung
combined_input = text + additional_data

# Gửi cho LLM
response = client.responses.create(
    model="gpt-5-mini",
    input=combined_input
)

print(response.output_text)


{
  "poi_type": "Bar",
  "main_subcategory": "Live Music Bar",
  "specialization": "Live Music",
  "suitability": {
    "SOLO": 50,
    "GROUP": 90,
    "FAMILY": 25
  }
}
