# Notebook: Thu th·∫≠p d·ªØ li·ªáu kh√°ch h√†ng v√† danh m·ª•c t·ªânh/ TP t·ª´ Pancake POS API

## 1. L·∫•y th√¥ng tin kh√°ch h√†ng

### 1.1. Import th∆∞ vi·ªán & Config
### Gi·∫£i th√≠ch
- Import c√°c th∆∞ vi·ªán c·∫ßn thi·∫øt (`os`, `json`, `requests`, `pandas`, `sqlalchemy`, ‚Ä¶).  
- ƒê·ªçc c√°c bi·∫øn m√¥i tr∆∞·ªùng t·ª´ `.env` (API key, th√¥ng tin DB, shop_id).  
- Gi√∫p b·∫£o m·∫≠t th√¥ng tin thay v√¨ ghi tr·ª±c ti·∫øp trong code.  

In [1]:
import os
import json
import time
import random
import requests
import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy.dialects.mysql import LONGTEXT, VARCHAR
from sqlalchemy.types import BigInteger, DateTime
from dotenv import load_dotenv
from datetime import datetime

# Load bi·∫øn m√¥i tr∆∞·ªùng
load_dotenv()

API_KEY   = os.getenv("API_KEY")
DB_USER   = os.getenv("DB_USER")
DB_PASS   = os.getenv("DB_PASS")
DB_HOST   = os.getenv("DB_HOST")
DB_PORT   = os.getenv("DB_PORT")
DB_BRONZE = os.getenv("DB_BRONZE")
SHOP_ID   = os.getenv("SHOP_ID")  # SHOP_ID trong .env


### 1.2. K·∫øt n·ªëi t·ªõi database Bronze
### Gi·∫£i th√≠ch
- S·ª≠ d·ª•ng `SQLAlchemy` ƒë·ªÉ t·∫°o engine k·∫øt n·ªëi MySQL.  
- Database ƒë√≠ch: schema Bronze, n∆°i l∆∞u d·ªØ li·ªáu th√¥ (raw).  
- In ra th√¥ng b√°o khi k·∫øt n·ªëi th√†nh c√¥ng.  

In [18]:
engine_bronze = create_engine(
    f"mysql+pymysql://{DB_USER}:{DB_PASS}@{DB_HOST}:{DB_PORT}/{DB_BRONZE}"
)

print(f"‚úÖ K·∫øt n·ªëi th√†nh c√¥ng t·ªõi Bronze: {DB_BRONZE}")


‚úÖ K·∫øt n·ªëi th√†nh c√¥ng t·ªõi Bronze: winner_bronze


### 1.3. H√†m l·∫•y m·ªôt trang d·ªØ li·ªáu kh√°ch h√†ng t·ª´ API
### Gi·∫£i th√≠ch
- H√†m `get_customers_page`:
  - G·ªçi API `/customers` v·ªõi `shop_id`, `page`, `page_size`.
  - Tr·∫£ v·ªÅ: danh s√°ch kh√°ch h√†ng (`data`) v√† t·ªïng s·ªë trang (`total_pages`).  
  - C√≥ c∆° ch·∫ø retry t·ªëi ƒëa 3 l·∫ßn n·∫øu l·ªói (timeout ho·∫∑c HTTP error).  

In [3]:
def get_customers_page(shop_id, page=1, page_size=1000, max_retries=3):
    """
    G·ªçi API /shops/{shop_id}/customers ƒë·ªÉ l·∫•y m·ªôt trang kh√°ch h√†ng
    """
    url = f"https://pos.pages.fm/api/v1/shops/{shop_id}/customers"
    params = {"api_key": API_KEY, "page": page, "page_size": page_size}
    
    for attempt in range(1, max_retries + 1):
        try:
            resp = requests.get(url, params=params, timeout=60)
            if resp.status_code == 200:
                j = resp.json()
                data = j.get("data", [])
                total_pages = j.get("total_pages") or j.get("meta", {}).get("total_pages")
                return data, total_pages
            else:
                print(f"‚ö†Ô∏è L·ªói {resp.status_code} t·∫°i page {page}, attempt {attempt}")
        except Exception as e:
            print(f"‚ö†Ô∏è Exception t·∫°i page {page}: {e}, attempt {attempt}")
        time.sleep(5 * attempt)
    return [], None


### 1.4. H√†m ghi d·ªØ li·ªáu batch v√†o MySQL
### Gi·∫£i th√≠ch
- H√†m `insert_customers_batch`:
  - Nh·∫≠n DataFrame `df_batch`.
  - Ghi d·ªØ li·ªáu v√†o b·∫£ng `customers_raw` trong schema Bronze.
  - C√°c c·ªôt g·ªìm: `shop_id`, `customer_id`, `raw_json`, `extracted_at`.  

In [4]:
def insert_customers_batch(df_batch):
    df_batch.to_sql(
        "customers_raw",
        con=engine_bronze,
        if_exists="append",
        index=False,
        dtype={
            "shop_id": BigInteger(),
            "customer_id": VARCHAR(50),   # id kh√°ch c√≥ th·ªÉ l√† chu·ªói
            "raw_json": LONGTEXT(),
            "extracted_at": DateTime()
        }
    )


### 1.5. H√†m fetch & load nhi·ªÅu trang kh√°ch h√†ng
### Gi·∫£i th√≠ch
- H√†m `fetch_and_load_customers`:
  - Ch·∫°y v√≤ng l·∫∑p qua t·ª´ng trang API.
  - Chuy·ªÉn ƒë·ªïi d·ªØ li·ªáu v·ªÅ DataFrame.
  - L∆∞u batch v√†o MySQL b·∫±ng `insert_customers_batch`.
  - C√≥ delay ng·∫´u nhi√™n (1-3s) ƒë·ªÉ tr√°nh b·ªã rate-limit.  

In [5]:
def fetch_and_load_customers(shop_id, start_page=1, page_size=1000):
    page = start_page
    while True:
        customers, total_pages = get_customers_page(shop_id, page, page_size)
        if not customers:
            print("üì¶ Kh√¥ng c√≤n d·ªØ li·ªáu t·∫°i page", page)
            break

        df_batch = pd.DataFrame([{
            "shop_id": shop_id,
            "customer_id": cust.get("id"),
            "raw_json": json.dumps(cust, ensure_ascii=False),
            "extracted_at": datetime.now()
        } for cust in customers])

        insert_customers_batch(df_batch)
        print(f"‚úÖ Page {page}/{total_pages} - Loaded {len(customers)} customers")

        if total_pages and page >= total_pages:
            break

        time.sleep(random.uniform(1, 3))  # ng·ªß ng·∫´u nhi√™n ƒë·ªÉ tr√°nh b·ªã block
        page += 1


### 1.6. Ch·∫°y ti·∫øn tr√¨nh thu th·∫≠p d·ªØ li·ªáu
### Gi·∫£i th√≠ch
- G·ªçi h√†m `fetch_and_load_customers` ƒë·ªÉ thu th·∫≠p to√†n b·ªô d·ªØ li·ªáu kh√°ch h√†ng.
- K·∫øt qu·∫£: b·∫£ng `customers_raw` trong schema Bronze ch·ª©a d·ªØ li·ªáu th√¥ t·ª´ API.  


In [6]:
fetch_and_load_customers(SHOP_ID, start_page=1, page_size=1000)


‚úÖ Page 1/37 - Loaded 1000 customers
‚úÖ Page 2/37 - Loaded 1000 customers
‚úÖ Page 3/37 - Loaded 1000 customers
‚úÖ Page 4/37 - Loaded 1000 customers
‚úÖ Page 5/37 - Loaded 1000 customers
‚úÖ Page 6/37 - Loaded 1000 customers
‚úÖ Page 7/37 - Loaded 1000 customers
‚úÖ Page 8/37 - Loaded 1000 customers
‚úÖ Page 9/37 - Loaded 1000 customers
‚úÖ Page 10/37 - Loaded 1000 customers
‚úÖ Page 11/37 - Loaded 1000 customers
‚úÖ Page 12/37 - Loaded 1000 customers
‚úÖ Page 13/37 - Loaded 1000 customers
‚úÖ Page 14/37 - Loaded 1000 customers
‚úÖ Page 15/37 - Loaded 1000 customers
‚úÖ Page 16/37 - Loaded 1000 customers
‚úÖ Page 17/37 - Loaded 1000 customers
‚úÖ Page 18/37 - Loaded 1000 customers
‚úÖ Page 19/37 - Loaded 1000 customers
‚úÖ Page 20/37 - Loaded 1000 customers
‚úÖ Page 21/37 - Loaded 1000 customers
‚úÖ Page 22/37 - Loaded 1000 customers
‚úÖ Page 23/37 - Loaded 1000 customers
‚úÖ Page 24/37 - Loaded 1000 customers
‚úÖ Page 25/37 - Loaded 1000 customers
‚úÖ Page 26/37 - Loaded 1000 custo

--- 
## 2. L·∫•y danh m·ª•c t·ªânh/th√†nh ph·ªë t·ª´ Pancake POS API
Trong ph·∫ßn n√†y, ta s·∫Ω g·ªçi endpoint `/geo/provinces` ƒë·ªÉ l·∫•y danh m·ª•c t·ªânh/th√†nh ph·ªë ·ªü Vi·ªát Nam (country_code = "84").  
K·∫øt qu·∫£ s·∫Ω ƒë∆∞·ª£c l∆∞u v√†o b·∫£ng `province_raw` trong **t·∫ßng Bronze**.

### 2.1. Khai b√°o BASE_URL v√† headers (Authorization)
### Gi·∫£i th√≠ch
- `BASE_URL`: ƒë∆∞·ªùng d·∫´n g·ªëc c·ªßa Pancake POS API.  
- `headers`: ch·ª©a Authorization token (`Bearer API_KEY`) ƒë·ªÉ x√°c th·ª±c khi g·ªçi API.  


In [14]:
BASE_URL = "https://pos.pages.fm/api/v1"
headers = {"Authorization": f"Bearer {API_KEY}"}

### 2.2. H√†m g·ªçi API chung
### Gi·∫£i th√≠ch
- H√†m `fetch_api` gi√∫p t√°i s·ª≠ d·ª•ng cho nhi·ªÅu endpoint.  
- Nh·∫≠n v√†o `endpoint` v√† `params` ‚Üí g·ªçi API ‚Üí tr·∫£ v·ªÅ ph·∫ßn `"data"`.  
- C√≥ x·ª≠ l√Ω l·ªói (`try/except`) v√† in ra th√¥ng b√°o n·∫øu th·∫•t b·∫°i.  


In [15]:
# H√†m g·ªçi API

def fetch_api(endpoint, params=None):
    url = f"{BASE_URL}{endpoint}"
    try:
        res = requests.get(url, headers=headers, params=params, timeout=30)
        res.raise_for_status()
        return res.json().get("data", [])
    except Exception as e:
        print(f"‚ùå L·ªói khi g·ªçi API {url}: {e}")
        return []


### 2.3. H√†m l·∫•y danh m·ª•c t·ªânh/th√†nh ph·ªë
### Gi·∫£i th√≠ch
- H√†m `get_provinces` g·ªçi API `/geo/provinces`.  
- D·ªØ li·ªáu tr·∫£ v·ªÅ ƒë∆∞·ª£c chuy·ªÉn th√†nh `DataFrame` v·ªõi 2 c·ªôt:
  - `province_id`
  - `province_name`  
- In ra s·ªë l∆∞·ª£ng t·ªânh th√†nh l·∫•y ƒë∆∞·ª£c.  


In [16]:
# L·∫•y danh m·ª•c t·ªânh/th√†nh ph·ªë

def get_provinces():
    provinces = fetch_api("/geo/provinces", params={"country_code": "84"})
    print(f"‚úÖ L·∫•y ƒë∆∞·ª£c {len(provinces)} t·ªânh/th√†nh")

    df = pd.DataFrame([{
        "province_id": p["id"],
        "province_name": p["name"]
    } for p in provinces])

    return df

# Ch·∫°y th·ª≠
df_provinces = get_provinces()
df_provinces.head()


‚úÖ L·∫•y ƒë∆∞·ª£c 63 t·ªânh/th√†nh


Unnamed: 0,province_id,province_name
0,805,An Giang
1,221,B·∫Øc Giang
2,207,B·∫Øc K·∫°n
3,821,B·∫°c Li√™u
4,106,B·∫Øc Ninh


### 2.4. Ghi d·ªØ li·ªáu provinces v√†o MySQL (Bronze)
### Gi·∫£i th√≠ch
- S·ª≠ d·ª•ng `to_sql` ƒë·ªÉ ghi d·ªØ li·ªáu v√†o b·∫£ng `province_raw`.  
- `if_exists="replace"`: x√≥a b·∫£ng c≈© v√† ghi l·∫°i (n·∫øu mu·ªën c·ªông d·ªìn th√¨ ƒë·ªïi sang `"append"`).  
- Ki·ªÉu d·ªØ li·ªáu trong MySQL:
  - `province_id`: VARCHAR(20).  
  - `province_name`: VARCHAR(255).  


In [19]:
df_provinces.to_sql(
    name="province_raw",   # t√™n b·∫£ng trong MySQL
    con=engine_bronze,            # k·∫øt n·ªëi t·ªõi Bronze DB
    if_exists="replace",          # ghi ƒë√®, d√πng "append" n·∫øu mu·ªën th√™m d·ªØ li·ªáu
    index=False,
    dtype={
        "province_id": VARCHAR(20),
        "province_name": VARCHAR(255)
    }
)
print("‚úÖ ƒê√£ l∆∞u b·∫£ng province_raw v√†o Bronze DB")


‚úÖ ƒê√£ l∆∞u b·∫£ng province_raw v√†o Bronze DB
