## 重新繪製台灣移工組成圖表

### 參考文章及資料來源

1. 新聞

    [全台移工將破80萬大關，電業也招手！失聯者逼近9萬人 這產業占最多 by 李瑋萱 (2024)](https://tw.news.yahoo.com/全台移工將破80萬大關-電業也招手-失聯者逼近9萬人-這產業占最多-091507621.html)

2. 圖表

<pre>
    <a href="https://statfy.mol.gov.tw/index12.aspx" target="_blank">
        <img alt="勞動部 - 勞動統計查詢網 - 產業與社福移工人數 - 按國籍別分" title="勞動部 - 勞動統計查詢網 - 產業與社福移工人數 - 按國籍別分" src="https://s.yimg.com/ny/api/res/1.2/o0wftHciHulxsXZDp0WuyQ--/YXBwaWQ9aGlnaGxhbmRlcjt3PTk2MA--/https://media.zenfs.com/en/stormmediagroup.com/8a0c5682dde912118663d746fadadb12" width="50%">
    </a>
</pre>

3. 資料

    [勞動部 - 勞動統計查詢網 - 產業及社福移工人數按國籍分](https://statdb.mol.gov.tw/statiscla/webMain.aspx?sys=100&kind=10&type=1&funid=wqrymenu2&cparm1=wq14&rdm=I4y9dcIi)


In [18]:
import os
os.makedirs("data", exist_ok=True)

In [19]:
from http.client import HTTPSConnection

try:
    conn = HTTPSConnection("statdb.mol.gov.tw")
    conn.request(
        "GET",
        (
            "/statiscla/webMain.aspx"
            "?sys=220&ym=8000&ymt=11310&kind=21&type=1"
            "&funid=wq1402&cycle=1&outmode=2&compmode=0&outkind=11&fldspc=1,6,"
        ),
    )
    resp = conn.getresponse()
    if resp.status == 200:
        with open("data/wq1402.csv", "wb") as fo:
            while True:
                chunk = resp.read(4096)
                if not chunk:
                    break
                fo.write(chunk)
    else:
        print(f"Failure: {resp.status} ({resp.reason})")
finally:
    conn.close()

In [21]:
import csv
import json
import re

PAT_DATE_TW = re.compile(r"(?P<y>\d+)年\s+(?P<m>\d+)月")


def parse_date_tw(s: str) -> tuple[int, int]:
    """ISO format: yyyy-mm"""
    d = PAT_DATE_TW.match(s)
    y = int(d["y"]) + 1911
    m = int(d["m"])
    s = f"{y:04d}-{m:02d}"
    return s


with (
    open("data/wq1402.csv", mode="r", encoding="big5") as fi,
    open("data.json", mode="w", encoding="utf-8") as fo,
):
    ro: list[dict[str, str | int]] = []
    for ri in csv.DictReader(fi):
        x = parse_date_tw(ri["統計期"])
        ls = [
            "總計/印尼",
            "總計/馬來西亞",
            "總計/菲律賓",
            "總計/泰國",
            "總計/越南",
            "總計/蒙古",
        ]
        for l in ls:
            y = ri[l]
            y = 0 if y == "\uFF0D" else int(y)
            l = l.replace("總計/", "")
            ro.append(dict(l=l, x=x, y=y))
    json.dump(ro, fo, ensure_ascii=False, indent=2)

In [None]:
%%html
<script>
    document.getElementsByTagName('h1')[0].style.color = 'gray';
    document.getElementsByTagName('h1')[0].hidden = false;
</script>
<h1 hidden>My First Heading</h1>