### 1. Motivation
#### What is your dataset?
We used three datasets related to Austin Animal Center:
- Austin_Animal_Center_Intakes_20250419.csv: records of animals entering the shelter, including intake dates and animal types.
- Austin_Animal_Center_Outcomes_20250419.csv: outcome records of animals leaving the shelter (e.g., adoption, euthanasia).
- shelter_geocoded_locations.csv: a manually geocoded dataset that maps official shelter addresses in Austin to their corresponding latitude and longitude coordinates, based on data from the Austin Animal Center's official shelter list.
- neighbour.json: downloaded from https://data.austintexas.gov/.

#### Why did you choose this/these particular dataset(s)?
The intake dataset offers a rich, time-stamped record of real-world animal shelter operations. With the help of manually geocoded shelter addresses, we were able to conduct meaningful spatial analysis. This multi-angle approach allows for an accessible and engaging visual story.

#### What was your goal for the end user's experience?
We want users to understand the patterns behind stray animal intake: when animals enter shelters most frequently, and which shelter regions are under more pressure. The visualizations should guide users from temporal overview to regional insights.

### 2. Basic stats

#### Data Cleaning & Preprocessing
- Merged intake & outcome tables: According to Animal ID.Keeping duplicates for repeat visits.
- Date parsing: Converted intake_datetime to proper datetime format.
- Time aggregation: Grouped intake counts by month and by shelter region.
- Outliers: Removed records with missing animal type or unmatchable shelter data.

In [None]:
import pandas as pd

# 读取 Intakes 和 Outcomes 数据集
intake_df = pd.read_csv("Austin_Animal_Center_Intakes_20250419.csv")
outcome_df = pd.read_csv("Austin_Animal_Center_Outcomes_20250419.csv")

def print_basic_info(df, name):
    print(f"datasetname: {name}")
    print(f"datavolume: {df.shape}")
    print("\n column name")
    print(df.columns.tolist())
    print("\n missing value")
    print(df.isnull().sum())
    print("\n sample")
    print(df.head(5))
    print("\n" + "="*50 + "\n")

# 打印 Intakes 数据基本信息
print_basic_info(intake_df, "Intakes")

# 打印 Outcomes 数据基本信息
print_basic_info(outcome_df, "Outcomes")

# 提取 ID 集合
intake_ids = set(intake_df["Animal ID"])
outcome_ids = set(outcome_df["Animal ID"])

# 检查包含关系
only_in_intake = intake_ids - outcome_ids
only_in_outcome = outcome_ids - intake_ids
common_ids = intake_ids & outcome_ids

print(f"🎯 Intake 中总共有 ID 数量: {len(intake_ids)}")
print(f"🎯 Outcome 中总共有 ID 数量: {len(outcome_ids)}")
print(f"✅ 两边共有的 ID 数量: {len(common_ids)}")
print(f"⚠️ 只在 Intake 中出现的 ID: {len(only_in_intake)}")
print(f"⚠️ 只在 Outcome 中出现的 ID: {len(only_in_outcome)}")

# 是否有重复？
intake_duplicates = intake_df["Animal ID"].duplicated().sum()
outcome_duplicates = outcome_df["Animal ID"].duplicated().sum()

print(f"\n🔁 Intake 中重复的 Animal ID 数量: {intake_duplicates}")
print(f"🔁 Outcome 中重复的 Animal ID 数量: {outcome_duplicates}")

# 每个 ID 在 intake 中出现的次数
intake_counts = intake_df["Animal ID"].value_counts()
outcome_counts = outcome_df["Animal ID"].value_counts()

# 是否所有 ID 只出现一次
print("Intake 一对一:", all(intake_counts == 1))
print("Outcome 一对一:", all(outcome_counts == 1))

# 修复时间格式并统一为 tz-naive（无时区）
intake_df["DateTime"] = pd.to_datetime(intake_df["DateTime"], errors="coerce").dt.tz_localize(None)
outcome_df["DateTime"] = pd.to_datetime(outcome_df["DateTime"], errors="coerce").dt.tz_localize(None)

#对每个动物按时间排序，并编号每一次进出
intake_df_sorted = intake_df.sort_values(by=["Animal ID", "DateTime"])
outcome_df_sorted = outcome_df.sort_values(by=["Animal ID", "DateTime"])

intake_df_sorted["Intake Number"] = intake_df_sorted.groupby("Animal ID").cumcount() + 1
outcome_df_sorted["Outcome Number"] = outcome_df_sorted.groupby("Animal ID").cumcount() + 1

#重命名避免合并冲突，保留 MonthYear 和 Date of Birth
intake_df_sorted = intake_df_sorted.rename(columns={
    "DateTime": "Intake Datetime",
    "MonthYear": "Intake MonthYear",
    "Sex upon Intake": "Sex upon Intake",
    "Age upon Intake": "Age upon Intake",
    "Intake Type": "Intake Type",
    "Intake Condition": "Intake Condition",
    "Found Location": "Found Location"
})

outcome_df_sorted = outcome_df_sorted.rename(columns={
    "DateTime": "Outcome Datetime",
    "MonthYear": "Outcome MonthYear",
    "Sex upon Outcome": "Sex upon Outcome",
    "Age upon Outcome": "Age upon Outcome",
    "Outcome Type": "Outcome Type",
    "Outcome Subtype": "Outcome Subtype",
    "Date of Birth": "Date of Birth"
})

#合并 Intake 和 Outcome（按编号对应）
merged_df = pd.merge(
    intake_df_sorted,
    outcome_df_sorted,
    left_on=["Animal ID", "Intake Number"],
    right_on=["Animal ID", "Outcome Number"],
    how="left",  # 保留所有 Intake，即使没有匹配的 Outcome
    suffixes=("_Intake", "_Outcome")
)

#填充没有 Outcome 的记录
merged_df["Outcome Number"] = merged_df["Outcome Number"].fillna(0).astype(int)
merged_df["Outcome Type"] = merged_df["Outcome Type"].fillna("No Outcome")
merged_df["Outcome Subtype"] = merged_df["Outcome Subtype"].fillna("Unknown")

#时间合理性过滤：出所时间应大于等于入所时间或为空
merged_df = merged_df[
    (merged_df["Outcome Datetime"].isna()) |
    (merged_df["Outcome Datetime"] >= merged_df["Intake Datetime"])
]

#合并字段：优先保留 Intake 数据
merged_df["Name"] = merged_df["Name_Intake"].combine_first(merged_df["Name_Outcome"])
merged_df["Animal Type"] = merged_df["Animal Type_Intake"].combine_first(merged_df["Animal Type_Outcome"])
merged_df["Breed"] = merged_df["Breed_Intake"].combine_first(merged_df["Breed_Outcome"])
merged_df["Color"] = merged_df["Color_Intake"].combine_first(merged_df["Color_Outcome"])

#填充没有出所的数字字段
merged_df["Outcome Number"] = merged_df["Outcome Number"].fillna(0).astype(int)

#整理最终字段顺序，保留 Intake/Outcome MonthYear 和 Date of Birth
final_df = merged_df[[
    "Animal ID", "Name", "Animal Type", "Date of Birth", "Intake Datetime", "Intake Type", "Intake Condition",
    "Intake MonthYear", "Found Location", "Outcome Datetime", "Outcome Type", "Outcome Subtype",
    "Outcome MonthYear", "Intake Number", "Outcome Number", "Breed", "Color",
    "Sex upon Intake", "Age upon Intake", "Sex upon Outcome", "Age upon Outcome"
]]

# 11. save
final_df.to_csv("updated_cleaned_merged_animal_data.csv", index=False)

#显示预览（如在 Jupyter 中使用）
print("合并完成，共 {} 条记录。".format(len(final_df)))
final_df.head(30)


In [None]:
import pandas as pd
import googlemaps
from concurrent.futures import ThreadPoolExecutor
import time
import math

# 输入你的 Google API 密钥
api_key = "AIzaSyDkZJ9NcJjGtc40JG-P3LryIwHS592ii1U"  # 替换为你的 API 密钥
gmaps = googlemaps.Client(key=api_key)

# 加载数据文件
file_path = 'addresses_part_1.csv'  # 每个人的文件不同
df = pd.read_csv(file_path)

# 假设地址列的名称是 'Formatted Address'
addresses = df['Formatted Address'].dropna().unique()

# 地理编码函数
def geocode_address(address):
    try:
        # 使用 Google Maps API 进行地理编码
        result = gmaps.geocode(address)
        if result:
            lat = result[0]['geometry']['location']['lat']
            lng = result[0]['geometry']['location']['lng']
            return address, lat, lng
        else:
            return address, None, None
    except Exception as e:
        return address, None, None

# 多线程地理编码
def geocode_addresses_in_parallel(addresses):
    total_addresses = len(addresses)
    progress_interval = math.ceil(total_addresses / 10)  # 每处理10%的地址打印一次进度
    results = []

    with ThreadPoolExecutor(max_workers=10) as executor:
        futures = {executor.submit(geocode_address, address): address for address in addresses}

        completed = 0
        for future in futures:
            result = future.result()
            results.append(result)
            completed += 1
            if completed % progress_interval == 0:
                print(f"已处理 {completed} / {total_addresses} 地址")

    return results

# 开始地理编码
start_time = time.time()
geocoded_data = geocode_addresses_in_parallel(addresses)
end_time = time.time()

# 将结果转为 DataFrame
geocoded_df = pd.DataFrame(geocoded_data, columns=['Address', 'Latitude', 'Longitude'])

# 保存地理编码结果到新文件
output_file = 'geocoded_addresses_part_1.csv'  # 每个人的输出文件不同
geocoded_df.to_csv(output_file, index=False)

# 打印用时
print(f"地理编码完成，耗时 {end_time - start_time} 秒。")

# 显示结果预览
geocoded_df.head()

# 提供文件下载链接
output_file

#### Geocoding
- Transfer TXT found locations to coordinates through Google Map API.
- Shelter matching: Linked each record to a shelter location based on the shelter_name or address field.
- Geolocation: Used shelter_geocoded_locations.csv to attach coordinates to each shelter site.
- Neighborhood mapping: Mapped coordinates to official Austin neighborhood polygons using spatial joins.

In [None]:
import pandas as pd

# 读取原始文件
file_path = 'unique_geocoding_ready_addresses.csv'  # 你上传的文件路径
df = pd.read_csv(file_path)

# 假设地址列为 'Formatted Address'
addresses = df['Formatted Address'].dropna()

# 将数据分成两部分
addresses_part_1 = addresses[:len(addresses)//2]
addresses_part_2 = addresses[len(addresses)//2:]

# 将分割的数据保存为两个 CSV 文件
addresses_part_1.to_csv('addresses_part_1.csv', index=False)
addresses_part_2.to_csv('addresses_part_2.csv', index=False)


import pandas as pd
import googlemaps
from concurrent.futures import ThreadPoolExecutor
import time
import math

# 输入你的 Google API 密钥
api_key = "AIzaSyCdpBuSe2rZT_JwVGgEYLsKGd6SAN0xm5o"  # 替换为你的 API 密钥
gmaps = googlemaps.Client(key=api_key)

# 加载数据文件
file_path = 'addresses_part_2.csv'  # 每个人的文件不同
df = pd.read_csv(file_path)

# 假设地址列的名称是 'Formatted Address'
addresses = df['Formatted Address'].dropna().unique()

# 地理编码函数
def geocode_address(address):
    try:
        # 使用 Google Maps API 进行地理编码
        result = gmaps.geocode(address)
        if result:
            lat = result[0]['geometry']['location']['lat']
            lng = result[0]['geometry']['location']['lng']
            return address, lat, lng
        else:
            return address, None, None
    except Exception as e:
        return address, None, None

# 多线程地理编码
def geocode_addresses_in_parallel(addresses):
    total_addresses = len(addresses)
    progress_interval = math.ceil(total_addresses / 10)  # 每处理10%的地址打印一次进度
    results = []

    with ThreadPoolExecutor(max_workers=10) as executor:
        futures = {executor.submit(geocode_address, address): address for address in addresses}

        completed = 0
        for future in futures:
            result = future.result()
            results.append(result)
            completed += 1
            if completed % progress_interval == 0:
                print(f"已处理 {completed} / {total_addresses} 地址")

    return results

# 开始地理编码
start_time = time.time()
geocoded_data = geocode_addresses_in_parallel(addresses)
end_time = time.time()

# 将结果转为 DataFrame
geocoded_df = pd.DataFrame(geocoded_data, columns=['Address', 'Latitude', 'Longitude'])

# 保存地理编码结果到新文件
output_file = 'geocoded_addresses_part_2.csv'  # 每个人的输出文件不同
geocoded_df.to_csv(output_file, index=False)

# 打印用时
print(f"地理编码完成，耗时 {end_time - start_time} 秒。")

# 显示结果预览
geocoded_df.head()

# 提供文件下载链接
output_file


import pandas as pd

# 1. 读取两个地理编码部分数据
df_part_1 = pd.read_csv("geocoded_addresses_part_1.csv")
df_part_2 = pd.read_csv("geocoded_addresses_part_2.csv")

# 2. 合并两个数据集
merged_df = pd.concat([df_part_1, df_part_2], ignore_index=True)

# 3. 去重操作，防止重复地址
merged_df = merged_df.drop_duplicates(subset=['Address'])

# 4. 保存合并后的数据
merged_df.to_csv("geocoded_addresses_merged.csv", index=False)

print("合并完成，共 {} 条唯一地址记录。".format(len(merged_df)))



#### Dataset Statistics
- Intakes dataset rows: 126,234
- Outcomes dataset rows: 124,513
- Unique animal types: 5 (Dogs, Cats, Birds, Others, Small Mammals)
- Mapped shelter coordinates: Successfully matched to all listed shelters
- Time span: October 2013 to April 2025
- Number of mapped shelters: 12
- 68395 coded locations

### 3. Data Analysis

#### Dynamic Hotspot Map of Animal Intakes
We used Folium’s HeatMapWithTime to animate changes in geographic hotspots across Austin over time. Each frame represents a specific month from October 2013 to April 2025.

In [None]:
import pandas as pd
import folium
from folium.plugins import HeatMapWithTime, Fullscreen, MeasureControl
import branca.colormap as cm

# === Step 1: Load data ===
df = pd.read_csv("merged_data.csv", parse_dates=["Intake Datetime"])
df = df.dropna(subset=["Latitude", "Longitude", "Intake Datetime"]).copy()
df = df[(df["Latitude"].between(-90, 90)) & (df["Longitude"].between(-180, 180))]
df["Month_Formatted"] = df["Intake Datetime"].dt.strftime("%Y-%m")

# === Step 2: Add weight ===
def add_weight(points_list):
    from collections import defaultdict
    point_counts = defaultdict(int)
    weighted = []
    for p in points_list:
        rounded = (round(p[0], 3), round(p[1], 3))
        point_counts[rounded] += 1
    for p in points_list:
        rounded = (round(p[0], 3), round(p[1], 3))
        weighted.append([p[0], p[1], min(point_counts[rounded] * 0.2, 1)])
    return weighted

time_index = sorted(df["Month_Formatted"].unique())
heat_data = [add_weight(df[df["Month_Formatted"] == m][["Latitude", "Longitude"]].values.tolist()) for m in time_index]

# === Step 3: Setup map ===
center = [df["Latitude"].median(), df["Longitude"].median()]
m = folium.Map(location=center, zoom_start=12, tiles="CartoDB Positron", control_scale=True)
MeasureControl(position="bottomright").add_to(m)

# === Step 4: Color gradient ===
colors = ["#c4e9f2", "#7fbbdd", "#f58b05", "#ffc22f"]
colormap = cm.LinearColormap(colors=colors, index=[0, 0.3, 0.6, 1], vmin=0, vmax=1, caption="Hotspot Density")
colormap.add_to(m)

# === Step 5: Heatmap with time ===
HeatMapWithTime(
    data=heat_data,
    index=time_index,
    radius=15,
    min_opacity=0.3,
    max_opacity=0.9,
    gradient={i / 3: colors[i] for i in range(4)},
    use_local_extrema=True,
    auto_play=True,
    display_index=True
).add_to(m)

# === Step 6: Title box ===
title_html = f"""
<div id="title-card" style="
    position: absolute;
    top: 20px;
    left: 20px;
    z-index: 9999;
    background-color: rgba(255,255,255,0.85);
    padding: 10px 15px;
    border-radius: 8px;
    font-family: sans-serif;
    box-shadow: 0 0 8px rgba(0,0,0,0.1);
    max-width: 300px;
    font-size: 13px;
">
    <h4 style="margin: 0; font-size: 1em;"><b>Dynamic Hotspot Map of Animal Intakes</b></h4>
    <p style="margin: 2px 0;">Monthly distribution of animal intake hotspots</p>
    <p style="margin: 0;">Data range: {df["Intake Datetime"].min().strftime('%Y-%m')} to {df["Intake Datetime"].max().strftime('%Y-%m')}</p>
</div>
"""
m.get_root().html.add_child(folium.Element(title_html))

# === Step 7: Final JS fix using MutationObserver ===
custom_js = """
<script>
document.addEventListener("DOMContentLoaded", function () {
    const map = document.querySelector('.folium-map');
    if (map) map.style.position = 'relative';

    const observer = new MutationObserver(() => {
        const ctrl = document.querySelector('.leaflet-control-timecontrol');
        const target = document.querySelector('.leaflet-bottom.leaflet-left');

        if (ctrl && target && ctrl.querySelectorAll("button").length > 0) {
            // Remove all buttons except play/pause (index 1)
            const buttons = ctrl.querySelectorAll("button");
            buttons.forEach((btn, i) => {
                if (i !== 1) btn.remove();
            });

            // Remove fps control row
            const rows = ctrl.querySelectorAll("tr");
            if (rows.length > 1) rows[1].remove();

            // Move and style control
            target.appendChild(ctrl);
            Object.assign(ctrl.style, {
                position: 'absolute',
                left: '0',
                bottom: '0',
                margin: '20px',
                zIndex: '10000',
                background: 'white',
                borderRadius: '8px',
                padding: '8px',
                boxShadow: '0 0 6px rgba(0,0,0,0.2)',
                display: 'inline-block',
                maxWidth: '600px',
                overflow: 'hidden',
                transform: 'translateX(0%)'
            });

            observer.disconnect();
        }
    });

    observer.observe(document.body, { childList: true, subtree: true });
});
</script>
"""
m.get_root().html.add_child(folium.Element(custom_js))

# === Step 8: Fullscreen button ===
Fullscreen(position="topright").add_to(m)

# === Step 9: Save output ===
m.save("heatmap_final_clean_controls.html")
print("✅ Saved: heatmap_final_clean_controls.html")

####Choropleth of Animal Intakes
Using a GeoJSON file of Austin neighborhoods, we mapped total intake counts per region and applied a yellow-to-blue gradient using Folium and D3 for visual contrast.

In [None]:
import pandas as pd
import folium
import json
from shapely.geometry import shape, Point
import branca.colormap as cm

# === Step 1: 加载收容数据 ===
df = pd.read_csv("merged_data.csv", parse_dates=["Intake Datetime"])
df = df.dropna(subset=["Latitude", "Longitude"])

# === Step 2: 加载 GeoJSON 数据 ===
with open("Neighborhoods_20250506.geojson") as f:
    gj = json.load(f)

# 自动检测区域字段名
region_field = next((k for k in gj["features"][0]["properties"]
                     if "name" in k.lower() or "label" in k.lower()),
                    list(gj["features"][0]["properties"].keys())[0])

# === Step 3: 将点匹配到区域 ===
polygon_map = {f["properties"][region_field]: shape(f["geometry"]) for f in gj["features"]}

def assign_region(row):
    pt = Point(row["Longitude"], row["Latitude"])
    for name, poly in polygon_map.items():
        if poly.contains(pt):
            return name
    return None

df["region"] = df.apply(assign_region, axis=1)
df = df.dropna(subset=["region"])

# === Step 4: 汇总每个区域的收容数量 ===
region_counts = df.groupby("region").size().reset_index(name="animal_count")
region_dict = region_counts.set_index("region")["animal_count"].to_dict()

# 写入 GeoJSON 属性
for f in gj["features"]:
    rid = f["properties"].get(region_field)
    count = region_dict.get(rid, 0)
    f["properties"]["animal_count"] = int(count)

# === Step 5: 创建底图 ===
m = folium.Map(location=[30.27, -97.74], zoom_start=11, tiles="CartoDB Positron")

# === Step 6: 创建渐变色条（淡黄到主蓝）===
colormap = cm.LinearColormap(
    colors=["#FFF5BF", "#639BFF"],  # 淡黄 → 主蓝
    vmin=min(region_dict.values()),
    vmax=max(region_dict.values()),
    caption="Total Animal Intakes by Neighborhood"
)
colormap.add_to(m)

# === Step 7: 绘制 GeoJson 多边形图层 ===
folium.GeoJson(
    gj,
    style_function=lambda feature: {
        "fillColor": colormap(feature["properties"].get("animal_count", 0)),
        "color": "#A5C8FF",         # 浅蓝边界
        "weight": 0.5,
        "fillOpacity": 0.7
    },
    tooltip=folium.GeoJsonTooltip(
        fields=[region_field, "animal_count"],
        aliases=["Neighborhood:", "Animal Intakes:"],
        localize=True,
        style=(
            "background-color: #FFF5BF; "
            "color: #3C3C3C; "
            "font-family: Comic Sans MS, sans-serif; "
            "font-size: 12px; "
            "padding: 6px; border-radius: 5px;"
        )
    )
).add_to(m)

# === Step 8: 添加标题卡片 ===
title_html = """
<div style="
    position: absolute;
    top: 20px;
    left: 20px;
    z-index: 9999;
    background-color: #FFFAF0;
    padding: 10px 15px;
    border-radius: 10px;
    font-family: Comic Sans MS, sans-serif;
    box-shadow: 0 0 6px rgba(0,0,0,0.1);
    color: #3C3C3C;
    font-size: 13px;
    max-width: 300px;
">
    <h4 style="margin: 0; font-size: 16px;"><b>Choropleth of Animal Intakes</b></h4>
    <p style="margin: 4px 0;">Neighborhoods shaded by total intake counts</p>
    <p style="margin: 0;">Data source: merged_data.csv</p>
</div>
"""
m.get_root().html.add_child(folium.Element(title_html))

# === Step 9: 导出为 HTML 文件 ===
m.save("choropleth_yellow_to_blue.html")
print("✅ Saved: choropleth_yellow_to_blue.html")

#### Monthly shelter-intake trend
We grouped the dataset by intake month and counted total animal entries. A time series chart was created using Plotly to reflect seasonal and long-term trends.

In [None]:
import pandas as pd
import plotly.graph_objects as go

# === Step 1: 加载 intake 数据 ===
df = pd.read_csv("merged_data.csv", parse_dates=["Intake Datetime"])
df = df.dropna(subset=["Intake Datetime"])
df["Month"] = df["Intake Datetime"].dt.to_period("M").astype(str)

monthly_counts = (
    df.groupby("Month")
    .size()
    .reset_index(name="Count")
    .sort_values("Month")
)
monthly_counts["Month"] = pd.to_datetime(monthly_counts["Month"])

# === Step 2: 自定义颜色样式 ===
colors = {
    "background": "#FFFAF0",   # 奶白色
    "line": "#639BFF",         # 折线蓝
    "marker": "#FFE25F",       # 柠檬黄点
    "font": "#3C3C3C"          # 深灰字体
}

# === Step 3: 构建图表 ===
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=monthly_counts["Month"],
    y=monthly_counts["Count"],
    mode="lines+markers",
    line=dict(color=colors["line"], width=4, shape="spline"),
    marker=dict(size=9, color=colors["marker"], line=dict(width=1, color="white")),
    hovertemplate="Month: %{x|%b %Y}<br>Intakes: %{y}<extra></extra>"
))

# === Step 4: 布局与风格 ===
fig.update_layout(
    title="From Streets to Shelter: Monthly Intake Trend",
    xaxis_title="Month",
    yaxis_title="Number of Animals",
    template="simple_white",
    font=dict(family="Comic Sans MS, Arial, sans-serif", size=16, color=colors["font"]),
    hovermode="x unified",
    margin=dict(t=60, l=60, r=40, b=60),
    plot_bgcolor=colors["background"],
    paper_bgcolor=colors["background"]
)

# === Step 5: 导出为 HTML 文件 ===
fig.write_html("monthly_intake_lemon_highlight_cartoon.html")
print("✅ Saved: monthly_intake_lemon_highlight_cartoon.html")

#### Shelter Coverage & Intakes
We made this map to show where each animal shelter can currently reach within 5 km and where most animals are being taken in. This helps us quickly spot places that still lack nearby shelter support, so we know where to add new shelters or send extra help.

In [None]:
import json, warnings
import pandas as pd, geopandas as gpd
from shapely.geometry import shape
import folium, branca
from folium.plugins import MarkerCluster
from tqdm import tqdm

warnings.filterwarnings("ignore", category=UserWarning)

# basics
NEIGH_JSON  = "neighbour.json"
ANIMAL_CSV  = "merged_data.csv"
SHELTER_CSV = "updated_shelter_geocoded_locations.csv"
ICON_PATH   = "rounded_shelter_icon.png"
OUTCOME_COL      = "Outcome Type"
ADOPTED_VALUE    = "Adoption"


CIRCLE_RADIUS = 5000          # shelter radius (m)
MAX_ANIMALS   = 6000          # sampling cap
COLOR_MIN, COLOR_MAX, N_BINS = "#fff5bf", "#92b7ec", 7

#  读取 neighbourhood.json
def load_neighbourhood(path=NEIGH_JSON):
    with open(path, "r", encoding="utf-8") as f:
        raw = json.load(f)
    return gpd.GeoDataFrame(
        [{"geometry": shape(feat["the_geom"]),
          **{k: v for k, v in feat.items() if k != "the_geom"}} for feat in raw],
        crs="EPSG:4326")

gdf_neigh = load_neighbourhood()
name_col = next(c for c in ["neighname","NAME","name","neighborhood"] if c in gdf_neigh)

# 读取动物 & shelters
animals  = pd.read_csv(ANIMAL_CSV).dropna(subset=["Latitude","Longitude"])
if len(animals) > MAX_ANIMALS:
    animals = animals.sample(MAX_ANIMALS, random_state=42)
shelters = pd.read_csv(SHELTER_CSV)

# 动物 → 社区匹配
gdf_animals = gpd.GeoDataFrame(
    animals, geometry=gpd.points_from_xy(animals.Longitude, animals.Latitude), crs="EPSG:4326")
joined = gpd.sjoin(gdf_animals, gdf_neigh[[name_col,"geometry"]],
                   predicate="within", how="left")
cnts = joined.groupby(name_col)["Animal ID"].count().rename("animal_cnt").reset_index()
gdf_neigh = gdf_neigh.merge(cnts, on=name_col, how="left").fillna({"animal_cnt":0})

# colour
bins = pd.qcut(gdf_neigh["animal_cnt"], N_BINS, duplicates="drop", retbins=True)[1]
cm = branca.colormap.LinearColormap([COLOR_MIN,COLOR_MAX],
                                    vmin=bins.min(), vmax=bins.max()
                                    ).to_step(len(bins)-1)
cm.caption = "Total Animal Intakes by Neighborhood"
def style_fn(f):
    return {"fillColor": cm(f["properties"]["animal_cnt"]),
            "color":"#A5C8FF","weight":0.6,"fillOpacity":0.7}

#Folium 地图
m = folium.Map(location=[30.27,-97.74], zoom_start=11, tiles="cartodbpositron")

folium.GeoJson(
    gdf_neigh.to_json(), style_function=style_fn,
    tooltip=folium.GeoJsonTooltip(
        fields=[name_col,"animal_cnt"],
        aliases=["Neighborhood","Total Intakes"],
        sticky=False, labels=True,
        style=("background:#FFF5BF;font-family:'Comic Sans MS';"
               "padding:4px;border-radius:5px;font-size:12px;"))
    ,name="Intake Choropleth").add_to(m)
cm.add_to(m)

# Info card
m.get_root().html.add_child(branca.element.Element(f"""
<div style="position:absolute;top:20px;left:20px;z-index:9999;
            background:#FFFAF0;padding:10px 15px;border-radius:10px;
            font-family:'Comic Sans MS',sans-serif;font-size:13px;color:#3C3C3C;
            box-shadow:0 0 6px rgba(0,0,0,0.1);max-width:320px;">
  <h4 style="margin:0;font-size:16px;"><b>Shelter Coverage & Intakes</b></h4>
  <p style="margin:4px 0;">Base color: intake counts (yellow→blue)<br/>
     Blue circle: {CIRCLE_RADIUS//1000}&nbsp;km shelter radius<br/>
     House: shelter location</p>
</div>"""))



# MarkerCluster color
cluster = MarkerCluster(
    name='Stray Animals (cluster)',
    icon_create_function=r'''
    function(cluster){
        var count = cluster.getChildCount();
        var color = '#FFA534';          // ≤25  orange
        if (count > 100)      { color = '#A3C4F3'; }   // >100  pastel blue
        else if (count > 25)  { color = '#FFE599'; }   // 26‑100 pastel yellow

        return new L.DivIcon({
            html: '<div style="background:'+color+';"><span>'+count+'</span></div>',
            className: 'custom-cluster',
            iconSize: [40, 40]
        });
    }
    '''
).add_to(m)

# 把动物点加入聚类（保持不变）
for _, r in animals.iterrows():
    folium.CircleMarker(
        location=[r.Latitude, r.Longitude],
        radius=3,
        color='#3186CC',
        fill=True,
        fill_color='#3186CC',
        fill_opacity=0.5,
        popup=f"{r['Animal Type']} — {r['Found Location']}"
    ).add_to(cluster)

# 追加聚类气泡的 CSS（数字居中、圆形）
m.get_root().html.add_child(branca.element.Element("""
<style>
.custom-cluster div{
  width:40px; height:40px; line-height:40px; border-radius:20px;
  text-align:center; color:#FFFFFF; font-weight:bold;
  font-family:Arial,Helvetica,sans-serif;
  box-shadow:0 0 0 1.5px #fff inset;
}
</style>
"""))


# Individual points
COLOR_MAP = {
    "Dog":   "#FFA534",
    "Cat":   "#FFE599",
    "Other": "#A3C4F3"
}
DEFAULT_COLOR = "#8CD5FF"

# FeatureGroup
type_groups = {}
for typ in animals["Animal Type"].unique():
    fg = folium.FeatureGroup(name=f"{typ} (points)", show=True)
    fg.add_to(m)
    type_groups[typ] = fg

# add points to groups
for _, r in animals.iterrows():
    typ = r["Animal Type"]
    fg  = type_groups.get(typ)
    if fg is None:                       # CSV 里出现未知新类型
        fg = type_groups.setdefault("Other", folium.FeatureGroup(
                name="Other (points)", show=True).add_to(m))
    color = COLOR_MAP.get(typ, DEFAULT_COLOR)
    folium.CircleMarker(
        location=[r.Latitude, r.Longitude],
        radius=3.5,
        color=color, weight=0.5,
        fill=True, fill_color=color, fill_opacity=0.35,
        popup=f"{typ} — {r['Found Location']}"
    ).add_to(fg)

#Shelters & Coverage


shelter_fg = folium.FeatureGroup(name="Shelters & Coverage", show=True)

for _, r in shelters.iterrows():
    # 覆盖圈
    folium.Circle(
        location=[r.latitude, r.longitude],
        radius=CIRCLE_RADIUS,
        color="#629BFD", weight=1,
        fill=True, fill_color="#629BFD", fill_opacity=0.25,
        tooltip=r.shelter_name
    ).add_to(shelter_fg)

    folium.Marker(
        location=[r.latitude, r.longitude],
        icon=folium.CustomIcon(ICON_PATH, icon_size=(34, 34)),
        tooltip=r.shelter_name
    ).add_to(shelter_fg)

# **关键**：最后再把 FeatureGroup 加到地图，这样它在最上层
shelter_fg.add_to(m)


folium.LayerControl(position="topright", collapsed=False).add_to(m)

m.save("shelter_and_animal_intakes.html")
print("Map saved: shelter_and_animal_intakes.html")

#### Intake → Outcome Flow
We made this Sankey chart to show where most animals enter the system and where they finally go, so we can see how well the main rescue‑to‑placement pipeline works and which small streams still need help.

In [None]:
import pandas as pd, plotly.graph_objects as go

df = pd.read_csv("merged_data.csv")

src_col = "Intake Type"
tgt_col = "Outcome Type"

flow = (df.groupby([src_col, tgt_col])
          .size()
          .reset_index(name="count"))

#  建立 label⇄索引映射
labels = pd.concat([flow[src_col], flow[tgt_col]]).unique().tolist()
label2id = {lbl:i for i, lbl in enumerate(labels)}

flow["src_id"] = flow[src_col].map(label2id)
flow["tgt_id"] = flow[tgt_col].map(label2id)

# 3) Sankey
src_pal = ['#f9cb9c', '#ffe599']
tgt_color = '#cfe2f3'

# 判断哪些是源节点
src_set = set(flow[src_col])

# 构造节点颜色
node_colors = []
src_color_map = {}
for i, lbl in enumerate(labels):
    if lbl in src_set:
        c = src_pal[i % 2]         # 交替选色
        src_color_map[lbl] = c
        node_colors.append(c)
    else:
        node_colors.append(tgt_color)

# #RRGGBB → rgba(r,g,b,α)
def hex2rgba(hexclr, alpha=0.4):
    hexclr = hexclr.lstrip('#')
    r, g, b = (int(hexclr[j:j+2], 16) for j in (0,2,4))
    return f"rgba({r},{g},{b},{alpha})"

link_colors = [
    hex2rgba(src_color_map[ labels[s] ]) for s in flow["src_id"]
]


fig = go.Figure(go.Sankey(
    node=dict(
        label     = labels,
        pad       = 20,
        thickness = 18,
        color     = node_colors
    ),
    link=dict(
        source = flow["src_id"],
        target = flow["tgt_id"],
        value  = flow["count"],
        color  = link_colors
    )
))
fig.update_layout(
    title_text = "Austin Animal Intake → Outcome Flow (2014‑2024)",
    font_size  = 12, height = 500,width=860,
    font_family='Comic Sans MS',
    margin     = dict(l=10, r=10, t=40, b=10),
    paper_bgcolor= "rgba(0,0,0,0)",   # ← 整张画布透明
    plot_bgcolor = "rgba(0,0,0,0)",    # ← 绘图区透明
    autosize=False
)
fig.write_html(
    "animal_outcomes_flow_sankey.html",
    include_plotlyjs="cdn",
    full_html=False,
    config={"responsive": True}
)
print("Sankey saved")



### 4. Genre
#### Which genre of data story did you use?
We used a Martini Glass narrative structure: The stem is a focused, linear presentation (time series → heatmap animation),followed by an open exploration body (static choropleth map for spatial analysis and comparison).

#### Visual Narrative tools used
- Graphical highlighting (e.g., saturation of heatmap points over time)
- Progressive reveal (the animated heatmap adds time dimension interactively)
- Visual grouping (e.g., color mapping in the choropleth for comparing regions)

#### Narrative Structure tools used
- Author-driven sequence at the start (clear framing of the problem through time trend)
- Reader-driven interaction in choropleth map and animation slider
- Multi-messaging via annotation blocks and map legends

###  5. Visualizations
To effectively communicate insights from the dataset, we developed a set of visualizations, each tailored to a specific narrative need:

We began with an animated heatmap, built using Folium’s HeatMapWithTime, which dynamically displays how the geographic concentration of stray animal pickups evolved over time. This animation reveals consistent hotspots in downtown and east Austin, and shows how stray activity gradually expanded northeast over the years. It adds a powerful spatio-temporal dimension to the narrative, guiding the viewer through changing urban shelter pressures.

To offer a cumulative view of shelter burden, we created a neighborhood-level choropleth map using Folium and GeoJSON. The map uses a yellow-to-blue color gradient to represent total intake counts per region. Areas like East Riverside and St. John clearly stand out, allowing readers to compare neighborhood-level disparities in animal intake.

We then introduced a monthly time series line chart, constructed with Plotly, to illustrate seasonal trends in shelter intakes from 2013 to 2025.

### 6. Discussion
#### What went well?
This project provided a rich opportunity to explore stray animal intake patterns in Austin by combining spatial, temporal, and categorical data through a carefully structured visual narrative.
Our use of the Martini Glass structure (Segel & Heer, 2010) was especially helpful in guiding the viewer. The linear sequence of visualizations at the beginning ensured a clear introduction to the problem. As the narrative progressed, we allowed more room for exploration — such as through the interactive intake-condition outcome chart and the Sankey diagram. These elements supported both author-driven messaging and reader-driven discovery, making the overall experience engaging and accessible


#### What is still missing? What could be improved? Why?
Despite these strengths, several limitations remain. Some of our charts were static, especially the bar and pie charts, which could have been more engaging if built with interactive tools like Plotly or Altair. Additionally, while we conducted extensive exploratory analysis, we did not incorporate predictive modeling. Future versions of this project could include classification models (e.g., predicting adoption likelihood based on intake condition, age, or shelter) to offer more actionable insights for shelter operations. Finally, although we used a manually geocoded list of shelter locations to assess service coverage, the spatial accuracy of some intake data may still be limited by the original address format or missing GPS points. Expanding the dataset or combining it with open civic maps could enhance precision in further iterations.

### 7. Contributions
- s242613 Yuling Zhai: Geocoding/Heatmap of stray-animal pickup locations/
Neighborhood-level intake counts/Monthly shelter-intake trend
- s242614 Shimin Huang: Data Cleaning & Preprocessing/Shelter Coverage & Intakes/Intake-Outcome Flow

### References
Segel & Heer, 2010, Narrative Visualization: Telling Stories with Data
