### 1. Motivation
#### What is your dataset?
We used three datasets related to Austin Animal Center:
- Austin_Animal_Center_Intakes_20250419.csv: records of animals entering the shelter, including intake dates and animal types.
- Austin_Animal_Center_Outcomes_20250419.csv: outcome records of animals leaving the shelter (e.g., adoption, euthanasia).
- shelter_geocoded_locations.csv: a manually geocoded dataset that maps official shelter addresses in Austin to their corresponding latitude and longitude coordinates, based on data from the Austin Animal Center's official shelter list.
- neighbour.json: downloaded from https://data.austintexas.gov/.

#### Why did you choose this/these particular dataset(s)?
The intake dataset offers a rich, time-stamped record of real-world animal shelter operations. With the help of manually geocoded shelter addresses, we were able to conduct meaningful spatial analysis. This multi-angle approach allows for an accessible and engaging visual story.

#### What was your goal for the end user's experience?
We want users to understand the patterns behind stray animal intake: when animals enter shelters most frequently, and which shelter regions are under more pressure. The visualizations should guide users from temporal overview to regional insights.

### 2. Basic stats

#### Data Cleaning & Preprocessing Steps
- Merged intake & outcome tables: According to Animal ID.Keeping duplicates for repeat visits.
- Date parsing: Converted intake_datetime to proper datetime format.
- Shelter matching: Linked each record to a shelter location based on the shelter_name or address field.
- Geolocation: Used shelter_geocoded_locations.csv to attach coordinates to each shelter site.
- Neighborhood mapping: Mapped coordinates to official Austin neighborhood polygons using spatial joins.
- Time aggregation: Grouped intake counts by month and by shelter region.
- Outliers: Removed records with missing animal type or unmatchable shelter data.

#### Dataset Statistics
- Intakes dataset rows: 126,234
- Outcomes dataset rows: 124,513
- Unique animal types: 5 (Dogs, Cats, Birds, Others, Small Mammals)
- Mapped shelter coordinates: Successfully matched to all listed shelters
- Time span: October 2013 to April 2025
- Number of mapped shelters: 12

### 3. Data Analysis

#### Dynamic Hotspot Map of Animal Intakes
We used Folium’s HeatMapWithTime to animate changes in geographic hotspots across Austin over time. Each frame represents a specific month from October 2013 to April 2025.

In [None]:
import pandas as pd
import folium
from folium.plugins import HeatMapWithTime, Fullscreen, MeasureControl
import branca.colormap as cm

# === Step 1: Load data ===
df = pd.read_csv("merged_data.csv", parse_dates=["Intake Datetime"])
df = df.dropna(subset=["Latitude", "Longitude", "Intake Datetime"]).copy()
df = df[(df["Latitude"].between(-90, 90)) & (df["Longitude"].between(-180, 180))]
df["Month_Formatted"] = df["Intake Datetime"].dt.strftime("%Y-%m")

# === Step 2: Add weight ===
def add_weight(points_list):
    from collections import defaultdict
    point_counts = defaultdict(int)
    weighted = []
    for p in points_list:
        rounded = (round(p[0], 3), round(p[1], 3))
        point_counts[rounded] += 1
    for p in points_list:
        rounded = (round(p[0], 3), round(p[1], 3))
        weighted.append([p[0], p[1], min(point_counts[rounded] * 0.2, 1)])
    return weighted

time_index = sorted(df["Month_Formatted"].unique())
heat_data = [add_weight(df[df["Month_Formatted"] == m][["Latitude", "Longitude"]].values.tolist()) for m in time_index]

# === Step 3: Setup map ===
center = [df["Latitude"].median(), df["Longitude"].median()]
m = folium.Map(location=center, zoom_start=12, tiles="CartoDB Positron", control_scale=True)
MeasureControl(position="bottomright").add_to(m)

# === Step 4: Color gradient ===
colors = ["#c4e9f2", "#7fbbdd", "#f58b05", "#ffc22f"]
colormap = cm.LinearColormap(colors=colors, index=[0, 0.3, 0.6, 1], vmin=0, vmax=1, caption="Hotspot Density")
colormap.add_to(m)

# === Step 5: Heatmap with time ===
HeatMapWithTime(
    data=heat_data,
    index=time_index,
    radius=15,
    min_opacity=0.3,
    max_opacity=0.9,
    gradient={i / 3: colors[i] for i in range(4)},
    use_local_extrema=True,
    auto_play=True,
    display_index=True
).add_to(m)

# === Step 6: Title box ===
title_html = f"""
<div id="title-card" style="
    position: absolute;
    top: 20px;
    left: 20px;
    z-index: 9999;
    background-color: rgba(255,255,255,0.85);
    padding: 10px 15px;
    border-radius: 8px;
    font-family: sans-serif;
    box-shadow: 0 0 8px rgba(0,0,0,0.1);
    max-width: 300px;
    font-size: 13px;
">
    <h4 style="margin: 0; font-size: 1em;"><b>Dynamic Hotspot Map of Animal Intakes</b></h4>
    <p style="margin: 2px 0;">Monthly distribution of animal intake hotspots</p>
    <p style="margin: 0;">Data range: {df["Intake Datetime"].min().strftime('%Y-%m')} to {df["Intake Datetime"].max().strftime('%Y-%m')}</p>
</div>
"""
m.get_root().html.add_child(folium.Element(title_html))

# === Step 7: Final JS fix using MutationObserver ===
custom_js = """
<script>
document.addEventListener("DOMContentLoaded", function () {
    const map = document.querySelector('.folium-map');
    if (map) map.style.position = 'relative';

    const observer = new MutationObserver(() => {
        const ctrl = document.querySelector('.leaflet-control-timecontrol');
        const target = document.querySelector('.leaflet-bottom.leaflet-left');

        if (ctrl && target && ctrl.querySelectorAll("button").length > 0) {
            // Remove all buttons except play/pause (index 1)
            const buttons = ctrl.querySelectorAll("button");
            buttons.forEach((btn, i) => {
                if (i !== 1) btn.remove();
            });

            // Remove fps control row
            const rows = ctrl.querySelectorAll("tr");
            if (rows.length > 1) rows[1].remove();

            // Move and style control
            target.appendChild(ctrl);
            Object.assign(ctrl.style, {
                position: 'absolute',
                left: '0',
                bottom: '0',
                margin: '20px',
                zIndex: '10000',
                background: 'white',
                borderRadius: '8px',
                padding: '8px',
                boxShadow: '0 0 6px rgba(0,0,0,0.2)',
                display: 'inline-block',
                maxWidth: '600px',
                overflow: 'hidden',
                transform: 'translateX(0%)'
            });

            observer.disconnect();
        }
    });

    observer.observe(document.body, { childList: true, subtree: true });
});
</script>
"""
m.get_root().html.add_child(folium.Element(custom_js))

# === Step 8: Fullscreen button ===
Fullscreen(position="topright").add_to(m)

# === Step 9: Save output ===
m.save("heatmap_final_clean_controls.html")
print("✅ Saved: heatmap_final_clean_controls.html")

####Choropleth of Animal Intakes
Using a GeoJSON file of Austin neighborhoods, we mapped total intake counts per region and applied a yellow-to-blue gradient using Folium and D3 for visual contrast.

In [None]:
import pandas as pd
import folium
import json
from shapely.geometry import shape, Point
import branca.colormap as cm

# === Step 1: 加载收容数据 ===
df = pd.read_csv("merged_data.csv", parse_dates=["Intake Datetime"])
df = df.dropna(subset=["Latitude", "Longitude"])

# === Step 2: 加载 GeoJSON 数据 ===
with open("Neighborhoods_20250506.geojson") as f:
    gj = json.load(f)

# 自动检测区域字段名
region_field = next((k for k in gj["features"][0]["properties"]
                     if "name" in k.lower() or "label" in k.lower()),
                    list(gj["features"][0]["properties"].keys())[0])

# === Step 3: 将点匹配到区域 ===
polygon_map = {f["properties"][region_field]: shape(f["geometry"]) for f in gj["features"]}

def assign_region(row):
    pt = Point(row["Longitude"], row["Latitude"])
    for name, poly in polygon_map.items():
        if poly.contains(pt):
            return name
    return None

df["region"] = df.apply(assign_region, axis=1)
df = df.dropna(subset=["region"])

# === Step 4: 汇总每个区域的收容数量 ===
region_counts = df.groupby("region").size().reset_index(name="animal_count")
region_dict = region_counts.set_index("region")["animal_count"].to_dict()

# 写入 GeoJSON 属性
for f in gj["features"]:
    rid = f["properties"].get(region_field)
    count = region_dict.get(rid, 0)
    f["properties"]["animal_count"] = int(count)

# === Step 5: 创建底图 ===
m = folium.Map(location=[30.27, -97.74], zoom_start=11, tiles="CartoDB Positron")

# === Step 6: 创建渐变色条（淡黄到主蓝）===
colormap = cm.LinearColormap(
    colors=["#FFF5BF", "#639BFF"],  # 淡黄 → 主蓝
    vmin=min(region_dict.values()),
    vmax=max(region_dict.values()),
    caption="Total Animal Intakes by Neighborhood"
)
colormap.add_to(m)

# === Step 7: 绘制 GeoJson 多边形图层 ===
folium.GeoJson(
    gj,
    style_function=lambda feature: {
        "fillColor": colormap(feature["properties"].get("animal_count", 0)),
        "color": "#A5C8FF",         # 浅蓝边界
        "weight": 0.5,
        "fillOpacity": 0.7
    },
    tooltip=folium.GeoJsonTooltip(
        fields=[region_field, "animal_count"],
        aliases=["Neighborhood:", "Animal Intakes:"],
        localize=True,
        style=(
            "background-color: #FFF5BF; "
            "color: #3C3C3C; "
            "font-family: Comic Sans MS, sans-serif; "
            "font-size: 12px; "
            "padding: 6px; border-radius: 5px;"
        )
    )
).add_to(m)

# === Step 8: 添加标题卡片 ===
title_html = """
<div style="
    position: absolute;
    top: 20px;
    left: 20px;
    z-index: 9999;
    background-color: #FFFAF0;
    padding: 10px 15px;
    border-radius: 10px;
    font-family: Comic Sans MS, sans-serif;
    box-shadow: 0 0 6px rgba(0,0,0,0.1);
    color: #3C3C3C;
    font-size: 13px;
    max-width: 300px;
">
    <h4 style="margin: 0; font-size: 16px;"><b>Choropleth of Animal Intakes</b></h4>
    <p style="margin: 4px 0;">Neighborhoods shaded by total intake counts</p>
    <p style="margin: 0;">Data source: merged_data.csv</p>
</div>
"""
m.get_root().html.add_child(folium.Element(title_html))

# === Step 9: 导出为 HTML 文件 ===
m.save("choropleth_yellow_to_blue.html")
print("✅ Saved: choropleth_yellow_to_blue.html")

#### Monthly shelter-intake trend
We grouped the dataset by intake month and counted total animal entries. A time series chart was created using Plotly to reflect seasonal and long-term trends.

In [None]:
import pandas as pd
import plotly.graph_objects as go

# === Step 1: 加载 intake 数据 ===
df = pd.read_csv("merged_data.csv", parse_dates=["Intake Datetime"])
df = df.dropna(subset=["Intake Datetime"])
df["Month"] = df["Intake Datetime"].dt.to_period("M").astype(str)

monthly_counts = (
    df.groupby("Month")
    .size()
    .reset_index(name="Count")
    .sort_values("Month")
)
monthly_counts["Month"] = pd.to_datetime(monthly_counts["Month"])

# === Step 2: 自定义颜色样式 ===
colors = {
    "background": "#FFFAF0",   # 奶白色
    "line": "#639BFF",         # 折线蓝
    "marker": "#FFE25F",       # 柠檬黄点
    "font": "#3C3C3C"          # 深灰字体
}

# === Step 3: 构建图表 ===
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=monthly_counts["Month"],
    y=monthly_counts["Count"],
    mode="lines+markers",
    line=dict(color=colors["line"], width=4, shape="spline"),
    marker=dict(size=9, color=colors["marker"], line=dict(width=1, color="white")),
    hovertemplate="Month: %{x|%b %Y}<br>Intakes: %{y}<extra></extra>"
))

# === Step 4: 布局与风格 ===
fig.update_layout(
    title="From Streets to Shelter: Monthly Intake Trend",
    xaxis_title="Month",
    yaxis_title="Number of Animals",
    template="simple_white",
    font=dict(family="Comic Sans MS, Arial, sans-serif", size=16, color=colors["font"]),
    hovermode="x unified",
    margin=dict(t=60, l=60, r=40, b=60),
    plot_bgcolor=colors["background"],
    paper_bgcolor=colors["background"]
)

# === Step 5: 导出为 HTML 文件 ===
fig.write_html("monthly_intake_lemon_highlight_cartoon.html")
print("✅ Saved: monthly_intake_lemon_highlight_cartoon.html")

### 4. Genre
#### Which genre of data story did you use?
We used a Martini Glass narrative structure: The stem is a focused, linear presentation (time series → heatmap animation),followed by an open exploration body (static choropleth map for spatial analysis and comparison).

#### Visual Narrative tools used
- Graphical highlighting (e.g., saturation of heatmap points over time)
- Progressive reveal (the animated heatmap adds time dimension interactively)
- Visual grouping (e.g., color mapping in the choropleth for comparing regions)

#### Narrative Structure tools used
- Author-driven sequence at the start (clear framing of the problem through time trend)
- Reader-driven interaction in choropleth map and animation slider
- Multi-messaging via annotation blocks and map legends

###  5. Visualizations
To effectively communicate insights from the dataset, we developed a set of visualizations, each tailored to a specific narrative need:

We began with an animated heatmap, built using Folium’s HeatMapWithTime, which dynamically displays how the geographic concentration of stray animal pickups evolved over time. This animation reveals consistent hotspots in downtown and east Austin, and shows how stray activity gradually expanded northeast over the years. It adds a powerful spatio-temporal dimension to the narrative, guiding the viewer through changing urban shelter pressures.

To offer a cumulative view of shelter burden, we created a neighborhood-level choropleth map using Folium and GeoJSON. The map uses a yellow-to-blue color gradient to represent total intake counts per region. Areas like East Riverside and St. John clearly stand out, allowing readers to compare neighborhood-level disparities in animal intake.

We then introduced a monthly time series line chart, constructed with Plotly, to illustrate seasonal trends in shelter intakes from 2013 to 2025.

### 6. Discussion
#### What went well?
This project provided a rich opportunity to explore stray animal intake patterns in Austin by combining spatial, temporal, and categorical data through a carefully structured visual narrative.
Our use of the Martini Glass structure (Segel & Heer, 2010) was especially helpful in guiding the viewer. The linear sequence of visualizations at the beginning ensured a clear introduction to the problem. As the narrative progressed, we allowed more room for exploration — such as through the interactive intake-condition outcome chart and the Sankey diagram. These elements supported both author-driven messaging and reader-driven discovery, making the overall experience engaging and accessible


#### What is still missing? What could be improved? Why?
Despite these strengths, several limitations remain. Some of our charts were static, especially the bar and pie charts, which could have been more engaging if built with interactive tools like Plotly or Altair. Additionally, while we conducted extensive exploratory analysis, we did not incorporate predictive modeling. Future versions of this project could include classification models (e.g., predicting adoption likelihood based on intake condition, age, or shelter) to offer more actionable insights for shelter operations. Finally, although we used a manually geocoded list of shelter locations to assess service coverage, the spatial accuracy of some intake data may still be limited by the original address format or missing GPS points. Expanding the dataset or combining it with open civic maps could enhance precision in further iterations.

We lost about 55% of geocoded addresses, which reduced the spatial accuracy. An interactive dashboard combining filters (e.g., by animal type, intake condition) would offer deeper insights. Labeling and overlays on maps could enhance interpretability for first-time viewers.

### 7. Contributions
s242613 Yuling Zhai: Heatmap of stray-animal pickup locations/Neighborhood-level intake counts/Monthly shelter-intake trend

### References
Segel & Heer, 2010, Narrative Visualization: Telling Stories with Data
