STEP 1 — LOAD CRIME DATA + INITIAL INSPECTION

In [1]:
import pandas as pd

In [None]:
import boto3
import warnings
warnings.filterwarnings('ignore')
CRIME_KEY = "Crimes_2018_to_Present.csv"
bucket = "group-6-chicago-crime-data"
s3 = boto3.client("s3")
CRIME_KEY = "raw/Crimes_2018_to_Present.csv"
crime_df = pd.read_csv(f"s3://{bucket}/{CRIME_KEY}", low_memory=False)
print("Crime data shape:", crime_df.shape)
crime_df.head()


Crime data shape: (1923226, 22)


Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,14030083,JJ490890,11/15/2025 10:54:00 PM,051XX W BLOOMINGDALE AVE,420,BATTERY,AGGRAVATED - KNIFE / CUTTING INSTRUMENT,RESIDENCE,True,True,...,37.0,25.0,04B,1141877.0,1911484.0,2025,11/23/2025 03:42:58 PM,41.913178,-87.754211,"(41.913177726, -87.754210718)"
1,14030104,JJ490911,11/15/2025 10:49:00 PM,048XX W HENDERSON ST,4387,OTHER OFFENSE,VIOLATE ORDER OF PROTECTION,APARTMENT,True,True,...,31.0,15.0,26,1143465.0,1921833.0,2025,11/23/2025 03:42:58 PM,41.941547,-87.748117,"(41.941546843, -87.748117196)"
2,14030070,JJ490877,11/15/2025 10:45:00 PM,009XX W 116TH ST,560,ASSAULT,SIMPLE,RESIDENCE,False,True,...,21.0,53.0,08A,1171942.0,1827824.0,2025,11/23/2025 03:42:58 PM,41.682996,-87.646216,"(41.682995567, -87.646215535)"
3,14030085,JJ490876,11/15/2025 10:43:00 PM,021XX N LONG AVE,520,ASSAULT,AGGRAVATED - KNIFE / CUTTING INSTRUMENT,SCHOOL - PUBLIC GROUNDS,False,False,...,36.0,19.0,04A,1140035.0,1913783.0,2025,11/23/2025 03:42:58 PM,41.91952,-87.760922,"(41.919520368, -87.760921537)"
4,14030065,JJ490870,11/15/2025 10:40:00 PM,005XX N LAWNDALE AVE,2017,NARCOTICS,MANUFACTURE / DELIVER - CRACK,ALLEY,True,False,...,27.0,23.0,18,1151626.0,1903280.0,2025,11/23/2025 03:42:58 PM,41.890479,-87.718611,"(41.890478932, -87.718610734)"


Parse ProQuest TXT → article-level news_df

In [None]:
import boto3, os

# If not already created earlier:
# s3 = boto3.client("s3")


prefix = "raw/news/"   # adjust if your path is different

resp = s3.list_objects_v2(Bucket=bucket, Prefix=prefix)

contents = resp.get("Contents", [])
txt_keys = [o["Key"] for o in contents if o["Key"].lower().endswith(".txt")]

print("Total S3 objects under", prefix, ":", len(contents))
print("TXT files found:", len(txt_keys))
print("First 5 keys:")
for k in txt_keys[:5]:
    print("  ", k)

# Downloaded locally into news_local (only if not already there)
os.makedirs("news_local", exist_ok=True)

for key in txt_keys:
    local_name = os.path.basename(key)
    local_path = os.path.join("news_local", local_name)
    if not os.path.exists(local_path):
        print("Downloading:", key, "->", local_path)
        s3.download_file(bucket, key, local_path)

print("Done. Local TXT files:", len(os.listdir("news_local")))
import os, re, glob
import pandas as pd
from tqdm import tqdm

# -----------------------------------------------------------
# Helper: extract multi-line field safely
# -----------------------------------------------------------
def extract_field(block, field_name):
    """
    Extracts a metadata field like 'Title:' or 'Publication date:'.

    Rule:
    - Capture everything after "FieldName:"
    - Stop when we hit the next field label or the end of block
    """
    pattern = rf"{field_name}\s*:\s*(.*?)(?=\n[A-Za-z ]+\s*:|\Z)"
    match = re.search(pattern, block, flags=re.DOTALL | re.IGNORECASE)
    if match:
        return match.group(1).strip()
    return None


# -----------------------------------------------------------
# Final parsing function
# -----------------------------------------------------------
def parse_proquest_file(path):
    with open(path, "r", encoding="utf-8", errors="ignore") as f:
        text = f.read()

    # Split articles by line of underscores
    blocks = re.split(r"_+\s*\n", text)

    articles = []
    for block in blocks:
        block = block.strip()
        if len(block) < 50:
            continue

        # Extract EXACT required fields
        title  = extract_field(block, "Title")
        author = extract_field(block, "Author")
        pubdate = extract_field(block, "Publication date")
        doctype = extract_field(block, "Document type")
        subject = extract_field(block, "Subject")
        fulltext = extract_field(block, "Full text")

        # Title fallback (never skip)
        if not title:
            first_line = block.splitlines()[0].strip()
            title = first_line[:150]

        # Full text fallback (never missing)
        if not fulltext:
            # full article minus metadata
            lines = block.splitlines()
            fulltext = "\n".join(lines).strip()

        articles.append({
            "Title": title,
            "Author": author,
            "PublicationDateRaw": pubdate,
            "DocumentType": doctype,
            "Subject": subject,
            "FullText": fulltext,
            "RawBlock": block
        })

    return articles


# -----------------------------------------------------------
# Parse ALL files in news_local/
# -----------------------------------------------------------
all_articles = []

txt_files = glob.glob("news_local/*.txt")
print("Local TXT files found:", len(txt_files))

for file in txt_files:
    parsed = parse_proquest_file(file)
    print(f"Parsed {len(parsed):4d} articles from {os.path.basename(file)}")
    all_articles.extend(parsed)

news_df = pd.DataFrame(all_articles)
print("\nNEWS raw parsed shape:", news_df.shape)

news_df.head()

 

Local TXT files found: 31
Parsed  501 articles from ProQuestDocuments-2025-11-25 (22).txt
Parsed  539 articles from ProQuestDocuments-2025-11-25 (18).txt
Parsed  501 articles from ProQuestDocuments-2025-11-25 (14).txt
Parsed  501 articles from ProQuestDocuments-2025-11-25 (15).txt
Parsed  501 articles from ProQuestDocuments-2025-11-25 (19).txt
Parsed  502 articles from ProQuestDocuments-2025-11-25 (23).txt
Parsed  501 articles from ProQuestDocuments-2025-11-25 (28).txt
Parsed  508 articles from ProQuestDocuments-2025-11-25 (12).txt
Parsed  501 articles from ProQuestDocuments-2025-11-25 (32).txt
Parsed  501 articles from ProQuestDocuments-2025-11-25 (24).txt
Parsed  501 articles from ProQuestDocuments-2025-11-25 (7).txt
Parsed  514 articles from ProQuestDocuments-2025-11-25 (6).txt
Parsed  501 articles from ProQuestDocuments-2025-11-25 (25).txt
Parsed    6 articles from ProQuestDocuments-2025-11-25 (33).txt
Parsed  501 articles from ProQuestDocuments-2025-11-25 (13).txt
Parsed  501 arti

Unnamed: 0,Title,Author,PublicationDateRaw,DocumentType,Subject,FullText,RawBlock
0,"1 dead, 2 hurt after stolen vehicle hits house...","Fry, Paige","Dec 15, 2019",News,Fires; Fatalities; Automobile theft,A teenage boy was killed and two other teenage...,"1 dead, 2 hurt after stolen vehicle hits house..."
1,Democrats' impeachment of Trump is way too thin,"Kass, John","Dec 15, 2019",News,Extortion; Impeachment; Scandals; Candidates,Where did the Trump impeachment go?\nWhen Hous...,Democrats' impeachment of Trump is way too thi...
2,"Nearly 1,000 volunteers help hang lights down ...","Williams-Harris, Deanese","Dec 14, 2019",News,Community; Trauma; Holiday decorations,"Community activists from My Block, My Hood, My...","Nearly 1,000 volunteers help hang lights down ..."
3,"1 teen dead, 2 injured after stolen vehicle cr...","Fry, Paige","Dec 14, 2019",News,Fatalities; Automobile theft,A teenage boy was killed and two other teenage...,"1 teen dead, 2 injured after stolen vehicle cr..."
4,VOICE OF THE PEOPLE,,"Dec 14, 2019",News,Agreements; Arbitration; Judaism; Long term he...,"Judaism is not a nationality\nAs a Jew, I am h...",VOICE OF THE PEOPLE\n\nhttps://www.proquest.co...


In [4]:
!pip install faiss-cpu --no-build-isolation --no-cache-dir




In [5]:
!pip install -q pandas numpy boto3 sentence-transformers faiss-cpu plotly matplotlib seaborn tqdm

In [6]:
!pip install -q google-generativeai

In [7]:
import os, gc, json, re, glob, warnings
import pandas as pd
import numpy as np
from tqdm.auto import tqdm
from sentence_transformers import SentenceTransformer
import faiss
from datetime import datetime, timedelta
from tqdm import tqdm

warnings.filterwarnings('ignore')


  from .autonotebook import tqdm as notebook_tqdm


In [8]:
CACHE_DIR = "cache"
os.makedirs(CACHE_DIR, exist_ok=True)

CRIME_CHUNKS_FILE = f"{CACHE_DIR}/crime_chunks.jsonl"
NEWS_CHUNKS_FILE  = f"{CACHE_DIR}/news_chunks.jsonl"
LINKS_FILE        = f"{CACHE_DIR}/crime_news_links.jsonl"

In [9]:
def json_safe(x):
    if isinstance(x, (np.int64, np.int32, int)): return int(x)
    if isinstance(x, (np.float64, np.float32, float)): return float(x)
    if isinstance(x, pd.Timestamp): return x.isoformat()
    return str(x)

def write_jsonl(path, obj):
    obj = {k: json_safe(v) if not isinstance(v, dict)
           else {a: json_safe(b) for a,b in v.items()}
           for k,v in obj.items()}
    with open(path, "a", encoding="utf-8") as f:
        f.write(json.dumps(obj) + "\n")

In [10]:
class NarrativeIntelligenceSystem:

    def __init__(self, crime_df, news_df, embedding_model="all-MiniLM-L6-v2"):
        print("Initializing system...")
        self.crime_df = crime_df.copy()
        self.news_df  = news_df.copy()

        self.model = SentenceTransformer(embedding_model)

        # Pre-clean
        self.crime_df["Date"] = pd.to_datetime(self.crime_df["Date"])
        self.news_df["PublicationDate"] = pd.to_datetime(
            self.news_df["PublicationDateRaw"], errors="coerce"
        )

        print("Data ready.")

    # ------------------------------------------------------------
    # CREATE CRIME CHUNKS
    # ------------------------------------------------------------
    def create_crime_chunks(self, min_crimes=5):
        print(" Creating crime chunks → JSONL")

        if os.path.exists(CRIME_CHUNKS_FILE):
            os.remove(CRIME_CHUNKS_FILE)

        g = self.crime_df.groupby(["Primary Type", "Block", self.crime_df["Date"].dt.date])

        for (ptype, block, day), grp in tqdm(g, total=len(g)):
            if len(grp) < min_crimes:
                continue

            text = f"""
Crime Report Summary
Type: {ptype}
Block: {block}
Date: {day}
Incidents: {len(grp)}
Arrests: {grp['Arrest'].sum()}
Latitude: {grp['Latitude'].mean()}
Longitude: {grp['Longitude'].mean()}
"""
            metadata = {
                "primary_type": ptype,
                "block": block,
                "date": str(day),
                "count": len(grp)
            }

            write_jsonl(CRIME_CHUNKS_FILE, {"text": text, "metadata": metadata})

        print("Crime chunks saved.")

    # ------------------------------------------------------------
    # CREATE NEWS CHUNKS
    # ------------------------------------------------------------
    def create_news_chunks(self, max_len=1500):
        print(" Creating news chunks → JSONL")

        if os.path.exists(NEWS_CHUNKS_FILE):
            os.remove(NEWS_CHUNKS_FILE)

        for _, row in tqdm(self.news_df.iterrows(), total=len(self.news_df)):
            text = f"Title: {row['Title']}\n\n{row['FullText']}"
            metadata = {
                "title": str(row['Title']),
                "date": str(row["PublicationDate"]),
                "subject": str(row.get("Subject"))
            }

            write_jsonl(NEWS_CHUNKS_FILE, {
                "text": text[:max_len],
                "metadata": metadata
            })

        print("News chunks saved.")

    # ------------------------------------------------------------
    # DETERMINISTIC LINKING ENGINE
    # ------------------------------------------------------------
    def build_crime_news_links(self, days_window=3):
        print("Building deterministic crime ↔ news links")

        if os.path.exists(LINKS_FILE):
            os.remove(LINKS_FILE)

        for _, crime in tqdm(self.crime_df.iterrows(), total=len(self.crime_df)):
            date = crime["Date"]
            ptype = crime["Primary Type"]
            block = crime["Block"]

            window = self.news_df[
                (self.news_df["PublicationDate"] >= date - pd.Timedelta(days=days_window)) &
                (self.news_df["PublicationDate"] <= date + pd.Timedelta(days=days_window))
            ]

            # block similarity (prefix matching)
            simple_block = block.split()[1] if len(block.split()) > 1 else block
            candidates = window[ window["FullText"].str.contains(simple_block.split()[0], case=False, na=True) ]

            # primary-type keyword match
            candidates = candidates[candidates["FullText"].str.contains(ptype.split()[0], case=False, na=True)]

            for _, news in candidates.iterrows():
                write_jsonl(LINKS_FILE, {
                    "crime_type": ptype,
                    "crime_block": block,
                    "crime_date": str(date),
                    "news_title": news["Title"],
                    "news_date": str(news["PublicationDate"])
                })

        print("Deterministic links created.")

    # ------------------------------------------------------------
    # SEMANTIC VECTOR INDEX (memory-safe)
    # ------------------------------------------------------------
    def build_vector_index(self, jsonl_file, output_index, batch_size=128):
        print(f"Building vector index: {output_index}")

        index = None
        texts = []

        # Streaming read
        with open(jsonl_file, "r", encoding="utf-8") as f:
            for line in tqdm(f):
                obj = json.loads(line)
                texts.append(obj["text"])

                if len(texts) >= batch_size:
                    emb = self.model.encode(texts, convert_to_numpy=True, batch_size=16)
                    if index is None:
                        index = faiss.IndexFlatL2(emb.shape[1])
                    index.add(emb.astype("float32"))
                    texts = []
                    gc.collect()

        # Remaining
        if texts:
            emb = self.model.encode(texts, convert_to_numpy=True, batch_size=16)
            if index is None:
                index = faiss.IndexFlatL2(emb.shape[1])
            index.add(emb.astype("float32"))

        faiss.write_index(index, output_index)
        print("Index saved:", output_index)

    # ------------------------------------------------------------
    # HYBRID RAG QUERY
    # ------------------------------------------------------------
    def rag_query(self, user_query, top_k=5):
        print(" Running Hybrid RAG Query...")

        q_emb = self.model.encode([user_query], convert_to_numpy=True)

        # Load indices
        crime_index = faiss.read_index(f"{CACHE_DIR}/crime_index.faiss")
        news_index  = faiss.read_index(f"{CACHE_DIR}/news_index.faiss")

        d_c, i_c = crime_index.search(q_emb, top_k)
        d_n, i_n = news_index.search(q_emb, top_k)

        # Load chunks
        def read_chunk(file, idx):
            with open(file, "r", encoding="utf-8") as f:
                for line_num, line in enumerate(f):
                    if line_num == idx:
                        return json.loads(line)

        crime_chunks = [read_chunk(CRIME_CHUNKS_FILE, i) for i in i_c[0]]
        news_chunks  = [read_chunk(NEWS_CHUNKS_FILE, i) for i in i_n[0]]

        # Load deterministic links
        links = []
        with open(LINKS_FILE, "r", encoding="utf-8") as f:
            for L in f:
                links.append(json.loads(L))

        return {
            "crime": crime_chunks,
            "news": news_chunks,
            "links": links[:20]
        }

In [11]:
!pip install google-generativeai

import google.generativeai as genai

class NarrativeChatbot:

    def __init__(self, api_key, system):
        genai.configure(api_key=api_key)
        self.model = genai.GenerativeModel("models/gemini-pro-latest")
        self.system = system

    def ask(self, query):
        rag = self.system.rag_query(query)

        context = "\n\n".join([
            "=== CRIME DATA ===\n" + "\n".join(c["text"] for c in rag["crime"]),
            "=== NEWS DATA ===\n" + "\n".join(n["text"] for n in rag["news"]),
            "=== LINKS FOUND ===\n" + "\n".join(json.dumps(l) for l in rag["links"])
        ])

        prompt = f"""
You are a Chicago crime-narrative analyst.

User question:
{query}

Use ALL crime + news + cross-links below to generate a unified narrative:

{context}

Return a structured, factual narrative insight combining both datasets.
"""

        return self.model.generate_content(prompt).text




In [13]:
system = NarrativeIntelligenceSystem(crime_df, news_df)

system.create_crime_chunks()
system.create_news_chunks()
system.build_crime_news_links(days_window=3)

system.build_vector_index(CRIME_CHUNKS_FILE, f"{CACHE_DIR}/crime_index.faiss")
system.build_vector_index(NEWS_CHUNKS_FILE, f"{CACHE_DIR}/news_index.faiss")

chat = NarrativeChatbot("AIzaSyAwhy9oNOYiYUByyPZwFEYy_Q8p2GLUn_U", system)

chat.ask("What happened near Lincoln Ave related to robbery this week?")


Initializing system...
Data ready.
 Creating crime chunks → JSONL


100%|██████████| 1870768/1870768 [00:21<00:00, 86614.44it/s] 


Crime chunks saved.
 Creating news chunks → JSONL


100%|██████████| 15299/15299 [00:01<00:00, 13007.98it/s]


News chunks saved.
Building deterministic crime ↔ news links


100%|██████████| 1923226/1923226 [48:21<00:00, 662.92it/s]  


Deterministic links created.
Building vector index: cache/crime_index.faiss


507it [00:04, 122.36it/s]


Index saved: cache/crime_index.faiss
Building vector index: cache/news_index.faiss


15299it [03:06, 81.99it/s]


Index saved: cache/news_index.faiss
 Running Hybrid RAG Query...


'Based on an analysis of the provided crime data, news reports, and cross-links, here is a unified narrative regarding robbery-related events near Lincoln Avenue this week.\n\n### Narrative Insight\n\nWhile official crime data for Lincoln Avenue this week specifies **theft**, not robbery, several violent **robberies** have been reported in adjacent North Side neighborhoods, including Lincoln Park, which Lincoln Avenue passes through. Analysis suggests a distinction between property crime on the avenue itself and more violent, person-to-person crime in the surrounding area.\n\n### Detailed Breakdown\n\n*   **Thefts on Lincoln Avenue:** Crime data from April 21, 2025, shows a cluster of five **theft** incidents on the **6100 block of North Lincoln Avenue**. One arrest was made in connection with these events. It is important to note these are classified as theft, which typically does not involve force or the threat of force against a victim.\n\n*   **Violent Robberies in Lincoln Park:** 

In [14]:
chat = NarrativeChatbot("AIzaSyAwhy9oNOYiYUByyPZwFEYy_Q8p2GLUn_U", system)
from IPython.display import Markdown, display
display(Markdown((chat.ask("Did any news articles cover the criminal damage incidents near Englewood?"))))

 Running Hybrid RAG Query...


Based on an analysis of the provided crime reports, news articles, and cross-links, there is no evidence that the specified criminal damage incidents were covered in the news.

### Narrative Insight

The available data presents two separate, unconnected narratives regarding crime in Chicago.

**1. Unreported Criminal Damage Incidents:**
Crime reports detail several incidents of "CRIMINAL DAMAGE" across the city between 2020 and 2025. Of the locations provided, only one falls within the Englewood neighborhood:
*   **070XX S LOWE AVE** (Englewood)
*   **070XX S BENNETT AVE** (South Shore)
*   **091XX S LOWE AVE** (Washington Heights)
*   **047XX S GREENWOOD AVE** (Kenwood)
*   **027XX W LELAND AVE** (Lincoln Square)

Despite multiple incidents at each location, none of these specific events are mentioned in the provided news articles or connected to them via the cross-links.

**2. News Coverage Focused on Violent Crime in Englewood:**
The supplied news articles exclusively cover severe violent crime—specifically shootings—in the Englewood and West Englewood neighborhoods. These reports detail multiple fatalities and injuries, police responses, and community reactions, including one instance where a police shooting in Englewood led to looting and property damage downtown on the Magnificent Mile. However, this coverage does not extend to the specific, smaller-scale criminal damage incidents listed in the crime data.

**Conclusion:**
The provided data indicates a clear separation between the types of crime that receive media attention and those that do not. While Englewood is the focus of intense news coverage for shootings and gun violence, the data shows no news reporting on the listed incidents of criminal damage in or near the neighborhood. The cross-links further confirm this disconnect, as they link other crimes to other news stories entirely, but do not establish any connection between the supplied criminal damage reports and the news articles on Englewood.

In [15]:
display(Markdown((chat.ask("What crimes with arrests were reported near the N Lincoln Ave corridor?"))))

 Running Hybrid RAG Query...


Based on an analysis of the provided crime data, news reports, and cross-links, here is a unified narrative regarding crimes with arrests near the N. Lincoln Avenue corridor.

### **Analysis of Crime on the N. Lincoln Avenue Corridor**

The provided data points to two distinct criminal events with arrests occurring directly on North Lincoln Avenue.

1.  **Violent Crime and SWAT Response in North Center:** A news report details a significant incident in the **4100 block of N. Lincoln Avenue**. On a Monday morning, a 74-year-old man was shot and critically wounded in the adjacent 2000 block of West Berteau Avenue. The subsequent police investigation led to a SWAT team establishing a perimeter at the corner of Berteau and Lincoln Avenues. According to Ald. Matt Martin (47th), **one arrest was made** in connection with the shooting, while a search for a second individual continues.

2.  **Theft Incidents in West Ridge:** Further north, official crime data reports a series of five theft incidents on **April 21, 2025**, in the **6100 block of N. Lincoln Avenue**. This cluster of activity resulted in **one arrest**.

### **Related Incidents in Adjacent Neighborhoods**

Cross-links in the data suggest connections between crime reports in neighborhoods intersected by or near the Lincoln Avenue corridor and various news reports. However, these links do not confirm that the news articles are reporting on these specific crime instances or that arrests occurred in every case. Relevant incidents from these links include:

*   **Criminal Sexual Assault:** A report was filed for an incident on the **3300 block of N. Damen Avenue** in North Center, a street in close proximity to Lincoln Avenue.
*   **Theft:** Reports were also filed on the **2200 block of N. Racine Avenue** in Lincoln Park and the **3400 block of N. Clark Street** in Lakeview, both neighborhoods through which the Lincoln corridor runs.

### **Geographically Unrelated Incidents with Arrests**

To provide a complete picture of the supplied dataset, it is important to note that it also contains multiple crime reports with arrests that are geographically distant from the N. Lincoln Avenue corridor. These incidents are not part of a pattern related to the corridor itself.

*   **West Side Thefts:** A significant number of arrests were made in connection with theft on West North Avenue. Reports from the **4600 and 4700 blocks of W. North Avenue** show a total of 15 theft incidents across three separate dates, leading to a combined **13 arrests**.
*   **South Side Offenses:** In Englewood, a crime report from the **5800 block of S. Ada Street** documents 13 incidents classified as "Other Offense," resulting in **6 arrests**.
*   **Suburban Crime:** News reports detail numerous arrests made by police in the suburban communities of **Evanston** and **Park Ridge** for offenses including robbery, battery, drug possession, and DUI. These are separate jurisdictions and events from the Chicago-based data.

In [16]:
display(Markdown((chat.ask("How effective has police response been in Englewood according to arrest patterns and news reporting?"))))

 Running Hybrid RAG Query...


As a Chicago crime-narrative analyst, here is a structured insight into the effectiveness of police response in Englewood based on the provided data.

### **Narrative Insight: Police Effectiveness in Englewood**

Based on the supplied data, police response in Englewood is characterized by a complex mix of high-stakes engagement, strategic and equipment-related deficiencies, and inconsistent enforcement outcomes. While police are actively involved in volatile and dangerous situations within the neighborhood, the data points to systemic challenges that undermine overall effectiveness, particularly concerning accountability and follow-through on certain crime types.

### **Analysis by Data Source**

**1. Arrest Patterns & Enforcement Disparities**

The crime data provided reveals a stark contrast in police effectiveness based on crime type and location, suggesting a targeted but narrow enforcement strategy.

*   **Ineffectiveness in Property Crime:** For the crime of **Criminal Damage**, the data shows a 0% arrest rate across four separate locations and a total of 23 reported incidents. Two of these locations (070XX S Lowe Ave and 091XX S Lowe Ave) are within the Englewood/West Englewood area. This pattern indicates a systemic failure to resolve property crime cases, which can erode community trust and create an environment of impunity for lower-level offenses.
*   **High Effectiveness in Targeted Operations:** In sharp contrast, a narcotics operation at 034XX W Chicago Ave (Humboldt Park, not Englewood) resulted in a 100% arrest rate, with 17 arrests for 17 incidents. While occurring outside the neighborhood in question, this demonstrates that when a specific crime type is targeted, the department can be highly effective. The disparity suggests that resources and strategic priorities are focused away from crimes like criminal damage.

**2. News Reporting: A Focus on Crisis and Systemic Failures**

News reports center on Englewood as a flashpoint for significant and often controversial police incidents, highlighting critical operational failures.

*   **Equipment and Accountability Deficiencies:** Multiple articles detail the shooting of Latrell Allen in Englewood by a police officer who was not equipped with a body-worn camera. This officer was part of a newly formed "hot spot" unit deployed by Superintendent David Brown. Mayor Lori Lightfoot publicly acknowledged this as a systemic problem, citing contract issues and departmental reorganization. The absence of video evidence in such a critical incident erodes public trust and complicates the official narrative, directly contributing to civil unrest.
*   **Volatile Environment and Violence Against Officers:** Police work in Englewood is portrayed as exceptionally dangerous. One report details two officers being shot in the neighborhood in less than a week during separate encounters, one of which was a traffic stop. This underscores the high-risk nature of police engagement in the area.
*   **Inconsistent Strategic Command:** The response to public unrest shows further cracks. A report from the city's Inspector General highlights how a protest originating in Englewood following the death of George Floyd was a precursor to city-wide chaos that the Chicago Police Department was unprepared for and failed to control. This points to a failure in high-level strategic planning. Conversely, a report from a Fourth of July weekend notes that Englewood was "quiet" during a high-visibility patrol that included a ride-along with Mayor Lightfoot, suggesting that a concentrated police presence can have a temporary, localized deterrent effect.

**3. Cross-Link Analysis**

An analysis of the provided cross-links reveals that they are **not relevant** to the specific events, locations, or news reports concerning Englewood. The linked crimes and articles pertain to different neighborhoods (e.g., Portage Park, Lincoln Park), different topics (federal immigration agents, a 2004 cold case), and do not connect to the provided Englewood-centric data. Therefore, they offer no insight into police effectiveness within the Englewood neighborhood itself.

### **Unified Narrative**

The combined data paints a portrait of a police department that is heavily engaged in Englewood but struggles with effectiveness on multiple fronts. The narrative is not one of neglect, but of deeply flawed and inconsistent execution.

Police presence in Englewood is defined by high-stakes, violent encounters—both initiated by and directed against officers. The creation of "hot spot" units shows a strategic intent to address crime, yet this strategy is immediately undermined by fundamental failures, such as deploying these units without essential accountability tools like body cameras. This single failure fueled significant public backlash.

While officers face clear and present danger, the department's broader effectiveness is questionable. The complete absence of arrests for repeated instances of criminal damage in and around Englewood suggests that such crimes are not a priority, leaving residents to deal with neighborhood decay. This contrasts sharply with the department's demonstrated ability to conduct successful, high-arrest operations against narcotics elsewhere.

Ultimately, police response in Englewood appears paradoxical: it is simultaneously intense and insufficient. Officers are on the front lines of violent confrontations, but the strategic and logistical support behind them is flawed, leading to failures in accountability, inconsistent crime resolution, and an inability to manage large-scale civil unrest originating within the neighborhood.

## RAG-LLM EVALUATION METRICS
### Comprehensive evaluation of retrieval quality and generation performance

In [17]:
import numpy as np
import re
import json
from typing import Dict, List
from collections import Counter


class RAGEvaluator:
    """Comprehensive RAG-LLM evaluation metrics for NarrativeIntelligenceSystem."""
    
    def __init__(self, system, chat):
        """
        Args:
            system: NarrativeIntelligenceSystem instance
            chat: NarrativeChatbot instance
        """
        self.system = system
        self.chat = chat
        
    # ========== RETRIEVAL METRICS ==========
    
    def evaluate_retrieval_quality(self, query: str, top_k=5) -> Dict:
        """Evaluate quality of retrieved documents."""
        rag_results = self.system.rag_query(query, top_k=top_k)
        
        metrics = {
            'crime_docs_retrieved': len(rag_results['crime']),
            'news_docs_retrieved': len(rag_results['news']),
            'links_found': len(rag_results['links']),
            'total_retrieved': len(rag_results['crime']) + len(rag_results['news'])
        }
        
        return metrics
    
    def evaluate_retrieval_diversity(self, query: str, top_k=5) -> Dict:
        """Measure diversity of retrieved documents."""
        rag_results = self.system.rag_query(query, top_k=top_k)
        
        # Extract crime types from retrieved crime chunks
        crime_types = []
        for crime_doc in rag_results['crime']:
            if crime_doc and 'text' in crime_doc:
                # Parse crime type from text
                match = re.search(r'Type:\s*(.+)', crime_doc['text'])
                if match:
                    crime_types.append(match.group(1).strip())
        
        # Extract news titles
        news_titles = []
        for news_doc in rag_results['news']:
            if news_doc and 'metadata' in news_doc:
                news_titles.append(news_doc['metadata'].get('title', 'Unknown'))
        
        metrics = {
            'unique_crime_types': len(set(crime_types)),
            'total_crime_docs': len(crime_types),
            'crime_diversity_ratio': len(set(crime_types)) / len(crime_types) if crime_types else 0,
            'unique_news_articles': len(set(news_titles)),
            'total_news_docs': len(news_titles),
            'news_diversity_ratio': len(set(news_titles)) / len(news_titles) if news_titles else 0
        }
        
        return metrics
    
    def evaluate_retrieval_coverage(self, query: str, top_k=5) -> Dict:
        """Evaluate temporal and spatial coverage."""
        rag_results = self.system.rag_query(query, top_k=top_k)
        
        # Extract dates and locations from crime data
        crime_dates = []
        crime_blocks = []
        
        for crime_doc in rag_results['crime']:
            if crime_doc and 'text' in crime_doc:
                # Parse date
                date_match = re.search(r'Date:\s*(.+)', crime_doc['text'])
                if date_match:
                    crime_dates.append(date_match.group(1).strip())
                
                # Parse block
                block_match = re.search(r'Block:\s*(.+)', crime_doc['text'])
                if block_match:
                    crime_blocks.append(block_match.group(1).strip())
        
        # Extract news dates
        news_dates = []
        for news_doc in rag_results['news']:
            if news_doc and 'metadata' in news_doc:
                news_dates.append(news_doc['metadata'].get('date', 'Unknown'))
        
        metrics = {
            'unique_crime_dates': len(set(crime_dates)),
            'unique_crime_locations': len(set(crime_blocks)),
            'unique_news_dates': len(set(news_dates)),
            'temporal_coverage_days': len(set(crime_dates)),
            'spatial_coverage_blocks': len(set(crime_blocks)),
            'deterministic_links': len(rag_results['links'])
        }
        
        return metrics
    
    # ========== GENERATION METRICS ==========
    
    def evaluate_answer_quality(self, query: str, answer: str, rag_results: Dict) -> Dict:
        """Evaluate quality of generated answer."""
        
        # 1. Answer length and structure
        word_count = len(answer.split())
        sentence_count = len([s for s in answer.split('.') if s.strip()])
        avg_sentence_length = word_count / sentence_count if sentence_count > 0 else 0
        
        # 2. Information density - check for numbers/statistics
        numbers = re.findall(r'\b\d+(?:\.\d+)?%?\b', answer)
        stats_density = len(numbers) / word_count if word_count > 0 else 0
        
        # 3. Source citation - check if answer mentions sources
        citation_keywords = ['according to', 'based on', 'data shows', 'reported', 'article', 
                           'news', 'crime data', 'records indicate', 'analysis shows']
        citations = sum(1 for keyword in citation_keywords if keyword.lower() in answer.lower())
        
        # 4. Specificity - mentions of locations, dates, crime types
        locations = len(re.findall(r'\b(?:block|avenue|street|ave|st|district|neighborhood)\b', answer.lower()))
        dates = len(re.findall(r'\b\d{4}\b|\b(?:january|february|march|april|may|june|july|august|september|october|november|december)\b', answer.lower()))
        
        # 5. Structure indicators
        headers = len(re.findall(r'(?:^|\n)#+\s+|(?:^|\n)\*\*[^*]+\*\*', answer))
        lists = len(re.findall(r'(?:^|\n)\s*[-*•]\s+', answer))
        
        metrics = {
            'word_count': word_count,
            'sentence_count': sentence_count,
            'avg_sentence_length': round(avg_sentence_length, 2),
            'statistics_mentioned': len(numbers),
            'stats_density': round(stats_density, 4),
            'citation_indicators': citations,
            'location_mentions': locations,
            'temporal_mentions': dates,
            'has_structure': headers + lists > 0,
            'specificity_score': round((locations + dates + len(numbers)) / word_count, 4) if word_count > 0 else 0
        }
        
        return metrics
    
    def evaluate_context_utilization(self, query: str, answer: str, rag_results: Dict) -> Dict:
        """Measure how well the answer uses retrieved context."""
        
        answer_lower = answer.lower()
        
        # Extract key terms from retrieved crime documents
        crime_terms_in_context = []
        for crime_doc in rag_results['crime']:
            if crime_doc and 'text' in crime_doc:
                # Extract crime type
                type_match = re.search(r'Type:\s*(.+)', crime_doc['text'])
                if type_match:
                    crime_terms_in_context.append(type_match.group(1).strip().lower())
                
                # Extract block
                block_match = re.search(r'Block:\s*(.+)', crime_doc['text'])
                if block_match:
                    crime_terms_in_context.append(block_match.group(1).strip().lower())
        
        # Extract key terms from news
        news_terms_in_context = []
        for news_doc in rag_results['news']:
            if news_doc and 'metadata' in news_doc:
                title = news_doc['metadata'].get('title', '').lower()
                # Extract meaningful words from title (>4 chars)
                words = [w for w in title.split() if len(w) > 4]
                news_terms_in_context.extend(words)
        
        # Check how many context terms appear in answer
        crime_terms_used = sum(1 for term in crime_terms_in_context if term and term in answer_lower)
        news_terms_used = sum(1 for term in news_terms_in_context if term and term in answer_lower)
        
        total_context_terms = len(crime_terms_in_context) + len(news_terms_in_context)
        total_used = crime_terms_used + news_terms_used
        
        # Check if links were mentioned
        links_mentioned = 'link' in answer_lower or 'connection' in answer_lower or 'related' in answer_lower
        
        metrics = {
            'crime_context_terms_available': len(crime_terms_in_context),
            'crime_context_terms_used': crime_terms_used,
            'news_context_terms_available': len(news_terms_in_context),
            'news_context_terms_used': news_terms_used,
            'context_utilization_rate': round(total_used / total_context_terms, 4) if total_context_terms > 0 else 0,
            'crime_sources_available': len(rag_results['crime']),
            'news_sources_available': len(rag_results['news']),
            'links_available': len(rag_results['links']),
            'links_mentioned_in_answer': links_mentioned
        }
        
        return metrics
    
    def evaluate_faithfulness(self, answer: str, rag_results: Dict) -> Dict:
        """Check if answer statements are supported by retrieved documents."""
        
        # Extract all retrieved text
        all_context = []
        
        for crime_doc in rag_results['crime']:
            if crime_doc and 'text' in crime_doc:
                all_context.append(crime_doc['text'].lower())
        
        for news_doc in rag_results['news']:
            if news_doc and 'text' in news_doc:
                all_context.append(news_doc['text'].lower())
        
        combined_context = ' '.join(all_context)
        
        # Split answer into claims (sentences)
        claims = [s.strip() for s in answer.split('.') if s.strip() and len(s.strip()) > 10]
        
        # Simple faithfulness check - are key terms from claims in context?
        supported_claims = 0
        for claim in claims:
            # Extract key nouns/terms from claim (simple heuristic)
            claim_words = [w.lower() for w in claim.split() if len(w) > 4 and w.isalpha()]
            if claim_words:
                # Check if at least 30% of key words appear in context
                matches = sum(1 for word in claim_words if word in combined_context)
                if matches / len(claim_words) >= 0.3:
                    supported_claims += 1
        
        metrics = {
            'total_claims': len(claims),
            'supported_claims': supported_claims,
            'faithfulness_score': round(supported_claims / len(claims), 4) if claims else 0,
            'unsupported_claims': len(claims) - supported_claims,
            'context_size_words': len(combined_context.split())
        }
        
        return metrics
    
    # ========== COMPREHENSIVE EVALUATION ==========
    
    def comprehensive_evaluation(self, query: str, top_k=5) -> Dict:
        """Run all evaluation metrics on a query."""
        
        print(f"Running comprehensive evaluation for query: '{query}'")
        print(f"   Retrieving top {top_k} results...\n")
        
        # Get retrieval results
        rag_results = self.system.rag_query(query, top_k=top_k)
        
        # Generate answer
        print("   Generating answer...")
        answer = self.chat.ask(query)
        
        # Run all evaluations
        eval_results = {
            'query': query,
            'answer': answer,
            'retrieval_metrics': {
                'quality': self.evaluate_retrieval_quality(query, top_k),
                'diversity': self.evaluate_retrieval_diversity(query, top_k),
                'coverage': self.evaluate_retrieval_coverage(query, top_k)
            },
            'generation_metrics': {
                'answer_quality': self.evaluate_answer_quality(query, answer, rag_results),
                'context_utilization': self.evaluate_context_utilization(query, answer, rag_results),
                'faithfulness': self.evaluate_faithfulness(answer, rag_results)
            }
        }
        
        return eval_results
    
    def print_evaluation_report(self, eval_results: Dict):
        """Print a formatted evaluation report."""
        
        print("\n" + "="*80)
        print("RAG-LLM EVALUATION REPORT")
        print("="*80)
        
        print(f"\nQuery: {eval_results['query']}")
        
        # Retrieval Metrics
        print("\n" + "-"*80)
        print("RETRIEVAL METRICS")
        print("-"*80)
        
        qual = eval_results['retrieval_metrics']['quality']
        print(f"\n  Quality:")
        print(f"    • Crime Documents Retrieved: {qual['crime_docs_retrieved']}")
        print(f"    • News Documents Retrieved: {qual['news_docs_retrieved']}")
        print(f"    • Deterministic Links Found: {qual['links_found']}")
        print(f"    • Total Documents: {qual['total_retrieved']}")
        
        div = eval_results['retrieval_metrics']['diversity']
        print(f"\n  Diversity:")
        print(f"    • Unique Crime Types: {div['unique_crime_types']} / {div['total_crime_docs']}")
        print(f"    • Crime Diversity Ratio: {div['crime_diversity_ratio']:.2%}")
        print(f"    • Unique News Articles: {div['unique_news_articles']} / {div['total_news_docs']}")
        print(f"    • News Diversity Ratio: {div['news_diversity_ratio']:.2%}")
        
        cov = eval_results['retrieval_metrics']['coverage']
        print(f"\n  Coverage:")
        print(f"    • Unique Crime Dates: {cov['unique_crime_dates']}")
        print(f"    • Unique Crime Locations: {cov['unique_crime_locations']}")
        print(f"    • Temporal Coverage: {cov['temporal_coverage_days']} days")
        print(f"    • Spatial Coverage: {cov['spatial_coverage_blocks']} blocks")
        
        # Generation Metrics
        print("\n" + "-"*80)
        print("GENERATION METRICS")
        print("-"*80)
        
        ans_qual = eval_results['generation_metrics']['answer_quality']
        print(f"\n  Answer Quality:")
        print(f"    • Word Count: {ans_qual['word_count']}")
        print(f"    • Sentence Count: {ans_qual['sentence_count']}")
        print(f"    • Avg Sentence Length: {ans_qual['avg_sentence_length']:.1f} words")
        print(f"    • Statistics Mentioned: {ans_qual['statistics_mentioned']}")
        print(f"    • Citation Indicators: {ans_qual['citation_indicators']}")
        print(f"    • Location Mentions: {ans_qual['location_mentions']}")
        print(f"    • Temporal Mentions: {ans_qual['temporal_mentions']}")
        print(f"    • Has Structure: {'Yes' if ans_qual['has_structure'] else 'No'}")
        print(f"    • Specificity Score: {ans_qual['specificity_score']:.4f}")
        
        util = eval_results['generation_metrics']['context_utilization']
        print(f"\n  Context Utilization:")
        print(f"    • Crime Terms Used: {util['crime_context_terms_used']} / {util['crime_context_terms_available']}")
        print(f"    • News Terms Used: {util['news_context_terms_used']} / {util['news_context_terms_available']}")
        print(f"    • Overall Utilization Rate: {util['context_utilization_rate']:.2%}")
        print(f"    • Sources Available: {util['crime_sources_available']} crime, {util['news_sources_available']} news, {util['links_available']} links")
        print(f"    • Links Mentioned: {'Yes' if util['links_mentioned_in_answer'] else 'No'}")
        
        faith = eval_results['generation_metrics']['faithfulness']
        print(f"\n  Faithfulness:")
        print(f"    • Total Claims: {faith['total_claims']}")
        print(f"    • Supported Claims: {faith['supported_claims']}")
        print(f"    • Faithfulness Score: {faith['faithfulness_score']:.2%}")
        print(f"    • Unsupported Claims: {faith['unsupported_claims']}")
        print(f"    • Context Size: {faith['context_size_words']:,} words")
        
        # Overall Assessment
        print("\n" + "-"*80)
        print(" OVERALL ASSESSMENT")
        print("-"*80)
        
        # Calculate aggregate score - FIXED VERSION
        ret_qual = eval_results['retrieval_metrics']['quality']
        ret_div = eval_results['retrieval_metrics']['diversity']
        ans_qual = eval_results['generation_metrics']['answer_quality']
        ans_util = eval_results['generation_metrics']['context_utilization']
        ans_faith = eval_results['generation_metrics']['faithfulness']
        
        retrieval_score = min(ret_qual['total_retrieved'] / 10, 1.0)  # Normalize to 0-1
        diversity_score = (ret_div['crime_diversity_ratio'] + ret_div['news_diversity_ratio']) / 2
        answer_quality_score = min(ans_qual['specificity_score'] * 50, 1.0)  # Normalize
        utilization_score = ans_util['context_utilization_rate']
        faithfulness_score = ans_faith['faithfulness_score']
        
        scores = {
            'Retrieval Volume': retrieval_score,
            'Retrieval Diversity': diversity_score,
            'Answer Quality': answer_quality_score,
            'Context Usage': utilization_score,
            'Faithfulness': faithfulness_score
        }
        
        for metric, score in scores.items():
            bar = "█" * int(score * 20) + "░" * (20 - int(score * 20))
            print(f"  {metric:20} {bar} {score:.2%}")
        
        overall_score = np.mean(list(scores.values()))
        print(f"\n  Overall RAG Score: {overall_score:.2%}")
        
        print("\n" + "="*80)


print("Fixed RAG Evaluator loaded!")


Fixed RAG Evaluator loaded!


In [19]:
# Initialize the evaluator
evaluator = RAGEvaluator(system, chat)
print(" RAG Evaluator initialized and ready!")

 RAG Evaluator initialized and ready!


In [20]:
# Run comprehensive evaluation on a single query
test_query = "What are the main public safety concerns in Englewood?"

eval_results = evaluator.comprehensive_evaluation(
    query=test_query,
    top_k=5
)

# Print detailed report
evaluator.print_evaluation_report(eval_results)

Running comprehensive evaluation for query: 'What are the main public safety concerns in Englewood?'
   Retrieving top 5 results...

 Running Hybrid RAG Query...
   Generating answer...
 Running Hybrid RAG Query...
 Running Hybrid RAG Query...
 Running Hybrid RAG Query...
 Running Hybrid RAG Query...

RAG-LLM EVALUATION REPORT

Query: What are the main public safety concerns in Englewood?

--------------------------------------------------------------------------------
RETRIEVAL METRICS
--------------------------------------------------------------------------------

  Quality:
    • Crime Documents Retrieved: 5
    • News Documents Retrieved: 5
    • Deterministic Links Found: 20
    • Total Documents: 10

  Diversity:
    • Unique Crime Types: 1 / 5
    • Crime Diversity Ratio: 20.00%
    • Unique News Articles: 5 / 5
    • News Diversity Ratio: 100.00%

  Coverage:
    • Unique Crime Dates: 5
    • Unique Crime Locations: 5
    • Temporal Coverage: 5 days
    • Spatial Coverage: 5 b

In [21]:
# Batch evaluation on multiple queries
test_queries = [
    "What crimes with arrests were reported near the N Lincoln Ave corridor?",
    "How effective has police response been in Englewood according to arrest patterns?",
    "Were any criminal damage incidents near Englewood covered by news?"
]

print(" Running batch evaluation...\n")

batch_results = []
for i, query in enumerate(test_queries, 1):
    print(f"\n{'='*80}")
    print(f"Query {i}/{len(test_queries)}")
    print('='*80)
    
    result = evaluator.comprehensive_evaluation(query, top_k=3)
    batch_results.append(result)
    evaluator.print_evaluation_report(result)
    
    print("\n" + "="*80)
    print()

# Aggregate statistics
print("\n\n" + "="*80)
print(" AGGREGATE STATISTICS ACROSS ALL QUERIES")
print("="*80)

avg_retrieval_volume = np.mean([r['retrieval_metrics']['quality']['total_retrieved'] for r in batch_results])
avg_faithfulness = np.mean([r['generation_metrics']['faithfulness']['faithfulness_score'] for r in batch_results])
avg_utilization = np.mean([r['generation_metrics']['context_utilization']['context_utilization_rate'] for r in batch_results])
avg_word_count = np.mean([r['generation_metrics']['answer_quality']['word_count'] for r in batch_results])

print(f"\n  Average Documents Retrieved: {avg_retrieval_volume:.1f}")
print(f"  Average Faithfulness Score: {avg_faithfulness:.2%}")
print(f"  Average Context Utilization: {avg_utilization:.2%}")
print(f"  Average Answer Length: {avg_word_count:.0f} words")
print(f"\n  Total Queries Evaluated: {len(batch_results)}")
print("\n" + "="*80)

 Running batch evaluation...


Query 1/3
Running comprehensive evaluation for query: 'What crimes with arrests were reported near the N Lincoln Ave corridor?'
   Retrieving top 3 results...

 Running Hybrid RAG Query...
   Generating answer...
 Running Hybrid RAG Query...
 Running Hybrid RAG Query...
 Running Hybrid RAG Query...
 Running Hybrid RAG Query...

RAG-LLM EVALUATION REPORT

Query: What crimes with arrests were reported near the N Lincoln Ave corridor?

--------------------------------------------------------------------------------
RETRIEVAL METRICS
--------------------------------------------------------------------------------

  Quality:
    • Crime Documents Retrieved: 3
    • News Documents Retrieved: 3
    • Deterministic Links Found: 20
    • Total Documents: 6

  Diversity:
    • Unique Crime Types: 1 / 3
    • Crime Diversity Ratio: 33.33%
    • Unique News Articles: 3 / 3
    • News Diversity Ratio: 100.00%

  Coverage:
    • Unique Crime Dates: 3
    • Unique Crim

In [22]:
# Evaluate your own custom query
from IPython.display import Markdown

custom_query = "What happened near Lincoln Ave related to robbery this week?"

print(f"Evaluating: {custom_query}\n")
result = evaluator.comprehensive_evaluation(custom_query, top_k=5)

# Show the answer
print("\n" + "="*80)
print("GENERATED ANSWER")
print("="*80)
display(Markdown(result['answer']))

# Show metrics
print()
evaluator.print_evaluation_report(result)

Evaluating: What happened near Lincoln Ave related to robbery this week?

Running comprehensive evaluation for query: 'What happened near Lincoln Ave related to robbery this week?'
   Retrieving top 5 results...

 Running Hybrid RAG Query...
   Generating answer...
 Running Hybrid RAG Query...
 Running Hybrid RAG Query...
 Running Hybrid RAG Query...
 Running Hybrid RAG Query...

GENERATED ANSWER


Based on an analysis of the provided crime data and news reports, here is a unified narrative regarding robbery-related incidents near Lincoln Avenue this week.

**Direct Incident on Lincoln Avenue:**

According to official crime data, there was a cluster of **thefts**, not robberies, on **Monday, April 21, 2025**. Five separate theft incidents were reported on the **6100 block of North Lincoln Avenue**. In connection with these events, one arrest has been made. The data does not specify the nature of these thefts or if they were related.

**Broader Context from Regional News:**

While the specific incidents on Lincoln Avenue were classified as theft, several significant **robberies** have occurred recently in similarly named or nearby locations, which contributes to the public safety narrative in the area:

*   **Lincoln Park Neighborhood Robbery:** A high-profile robbery occurred last week in the Lincoln Park neighborhood where a young teacher was stalked, attacked, and slammed to the ground in the afternoon. Her attacker, who allegedly hoped to steal her phone for $100, was released on a $100 bond, causing public outcry from the victim's family.
*   **Lincoln Park Home Invasion:** In a separate, violent incident this week, a woman was beaten, tied up, and had her mouth taped by three men during a home invasion robbery in the Lincoln Park neighborhood, on the 400 block of West Fullerton Parkway.
*   **Lincoln Highway Armed Robbery (Clarification):** An armed robbery at Ryan's Pub occurred in Frankfort Township on Lincoln Highway, a distinct location from Chicago's Lincoln Avenue. This incident resulted in one of the alleged offenders being fatally shot by a patron. Two other suspects have since been charged.

**Summary Narrative:**

This week's primary police activity on Lincoln Avenue itself involved five reported thefts on the North Side, leading to one arrest. However, the broader public concern about robbery is likely fueled by recent, more violent crimes in the nearby Lincoln Park neighborhood, including the widely reported attack on a teacher and a brutal home invasion. It is important to distinguish the specific theft reports on North Lincoln Avenue from these other robbery incidents and to note that the armed robbery at Ryan's Pub occurred on Lincoln Highway in a different municipality.

The provided cross-links did not contain any connections between these specific crime reports and news articles.



RAG-LLM EVALUATION REPORT

Query: What happened near Lincoln Ave related to robbery this week?

--------------------------------------------------------------------------------
RETRIEVAL METRICS
--------------------------------------------------------------------------------

  Quality:
    • Crime Documents Retrieved: 5
    • News Documents Retrieved: 5
    • Deterministic Links Found: 20
    • Total Documents: 10

  Diversity:
    • Unique Crime Types: 1 / 5
    • Crime Diversity Ratio: 20.00%
    • Unique News Articles: 5 / 5
    • News Diversity Ratio: 100.00%

  Coverage:
    • Unique Crime Dates: 5
    • Unique Crime Locations: 3
    • Temporal Coverage: 5 days
    • Spatial Coverage: 3 blocks

--------------------------------------------------------------------------------
GENERATION METRICS
--------------------------------------------------------------------------------

  Answer Quality:
    • Word Count: 381
    • Sentence Count: 15
    • Avg Sentence Length: 25.4 words
   