CRS = w1*C + w2*S + w3*T + w4*A

A higher C (total number of crimes) indicates more danger → should increase risk → penalize

A higher S (violent crime ratio) means more violence → more dangerous → penalize

A higher T (nighttime crime ratio) means more crimes at night → riskier → penalize

A higher A (arrest rate) implies better law enforcement and safety → should reduce risk

So, to ensure that a higher CRS means higher risk, we need to reverse the direction of A’s impact.
That is, use (1 - A) to reflect that lower arrest rates imply higher risk.

Updated formula：CRS = w1*C + w2*S + w3*T + w4*(1-A)

In [2]:
import pandas as pd

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# Step 1.1: Read the cleaned data

In [4]:
# Set file path
file_path = '/content/drive/My Drive/IDS 575 Final project/cleaned_crimes_data.csv'

# Read CSV file and parse date columns
df = pd.read_csv(file_path, parse_dates=['Date', 'Updated On'])

df.head()

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,13751906,JJ148279,2024-12-31,047XX S JUSTINE ST,890,THEFT,FROM BUILDING,CONSTRUCTION SITE,False,False,...,20.0,61,6,1166798.0,1873451.0,2024,2025-02-21 15:40:32,41.808314,-87.663746,"(41.808313612, -87.663746144)"
1,13737151,JJ134861,2024-12-31,066XX N Olmsted ave,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,...,41.0,9,11,1125425.0,1943135.0,2024,2025-02-05 15:42:18,42.000321,-87.813948,"(42.000320673, -87.813948185)"
2,13732732,JJ129829,2024-12-31,030XX W 53RD PL,2826,OTHER OFFENSE,HARASSMENT BY ELECTRONIC MEANS,RESIDENCE,False,True,...,14.0,63,26,1156929.0,1869004.0,2024,2025-01-31 15:40:40,41.796316,-87.700064,"(41.796315735, -87.700063673)"
3,13728836,JJ123618,2024-12-31,071XX S VERNON AVE,1152,DECEPTIVE PRACTICE,ILLEGAL USE CASH CARD,APARTMENT,False,False,...,6.0,69,11,1180479.0,1857851.0,2024,2025-01-27 15:40:44,41.765202,-87.614046,"(41.765202297, -87.614046499)"
4,13724075,JJ119468,2024-12-31,052XX S PULASKI RD,281,CRIMINAL SEXUAL ASSAULT,NON-AGGRAVATED,PARKING LOT / GARAGE (NON RESIDENTIAL),False,True,...,23.0,62,2,1150587.0,1869670.0,2024,2025-01-22 15:41:07,41.798269,-87.723303,"(41.79826919, -87.723303209)"


In [None]:
# Step 1.2: Count total number of crimes (C) for each community

In [5]:
crime_counts = df.groupby('Community Area').size().reset_index(name='Total_Crimes')

In [None]:
# Step 1.3: Calculate S (proportion of violent crimes)

In [6]:
# Define list of violent crime types
violent_crimes = ['BATTERY', 'ASSAULT', 'ROBBERY', 'HOMICIDE', 'CRIM SEXUAL ASSAULT', 'WEAPONS VIOLATION']

# Mark whether the crime is violent
df['Is_Violent'] = df['Primary Type'].isin(violent_crimes)

# Count violent crimes for each community
violent_counts = df.groupby('Community Area')['Is_Violent'].sum().reset_index(name='Violent_Crimes')

# Merge total crimes and violent crimes, then calculate the ratio
merged_s = pd.merge(crime_counts, violent_counts, on='Community Area')
merged_s['Violent_Ratio'] = merged_s['Violent_Crimes'] / merged_s['Total_Crimes']

In [None]:
# Step 1.4: Calculate T (proportion of nighttime crimes)

In [7]:
# Extract hour from the 'Date' column
df['Hour'] = df['Date'].dt.hour

# Determine if it's nighttime: from 6 PM (18) to 6 AM (before 6)
df['Is_Night'] = df['Hour'].apply(lambda x: x >= 18 or x < 6)

# Count nighttime crimes
night_counts = df.groupby('Community Area')['Is_Night'].sum().reset_index(name='Night_Crimes')

# Merge and calculate ratio
merged_t = pd.merge(merged_s, night_counts, on='Community Area')
merged_t['Night_Ratio'] = merged_t['Night_Crimes'] / merged_t['Total_Crimes']

In [None]:
# Step 1.5: Calculate A (arrest rate)

In [8]:
# Count number of arrests (where 'Arrest' is True)
arrest_counts = df.groupby('Community Area')['Arrest'].sum().reset_index(name='Arrest_Count')

# Merge and calculate arrest rate
merged_a = pd.merge(merged_t, arrest_counts, on='Community Area')
merged_a['Arrest_Rate'] = merged_a['Arrest_Count'] / merged_a['Total_Crimes']

In [None]:
# Step 1.6: Normalize all indicators and calculate final CRS score

In [9]:
# To align direction: use (1 - Arrest_Rate), fewer arrests imply higher risk
merged_a['Adj_Arrest'] = 1 - merged_a['Arrest_Rate']

# Normalization function (Min-Max scaling)
def normalize(series):
    return (series - series.min()) / (series.max() - series.min())

# Normalize the four indicators
merged_a['C_norm'] = normalize(merged_a['Total_Crimes'])
merged_a['S_norm'] = normalize(merged_a['Violent_Ratio'])
merged_a['T_norm'] = normalize(merged_a['Night_Ratio'])
merged_a['A_norm'] = normalize(merged_a['Adj_Arrest'])

# Set weights
w1, w2, w3, w4 = 0.4, 0.3, 0.2, 0.1

# Calculate overall CRS score and map it to 0–100 range
merged_a['CRS_Score'] = (
    w1 * merged_a['C_norm'] +
    w2 * merged_a['S_norm'] +
    w3 * merged_a['T_norm'] +
    w4 * merged_a['A_norm']
) * 100

# Assign labels: Low / Medium / High
def label_risk(score):
    if score <= 30:
        return 'Low'
    elif score <= 60:
        return 'Medium'
    else:
        return 'High'

merged_a['Risk_Level'] = merged_a['CRS_Score'].apply(label_risk)

In [None]:
# Step 1.7: Export as CRS score table CSV file

In [10]:
output_path = '/content/drive/My Drive/IDS 575 Final project/community_crs_scores.csv'

# Select output columns
final_df = merged_a[[
    'Community Area', 'Total_Crimes', 'Violent_Ratio',
    'Night_Ratio', 'Arrest_Rate', 'CRS_Score', 'Risk_Level'
]]

# Save to CSV
final_df.to_csv(output_path, index=False)
print("CRS score table saved successfully!")

CRS score table saved successfully!


In [None]:
# Step 2: Locate crimes within 500 meters of user-provided address and calculate risk indicators

In [11]:
import pandas as pd
import numpy as np
import requests
from math import radians, cos, sin, asin, sqrt

# Load cleaned crime data
file_path = '/content/drive/My Drive/IDS 575 Final project/cleaned_crimes_data.csv'
df = pd.read_csv(file_path, parse_dates=['Date'])

# Check if required columns are present
print(df.columns)

Index(['ID', 'Case Number', 'Date', 'Block', 'IUCR', 'Primary Type',
       'Description', 'Location Description', 'Arrest', 'Domestic', 'Beat',
       'District', 'Ward', 'Community Area', 'FBI Code', 'X Coordinate',
       'Y Coordinate', 'Year', 'Updated On', 'Latitude', 'Longitude',
       'Location'],
      dtype='object')


In [None]:
# Step 2.2: Define function to get latitude and longitude from address (using Google Geocoding API)

In [12]:
from getpass import getpass
GOOGLE_API_KEY = getpass("Please enter your Google Maps API key:")

Please enter your Google Maps API key:··········


In [13]:
def get_lat_lon_from_address(address):
    base_url = "https://maps.googleapis.com/maps/api/geocode/json"
    params = {
        'address': address,
        'key': GOOGLE_API_KEY
    }
    response = requests.get(base_url, params=params).json()

    if response['status'] == 'OK':
        location = response['results'][0]['geometry']['location']
        return location['lat'], location['lng']
    else:
        print("Failed to parse address:", response['status'])
        return None, None

In [None]:
# Step 2.3: Define function to calculate distance between two points (Haversine formula)

In [14]:
def haversine_distance(lat1, lon1, lat2, lon2):
    R = 6371.0              # Radius of the Earth (in kilometers)
    lat1, lon1, lat2, lon2 = map(radians, [lat1, lon1, lat2, lon2])  # Convert to radians

    dlat = lat2 - lat1
    dlon = lon2 - lon1

    a = sin(dlat/2)**2 + cos(lat1)*cos(lat2)*sin(dlon/2)**2
    c = 2 * asin(sqrt(a))

    distance = R * c
    return distance * 1000  # Convert to meters

In [15]:
# Load mapping table of community area numbers and names
area_path = '/content/drive/My Drive/IDS 575 Final project/community_areas.csv'
area_df = pd.read_csv(area_path)
area_map = dict(zip(area_df['Community Area'], area_df['Area Name']))

In [None]:
# Step 2.4: Calculate crimes within 500 meters of a given address and output risk indicators

In [16]:
def analyze_risk_nearby(address):
    # Get latitude and longitude from the address
    lat, lon = get_lat_lon_from_address(address)
    if lat is None:
        return

    # Find all crimes within 500 meters
    df['distance'] = df.apply(lambda row: haversine_distance(lat, lon, row['Latitude'], row['Longitude']), axis=1)
    nearby = df[df['distance'] <= 500]

    if nearby.empty:
        print("No crime records within 500 meters of the address.")
        return

    # Identify the nearest crime to determine the community area
    nearest = nearby.loc[nearby['distance'].idxmin()]
    community_area = int(nearest['Community Area']) if not pd.isna(nearest['Community Area']) else 'Unknown'
    area_name = area_map.get(community_area, "Unknown Area")

    # Look up CRS score for the area
    crs_score_area = None
    if community_area is not None:
        match_row = final_df[final_df['Community Area'] == community_area]
        if not match_row.empty:
            crs_score_area = match_row.iloc[0]['CRS_Score']

    # Calculate four risk indicators
    C = len(nearby)

    violent_types = ['BATTERY', 'ASSAULT', 'ROBBERY', 'HOMICIDE']
    S = len(nearby[nearby['Primary Type'].isin(violent_types)]) / C

    # Extract hour from datetime
    hours = nearby['Date'].dt.hour
    T = len(nearby[(hours >= 18) | (hours < 6)]) / C

    A = len(nearby[nearby['Arrest'] == True]) / C

    # Calculate a simple CRS score (normalized proportions)
    w1, w2, w3, w4 = 0.4, 0.3, 0.2, 0.1
    C_norm = np.log1p(C) / 10  # log(1+C), scaled to roughly 0–1.5
    CRS = (w1*C_norm + w2*S + w3*T + w4*(1 - A)) * 100

    print(f"Address: {address}")
    print(f"Community Area: {community_area} - {area_name}")
    if crs_score_area is not None:
        print(f"CRS score for the area: {crs_score_area:>6.2f}")
    print("\nRisk data within 500 meters of the address:")
    print(f"Total crimes (C):            {C:>6}")
    print(f"Violent crime ratio (S):     {S:>6.2%}")
    print(f"Nighttime crime ratio (T):   {T:>6.2%}")
    print(f"Arrest rate (A):             {A:>6.2%}")
    print(f"CRS score (within 500m):     {CRS:>6.2f}")

In [17]:
user_input = input("Please enter the Chicago address you want to check (e.g., '233 S Wacker Dr, Chicago, IL'):\n")

analyze_risk_nearby(user_input)

Please enter the Chicago address you want to check (e.g., '233 S Wacker Dr, Chicago, IL'):
233 S Wacker Dr, Chicago, IL
Address: 233 S Wacker Dr, Chicago, IL
Community Area: 32 - Loop
CRS score for the area:  37.77

Risk data within 500 meters of the address:
Total crimes (C):              3138
Violent crime ratio (S):     20.01%
Nighttime crime ratio (T):   37.44%
Arrest rate (A):              8.86%
CRS score (within 500m):      54.81


In [18]:
import openai
from getpass import getpass
openai.api_key = getpass("Please enter your OpenAI API key:")

Please enter your OpenAI API key:··········


In [19]:
user_input = input("Please enter the full address risk report text:\n")

Please enter the full address risk report text:
Please enter the Chicago address you want to check (e.g., '233 S Wacker Dr, Chicago, IL'): 233 S Wacker Dr, Chicago, IL Address: 233 S Wacker Dr, Chicago, IL Community Area: 32 - Loop CRS score for the area:  37.77  Risk data within 500 meters of the address: Total crimes (C):              3138 Violent crime ratio (S):     20.01% Nighttime crime ratio (T):   37.44% Arrest rate (A):              8.86% CRS score (within 500m):      54.81


In [21]:
!pip install openai==0.28

Collecting openai==0.28
  Downloading openai-0.28.0-py3-none-any.whl.metadata (13 kB)
Downloading openai-0.28.0-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.5/76.5 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: openai
  Attempting uninstall: openai
    Found existing installation: openai 1.75.0
    Uninstalling openai-1.75.0:
      Successfully uninstalled openai-1.75.0
Successfully installed openai-0.28.0


In [20]:
# Compose prompt
prompt = f"""
Below is a safety analysis report for a specific address:

{user_input}

Based on this report, please provide 4–5 travel safety suggestions for tourists.
Use both the area-level score and the local risk indicators within 500 meters to support your advice.
Also, determine which part of the area (East, South, West, North) the address is located in,
and compare the local crime situation with the overall community area.
"""

# Step 3: Call ChatGPT API
response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a safety advisor specializing in travel around Chicago. You provide practical travel advice for tourists."},
        {"role": "user", "content": prompt}
    ],
    temperature=0.7
)

In [21]:
advice = response['choices'][0]['message']['content']
print("Travel Advice:\n")
print(advice)

Travel Advice:

1. Be Vigilant: Given the high total crime rate (3138 crimes) within a 500m radius of the address, it's important to remain vigilant and aware of your surroundings at all times. This is particularly true during the night, where the nighttime crime ratio is 37.44%. 

2. Avoid Late Night Walks: Considering that the nighttime crime ratio (37.44%) is higher than the daytime, it is safer to avoid late-night walks. If necessary, consider using a trusted taxi service or rideshare app for transportation during these hours.

3. Stick to Crowded Areas: With a violent crime ratio of 20.01%, tourists should stick to crowded and well-lit areas, especially at night. Avoid taking shortcuts through alleyways or less-trafficked streets. 

4. Secure Valuables: Given the overall crime rate, securing your valuables is crucial. Don't display expensive jewelry or electronics in public, and always make sure your belongings are secure and close to you.

5. Be Aware of Local Enforcement: The ar

Travel Advice:

1. Be Vigilant: Given the high total crime rate (3138 crimes) within a 500m radius of the address, it's important to remain vigilant and aware of your surroundings at all times. This is particularly true during the night, where the nighttime crime ratio is 37.44%.

2. Avoid Late Night Walks: Considering that the nighttime crime ratio (37.44%) is higher than the daytime, it is safer to avoid late-night walks. If necessary, consider using a trusted taxi service or rideshare app for transportation during these hours.

3. Stick to Crowded Areas: With a violent crime ratio of 20.01%, tourists should stick to crowded and well-lit areas, especially at night. Avoid taking shortcuts through alleyways or less-trafficked streets.

4. Secure Valuables: Given the overall crime rate, securing your valuables is crucial. Don't display expensive jewelry or electronics in public, and always make sure your belongings are secure and close to you.

5. Be Aware of Local Enforcement: The arrest rate of 8.86% implies that law enforcement is active in the area. Don’t hesitate to approach them for help or direction if needed.

The address is located in the Loop area, which is considered the central business district of Chicago. The Loop area has a CRS score of 37.77, which is lower than the local CRS score within 500m (54.81), indicating a higher crime rate in the immediate vicinity of the address compared to the overall community area.

Stay safe and enjoy your visit to Chicago!