# Create Location Object

This notebook processes location sensor data to create behavior events and objects that represent a user's location patterns. The process involves:

1. Identifying when a user enters or exits specific locations (geofences)
2. Detecting when a user is in transit between locations
3. Creating location segments that represent stays at different locations
4. Linking these events and objects to the original sensor data

######## Data Structure

The location events and objects follow this structure:

### Location Events
- **Type**: `location_event`
- **Attributes**:
  - `lifecycle`: "Entering" or "Exiting"
  - `location_type`: Type of location (e.g., "home", "work", "in_transit", "other")
- **Relationships**: Links to the source sensor event

### Location Segments
- **Type**: `location_segment`
- **Attributes**:
  - `location_type`: Type of location
  - `start_time`: When the segment began
  - `end_time`: When the segment ended
- **Relationships**: Links to the Entering and Exiting events

## Location State Detection

The system determines a user's location state using these rules:

1. **In Transit**:
   - Distance between consecutive points > 50 meters
   - Time between points < 2 minutes

2. **In Geofence**:
   - User is within the radius of a defined location
   - Not in transit

3. **Other**:
   - Not in transit
   - Not in any defined geofence


## 1. Setup

In [1]:
import pandas as pd
import sys
sys.path.append('..')
import pandas as pd
import json
from datetime import datetime, timedelta
from pathlib import Path
from src.oced.location_objects import LocationEventManager
from src.oced.time_objects import TimeObject
import matplotlib.pyplot as plt
import seaborn as sns
from src.oced.oced_data_query import OCEDDataQuery


# 2. Load OCED Data
First, we load the OCED data and extract the location sensor events. These events contain:
- Timestamp
- Latitude and longitude coordinates
- Additional sensor data (altitude, speed, error)

In [None]:
# Get OCED-mHealth Data from JSON File
dataQuery = OCEDDataQuery()  
oced_data_file = f"stress_self_reports.json"
data_dict = dataQuery.load_json(oced_data_file)

# Quick look at the data structure
print(f"Number of behavior events: {len(data_dict.get('behaviorEvents', []))}")
print(f"Number of objects: {len(data_dict.get('objects', []))}")

# Diagnostic code to inspect notification events
location_sensor_events = [
    notif for notif in data_dict.get('sensorEvents', [])
    if notif['sensorEventType'] == 'location'
]

print(f"Number of location sensor events: {len(location_sensor_events)}")
location_sensor_events

## 3. Define Location Geofences

Location geofences define the boundaries of important locations. Each geofence has:
- A unique name (e.g., "home", "work")
- Center coordinates (latitude, longitude)
- Radius in meters

These geofences will be used to determine when a user enters or exits these locations.

In [None]:
location_geofences = {
    'home': {
        'latitude': 52.09733976415986, 
        'longitude': 5.109440239875549,
        'radius': 50  # meters
    },
    'work': {
        'latitude': 51.44792083437223,
        'longitude': 5.486085861578487,
        'radius': 200
    },
    'gym': {
        'latitude': 52.09881115258374,
        'longitude': 5.11129722583215,
        'radius': 50
    }
}

# Visualize geofences on a map
def plot_geofences(geofences):
    plt.figure(figsize=(10, 10))
    
    # Plot each geofence
    for name, geofence in geofences.items():
        plt.scatter(geofence['longitude'], geofence['latitude'], 
                   label=name, s=geofence['radius']*2)
    
    plt.title('Location Geofences')
    plt.xlabel('Longitude')
    plt.ylabel('Latitude')
    plt.legend()
    plt.grid(True)
    plt.show()

plot_geofences(location_geofences)

## 4. Process Location Sensor Events Data

#### 4.1. Initializes the LocationEventManager

In [4]:
location_manager = LocationEventManager()

#### 4.2. Creates necessary event and object types

In [None]:
# Create event types
print("Creating location event types...")
data_with_types = location_manager.create_location_event_type(data_dict)
event_types = [type for type in data_dict.get('behaviorEventTypes', [])]
event_types

In [None]:
# Create object types
print("Creating location object types...")
data_with_types = location_manager.create_location_object_type(data_with_types)
object_types = [type for type in data_dict.get('objectTypes', [])]
object_types

#### 4.3. Processes sensor events to create location events and segments

In [None]:
# Get location sensor events
sensor_events = [event for event in data_dict.get('sensorEvents', [])
                    if event.get('sensorEventType') == 'location']
print(f"Found {len(sensor_events)} location sensor events")

# Get the user ID from the data
user_objects = [obj for obj in data_dict.get('objects', []) if obj['type'] == 'player']
if not user_objects:
    raise ValueError("No user object found in the data")
user_id = user_objects[0]['id']
print(f"Using user ID: {user_id}")
    
# Create location events and objects
print("Creating location events and segments...")
extended_data, location_events = location_manager.create_location_events_and_objects(
    data=data_dict,
    sensor_events=sensor_events,
    user_id=user_id,
    location_geofences=location_geofences,
    transit_distance_threshold=50.0,  # meters
    transit_time_threshold=timedelta(minutes=2),
    min_segment_duration=timedelta(minutes=5),
    invalid_gps_duration_threshold=timedelta(minutes=300),
    default_home_geofence="home"
    )

# Print statistics
print("\nStatistics:")
print(f"Total location events created: {len(location_events)}")
print(f"Total location segments created: {len(location_manager.location_objects)}")

location_manager.location_objects

## 5. Create visualizations
We'll create two visualization functions:
1. A function to get events and segments for a specific day
2. A function to create the actual visualization plots

The visualization will show:
- Location events (Entering/Exiting) on a timeline
- Location segments as horizontal bars
- Different colors for different location types
- Time information for each event and segment

In [8]:
def get_day_events(extended_data, date_str):
    """Get all events and segments for a specific day."""
    day_events = {
        'sensor': [],
        'behavior': []
    }
    
    # Filter sensor events
    for event in extended_data.get('sensorEvents', []):
        event_time = datetime.fromisoformat(event['time'].replace('Z', '+00:00'))
        if event_time.strftime('%Y-%m-%d') == date_str:
            day_events['sensor'].append(event)
    
    # Filter behavior events (location events)
    for event in extended_data.get('behaviorEvents', []):
        if event['behaviorEventType'] == 'location_event':
            event_time = datetime.fromisoformat(event['time'].replace('Z', '+00:00'))
            if event_time.strftime('%Y-%m-%d') == date_str:
                day_events['behavior'].append(event)
    
    # Get location segments
    day_segments = []
    for obj in extended_data.get('objects', []):
        if obj['type'] == 'location_segment':
            start_time = datetime.fromisoformat(next(attr['value'] for attr in obj['attributes'] 
                                                   if attr['name'] == 'start_time'))
            if start_time.strftime('%Y-%m-%d') == date_str:
                day_segments.append(obj)
    
    return day_events, day_segments

def visualize_location_day(extended_data, date_str, location_geofences):
    """Create visualization of location events and segments for a specific day."""
    # Get events and segments for the day
    day_events, day_segments = get_day_events(extended_data, date_str)
    
    # Create figure with two subplots
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(15, 10), height_ratios=[1, 2])
    
    # Plot 1: Location events
    event_types = ['Entering', 'Exiting']
    colors = {'Entering': 'green', 'Exiting': 'red'}
    
    for event in day_events['behavior']:
        event_time = datetime.fromisoformat(event['time'].replace('Z', '+00:00'))
        lifecycle = next(attr['value'] for attr in event['attributes'] 
                        if attr['name'] == 'lifecycle')
        location_type = next(attr['value'] for attr in event['attributes'] 
                           if attr['name'] == 'location_type')
        
        # Plot event
        ax1.scatter(event_time, 0, color=colors[lifecycle], marker='|', s=100)
        ax1.text(event_time, 0.1, f"{lifecycle}\n{location_type}", 
                rotation=45, ha='right', va='bottom')
    
    ax1.set_title('Location Events')
    ax1.set_yticks([])
    ax1.grid(True, alpha=0.3)
    
    # Plot 2: Location segments
    segment_types = sorted(set(segment['attributes'][0]['value'] 
                             for segment in day_segments))
    colors = plt.cm.Set3(np.linspace(0, 1, len(segment_types)))
    color_map = dict(zip(segment_types, colors))
    
    for i, segment in enumerate(day_segments):
        start_time = datetime.fromisoformat(next(attr['value'] for attr in segment['attributes'] 
                                               if attr['name'] == 'start_time'))
        end_time = datetime.fromisoformat(next(attr['value'] for attr in segment['attributes'] 
                                             if attr['name'] == 'end_time'))
        location_type = next(attr['value'] for attr in segment['attributes'] 
                           if attr['name'] == 'location_type')
        
        # Plot segment
        ax2.barh(i, (end_time - start_time).total_seconds() / 3600, 
                left=start_time.hour + start_time.minute/60,
                color=color_map[location_type], alpha=0.7)
        ax2.text(start_time.hour + start_time.minute/60, i, 
                f"{location_type}\n{start_time.strftime('%H:%M')}-{end_time.strftime('%H:%M')}",
                va='center')
    
    ax2.set_title('Location Segments')
    ax2.set_xlabel('Hour of Day')
    ax2.set_yticks([])
    ax2.set_xlim(0, 24)
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    return fig

In [None]:
date_to_visualize = "2025-05-11"  # Replace with actual date
import numpy as np
fig = visualize_location_day(extended_data, date_to_visualize, location_geofences)
plt.show()


## Diagnostics 

In [None]:
def analyze_location_data(extended_data, location_manager):
    """Analyze location events and segments to understand their relationship."""
    # Get all location events
    location_events = [event for event in extended_data.get('behaviorEvents', [])
                      if event['behaviorEventType'] == 'location_event']
    
    # Count events by type
    entering_events = [e for e in location_events 
                      if next(attr['value'] for attr in e['attributes'] 
                             if attr['name'] == 'lifecycle') == 'Entering']
    exiting_events = [e for e in location_events 
                     if next(attr['value'] for attr in e['attributes'] 
                            if attr['name'] == 'lifecycle') == 'Exiting']
    
    # Get all segments
    segments = [obj for obj in extended_data.get('objects', [])
               if obj['type'] == 'location_segment']
    
    # Analyze relationships
    print(f"Total location events: {len(location_events)}")
    print(f"  - Entering events: {len(entering_events)}")
    print(f"  - Exiting events: {len(exiting_events)}")
    print(f"Total location segments: {len(segments)}")
    
    # Check for orphaned events (events not linked to segments)
    orphaned_events = []
    for event in location_events:
        event_id = event['id']
        is_linked = False
        for segment in segments:
            if any(rel['behaviorEvent'] == event_id 
                  for rel in segment.get('relationships', [])):
                is_linked = True
                break
        if not is_linked:
            orphaned_events.append(event)
    
    print(f"\nOrphaned events (not linked to any segment): {len(orphaned_events)}")
    if orphaned_events:
        print("Example orphaned event:")
        print(json.dumps(orphaned_events[0], indent=2))
    
    # Check for segments with missing events
    incomplete_segments = []
    for segment in segments:
        has_enter = False
        has_exit = False
        for rel in segment.get('relationships', []):
            if rel['type'] == 'starts_with':
                has_enter = True
            elif rel['type'] == 'ends_with':
                has_exit = True
        if not (has_enter and has_exit):
            incomplete_segments.append(segment)
    
    print(f"\nSegments with missing events: {len(incomplete_segments)}")
    if incomplete_segments:
        print("Example incomplete segment:")
        print(json.dumps(incomplete_segments[0], indent=2))

print("\nAnalyzing location events and segments...")
analyze_location_data(extended_data, location_manager)

## 7. Add relationships with objects and events

#### 7.1. Add relationships to Physical Activity Bout Objects and Events

In [None]:
# Get all PA bout objects from extended_data
pa_bout_objects = [
    obj for obj in extended_data.get('objects', [])
    if obj['type'] == 'physical_activity_bout'  # Adjust this if your PA bout type has a different name
]

# Add relationships to between PA bouts objects and location objects
updated_data = location_manager.relate_location_to_pa_bouts(
    extended_data,
    pa_bout_objects
)

# You can verify the relationships were added by checking a few location segments
# For example, to see relationships for the first location segment:
location_segments = [
    obj for obj in updated_data.get('objects', [])
    if obj['type'] == 'location_segment'
]

if location_segments:
    first_segment = location_segments[0]
    print("Location segment ID:", first_segment['id'])
    print("Location type:", next(attr['value'] for attr in first_segment['attributes'] 
                                if attr['name'] == 'location_type'))
    print("Overlapping PA bouts:")
    for rel in first_segment.get('relationships', []):
        if rel['qualifier'] == 'overlaps_with_pa_bout':
            print(f"- PA bout ID: {rel['id']}")

In [None]:
# Now, relate PA bout atomic events to locations
updated_data = location_manager.relate_pa_events_to_locations(
    extended_data=updated_data,
    pa_event_type="physical_activity_bout_event"  # Adjust this to match your PA bout event type
)

# Validation code to check the relationships
# 1. Check some PA events and their location relationships
pa_events = [
    event for event in updated_data.get('behaviorEvents', [])
    if event['behaviorEventType'] == "pa_bout_event"  # Adjust to match your event type
]

print("Validating PA event to location relationships:")
print("-" * 50)

# Check first 5 PA events (or fewer if there are less)
for pa_event in pa_events[:5]:
    print(f"\nPA Event ID: {pa_event['id']}")
    print(f"Time: {pa_event['time']}")
    
    # Get PA bout attributes (if any)
    pa_attributes = {attr['name']: attr['value'] for attr in pa_event.get('attributes', [])}
    print("PA Event attributes:", pa_attributes)
    
    # Get location relationships
    location_rels = [
        rel for rel in pa_event.get('relationships', [])
        if rel['qualifier'] == 'occurred_in_location'
    ]
    
    if location_rels:
        print("Found in location segments:")
        for rel in location_rels:
            # Find the location segment object
            loc_segment = next(
                (obj for obj in updated_data.get('objects', [])
                 if obj['id'] == rel['id'] and obj['type'] == 'location_segment'),
                None
            )
            if loc_segment:
                # Get location type
                loc_type = next(
                    (attr['value'] for attr in loc_segment['attributes']
                     if attr['name'] == 'location_type'),
                    "unknown"
                )
                # Get time range
                start_time = next(
                    (attr['value'] for attr in loc_segment['attributes']
                     if attr['name'] == 'start_time'),
                    "unknown"
                )
                end_time = next(
                    (attr['value'] for attr in loc_segment['attributes']
                     if attr['name'] == 'end_time'),
                    "unknown"
                )
                print(f"- Location: {loc_type}")
                print(f"  Time range: {start_time} to {end_time}")
    else:
        print("No location segment found for this PA event")

# 2. Print some statistics
total_pa_events = len(pa_events)
events_with_locations = sum(
    1 for event in pa_events
    if any(rel['qualifier'] == 'occurred_in_location' 
           for rel in event.get('relationships', []))
)

print("\nRelationship Statistics:")
print("-" * 50)
print(f"Total PA events: {total_pa_events}")
print(f"Events with location relationships: {events_with_locations}")
print(f"Events without location relationships: {total_pa_events - events_with_locations}")
print(f"Coverage: {(events_with_locations/total_pa_events)*100:.1f}%")