# SNWK Competition Data Analysis

This notebook loads and analyzes Swedish Nose Work competition data, transforming it into a structured DataFrame for statistical analysis.

## 1. Load Competition Data

Load the JSON file containing competition results and display basic information about the dataset.

In [1]:
import json
import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter
import numpy as np
from nw_stats.config import ProjectPaths
import os

# Load competition data
filename = "snwk_competition_results_20251008_050303.json" 
filepath = os.path.join(ProjectPaths.DATA, filename)

try:
    with open(filepath, "r", encoding="utf-8") as f:
        competitions_data = json.load(f)
    
    print(f" Successfully loaded data from: {filename}")
    print(f" Dataset contains {len(competitions_data)} competitions")
    
    # Display basic information about the dataset
    if competitions_data:
        sample_comp = competitions_data[0]
        print(f"\n Sample competition:")
        print(f"   • Date: {sample_comp.get('datum', 'N/A')}")
        print(f"   • Location: {sample_comp.get('plats', 'N/A')}")
        print(f"   • Type: {sample_comp.get('typ', 'N/A')}")
        print(f"   • Class: {sample_comp.get('klass', 'N/A')}")
        print(f"   • Result sections: {len(sample_comp.get('resultat', []))}")
        
except FileNotFoundError:
    print(f" File '{filename}' not found in {ProjectPaths.DATA}")
except Exception as e:
    print(f"d Error reading file: {e}")

 Successfully loaded data from: snwk_competition_results_20251008_050303.json
 Dataset contains 4478 competitions

 Sample competition:
   • Date: 2025-10-07
   • Location: Öckerö
   • Type: TEM
   • Class: NW1
   • Result sections: 5


## 2. Data Transformation

Transform the nested competition data into a flat DataFrame with one row per participant search.

**Note**: This structure organizes data by individual searches rather than competitions, which means diploma information is not directly available. Always filter by search type for meaningful analysis.

In [2]:
def convert_time_to_seconds(time_str):
    """
    Convert time string formats to seconds for analysis.
    
    Args:
        time_str: Time in format 'MM:SS,ss' or 'HH:MM:SS,ss'
        
    Returns:
        float: Time in seconds, or None if invalid
        
    Examples:
        '02:30,45' -> 150.45 seconds
        '1:05:30,12' -> 3930.12 seconds
    """
    if pd.isna(time_str) or time_str == '':
        return None
    
    try:
        # Replace comma with dot for decimals
        time_str = str(time_str).replace(',', '.')
        
        if ':' in time_str:
            parts = time_str.split(':')
            if len(parts) == 2:  # MM:SS.ss
                minutes = float(parts[0])
                seconds = float(parts[1])
                return minutes * 60 + seconds
            elif len(parts) == 3:  # HH:MM:SS.ss
                hours = float(parts[0])
                minutes = float(parts[1]) 
                seconds = float(parts[2])
                return hours * 3600 + minutes * 60 + seconds
        else:
            # Just seconds
            return float(time_str)
    except:
        return None

In [4]:
def create_participants_dataframe(competitions_data):
    """
    Transform nested competition data into a flat DataFrame.
    
    Args:
        competitions_data: List of competition dictionaries from JSON
        
    Returns:
        pd.DataFrame: Flattened data with one row per participant search
    """
    participants_list = []
    
    for comp in competitions_data:
        # Extract competition metadata
        comp_date = comp.get('datum', '')
        comp_location = comp.get('plats', '')
        comp_type = comp.get('typ', '')
        comp_class = comp.get('klass', '')
        comp_organizer = comp.get('arrangör', '')
        comp_coordinator = comp.get('anordnare', '')
        
        # Process each result set (different search types/moments)
        for result_set in comp.get('resultat', []):
            search_type = result_set.get('sök', '')
            judges = result_set.get('domare', [])
            judge_names = ', '.join(judges) if judges else ''
            
            # Process each participant in this result set
            for participant in result_set.get('tabell', []):
                participant_row = {
                    # Competition information
                    'klass': comp_class,
                    'datum': comp_date,
                    'plats': comp_location,
                    'typ': comp_type,
                    'arrangör': comp_organizer,
                    'anordnare': comp_coordinator,
                    'typ_av_sök': search_type,
                    'domare': judge_names,
                    
                    # Participant information
                    'förare': participant.get('handler', ''),
                    'hund_namn': participant.get('dog_call_name', ''),
                    'stamtavlenamn': participant.get('dog_full_name', ''),
                    'hundras': participant.get('dog_breed', ''),
                    'start_position': participant.get('start_number', ''),
                    'placering': participant.get('placement', ''),
                    'poäng': participant.get('points', ''),
                    'fel': participant.get('faults', ''),
                    'tid': convert_time_to_seconds(participant.get('time', ''))
                }
                
                participants_list.append(participant_row)
    
    return pd.DataFrame(participants_list)

# Transform data to DataFrame
print(" Creating participants DataFrame...")
df_participants = create_participants_dataframe(competitions_data)

print(f" DataFrame created successfully!")
print(f" Records: {len(df_participants):,}")
print(f" Columns: {len(df_participants.columns)}")
print(f" Memory usage: {df_participants.memory_usage().sum() / 1024:.1f} KB")

# Display sample of the data
print(f"\n Sample data:")
df_participants.head()

 Creating participants DataFrame...
 DataFrame created successfully!
 Records: 679,085
 Columns: 17
 Memory usage: 90191.1 KB

 Sample data:
 DataFrame created successfully!
 Records: 679,085
 Columns: 17
 Memory usage: 90191.1 KB

 Sample data:


Unnamed: 0,klass,datum,plats,typ,arrangör,anordnare,typ_av_sök,domare,förare,hund_namn,stamtavlenamn,hundras,start_position,placering,poäng,fel,tid
0,NW1,2025-10-07,Öckerö,TEM,Svenska Nose Work Klubben,Hubbes Hundkurser,total,okänd,Morten Hellenberg,Vilja,Becca,Labrador retriever,20,1,100,0,90.89
1,NW1,2025-10-07,Öckerö,TEM,Svenska Nose Work Klubben,Hubbes Hundkurser,total,okänd,Annika Malmhäll,Flora,Suki-Yaki's Flora Fantasia,Shih tzu,15,2,100,0,101.71
2,NW1,2025-10-07,Öckerö,TEM,Svenska Nose Work Klubben,Hubbes Hundkurser,total,okänd,Ewa Carlsson,Zally,Tufflaz Head In The Clouds,Lagotto romagnolo,23,3,100,0,117.36
3,NW1,2025-10-07,Öckerö,TEM,Svenska Nose Work Klubben,Hubbes Hundkurser,total,okänd,Gull-Britt Samuelsson,Ester,Ester,Dansk-svensk gårdshund,18,4,100,0,124.53
4,NW1,2025-10-07,Öckerö,TEM,Svenska Nose Work Klubben,Hubbes Hundkurser,total,okänd,Eva Andersson,Chico,Weiefors Just A Miracle,Papillon,6,5,100,0,161.31


## 3. Data Exploration

Explore the dataset to understand the data distribution and quality.

In [7]:
# Explore data categories and distributions
print(" Data Categories:")
print(f"   • Search types: {sorted(df_participants['typ_av_sök'].unique())}")
print(f"   • Competition types: {sorted(df_participants['typ'].unique())}")
print(f"   • Classes: {sorted(df_participants['klass'].unique())}")

print(f"\n Data Counts:")
print(f"   • Unique dogs: {df_participants['stamtavlenamn'].nunique():,}")
print(f"   • Unique handlers: {df_participants['förare'].nunique():,}")
print(f"   • Unique locations: {df_participants['plats'].nunique():,}")
print(f"   • Unique organizers: {df_participants['arrangör'].nunique():,}")

# Example analysis: Performance by search type for a specific dog
example_dog = "Springer Nova's Nemo Of Deye"
print(f"\n Example Analysis - {example_dog}:")

dog_data = df_participants[df_participants['stamtavlenamn'] == example_dog]
if len(dog_data) > 0:
    for search_type in ['Inomhus', 'Fordon', 'Behållare']:
        search_data = dog_data[dog_data['typ_av_sök'] == search_type]
        if len(search_data) > 0:
            avg_points = search_data['poäng'].mean()
            count = len(search_data)
            print(f"   • {search_type}: {avg_points:.1f} avg points ({count} searches)")
        else:
            print(f"   • {search_type}: No data")
else:
    print("   No data found for this dog")


 Data Categories:
   • Search types: ['Behållare', 'Fordon', 'Inomhus', 'Utomhus', 'total']
   • Competition types: ['TEM', 'TSM']
   • Classes: ['NW1', 'NW2', 'NW3']

 Data Counts:
   • Unique dogs: 8,397
   • Unique handlers: 6,642
   • Unique locations: 529
   • Unique organizers: 122

 Example Analysis - Springer Nova's Nemo Of Deye:
   • Inomhus: 20.3 avg points (76 searches)
   • Fordon: 20.4 avg points (52 searches)
   • Behållare: 20.0 avg points (64 searches)
   • Unique handlers: 6,642
   • Unique locations: 529
   • Unique organizers: 122

 Example Analysis - Springer Nova's Nemo Of Deye:
   • Inomhus: 20.3 avg points (76 searches)
   • Fordon: 20.4 avg points (52 searches)
   • Behållare: 20.0 avg points (64 searches)


## 4. Streamlit Dashboard

The data has been used to create an interactive Streamlit dashboard for exploring competition statistics.

In [9]:
# Instructions for running the Streamlit dashboard
print(" Streamlit Dashboard Available!")
print(" Location: ../streamlit_app.py")
print("\n To run the interactive dashboard:")
print("   1. Open terminal")
print("   2. cd /home/loke/projects/nw_stats/nw_stats/streamlit_app")
print("   3. streamlit run streamlit_app.py")
print("\n The dashboard will open in your browser at http://localhost:8501")
print("\n Features:")
print("   • Interactive filters by competition type, search type, and class")
print("   • Points distribution visualization")
print("   • Top performing dogs analysis")
print("   • Individual dog performance tracking")
print("   • Raw data exploration")

 Streamlit Dashboard Available!
 Location: ../streamlit_app.py

 To run the interactive dashboard:
   1. Open terminal
   2. cd /home/loke/projects/nw_stats/nw_stats/streamlit_app
   3. streamlit run streamlit_app.py

 The dashboard will open in your browser at http://localhost:8501

 Features:
   • Interactive filters by competition type, search type, and class
   • Points distribution visualization
   • Top performing dogs analysis
   • Individual dog performance tracking
   • Raw data exploration
