# Family Activity Ranking Model

This notebook implements a machine learning model that merges group members' ages and power levels (overall ability) to rank activities suitable for the entire group.

## Features:
- Age-based activity filtering
- Power/ability-based scoring
- Group compatibility analysis
- Multi-factor ranking algorithm

## Google Colab Compatible
Upload your `dataset/dataset empty -open space-.csv` file when running on Colab.

## 1. Import Required Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from typing import List, Dict, Tuple
import warnings
warnings.filterwarnings('ignore')

# Set style for visualizations
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("✓ Libraries imported successfully")

## 2. Define Group Member Structure

Each member has:
- **age**: Member's age in years
- **overall_ability**: Power level (1-10 scale) representing physical and cognitive capabilities

In [None]:
class GroupMember:
    """Represents a family member with age and ability attributes"""
    
    def __init__(self, name: str, age: int, overall_ability: float, 
                 interests: List[str] = None, special_needs: List[str] = None):
        self.name = name
        self.age = age
        self.overall_ability = overall_ability  # 1-10 scale
        self.interests = interests or []
        self.special_needs = special_needs or []
    
    def __repr__(self):
        return f"{self.name} (Age: {self.age}, Ability: {self.overall_ability})"


class FamilyGroup:
    """Manages a group of family members"""
    
    def __init__(self, members: List[GroupMember] = None):
        self.members = members or []
    
    def add_member(self, member: GroupMember):
        self.members.append(member)
    
    def get_age_range(self) -> Tuple[int, int]:
        """Returns (min_age, max_age) of the group"""
        if not self.members:
            return (0, 0)
        ages = [m.age for m in self.members]
        return (min(ages), max(ages))
    
    def get_avg_ability(self) -> float:
        """Returns average overall_ability of the group"""
        if not self.members:
            return 0.0
        return np.mean([m.overall_ability for m in self.members])
    
    def get_ability_range(self) -> Tuple[float, float]:
        """Returns (min_ability, max_ability) of the group"""
        if not self.members:
            return (0.0, 0.0)
        abilities = [m.overall_ability for m in self.members]
        return (min(abilities), max(abilities))
    
    def __repr__(self):
        return f"FamilyGroup({len(self.members)} members)"


print("✓ Group member classes defined")

## 3. Create Sample Family Group

Define your family members here with their ages and ability levels.

In [None]:
# Example family group
sample_group = FamilyGroup([
    GroupMember("Emma", age=10, overall_ability=7.0, 
                interests=["Arts & Crafts", "Reading", "Nature"]),
    GroupMember("Liam", age=6, overall_ability=5.0, 
                interests=["Sports", "Building", "Outdoors"]),
    GroupMember("Sophia", age=13, overall_ability=8.5, 
                interests=["Music", "Dance", "Art"]),
    GroupMember("Dad", age=42, overall_ability=6.0, 
                interests=["Sports", "Hiking", "Cooking"]),
])

print("Family Group Summary:")
print("=" * 50)
for member in sample_group.members:
    print(f"  {member}")
print("=" * 50)
print(f"Age Range: {sample_group.get_age_range()[0]}-{sample_group.get_age_range()[1]} years")
print(f"Average Ability: {sample_group.get_avg_ability():.2f}/10")
print(f"Ability Range: {sample_group.get_ability_range()[0]}-{sample_group.get_ability_range()[1]}")

## 4. Load Activity Dataset

### For Google Colab:
Upload your CSV file using the code below, or mount Google Drive.

In [None]:
# Uncomment for Google Colab file upload
# from google.colab import files
# uploaded = files.upload()
# csv_path = list(uploaded.keys())[0]  # Get uploaded filename

# For local execution
csv_path = "dataset/dataset empty -open space-.csv"

# Load activities
def load_activities(file_path: str) -> pd.DataFrame:
    """Load and preprocess activity dataset"""
    df = pd.read_csv(file_path)
    
    # Clean and convert data types
    df['age_min'] = pd.to_numeric(df['age_min'], errors='coerce')
    df['age_max'] = pd.to_numeric(df['age_max'], errors='coerce')
    df['duration_mins'] = pd.to_numeric(df['duration_mins'], errors='coerce')
    
    # Parse tags (handle string representation of lists)
    df['tags'] = df['tags'].apply(lambda x: [tag.strip() for tag in str(x).split(',')])
    
    return df

activities_df = load_activities(csv_path)

print(f"✓ Loaded {len(activities_df)} activities")
print(f"\nSample activities:")
print(activities_df[['title', 'age_min', 'age_max', 'duration_mins', 'tags']].head(10))

## 5. Activity Ranking Model

This model ranks activities based on multiple factors:
1. **Age Fit Score**: How well the activity age range matches the group
2. **Ability Score**: How the activity difficulty aligns with group ability
3. **Coverage Score**: What percentage of the group can participate
4. **Diversity Score**: Whether activity suits different ability levels

In [None]:
class ActivityRankingModel:
    """Model to rank activities for a family group based on age and ability"""
    
    def __init__(self, group: FamilyGroup, activities: pd.DataFrame):
        self.group = group
        self.activities = activities
        self.ranked_activities = None
    
    def calculate_age_fit_score(self, activity_row) -> float:
        """
        Calculate how well activity age range fits the group.
        Returns score 0-1 (higher is better)
        """
        group_min, group_max = self.group.get_age_range()
        activity_min = activity_row['age_min']
        activity_max = activity_row['age_max']
        
        # Check if activity range overlaps with group range
        if activity_max < group_min or activity_min > group_max:
            return 0.0  # No overlap
        
        # Calculate overlap percentage
        overlap_min = max(activity_min, group_min)
        overlap_max = min(activity_max, group_max)
        overlap_size = overlap_max - overlap_min
        
        group_range = group_max - group_min + 1
        activity_range = activity_max - activity_min + 1
        
        # Score based on how much of the group range is covered
        coverage = overlap_size / group_range if group_range > 0 else 1.0
        
        return min(coverage, 1.0)
    
    def calculate_member_coverage(self, activity_row) -> float:
        """
        Calculate what percentage of group members can participate.
        Returns score 0-1 (1 = all members can participate)
        """
        if not self.group.members:
            return 0.0
        
        activity_min = activity_row['age_min']
        activity_max = activity_row['age_max']
        
        eligible_count = sum(
            1 for member in self.group.members 
            if activity_min <= member.age <= activity_max
        )
        
        return eligible_count / len(self.group.members)
    
    def estimate_difficulty(self, activity_row) -> float:
        """
        Estimate activity difficulty based on age range and tags.
        Returns difficulty score 1-10 (10 = most difficult)
        """
        # Base difficulty on minimum age requirement
        age_difficulty = activity_row['age_min'] / 2.0  # Scale to roughly 1-10
        
        # Adjust based on tags
        difficulty_modifiers = {
            'exercise': 1.5,
            'sports': 2.0,
            'coordination': 1.5,
            'STEM': 1.5,
            'problem solving': 2.0,
            'balance': 1.0,
            'sensory': -0.5,
            'fun': -0.5,
        }
        
        tags = activity_row['tags']
        modifier = sum(
            difficulty_modifiers.get(tag.strip().lower(), 0) 
            for tag in tags
        )
        
        difficulty = age_difficulty + modifier
        return np.clip(difficulty, 1, 10)
    
    def calculate_ability_score(self, activity_row) -> float:
        """
        Calculate how well activity difficulty matches group ability.
        Returns score 0-1 (higher = better match)
        """
        activity_difficulty = self.estimate_difficulty(activity_row)
        group_avg_ability = self.group.get_avg_ability()
        
        # Score based on how close difficulty is to average ability
        # Activities slightly below average ability are preferred (more accessible)
        difference = abs(activity_difficulty - (group_avg_ability - 1))
        
        # Convert difference to score (smaller difference = higher score)
        score = 1.0 - (difference / 10.0)
        return max(0, score)
    
    def calculate_diversity_score(self, activity_row) -> float:
        """
        Calculate whether activity can accommodate different ability levels.
        Returns score 0-1 (higher = more inclusive)
        """
        activity_age_range = activity_row['age_max'] - activity_row['age_min']
        
        # Wider age ranges typically accommodate more ability levels
        diversity = min(activity_age_range / 10.0, 1.0)
        
        return diversity
    
    def calculate_composite_score(self, activity_row, weights: Dict[str, float] = None) -> float:
        """
        Calculate weighted composite score for an activity.
        
        Args:
            activity_row: Activity data
            weights: Dictionary of weights for each component score
        
        Returns:
            Composite score 0-100
        """
        if weights is None:
            weights = {
                'age_fit': 0.30,       # 30% - Age appropriateness is crucial
                'coverage': 0.30,      # 30% - How many can participate
                'ability': 0.25,       # 25% - Difficulty match
                'diversity': 0.15,     # 15% - Inclusivity
            }
        
        scores = {
            'age_fit': self.calculate_age_fit_score(activity_row),
            'coverage': self.calculate_member_coverage(activity_row),
            'ability': self.calculate_ability_score(activity_row),
            'diversity': self.calculate_diversity_score(activity_row),
        }
        
        composite = sum(scores[key] * weights[key] for key in weights.keys())
        
        # Store individual scores in activity row for debugging
        for key, value in scores.items():
            activity_row[f'score_{key}'] = value
        
        return composite * 100  # Scale to 0-100
    
    def rank_activities(self, top_n: int = None, weights: Dict[str, float] = None) -> pd.DataFrame:
        """
        Rank all activities for the group.
        
        Args:
            top_n: Return only top N activities (None = all)
            weights: Custom weights for scoring components
        
        Returns:
            DataFrame of ranked activities with scores
        """
        # Calculate composite score for each activity
        ranked = self.activities.copy()
        ranked['composite_score'] = ranked.apply(
            lambda row: self.calculate_composite_score(row, weights), 
            axis=1
        )
        
        # Sort by score (descending)
        ranked = ranked.sort_values('composite_score', ascending=False)
        
        # Filter out activities with zero score (completely inappropriate)
        ranked = ranked[ranked['composite_score'] > 0]
        
        # Add rank column
        ranked['rank'] = range(1, len(ranked) + 1)
        
        self.ranked_activities = ranked
        
        if top_n:
            return ranked.head(top_n)
        return ranked
    
    def get_recommendations(self, n: int = 10) -> pd.DataFrame:
        """
        Get top N recommended activities with detailed scores.
        """
        if self.ranked_activities is None:
            self.rank_activities()
        
        cols = ['rank', 'title', 'composite_score', 'score_age_fit', 
                'score_coverage', 'score_ability', 'score_diversity',
                'age_min', 'age_max', 'duration_mins', 'tags']
        
        return self.ranked_activities[cols].head(n)


print("✓ Activity Ranking Model defined")

## 6. Generate Activity Rankings

Apply the model to rank activities for your family group.

In [None]:
# Create model instance
model = ActivityRankingModel(sample_group, activities_df)

# Rank activities
ranked_activities = model.rank_activities()

print(f"✓ Ranked {len(ranked_activities)} suitable activities")
print(f"\nTop 15 Recommended Activities for Your Group:")
print("=" * 100)

recommendations = model.get_recommendations(n=15)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 50)

print(recommendations.to_string(index=False))

## 7. Visualize Results

In [None]:
# Visualization 1: Score Distribution
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Overall score distribution
axes[0, 0].hist(ranked_activities['composite_score'], bins=30, color='steelblue', edgecolor='black')
axes[0, 0].set_title('Distribution of Composite Scores', fontsize=14, fontweight='bold')
axes[0, 0].set_xlabel('Composite Score')
axes[0, 0].set_ylabel('Number of Activities')
axes[0, 0].axvline(ranked_activities['composite_score'].mean(), color='red', 
                   linestyle='--', label=f'Mean: {ranked_activities["composite_score"].mean():.1f}')
axes[0, 0].legend()

# Component scores for top 20 activities
top_20 = ranked_activities.head(20)
score_cols = ['score_age_fit', 'score_coverage', 'score_ability', 'score_diversity']
score_data = top_20[score_cols].values

x = np.arange(len(top_20))
width = 0.2

for i, col in enumerate(score_cols):
    label = col.replace('score_', '').replace('_', ' ').title()
    axes[0, 1].bar(x + i * width, top_20[col], width, label=label)

axes[0, 1].set_title('Component Scores - Top 20 Activities', fontsize=14, fontweight='bold')
axes[0, 1].set_xlabel('Activity Rank')
axes[0, 1].set_ylabel('Score (0-1)')
axes[0, 1].legend()
axes[0, 1].set_xticks(x + width * 1.5)
axes[0, 1].set_xticklabels(range(1, 21), rotation=0)

# Age range coverage
top_15 = ranked_activities.head(15)
activity_names = [title[:20] + '...' if len(title) > 20 else title for title in top_15['title']]

axes[1, 0].barh(activity_names, top_15['composite_score'], color='teal')
axes[1, 0].set_title('Top 15 Activities by Score', fontsize=14, fontweight='bold')
axes[1, 0].set_xlabel('Composite Score')
axes[1, 0].invert_yaxis()

# Member coverage analysis
coverage_dist = ranked_activities['score_coverage'].value_counts().sort_index()
axes[1, 1].bar(coverage_dist.index, coverage_dist.values, color='coral', edgecolor='black')
axes[1, 1].set_title('Member Coverage Distribution', fontsize=14, fontweight='bold')
axes[1, 1].set_xlabel('Coverage Score (% of members who can participate)')
axes[1, 1].set_ylabel('Number of Activities')

plt.tight_layout()
plt.show()

print("✓ Visualizations generated")

## 8. Detailed Analysis of Top Activities

In [None]:
def print_activity_details(activity_row):
    """Print detailed information about an activity"""
    print(f"\n{'='*80}")
    print(f"RANK #{int(activity_row['rank'])}: {activity_row['title'].upper()}")
    print(f"{'='*80}")
    print(f"Overall Score: {activity_row['composite_score']:.1f}/100")
    print(f"\nComponent Scores:")
    print(f"  • Age Fit:        {activity_row['score_age_fit']:.2f} (How well ages match)")
    print(f"  • Coverage:       {activity_row['score_coverage']:.2f} (% of members who can participate)")
    print(f"  • Ability Match:  {activity_row['score_ability']:.2f} (Difficulty appropriateness)")
    print(f"  • Diversity:      {activity_row['score_diversity']:.2f} (Inclusivity)")
    print(f"\nActivity Details:")
    print(f"  • Age Range:      {int(activity_row['age_min'])}-{int(activity_row['age_max'])} years")
    print(f"  • Duration:       {int(activity_row['duration_mins'])} minutes")
    print(f"  • Tags:           {', '.join(activity_row['tags'])}")
    print(f"  • Cost:           {activity_row['cost']}")
    print(f"  • Location:       {activity_row['indoor_outdoor']}")
    print(f"  • Players:        {activity_row['players']}")

# Show detailed analysis of top 5
print("\n" + "#" * 80)
print("# DETAILED ANALYSIS - TOP 5 RECOMMENDED ACTIVITIES")
print("#" * 80)

for idx in range(min(5, len(ranked_activities))):
    print_activity_details(ranked_activities.iloc[idx])

## 9. Customize Scoring Weights

You can adjust the importance of different factors by changing weights.

In [None]:
# Example: Prioritize activities where ALL members can participate
custom_weights = {
    'age_fit': 0.20,       # 20%
    'coverage': 0.50,      # 50% - Prioritize full group participation
    'ability': 0.20,       # 20%
    'diversity': 0.10,     # 10%
}

# Re-rank with custom weights
custom_ranked = model.rank_activities(weights=custom_weights)
custom_recs = custom_ranked[['rank', 'title', 'composite_score', 'score_coverage', 
                              'age_min', 'age_max']].head(10)

print("Top 10 Activities with Custom Weights (Prioritizing Full Participation):")
print("=" * 80)
print(custom_recs.to_string(index=False))

## 10. Export Results

In [None]:
# Export top recommendations to CSV
output_file = 'top_activity_recommendations.csv'
recommendations_export = ranked_activities.head(50)[[
    'rank', 'title', 'composite_score', 'score_age_fit', 'score_coverage',
    'score_ability', 'score_diversity', 'age_min', 'age_max', 
    'duration_mins', 'cost', 'indoor_outdoor', 'tags'
]]

recommendations_export.to_csv(output_file, index=False)
print(f"✓ Exported top 50 recommendations to '{output_file}'")

# Summary statistics
print(f"\n{'='*80}")
print("MODEL SUMMARY")
print(f"{'='*80}")
print(f"Total activities evaluated:     {len(activities_df)}")
print(f"Suitable activities found:      {len(ranked_activities)}")
print(f"Average composite score:        {ranked_activities['composite_score'].mean():.2f}")
print(f"Best activity:                  {ranked_activities.iloc[0]['title']}")
print(f"Best activity score:            {ranked_activities.iloc[0]['composite_score']:.2f}/100")
print(f"\nGroup Details:")
print(f"  Members:                      {len(sample_group.members)}")
print(f"  Age range:                    {sample_group.get_age_range()[0]}-{sample_group.get_age_range()[1]} years")
print(f"  Average ability:              {sample_group.get_avg_ability():.2f}/10")
print(f"={'='*80}")

## 11. Create Your Own Family Group

Modify the cell below to create your own family group and get personalized recommendations!

In [None]:
# CREATE YOUR OWN GROUP HERE
my_group = FamilyGroup([
    # Add your family members here
    # GroupMember("Name", age=X, overall_ability=Y),
    # overall_ability scale: 1=very low, 5=average, 10=very high
])

if my_group.members:
    # Create model for your group
    my_model = ActivityRankingModel(my_group, activities_df)
    my_ranked = my_model.rank_activities()
    my_recs = my_model.get_recommendations(n=20)
    
    print(f"\n🎯 TOP 20 ACTIVITIES FOR YOUR GROUP:")
    print("=" * 100)
    print(my_recs.to_string(index=False))
else:
    print("⚠️ Please add members to my_group to get recommendations!")

---

## Summary

This notebook implements a comprehensive activity ranking model that:

1. **Merges** group member ages and ability levels
2. **Analyzes** activity suitability across multiple dimensions
3. **Ranks** activities using a weighted scoring algorithm
4. **Provides** detailed recommendations with explanations

### Scoring Components:
- **Age Fit**: How well activity age range matches group
- **Coverage**: Percentage of members who can participate  
- **Ability Match**: How activity difficulty aligns with group power/ability
- **Diversity**: Whether activity accommodates different levels

### Next Steps:
- Customize weights based on your priorities
- Add more group members
- Filter by indoor/outdoor, cost, or season
- Integrate with calendar scheduling

**Compatible with Google Colab** - Upload your dataset and run!