Q16 - Real-time Data Rush¶

Question: Welcome to Real-time Data Rush!
You are given a dataset of magical creatures and their real-time activities.
Each creature performs various activities with different durations throughout the day.
Your task is to analyze this time-based data to answer the following questions:

- Calculate the total duration of activities for each creature.
- Determine the creature with the longest single activity duration.
- Identify the most common activity performed by each creature.
- Calculate the average activity duration for each type of activity across all creatures.
- Find the top 3 creatures with the highest average activity duration.

Datasets:

creature_activities: Contains columns (creature_id, creature_name, activity, duration), where activity is the type of activity and duration is the time spent on that activity in minutes.

In [None]:
import pandas as pd
import numpy as np

# Seed for reproducibility
np.random.seed(1212)

# Generate synthetic data
creature_ids = np.arange(1, 11)
creature_names = ['Frodo', 'Gandalf', 'Hermione', 'Legolas', 'Bilbo', 'Galadriel', 'Dumbledore', 'Gimli', 'Harry', 'Aragorn']
activities = ['Flying', 'Potion Making', 'Spell Casting', 'Herb Gathering', 'Treasure Hunting']
durations = np.arange(1, 121)  # Durations from 1 to 120 minutes

data = []
for creature_id, creature_name in zip(creature_ids, creature_names):
    num_activities = np.random.randint(5, 15)
    for _ in range(num_activities):
        activity = np.random.choice(activities)
        duration = np.random.choice(durations)
        data.append([creature_id, creature_name, activity, duration])

# Create DataFrame
creature_activities = pd.DataFrame(data, columns=['creature_id', 'creature_name', 'activity', 'duration'])

# Display the dataset
creature_activities.head()

In [None]:
# Calculate the total duration of activities for each creature.
total_activity_duration_creature = creature_activities.groupby('creature_name')['duration'].sum().reset_index()
total_activity_duration_creature.columns = ['creature_name', 'total_duration']
total_activity_duration_creature

In [None]:
# Determine the creature with the longest single activity duration.
creature_longest_activity = creature_activities.iloc[creature_activities['duration'].idxmax()]
creature_longest_activity

In [None]:
# Identify the most common activity performed by each creature
most_common_activity_creature = creature_activities.groupby('creature_name')['activity'].agg(lambda x: x.mode().iloc[0]).reset_index()
most_common_activity_creature

In [None]:
# Calculate the average activity duration for each type of activity across all creatures
avg_activity_duration = creature_activities.groupby('activity')['duration'].mean().reset_index()
avg_activity_duration

In [None]:
# Find the top 3 creatures with the highest average activity duration
avg_activity_duration_creature = creature_activities.groupby('creature_name')['duration'].mean().reset_index()
avg_activity_duration_creature.columns = ['creature_name', 'avg_duration']
avg_activity_duration_creature.nlargest(3, 'avg_duration')