Q12 - Async Adventures in Data

Question: Welcome to Async Adventures in Data!
You are given a dataset of magical quests undertaken by various whimsical characters.
Each character can perform multiple quests, and each quest has different stages.
Your task is to perform asynchronous data manipulation to answer the following questions:

- Count the total number of quests each character has completed.
- Calculate the average number of stages per quest for each character.
- Identify the character with the highest average quest duration.
- Determine the most common quest type for each character.
- Find the character with the most diverse quest portfolio (i.e., highest number of unique quest types).

Datasets:

magical_quests: Contains columns (character_id, character_name, quests), where quests is a list of dictionaries with keys (quest_id, quest_type, stages, duration).

In [None]:
import pandas as pd
import numpy as np
import json

# Seed for reproducibility
np.random.seed(808)

# Generate synthetic data
character_ids = np.arange(1, 11)
character_names = ['Frodo', 'Gandalf', 'Hermione', 'Legolas', 'Bilbo', 'Galadriel', 'Dumbledore', 'Gimli', 'Harry', 'Aragorn']
quest_types = ['Treasure Hunt', 'Dragon Slaying', 'Potion Making', 'Spell Casting', 'Rescue Mission']
stage_options = np.arange(1, 6)
duration_options = np.arange(1, 101)

data = []
for character_id, character_name in zip(character_ids, character_names):
    num_quests = np.random.randint(5, 11)
    quests = []
    for _ in range(num_quests):
        quest_id = int(np.random.randint(1, 1001))
        quest_type = np.random.choice(quest_types)
        stages = int(np.random.choice(stage_options))
        duration = int(np.random.choice(duration_options))
        quests.append({
            'quest_id': quest_id,
            'quest_type': quest_type,
            'stages': stages,
            'duration': duration
        })
    data.append([character_id, character_name, json.dumps(quests)])

# Create DataFrame
magical_quests = pd.DataFrame(data, columns=['character_id', 'character_name', 'quests'])

# Display the dataset
magical_quests.head(10)

In [None]:
# Count the total number of quests each character has completed.
magical_quests['total_quests'] = magical_quests['quests'].apply(lambda x: len(json.loads(x)))
magical_quests[['character_name', 'total_quests']]

In [None]:
# Calculate the average number of stages per quest for each character
magical_quests['avg_stages'] = magical_quests['quests'].apply(lambda x: pd.DataFrame(json.loads(x))['stages'].mean())
magical_quests[['character_name', 'avg_stages']]

In [None]:
# Identify the character with the highest average quest duration
magical_quests['avg_duration'] = magical_quests['quests'].apply(lambda x: pd.DataFrame(json.loads(x))['duration'].mean())
character_longest_quest = magical_quests.loc[magical_quests['avg_duration'].idxmax()]
character_longest_quest

In [None]:
# Determine the most common quest type for each character
def most_common_quest_type(quests_json):
    quests_df = pd.DataFrame(json.loads(quests_json))
    return quests_df['quest_type'].mode().iloc[0]

magical_quests['most_common_quest_type'] = magical_quests['quests'].apply(most_common_quest_type)
magical_quests[['character_name', 'most_common_quest_type']]

In [None]:
# Find the character with the most diverse quest portfolio (i.e., highest number of unique quest types)
def unique_quest_types(quest_json):
    quests_df = pd.DataFrame(json.loads(quest_json))
    return quests_df['quest_type'].nunique()

magical_quests['unique_quests_type'] = magical_quests['quests'].apply(unique_quest_types)
character_diverse_quests = magical_quests.loc[magical_quests['unique_quests_type'].idxmax()]
character_diverse_quests[['character_name', 'unique_quests_type']]