Q11 - Nested Data Labyrinth

Question: Welcome to the Nested Data Labyrinth!
You are given a dataset of enchanted forests and the magical creatures that inhabit them.
Each forest contains multiple creatures, and each creature has various attributes.
Your task is to navigate through this nested data to answer the following questions:

- Count the total number of creatures in each forest.
- Calculate the average power level of creatures in each forest.
- Identify the forest with the highest average age of creatures.
- Determine the most common creature type in each forest.
- Find the top 2 creatures with the highest power levels in each forest.

Datasets:

enchanted_forests: Contains columns (forest_id, forest_name, creatures), where creatures is a list of dictionaries with keys (creature_id, creature_name, creature_type, age, power_level).

In [None]:
import pandas as pd
import numpy as np
import json

# Seed for reproducibility
np.random.seed(707)

# Generate synthetic data
forest_ids = np.arange(1, 6)
forest_names = ['Mystic Grove', 'Enchanted Woods', 'Faerie Forest', 'Dragons Den', 'Unicorn Utopia']
creature_types = ['Dragon', 'Unicorn', 'Phoenix', 'Griffin', 'Mermaid']
creature_names = ['Draco', 'Sparkle', 'Flare', 'Griff', 'Ariel']

data = []
for forest_id, forest_name in zip(forest_ids, forest_names):
    num_creatures = np.random.randint(5, 11)
    creatures = []
    for _ in range(num_creatures):
        creature_id = np.random.randint(1, 1001)
        creature_type = np.random.choice(creature_types)
        creature_name = np.random.choice(creature_names)
        age = np.random.randint(1, 101)
        power_level = np.random.randint(50, 501)
        creatures.append({
            'creature_id': creature_id,
            'creature_name': creature_name,
            'creature_type': creature_type,
            'age': age,
            'power_level': power_level
        })
    data.append([forest_id, forest_name, json.dumps(creatures)])

# Create DataFrame
enchanted_forests = pd.DataFrame(data, columns=['forest_id', 'forest_name', 'creatures'])

# Display the dataset
enchanted_forests.head()

In [None]:
# Helper Functions
import json
def extract_creatures(row):
    return pd.DataFrame(json.loads(row['creatures']))

In [None]:
# Count the total number of creatures in each forest.
enchanted_forests['num_creatures'] = enchanted_forests['creatures'].apply(lambda x: len(json.loads(x)))
enchanted_forests[['forest_name', 'num_creatures']]

In [None]:
# Calculate the average power level of creatures in each forest
enchanted_forests['avg_power_level'] = enchanted_forests['creatures'].apply(lambda x: pd.DataFrame(json.loads(x))['power_level'].mean())
enchanted_forests[['forest_name', 'avg_power_level']]

In [None]:
# Identify the forest with the highest average age of creatures
enchanted_forests['avg_creature_age'] = enchanted_forests['creatures'].apply(lambda x: pd.DataFrame(json.loads(x))['age'].mean())
highest_age_forest = enchanted_forests.loc[enchanted_forests['avg_creature_age'].idxmax()]
highest_age_forest[['forest_name', 'avg_creature_age']]

In [None]:
# Determine the most common creature type in each forest
def most_common_creature_type(creatures_json):
    creatures_df = pd.DataFrame(json.loads(creatures_json))
    return creatures_df['creature_type'].mode().iloc[0]

enchanted_forests['most_common_creature_type'] = enchanted_forests['creatures'].apply(most_common_creature_type)
enchanted_forests[['forest_name', 'most_common_creature_type']]

In [None]:
# Find the top 2 creatures with the highest power levels in each forest
def top_2_creatures(creatures_json):
    creatures_df = pd.DataFrame(json.loads(creatures_json))
    return creatures_df.nlargest(2, 'power_level')[['creature_name', 'power_level']]

enchanted_forests['top_2_creatures'] = enchanted_forests['creatures'].apply(top_2_creatures)
for index, row in enchanted_forests.iterrows():
    print(f"Forest: {row['forest_name']}")
    print(row['top_2_creatures'], end='\n')