Q15 - High Dimensional Hide and Seek

Question: Welcome to High Dimensional Hide and Seek!
You are given a dataset of magical artifacts and their multidimensional properties.
Each artifact has several attributes across different dimensions.
Your task is to analyze these high-dimensional data to answer the following questions:

- Count the total number of artifacts in each category.
- Calculate the average value of each attribute for each category.
- Identify the category with the highest average "magic intensity" attribute.
- Determine the artifact with the highest combined value of all attributes.
- Find the top 3 categories with the highest variance in "enchantment" attribute.

Datasets:

magical_artifacts: Contains columns (artifact_id, category, attributes), where attributes is a dictionary with keys (magic_intensity, power_level, durability, enchantment, rarity).

In [None]:
import pandas as pd
import numpy as np
import json

# Seed for reproducibility
np.random.seed(1111)

# Generate synthetic data
artifact_ids = np.arange(1, 21)
categories = ['Wands', 'Potions', 'Amulets', 'Scrolls', 'Rings']
attributes_keys = ['magic_intensity', 'power_level', 'durability', 'enchantment', 'rarity']

data = []
for artifact_id in artifact_ids:
    category = np.random.choice(categories)
    attributes = {
        'magic_intensity': np.random.randint(1, 101),
        'power_level': np.random.randint(1, 101),
        'durability': np.random.randint(1, 101),
        'enchantment': np.random.randint(1, 101),
        'rarity': np.random.randint(1, 101)
    }
    data.append([artifact_id, category, json.dumps(attributes)])

# Create DataFrame
magical_artifacts = pd.DataFrame(data, columns=['artifact_id', 'category', 'attributes'])

# Display the dataset
magical_artifacts.head()

In [None]:
# Count the total number of artifacts in each category.
artifact_count_per_category = magical_artifacts['category'].value_counts().reset_index()
artifact_count_per_category.columns = ['category', 'artifact_count']
artifact_count_per_category

In [None]:
# Extract attributes as a DataFrame and merge as columns
attributes_df = magical_artifacts['attributes'].apply(lambda x: pd.Series(json.loads(x)))
magical_artifacts = pd.concat([magical_artifacts.drop(columns=['attributes']), attributes_df], axis=1)
magical_artifacts

In [None]:
# Calculate the average value of each attribute for each category
avg_attributes_category = magical_artifacts.groupby(['category'])[attributes_keys].mean().reset_index()
avg_attributes_category

In [None]:
# Identify the category with the highest average "magic intensity" attribute
avg_magic_intensity = magical_artifacts.groupby(['category'])['magic_intensity'].mean().reset_index()
avg_magic_intensity.loc[avg_magic_intensity['magic_intensity'].idxmax()]

In [None]:
# Determine the artifact with the highest combined value of all attributes.
magical_artifacts['total_attribute_value'] = magical_artifacts[attributes_keys].sum(axis=1)
artifact_highest_total_value = magical_artifacts.loc[magical_artifacts['total_attribute_value'].idxmax()]
artifact_highest_total_value[['artifact_id', 'category', 'total_attribute_value']]

In [None]:
# Find the top 3 categories with the highest variance in "enchantment" attribute
variance_enchantment_category = magical_artifacts.groupby(['category'])['enchantment'].var().reset_index()
top3_categories = variance_enchantment_category.nlargest(3, 'enchantment')
top3_categories