Q13 - Geo Pandas Adventure

Question: Welcome to Geo Pandas Adventure!
You are given a dataset of magical locations and the whimsical creatures that inhabit them.
Each location has different environmental properties and magical attributes.
Your task is to analyze this geographical data to answer the following questions:

- Count the total number of unique creatures in each location.
- Calculate the average magical energy level for each location.
- Identify the location with the highest diversity of creature types.
- Determine the most common magical attribute in each location.
- Find the top 3 locations with the highest average environmental scores.

Datasets:

magical_locations: Contains columns (location_id, location_name, creatures, magical_attributes, environmental_score), where creatures is a list of dictionaries with keys (creature_id, creature_name, creature_type, magical_energy).

In [None]:
import pandas as pd
import numpy as np
import json

# Seed for reproducibility
np.random.seed(909)

# Generate synthetic data
location_ids = np.arange(1, 6)
location_names = ['Mystic Mountain', 'Enchanted Forest', 'Faerie Glen', 'Dragon’s Lair', 'Unicorn Meadow']
creature_types = ['Dragon', 'Unicorn', 'Phoenix', 'Griffin', 'Mermaid']
creature_names = ['Draco', 'Sparkle', 'Flare', 'Griff', 'Ariel']
magical_attributes_options = ['Glowing Stones', 'Whispering Winds', 'Mystic Waters', 'Enchanted Trees', 'Floating Islands']
environmental_scores = np.random.randint(50, 101, size=len(location_ids))

data = []
for location_id, location_name in zip(location_ids, location_names):
    num_creatures = np.random.randint(5, 11)
    creatures = []
    for _ in range(num_creatures):
        creature_id = int(np.random.randint(1, 1001))
        creature_type = np.random.choice(creature_types)
        creature_name = np.random.choice(creature_names)
        magical_energy = int(np.random.randint(50, 501))
        creatures.append({
            'creature_id': creature_id,
            'creature_name': creature_name,
            'creature_type': creature_type,
            'magical_energy': magical_energy
        })
    magical_attributes = np.random.choice(magical_attributes_options, np.random.randint(1, 4), replace=False).tolist()
    data.append([location_id, location_name, json.dumps(creatures), magical_attributes, environmental_scores[location_id-1]])

# Create DataFrame
magical_locations = pd.DataFrame(data, columns=['location_id', 'location_name', 'creatures', 'magical_attributes', 'environmental_score'])

# Display the dataset
magical_locations.head()

In [None]:
# Count the total number of unique creatures in each location.
magical_locations['creature_count'] = magical_locations['creatures'].apply(lambda x: len(json.loads(x)))
magical_locations[['location_name', 'creature_count']]

In [None]:
# Calculate the average magical energy level for each location.
magical_locations['avg_energy_level'] = magical_locations['creatures'].apply(lambda x: pd.DataFrame(json.loads(x))['magical_energy'].mean())
magical_locations[['location_name', 'avg_energy_level']]

In [None]:
# Identify the location with the highest diversity of creature types.
def creature_diversity(creatures_json):
    creatures_df = pd.DataFrame(json.loads(creatures_json))
    return creatures_df['creature_type'].nunique()

magical_locations['creature_diversity'] = magical_locations['creatures'].apply(creature_diversity)
location_highest_diversity = magical_locations.loc[magical_locations['creature_diversity'].idxmax()]
location_highest_diversity[['location_name', 'creature_diversity']]

In [None]:
# Determine the most common magical attribute in each location
def most_common_attribute(attribute_list):
    return pd.Series(attribute_list).mode().iloc[0]

magical_locations['most_common_attribute'] = magical_locations['magical_attributes'].apply(most_common_attribute)
magical_locations[['location_name', 'most_common_attribute']]

In [None]:
# Find the top 3 locations with the highest average environmental scores
top_3_locations = magical_locations.nlargest(3, 'environmental_score')
top_3_locations[['location_name', 'environmental_score']]