In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Structuring Knowledge, Automating Analysis
## Dictionaries and Functions in Digital Humanities (DH)

Welcome to our exploration of how we organize complex information and automate repetitive tasks in digital humanities (DH) scholarship. Today we'll learn about two powerful Python tools that transform how we work with cultural data: **dictionaries** for organizing rich, structured information, and **functions** for automating our analytical processes.

In DH, we constantly work with *complex cultural objects* that have multiple attributes: a manuscript has an author, date, location, language, and genre; a historical figure has a name, birth year, occupation, and social networks; an artwork has a creator, medium, dimensions, and cultural context. **Dictionaries** help us keep all this interconnected information organized and accessible.

But our work often involves *repetitive analytical tasks*: calculating word frequencies across hundreds of texts, standardizing inconsistent historical data, or generating reports across multiple collections. **Functions** allow us to write analytical procedures once and apply them systematically across our materials.

Today's title reflects a fundamental shift in DH: from "structuring knowledge" (organizing what we know about *cultural objects* via **dictionaries**) to "automating analysis" (*systematically applying* our scholarly methods at scale, via **functions**).
  
### Understanding Data Complexity: From Simple to Structured

Let's connect today's concepts to what we've already learned:

- **Variables** = **Individual facts**: A single piece of information (a title, a date, a name)
- **Lists** = **Collections of *similar* items**: Multiple related pieces of the same type (all the book titles in a library, all the dates in a timeline)
- **Dictionaries** = *Complex cultural objects*: Multiple different types of information about a single item (all the metadata about one book, all the details about one historical event)
- **Functions** = *Analytical procedures*: Reusable methods for processing cultural data systematically

Think of today's lesson as learning to create detailed library catalog entries for *complex cultural materials*, then developing *systematic methods for analyzing* those collections at scale.

## Part 1: What Are Dictionaries?

A dictionary is like a detailed catalog card (for books) or database record. While a **list** stores multiple items of the same type in order, a **dictionary** stores *multiple pieces of different information* about a single subject. It's the difference between a simple bibliography (a **list** of titles only) and a comprehensive catalog entry (a **dictionary** of titles, authors, dates, publishers, subjects, etc.).

In DH, **dictionaries** let us represent the rich, multifaceted nature of *cultural objects and historical subjects*.

In [None]:
# A simple list - just titles
book_titles = ["Beloved", "Invisible Man", "Their Eyes Were Watching God"]

# A dictionary - rich information about one book
book_record = {
    "title": "Beloved",
    "author": "Toni Morrison",
    "publication_year": 1987,
    "genre": "Historical Fiction",
    "setting": "Ohio, post-Civil War",
    "awards": ["Pulitzer Prize", "Nobel Prize contributor"],
    "themes": ["memory", "trauma", "motherhood", "slavery"]
}

print("Simple list:", book_titles)
print("\nRich dictionary:")
for key, value in book_record.items():
    print(f"{key}: {value}")

### Try it yourself:
Create a dictionary representing a digital media object you're familiar with:

In [None]:
# Your turn: create a dictionary for a podcast, video game, social media creator, etc.
my_media_object = {
    # Add key-value pairs here
    # "title": "...",
    # "creator": "...",
    # "year": ...,
    # Add more attributes that matter for your chosen object
}

print(my_media_object)

## Part 2: Accessing and Understanding Dictionary Structure

Just as scholars need to extract specific information from catalog records, we need systematic ways to access information stored in dictionaries. Unlike lists (which have numerical positions for indexing), dictionaries use meaningful **keys** (data categories) to access **values** (the data in those categories).

In [None]:
# Historical figure from digital humanities research
historical_figure = {
    "name": "Ida B. Wells-Barnett",
    "birth_year": 1862,
    "death_year": 1931,
    "occupation": ["journalist", "activist", "researcher"],
    "known_for": "anti-lynching research and activism",
    "publications": ["Southern Horrors", "The Red Record"],
    "methodology": "data-driven journalism"
}

# Accessing specific information
print(f"Name: {historical_figure['name']}")
print(f"Birth year: {historical_figure['birth_year']}")
print(f"Known for: {historical_figure['known_for']}")

# What if we want all the keys (categories of information)?
print(f"\nAvailable information: {list(historical_figure.keys())}")

# What if we want all the values (the actual data)?
print(f"\nAll data: {list(historical_figure.values())}")

### Discussion Question:
Compare Ida B. Wells-Barnett's "methodology" (see **key** in code above) with modern digital humanities approaches. How do both use data systematically to understand social patterns?

In [None]:
# Try accessing different pieces of information from your dictionary above
# What happens if you try to access a key that doesn't exist?

## Part 2.5: Modifying and Building Dictionaries - Evolving Knowledge

Scholarly knowledge evolves as we discover new information, correct errors, or add new analytical categories. Dictionaries in Python are **mutable** (changeable, as in "mutation"), meaning we can update their values as our understanding of cultural objects deepens.

This flexibility mirrors how digital archives and scholarly databases grow and change over time.

In [None]:
# Let's build a record for a contemporary digital creator
creator_profile = {
    "name": "Contrapoints (Natalie Wynn)",
    "platform": "YouTube",
    "content_type": "video essays",
    "debut_year": 2016
}

print("Initial profile:", creator_profile)

# Adding new information as we research further
creator_profile["subscriber_count"] = "1.6M+ (as of 2025)"
creator_profile["topics"] = ["philosophy", "politics", "gender theory", "internet culture"]
creator_profile["production_style"] = ["theatrical", "academic", "satirical"]

# Updating existing information
creator_profile["platform"] = ["YouTube", "Patreon"]  # More comprehensive

# Checking our expanded record
print("\nExpanded profile:")
for category, info in creator_profile.items():
    print(f"  {category}: {info}")

## Part 3: Working with Complex Cultural Collections

Substantial DH projects often involve multiple related objects--like all the works by an author, all the artifacts from a time period, or all the documents from an archive. We can organize these using **dictionaries within dictionaries**, creating rich, interconnected datasets.

In [None]:
# A small digital archive of Harlem Renaissance writers
harlem_renaissance = {
    "langston_hughes": {
        "full_name": "James Mercer Langston Hughes",
        "birth_year": 1901,
        "death_year": 1967,
        "genres": ["poetry", "novels", "short stories", "plays"],
        "famous_works": ["The Weary Blues", "Montage of a Dream Deferred"],
        "themes": ["jazz culture", "racial pride", "American dream"],
        "innovation": "jazz poetry"
    },
    "zora_neale_hurston": {
        "full_name": "Zora Neale Hurston",
        "birth_year": 1891,
        "death_year": 1960,
        "genres": ["novels", "short stories", "anthropology", "folklore"],
        "famous_works": ["Their Eyes Were Watching God", "Mules and Men"],
        "themes": ["Black womanhood", "Southern culture", "folk traditions"],
        "innovation": "anthropological fiction"
    },
    "claude_mckay": {
        "full_name": "Festus Claudius McKay",
        "birth_year": 1889,
        "death_year": 1948,
        "genres": ["poetry", "novels"],
        "famous_works": ["Harlem Shadows", "Home to Harlem"],
        "themes": ["racial protest", "exile", "urban life"],
        "innovation": "sonnet form for racial themes"
    }
}

# Analyzing our collection
print("Harlem Renaissance Digital Archive")
print("=" * 35)

for writer_key, writer_info in harlem_renaissance.items():
    name = writer_info["full_name"]
    innovation = writer_info["innovation"]
    print(f"{name}: {innovation}")

### Try It Yourself: Building a Collection

In [None]:
# Create a small collection of related cultural objects
# Examples: favorite podcasts, influential video games, important films, etc.
my_collection = {
    # "object1_key": {
    #     "title": "...",
    #     "creator": "...",
    #     "year": ...,
    #     "significance": "..."
    # },
    # Add more objects...
}

# Analyze your collection
print("My Cultural Collection:")
print("=" * 25)
# Add code to display information about your collection

## Part 4: Systematic Analysis with Loops and Dictionaries

One of the most powerful aspects of working with structured cultural data is the ability to ask *systematic questions* across entire collections. We can combine **loops** with **dictionaries** to analyze patterns, extract insights, and generate reports.

In [None]:
# Let's analyze patterns in our Harlem Renaissance collection

print("=== Birth Year Analysis ===")
birth_years = []
for writer in harlem_renaissance.values():
    birth_years.append(writer["birth_year"])
    print(f"{writer['full_name']}: born {writer['birth_year']}")

earliest = min(birth_years)
latest = max(birth_years)
print(f"\nGeneration span: {earliest} to {latest} ({latest - earliest} years)")

print("\n=== Innovation Analysis ===")
for writer_key, writer_data in harlem_renaissance.items():
    name = writer_data["full_name"]
    innovation = writer_data["innovation"]
    themes = ", ".join(writer_data["themes"][:2])  # First two themes
    print(f"{name}: {innovation}")
    print(f"  Key themes: {themes}")
    print()

## Part 5: Introduction to Functions - Automating Scholarly Methods

As digital humanists, we often find ourselves *repeating the same analytical procedures* across different texts, datasets, or collections. **Functions** allow us to encapsulate our scholarly methods into reusable tools, making our work more efficient and reproducible.

Think of **functions** as automated scholarly procedures--like having a research assistant who can apply the same analytical method to any material you give them.

### From Collection Analysis to Individual Object Analysis

So far, we've worked with our **entire collection** (`harlem_renaissance`) using loops. Now we'll learn to create functions that work with **individual cultural objects** (like a single writer's dictionary).

For example, instead of analyzing all writers at once, we can create a function that analyzes just one writer, then apply that function to any writer we choose. This gives us more flexibility and reusable analytical tools.

### Basic Function Structure

In [None]:
# A very simple function to start with
def get_writer_innovation(writer_dict):
    """
    Extract a writer's key innovation from their dictionary.
    This is a simple function that takes input and returns output.
    """
    innovation = writer_dict["innovation"]
    return innovation

# Using our simple function
print("=== Simple Function Example ===")
langston_innovation = get_writer_innovation(harlem_renaissance["langston_hughes"])
print(f"Langston Hughes' innovation: {langston_innovation}")

zora_innovation = get_writer_innovation(harlem_renaissance["zora_neale_hurston"])
print(f"Zora Neale Hurston's innovation: {zora_innovation}")

# Notice:
# 1. The function takes one input (a writer dictionary)
# 2. It does something with that input (extracts the innovation)
# 3. It returns a result that we can use

### Building Complexity: From Simple Extraction to Analysis

Now let's build on this basic pattern with a more sophisticated function that performs analysis:

In [None]:
# A simple function for scholarly analysis
def analyze_writer_lifespan(writer_dict):
    """
    Calculate and describe a writer's lifespan from biographical data.
    This function takes a writer dictionary and returns analytical information.

    Parameter:
        writer_dict: A single writer's dictionary (like harlem_renaissance["langston_hughes"])
    """
    name = writer_dict["full_name"]
    birth = writer_dict["birth_year"]
    death = writer_dict["death_year"]

    lifespan = death - birth

    print(f"{name} ({birth}-{death})")
    print(f"Lived {lifespan} years")

    if lifespan < 50:
        print("Brief but impactful life")
    elif lifespan < 70:
        print("Full creative career")
    else:
        print("Long, influential life")

    return lifespan

# Using our function with individual writer dictionaries from our collection
print("=== Writer Lifespan Analysis ===")
# Note: harlem_renaissance["langston_hughes"] gives us just one writer's dictionary
langston_lifespan = analyze_writer_lifespan(harlem_renaissance["langston_hughes"])
print(f"Returned value: {langston_lifespan} years\n")

zora_lifespan = analyze_writer_lifespan(harlem_renaissance["zora_neale_hurston"])
print(f"Returned value: {zora_lifespan} years")

### Functions with Multiple Parameters

In [None]:
def compare_writers(writer1_dict, writer2_dict, comparison_attribute):
    """
    Compare two writers based on a specified attribute.
    This demonstrates how functions can take multiple inputs for complex analysis.
    """
    name1 = writer1_dict["full_name"]
    name2 = writer2_dict["full_name"]

    value1 = writer1_dict[comparison_attribute]
    value2 = writer2_dict[comparison_attribute]

    print(f"Comparing {name1} and {name2}:")
    print(f"  {name1}'s {comparison_attribute}: {value1}")
    print(f"  {name2}'s {comparison_attribute}: {value2}")

    if comparison_attribute == "birth_year":
        if value1 < value2:
            print(f"  {name1} was born earlier")
        elif value1 > value2:
            print(f"  {name2} was born earlier")
        else:
            print("  They were born in the same year")

    return (value1, value2)

# Using our comparison function
print("=== Writer Comparison Analysis ===")
birth_comparison = compare_writers(
    harlem_renaissance["langston_hughes"],
    harlem_renaissance["claude_mckay"],
    "birth_year"
)
print(f"Birth years: {birth_comparison}")

## Part 6: Advanced Functions for Cultural Data Analysis

Substantial DH work often requires sophisticated **functions** that can process complex cultural datasets, handle edge cases, and provide meaningful insights. Let's build functions that demonstrate professional DH analytical techniques.

In [None]:
def extract_themes_from_collection(collection_dict):
    """
    Extract and count all themes across a cultural collection.
    This function demonstrates how to analyze patterns across datasets.
    """
    all_themes = []

    # Collect all themes from all writers
    for writer_key, writer_data in collection_dict.items():
        themes = writer_data.get("themes", [])  # Safe access with default
        all_themes.extend(themes)

    # Count theme frequency
    theme_counts = {}
    for theme in all_themes:
        if theme in theme_counts:
            theme_counts[theme] += 1
        else:
            theme_counts[theme] = 1

    return theme_counts

def generate_collection_report(collection_dict, collection_name):
    """
    Generate a comprehensive report about a cultural collection.
    This demonstrates how functions can create professional outputs.
    """
    print(f"=== {collection_name} Analysis Report ===")
    print(f"Collection size: {len(collection_dict)} individuals")

    # Birth year analysis
    birth_years = [writer["birth_year"] for writer in collection_dict.values()]
    earliest_birth = min(birth_years)
    latest_birth = max(birth_years)

    print(f"Birth year range: {earliest_birth} to {latest_birth}")
    print(f"Generation span: {latest_birth - earliest_birth} years")

    # Theme analysis
    themes = extract_themes_from_collection(collection_dict)
    print(f"\nThematic diversity: {len(themes)} unique themes")

    # Most common themes
    sorted_themes = sorted(themes.items(), key=lambda x: x[1], reverse=True)
    print("Most common themes:")
    for theme, count in sorted_themes[:3]:
        print(f"  - {theme}: appears {count} time(s)")

    return {
        "size": len(collection_dict),
        "birth_range": (earliest_birth, latest_birth),
        "themes": themes
    }

# Using our advanced functions
report_data = generate_collection_report(harlem_renaissance, "Harlem Renaissance Writers")

### 🎯 You Try: Building Your Own Analysis Function

In [None]:
def analyze_my_collection(collection_dict):
    """
    Create your own function to analyze your cultural collection.
    Think about what questions you want to ask of your data.
    """
    # Your code here
    # Examples of what you might analyze:
    # - Most common creators or years
    # - Average length of titles
    # - Distribution across categories
    # - Patterns in your data


# Test your function with your collection
# analyze_my_collection(my_collection)

## 🎵 Fun Interlude: The Banana Fanna Function Challenge

Before we dive into real-world applications, let's have some fun with functions! This exercise combines string manipulation, function creation, and iteration--all key skills in DH text processing.

### Aims of the Exercise

Create functions that manipulate names in specified creative ways, as illustrated in "[The Name Game](https://www.youtube.com/watch?v=NeF7jqf0GU4.)" song (1964). This might seem silly, but the underlying skills (string manipulation, pattern recognition, systematic processing) are fundamental to text analysis in DH.

### The Three Practice Challenges

1. **String Transformation / Data Normalization**: Write a function that transforms text (like normalizing historical name variants)
2. **Pattern Recognition & Generation**: Create a function that follows a specific pattern (like generating consistent citation formats)  
3. **Systematic / Iterative Processing**: Apply your function systematically across a dataset (like standardizing an entire collection)

These are exactly the kinds of operations we do in DH--just with more playful content!

In [None]:
# The dataset we'll work with
names = ["becky", "timmy", "kyle", "sam", "kendra", "marcela", "curt"]

print("Our dataset:", names)
print("Ready to transform this data systematically!")

### 🎯 Practice 1: String Transformation / Data Normalization Function

**Challenge**: Write a function that *changes* the list so that all the names are in all caps. The function cannot just print the names--it must change the original list.

*Why this matters* in DH: Normalizing data formats is crucial for analysis. Historical records often have inconsistent capitalization, spelling variants, or formatting. Learning to systematically transform data prepares you for more substantial text cleaning tasks.

In [None]:
def make_names_uppercase(name_list):
    """
    Transform all names in a list to uppercase.
    This modifies the original list (in-place transformation).
    """
    # Your code here
    # Hint: You'll need to loop through the list and modify each item
    # Remember: string.upper() makes a string uppercase


# Test your function
print("Before:", names)
make_names_uppercase(names)
print("After:", names)

### 🎯 Practice 2: Pattern Recognition & Generation Function

**Challenge**: Write a function that puts someone's name into the "banana fana fo fana" song.

Using the name Katie as an example, the song follows this pattern:
```
Katie, Katie, bo-batie,
Bonana-fanna fo-fatie
Fee fi mo-matie
Katie!
```

Run your function on one of the names from the names list above.

*Why this matters* in DH: Pattern recognition and generation are fundamental to text analysis. Whether you're generating bibliographic citations, creating consistent metadata formats, or normalizing historical documents, you're applying systematic patterns to transform text.

In [None]:
def banana_fanna_song(name):

    # Generate the "banana fanna fo fanna" lyrics for a given name.
    # This demonstrates pattern-based text generation.

    # Your code here
    # Hint: You'll need to:
    # 1. Work with the name to create the rhyming parts
    # 2. Follow the specific pattern shown above
    # 3. Handle both uppercase and lowercase names

    # The pattern is:
    # [Name], [Name], bo-b[name without first letter],
    # Bonana-fanna fo-f[name without first letter]
    # Fee fi mo-m[name without first letter]
    # [Name]!



# Test your function
test_name = "Katie"
print(f"Testing with {test_name}:")
banana_fanna_song(test_name)

print(f"\nTesting with one of our names:")
banana_fanna_song(names[0])  # Test with the first name from our list

### 🎯 Practice 3: Systematic / Iterative Processing

**Challenge**: Write a for **loop** that applies the banana fanna **function** to the now all-caps name list. Or, write a function that does Practice 2 iteratively on a list. Two ways to get to the same thing.

**Discussion Question**: What's the difference between these two approaches?

*Why this matters* in DH: This is the essence of computational text analysis--taking a procedure you can do by hand and systematically applying it across an entire corpus. Whether you're analyzing sentiment across thousands of tweets, extracting named entities from historical documents, or generating metadata for a digital collection, you're using this same pattern: *function + iteration = systematic analysis*.

In [None]:
# Approach 1: Using a for loop with our existing function
print("=== Approach 1: Loop + Function ===")
for name in names:
    print(f"\nSong for {name}:")
    banana_fanna_song(name)

print("\n" + "="*50 + "\n")

# Approach 2: Create a function that processes the entire list
def banana_fanna_for_all(name_list):

    # Process an entire list of names with the banana fanna song.
    # This demonstrates how to create functions that work on collections.

    print("=== Approach 2: Collection Processing Function ===")

    # Your code here
    # This function should call banana_fanna_song() for each name in the list


# Test the collection processing function
banana_fanna_for_all(names)

# Discussion: What are the advantages of each approach?
# When might you use one vs the other in digital humanities work?

### 🤔 Reflection: From Silly Songs to Serious Scholarship

What you just practiced with the "Name Game" exercise are foundational skills for computational text analysis:

1. **Data Normalization / String Transformation** (Practice 1): Making data consistent for analysis
   - *In DH*: Standardizing historical spelling variants, normalizing metadata formats
   
2. **Pattern Recognition & Generation** (Practice 2): Following systematic rules to transform text
   - *In DH*: Creating consistent citations, extracting structured data from unstructured text
   
3. **Systematic / Iterative Processing** (Practice 3): Applying procedures systematically across datasets
   - *In DH*: Analyzing thousands of documents, processing entire digital collections

The skills are the same whether you're generating silly songs or serious scholarship. The difference is the nature of the data, the complexity of the patterns, and the significance of the insights you generate.

---

## Part 7: Real-World Digital Humanities Applications

Let's explore how dictionaries and functions work together in actual DH research scenarios. These examples demonstrate the kind of systematic cultural analysis that drives contemporary scholarship.

In [None]:
# Simulating a digital humanities research project:
# Analyzing representation in contemporary streaming content

streaming_content = {
    "bridgerton": {
        "title": "Bridgerton",
        "platform": "Netflix",
        "year": 2020,
        "genre": "period drama",
        "lead_demographics": ["multiracial casting", "female-centered"],
        "cultural_significance": "reimagining historical representation",
        "viewer_millions": 82
    },
    "squid_game": {
        "title": "Squid Game",
        "platform": "Netflix",
        "year": 2021,
        "genre": "thriller",
        "lead_demographics": ["Korean cast", "working-class focus"],
        "cultural_significance": "global non-English breakthrough",
        "viewer_millions": 111
    },
    "reservation_dogs": {
        "title": "Reservation Dogs",
        "platform": "Hulu",
        "year": 2021,
        "genre": "comedy-drama",
        "lead_demographics": ["Indigenous youth", "rural community"],
        "cultural_significance": "authentic Indigenous storytelling",
        "viewer_millions": 2.8
    }
}

def analyze_representation_patterns(content_collection):
    """
    Analyze patterns of representation in streaming content.
    This function demonstrates how DH scholars study media representation.
    """
    print("=== Representation Analysis ===")

    platforms = {}
    total_viewers = 0
    years = []

    for show_key, show_data in content_collection.items():
        # Platform analysis
        platform = show_data["platform"]
        if platform in platforms:
            platforms[platform] += 1
        else:
            platforms[platform] = 1

        # Viewership analysis
        viewers = show_data["viewer_millions"]
        total_viewers += viewers

        # Temporal analysis
        years.append(show_data["year"])

        # Cultural significance reporting
        title = show_data["title"]
        demographics = ", ".join(show_data["lead_demographics"])
        significance = show_data["cultural_significance"]

        print(f"{title} ({show_data['year']}):")
        print(f"  Demographics: {demographics}")
        print(f"  Significance: {significance}")
        print(f"  Viewership: {viewers}M")
        print()

    # Summary statistics
    print("=== Summary Statistics ===")
    print(f"Total shows analyzed: {len(content_collection)}")
    print(f"Platform distribution: {platforms}")
    print(f"Year range: {min(years)}-{max(years)}")
    print(f"Average viewership: {total_viewers/len(content_collection):.1f}M")

    return {
        "platforms": platforms,
        "avg_viewership": total_viewers/len(content_collection),
        "year_range": (min(years), max(years))
    }

# Run our analysis
analysis_results = analyze_representation_patterns(streaming_content)

## Part 8: Data Cleaning and Standardization Functions

Cultural data is often messy, inconsistent, or incomplete. Digital humanists need robust functions that can clean, standardize, and prepare data for analysis. This is crucial work that underlies all subsequent scholarly analysis.

In [None]:
# Messy historical data (typical of what we encounter in archives)
messy_historical_data = {
    "person1": {
        "name": "Mary Gallagher (?)",
        "age": "28 years",
        "occupation": "married woman",
        "location": "County Cork, Ireland"
    },
    "person2": {
        "name": "JOHN SANIN",
        "age": "19",
        "occupation": "laborer",
        "location": "cork county, ireland"
    },
    "person3": {
        "name": "Anthony Clark Jr.",
        "age": "unknown",
        "occupation": "Laborer",
        "location": "Cork County"
    }
}

def standardize_historical_record(person_dict):
    """
    Clean and standardize messy historical data.
    This function demonstrates essential data preparation work in DH.
    """
    cleaned_record = {}

    # Standardize name (remove uncertainty markers, fix capitalization)
    name = person_dict["name"]
    name = name.replace(" (?)", "").replace("(?)", "")  # Remove uncertainty
    name = name.title()  # Proper capitalization
    cleaned_record["name"] = name

    # Standardize age (extract numbers, handle missing data)
    age_raw = person_dict["age"]
    if "unknown" in age_raw.lower():
        cleaned_record["age"] = None
        cleaned_record["age_estimated"] = False
    else:
        # Extract just the number
        age_clean = ''.join(filter(str.isdigit, age_raw))
        cleaned_record["age"] = int(age_clean) if age_clean else None
        cleaned_record["age_estimated"] = "?" in person_dict["name"]

    # Standardize occupation (consistent categories)
    occupation = person_dict["occupation"].lower()
    if "married" in occupation:
        cleaned_record["occupation"] = "married"
        cleaned_record["gender"] = "female"
    elif "laborer" in occupation:
        cleaned_record["occupation"] = "laborer"
        cleaned_record["gender"] = "male"  # Historical assumption, could be refined
    else:
        cleaned_record["occupation"] = occupation
        cleaned_record["gender"] = "unknown"

    # Standardize location
    location = person_dict["location"]
    location = location.title().replace(",", ", ")  # Consistent formatting
    cleaned_record["location"] = location

    return cleaned_record

def process_historical_collection(messy_collection):
    """
    Apply standardization to an entire historical collection.
    This demonstrates batch processing of cultural data.
    """
    cleaned_collection = {}

    print("=== Data Cleaning Process ===")

    for person_id, person_data in messy_collection.items():
        print(f"Processing {person_id}:")
        print(f"  Original: {person_data['name']}")

        cleaned_data = standardize_historical_record(person_data)
        cleaned_collection[person_id] = cleaned_data

        print(f"  Cleaned: {cleaned_data['name']}")
        print(f"  Age: {cleaned_data['age']} (estimated: {cleaned_data['age_estimated']})")
        print(f"  Occupation: {cleaned_data['occupation']}")
        print()

    return cleaned_collection

# Process our messy data
cleaned_data = process_historical_collection(messy_historical_data)

# Verify our cleaning worked
print("=== Cleaned Dataset Summary ===")
for person_id, clean_data in cleaned_data.items():
    print(f"{clean_data['name']}: {clean_data['occupation']}, age {clean_data['age']}")

## Part 9: Your Turn - A Digital Humanities Research Project

Now it's time to apply what you've learned. Choose one of the three research project options below or create your own. Each project combines dictionaries for data organization with functions for systematic analysis.

### Option 1: Contemporary Music Analysis

In [None]:
# Create a collection of contemporary musicians/artists with rich metadata
music_collection = {
    # Example structure:
    # "artist_key": {
    #     "name": "Artist Name",
    #     "genre": "primary genre",
    #     "debut_year": 20XX,
    #     "origin": "City, Country",
    #     "themes": ["theme1", "theme2"],
    #     "cultural_impact": "description",
    #     "streaming_millions": XXX
    # }
}

def analyze_music_trends(collection):
    """
    Write a function to analyze patterns in contemporary music.
    Consider: geographic distribution, genre evolution, themes, etc.
    """
    # Your analysis code here


# Your tasks:
# 1. Add at least 4 artists with complete metadata
# 2. Implement the analysis function
# 3. Run analysis and interpret results
# 4. Consider what this tells us about contemporary culture

# Write your code here:

### Option 2: Social Media Creator Analysis

In [None]:
# Create a collection of social media creators across platforms
creator_collection = {
    # Example structure:
    # "creator_key": {
    #     "name": "Creator Name",
    #     "platform": "primary platform",
    #     "content_type": "videos/posts/etc",
    #     "follower_count": "1.5M",
    #     "demographic": ["young adult", "LGBTQ+", etc.],
    #     "topics": ["topic1", "topic2"],
    #     "cultural_role": "description"
    # }
}

def analyze_creator_ecosystem(collection):
    """
    Write a function to analyze the social media creator landscape.
    Consider: platform distribution, demographics, content types, etc.
    """
    # Your analysis code here

# Your tasks:
# 1. Add creators from different platforms and demographics
# 2. Implement comprehensive analysis
# 3. Consider what this reveals about digital culture
# 4. Think about representation and access

# Write your code here:

### Option 3: Video Game Cultural Analysis

In [None]:
# Create a collection of influential video games
game_collection = {
    # Example structure:
    # "game_key": {
    #     "title": "Game Title",
    #     "release_year": 20XX,
    #     "developer": "Studio Name",
    #     "genre": "primary genre",
    #     "cultural_themes": ["theme1", "theme2"],
    #     "representation": ["demographic1", "demographic2"],
    #     "innovation": "what it introduced",
    #     "sales_millions": XX
    # }
}

def analyze_gaming_culture(collection):
    """
    Write a function to analyze trends in video game culture.
    Consider: representation, themes, innovation, temporal patterns, etc.
    """
    # Your analysis code here


# Your tasks:
# 1. Add games that represent different eras and approaches
# 2. Analyze cultural and technological trends
# 3. Consider questions of representation and access
# 4. Think about games as cultural texts

# Write your code here:

## Reflection Questions

As we conclude our exploration of dictionaries and functions, consider these questions about data, knowledge, and analysis:

1. **Structured Knowledge**: How does organizing information in dictionaries change the kinds of questions we can ask about culture? What details become visible or invisible in this process?

2. **Analytical Automation**: What are the benefits and risks of automating scholarly analysis through functions? What aspects of interpretation should remain human?

3. **Data Representation**: When we structure cultural objects as dictionaries, what aspects of those objects might we be overlooking? How do our categories shape our conclusions?

4. **Scale and Intimacy**: How does the ability to analyze hundreds or thousands of cultural objects automatically change our relationship to individual works? What do we gain and lose?

5. **Future Applications**: How might the skills you've learned today apply to your own research interests? What cultural collections would you want to organize and analyze?

Write your thoughts in the Text cell below:

### Your Reflections:

(Double-click this cell to edit and write your thoughts)

## Key Takeaways

Today we've learned powerful tools for organizing and analyzing cultural data:

**Dictionaries for Cultural Data**
- **Structure complex information** about cultural objects, people, and phenomena
- **Organize rich metadata** that preserves the multifaceted nature of cultural materials
- **Enable sophisticated queries** about relationships and patterns in collections
- **Support nested organization** for complex historical and cultural datasets

**Functions for Scholarly Automation**
- **Encapsulate analytical methods** so they can be applied systematically
- **Ensure reproducible research** by codifying our scholarly procedures
- **Handle repetitive tasks** efficiently across large cultural collections  
- **Enable sophisticated analysis** that would be impossible to do manually
- **Clean and standardize messy data** from historical and cultural sources

**Professional Digital Humanities Skills**
- **Data cleaning and standardization** for working with real-world cultural data
- **Systematic analysis** of patterns across cultural collections
- **Report generation** for sharing findings with scholarly communities
- **Scalable research methods** that grow with the size of digital collections

**Critical Digital Humanities Perspectives**
- **Questioning our categories**: Understanding how our organizational choices shape our findings
- **Balancing automation and interpretation**: Knowing when human judgment is essential
- **Considering representation**: Being aware of whose voices and perspectives are included or excluded
- **Thinking about access**: Understanding how technical barriers might limit scholarly participation

These tools prepare you for advanced work in DH, from analyzing historical archives to studying contemporary digital culture. Remember, the technical skills are powerful, but the critical thinking about how we use them is what makes the work truly scholarly.

As you continue your studies, you'll use these concepts to structure complex research data, automate analytical workflows, and ask new kinds of questions about culture and society. The combination of rich data organization and systematic analysis opens up entirely new possibilities for humanities scholarship.