# Taylor Swift Python Tutorial: Dictionary Skills

Master advanced dictionary techniques using Taylor's rich discography data! Dictionaries are perfect for structured data, and advanced techniques like comprehensions and nested structures will make your code powerful and elegant.

## Learning Goals
- Master **nested dictionaries** for complex data structures
- Use **dictionary methods** effectively (`.get()`, `.keys()`, `.values()`, `.items()`)
- Create powerful **dictionary comprehensions**
- Build **data analysis pipelines** with dictionaries
- Handle real-world **data transformation** scenarios

## Advanced Dictionary Methods

Let's work with comprehensive Taylor Swift album data:

In [None]:
# Comprehensive Taylor Swift discography
taylor_discography = {
    "Taylor Swift": {
        "year": 2006,
        "genre": "Country",
        "tracks": 11,
        "sales_millions": 2.5
    },
    "Fearless": {
        "year": 2008,
        "genre": "Country",
        "tracks": 13,
        "sales_millions": 7.2
    },
    "1989": {
        "year": 2014,
        "genre": "Pop",
        "tracks": 13,
        "sales_millions": 6.2
    },
    "folklore": {
        "year": 2020,
        "genre": "Alternative",
        "tracks": 16,
        "sales_millions": 1.3
    }
}

print("=== Dictionary Methods Demo ===")

# .get() method - safe access with default values
fearless_sales = taylor_discography["Fearless"].get("sales_millions", 0)
fearless_awards = taylor_discography["Fearless"].get("awards", "No awards data")

print(f"Fearless sales: {fearless_sales} million")
print(f"Fearless awards: {fearless_awards}")

# .keys(), .values(), .items()
album_names = list(taylor_discography.keys())
print(f"\nAlbum names: {album_names}")

# Extract specific information using .values()
all_years = [info['year'] for info in taylor_discography.values()]
print(f"Release years: {all_years}")

# .items() for iteration
print("\nAlbum summaries:")
for album, info in taylor_discography.items():
    print(f"  {album} ({info['year']}): {info['genre']}, {info['tracks']} tracks")

# .update() method - merge dictionaries
new_albums = {
    "Midnights": {
        "year": 2022,
        "genre": "Pop",
        "tracks": 13,
        "sales_millions": 3.5
    }
}

taylor_discography.update(new_albums)
print(f"\nAfter update: {len(taylor_discography)} albums")

# .pop() and .popitem()
discog_copy = taylor_discography.copy()
removed_album = discog_copy.pop("Taylor Swift", None)  # Safe removal
print(f"\nRemoved album: {removed_album}")

# .setdefault() - add key only if it doesn't exist
for album_info in taylor_discography.values():
    album_info.setdefault("certified_platinum", True)  # Assume all are platinum

print(f"\nAdded platinum certification to all albums")

## Nested Dictionaries - Complex Data Structures

Handle real-world music data with multiple levels:

In [None]:
# Complex nested structure for Taylor Swift's career
taylor_career = {
    "albums": {
        "folklore": {
            "release_info": {
                "date": "2020-07-24",
                "label": "Republic Records",
                "surprise_release": True
            },
            "tracks": {
                "the 1": {"duration": 3.30, "writers": ["Taylor Swift", "Aaron Dessner"]},
                "cardigan": {"duration": 3.59, "writers": ["Taylor Swift", "Aaron Dessner"]},
                "august": {"duration": 4.21, "writers": ["Taylor Swift", "Jack Antonoff"]}
            },
            "awards": {
                "grammy": {"wins": 1, "nominations": 8, "categories": ["Album of the Year"]},
                "other": ["American Music Award", "Billboard Music Award"]
            }
        },
        "Midnights": {
            "release_info": {
                "date": "2022-10-21",
                "label": "Republic Records",
                "surprise_release": False
            },
            "tracks": {
                "Lavender Haze": {"duration": 3.22, "writers": ["Taylor Swift", "Jack Antonoff"]},
                "Anti-Hero": {"duration": 3.20, "writers": ["Taylor Swift", "Jack Antonoff"]},
                "Midnight Rain": {"duration": 2.54, "writers": ["Taylor Swift", "Jack Antonoff"]}
            },
            "awards": {
                "grammy": {"wins": 0, "nominations": 6, "categories": []},
                "other": ["MTV Video Music Award", "People's Choice Award"]
            }
        }
    },
    "collaborators": {
        "Aaron Dessner": {"albums": ["folklore", "evermore"], "role": "producer"},
        "Jack Antonoff": {"albums": ["1989", "reputation", "Lover", "Midnights"], "role": "producer"}
    }
}

print("=== Nested Dictionary Navigation ===")

# Deep access
folklore_release = taylor_career["albums"]["folklore"]["release_info"]["date"]
print(f"folklore release date: {folklore_release}")

# Safe deep access using multiple .get() calls
midnights_grammy_wins = (taylor_career.get("albums", {})
                        .get("Midnights", {})
                        .get("awards", {})
                        .get("grammy", {})
                        .get("wins", 0))
print(f"Midnights Grammy wins: {midnights_grammy_wins}")

# Iterate through nested structure
print("\nAlbum track analysis:")
for album_name, album_data in taylor_career["albums"].items():
    tracks = album_data["tracks"]
    total_duration = sum(track["duration"] for track in tracks.values())
    avg_duration = total_duration / len(tracks)
    
    print(f"  {album_name}: {len(tracks)} tracks, {total_duration:.1f} min total, {avg_duration:.1f} min avg")

# Find all unique writers across albums
all_writers = set()
for album_data in taylor_career["albums"].values():
    for track_data in album_data["tracks"].values():
        all_writers.update(track_data["writers"])

print(f"\nAll writers involved: {sorted(all_writers)}")

# Collaborator analysis
print("\nCollaborator analysis:")
for collaborator, info in taylor_career["collaborators"].items():
    album_count = len(info["albums"])
    print(f"  {collaborator} ({info['role']}): {album_count} albums - {', '.join(info['albums'])}")

## Dictionary Comprehensions

Create dictionaries efficiently and elegantly:

In [None]:
# Sample data for comprehensions
songs = ["Love Story", "Shake It Off", "Anti-Hero", "cardigan", "22"]
years = [2008, 2014, 2022, 2020, 2012]
durations = [3.55, 3.39, 3.20, 3.59, 3.85]

# Basic dictionary comprehension
song_years = {song: year for song, year in zip(songs, years)}
print(f"Song years: {song_years}")

# Comprehension with transformation
song_lengths = {song: len(song) for song in songs}
print(f"\nSong title lengths: {song_lengths}")

# Comprehension with condition
recent_songs = {song: year for song, year in song_years.items() if year >= 2020}
print(f"Recent songs (2020+): {recent_songs}")

# Complex comprehension with multiple data sources
song_data = {song: {"year": year, "duration": duration} 
            for song, year, duration in zip(songs, years, durations)}

print("\nComplex song data:")
for song, data in song_data.items():
    print(f"  {song}: {data}")

# Conditional value in comprehension
song_eras = {song: "Modern" if year >= 2020 else "Classic" 
            for song, year in song_years.items()}
print(f"\nSong eras: {song_eras}")

# Comprehension from existing dictionary
album_info = {
    "Fearless": {"genre": "Country", "tracks": 13},
    "1989": {"genre": "Pop", "tracks": 13},
    "folklore": {"genre": "Alternative", "tracks": 16}
}

# Extract specific field
album_genres = {album: info["genre"] for album, info in album_info.items()}
print(f"\nAlbum genres: {album_genres}")

# Filter and transform
long_albums = {album: f"{info['tracks']} tracks" 
              for album, info in album_info.items() 
              if info["tracks"] > 13}
print(f"Long albums: {long_albums}")

# Reverse dictionary (swap keys and values)
genre_albums = {}
for album, genre in album_genres.items():
    if genre not in genre_albums:
        genre_albums[genre] = []
    genre_albums[genre].append(album)

# Better way using comprehension with setdefault logic
from collections import defaultdict
genre_albums_better = defaultdict(list)
for album, genre in album_genres.items():
    genre_albums_better[genre].append(album)

print(f"\nAlbums by genre: {dict(genre_albums_better)}")

## Advanced Dictionary Patterns

Handle complex data processing scenarios:

In [None]:
# Chart performance data
chart_performance = [
    {"song": "Love Story", "peak_position": 4, "weeks_on_chart": 46, "year": 2008},
    {"song": "Shake It Off", "peak_position": 1, "weeks_on_chart": 50, "year": 2014},
    {"song": "Anti-Hero", "peak_position": 1, "weeks_on_chart": 30, "year": 2022},
    {"song": "cardigan", "peak_position": 1, "weeks_on_chart": 25, "year": 2020},
    {"song": "22", "peak_position": 20, "weeks_on_chart": 35, "year": 2012}
]

# Group by peak position
by_peak_position = {}
for song_data in chart_performance:
    pos = song_data["peak_position"]
    if pos not in by_peak_position:
        by_peak_position[pos] = []
    by_peak_position[pos].append(song_data["song"])

print("Songs grouped by peak chart position:")
for position in sorted(by_peak_position.keys()):
    songs = by_peak_position[position]
    print(f"  #{position}: {', '.join(songs)}")

# Calculate statistics by decade
decade_stats = {}
for song_data in chart_performance:
    decade = (song_data["year"] // 10) * 10  # 2008 -> 2000, 2014 -> 2010
    
    if decade not in decade_stats:
        decade_stats[decade] = {
            "songs": [],
            "total_weeks": 0,
            "number_ones": 0,
            "best_position": float('inf')
        }
    
    decade_stats[decade]["songs"].append(song_data["song"])
    decade_stats[decade]["total_weeks"] += song_data["weeks_on_chart"]
    if song_data["peak_position"] == 1:
        decade_stats[decade]["number_ones"] += 1
    decade_stats[decade]["best_position"] = min(
        decade_stats[decade]["best_position"], 
        song_data["peak_position"]
    )

print("\nStatistics by decade:")
for decade, stats in sorted(decade_stats.items()):
    print(f"  {decade}s: {len(stats['songs'])} songs, "
          f"{stats['number_ones']} #1 hits, "
          f"{stats['total_weeks']} total chart weeks")

# Create lookup dictionaries for efficient access
song_lookup = {song_data["song"]: song_data for song_data in chart_performance}

# Quick access to any song's data
def get_song_info(song_name):
    return song_lookup.get(song_name, "Song not found")

print(f"\nAnti-Hero info: {get_song_info('Anti-Hero')}")
print(f"Unknown song: {get_song_info('Unknown Song')}")

# Performance ranking system
def calculate_chart_score(song_data):
    """Calculate a chart performance score."""
    position_score = (101 - song_data["peak_position"])  # Lower position = higher score
    longevity_score = song_data["weeks_on_chart"]
    return position_score + longevity_score * 0.5

# Rank songs by performance
song_scores = {song_data["song"]: calculate_chart_score(song_data) 
              for song_data in chart_performance}

ranked_songs = sorted(song_scores.items(), key=lambda x: x[1], reverse=True)

print("\nSongs ranked by chart performance:")
for i, (song, score) in enumerate(ranked_songs, 1):
    print(f"  {i}. {song}: {score:.1f} points")

## Dictionary Merging and Updating

Combine data from multiple sources:

In [None]:
# Different data sources about Taylor Swift albums
basic_info = {
    "Fearless": {"year": 2008, "genre": "Country"},
    "1989": {"year": 2014, "genre": "Pop"},
    "folklore": {"year": 2020, "genre": "Alternative"}
}

sales_data = {
    "Fearless": {"sales_millions": 7.2, "certification": "Diamond"},
    "1989": {"sales_millions": 6.2, "certification": "Platinum"},
    "folklore": {"sales_millions": 1.3, "certification": "Platinum"}
}

awards_data = {
    "Fearless": {"grammy_wins": 4, "grammy_nominations": 5},
    "1989": {"grammy_wins": 3, "grammy_nominations": 7},
    "folklore": {"grammy_wins": 1, "grammy_nominations": 8}
}

print("=== Dictionary Merging Techniques ===")

# Method 1: Manual merging with .update()
combined_manual = {}
for album in basic_info:
    combined_manual[album] = basic_info[album].copy()
    if album in sales_data:
        combined_manual[album].update(sales_data[album])
    if album in awards_data:
        combined_manual[album].update(awards_data[album])

print("Manual merge result:")
for album, data in combined_manual.items():
    print(f"  {album}: {data}")

# Method 2: Dictionary comprehension merge
combined_comprehension = {
    album: {
        **basic_info.get(album, {}),
        **sales_data.get(album, {}),
        **awards_data.get(album, {})
    }
    for album in set(basic_info.keys()) | set(sales_data.keys()) | set(awards_data.keys())
}

print("\nComprehension merge result (same as above):")
print(f"Albums processed: {list(combined_comprehension.keys())}")

# Method 3: Merge with conflict resolution
def merge_album_data(*data_sources):
    """Merge multiple data sources with conflict tracking."""
    all_albums = set()
    for data_source in data_sources:
        all_albums.update(data_source.keys())
    
    merged = {}
    conflicts = {}
    
    for album in all_albums:
        merged[album] = {}
        
        for data_source in data_sources:
            if album in data_source:
                for key, value in data_source[album].items():
                    if key in merged[album] and merged[album][key] != value:
                        # Conflict detected
                        conflicts.setdefault(album, {})[key] = {
                            'old': merged[album][key],
                            'new': value
                        }
                    merged[album][key] = value
    
    return merged, conflicts

# Test with conflicting data
conflicting_data = {
    "1989": {"year": 2014, "genre": "Synth-pop"}  # Different genre
}

merged_data, conflicts = merge_album_data(basic_info, sales_data, awards_data, conflicting_data)

if conflicts:
    print("\nConflicts detected:")
    for album, album_conflicts in conflicts.items():
        print(f"  {album}: {album_conflicts}")

# Add derived fields
for album, data in merged_data.items():
    # Calculate efficiency score (Grammy wins per nomination)
    if 'grammy_wins' in data and 'grammy_nominations' in data:
        if data['grammy_nominations'] > 0:
            data['grammy_efficiency'] = data['grammy_wins'] / data['grammy_nominations']
        else:
            data['grammy_efficiency'] = 0
    
    # Calculate commercial impact (sales * Grammy wins)
    if 'sales_millions' in data and 'grammy_wins' in data:
        data['commercial_impact'] = data['sales_millions'] * (data['grammy_wins'] + 1)

print("\nEnhanced data with derived fields:")
for album, data in merged_data.items():
    efficiency = data.get('grammy_efficiency', 0)
    impact = data.get('commercial_impact', 0)
    print(f"  {album}: Grammy efficiency {efficiency:.2f}, Commercial impact {impact:.1f}")

## Practice Time! 🎵

Let's practice advanced dictionary techniques:

In [None]:
# Practice Exercise 1:
# Given this song data, create a dictionary comprehension that:
# 1. Maps song names to their duration in seconds (multiply by 60)
# 2. Only includes songs longer than 3.5 minutes
# 3. Adds "(Extended)" to the song name if duration > 4 minutes

song_durations = {
    "Love Story": 3.55,
    "All Too Well": 5.29,
    "22": 3.85,
    "ME!": 3.13,
    "cardigan": 3.59
}

# Your code here:



In [None]:
# Practice Exercise 2:
# Create a function that takes the nested career data and returns:
# - A dictionary mapping each writer to their total song count
# - A dictionary mapping each writer to the albums they've worked on

career_data = {
    "folklore": {
        "cardigan": ["Taylor Swift", "Aaron Dessner"],
        "august": ["Taylor Swift", "Jack Antonoff"]
    },
    "Midnights": {
        "Anti-Hero": ["Taylor Swift", "Jack Antonoff"],
        "Lavender Haze": ["Taylor Swift", "Jack Antonoff"]
    }
}

# Your code here:



In [None]:
# Practice Exercise 3:
# Create a data transformation pipeline that:
# 1. Takes a list of song dictionaries
# 2. Groups them by decade
# 3. Calculates average duration for each decade
# 4. Returns a summary dictionary

songs_data = [
    {"title": "Love Story", "year": 2008, "duration": 3.55},
    {"title": "22", "year": 2012, "duration": 3.85},
    {"title": "Shake It Off", "year": 2014, "duration": 3.39},
    {"title": "cardigan", "year": 2020, "duration": 3.59},
    {"title": "Anti-Hero", "year": 2022, "duration": 3.20}
]

# Your code here:



## Real-World Example: Music Analytics Dashboard

Build a comprehensive analytics system:

In [None]:
class TaylorSwiftAnalytics:
    """
    Comprehensive analytics dashboard for Taylor Swift's career data.
    """
    
    def __init__(self):
        self.data = {
            "albums": {},
            "songs": {},
            "chart_performance": {},
            "collaborations": {},
            "awards": {}
        }
    
    def add_album(self, name, year, genre, tracks=None):
        """Add album information."""
        self.data["albums"][name] = {
            "year": year,
            "genre": genre,
            "tracks": tracks or [],
            "track_count": len(tracks) if tracks else 0
        }
    
    def add_song(self, title, album, duration, writers=None):
        """Add song information."""
        self.data["songs"][title] = {
            "album": album,
            "duration": duration,
            "writers": writers or ["Taylor Swift"],
            "year": self.data["albums"].get(album, {}).get("year", None)
        }
        
        # Add to album's track list
        if album in self.data["albums"]:
            if title not in self.data["albums"][album]["tracks"]:
                self.data["albums"][album]["tracks"].append(title)
                self.data["albums"][album]["track_count"] = len(self.data["albums"][album]["tracks"])
    
    def get_genre_analysis(self):
        """Analyze Taylor's evolution across genres."""
        genre_data = {}
        
        for album, info in self.data["albums"].items():
            genre = info["genre"]
            if genre not in genre_data:
                genre_data[genre] = {
                    "albums": [],
                    "total_tracks": 0,
                    "years": []
                }
            
            genre_data[genre]["albums"].append(album)
            genre_data[genre]["total_tracks"] += info["track_count"]
            genre_data[genre]["years"].append(info["year"])
        
        # Add derived statistics
        for genre, data in genre_data.items():
            data["year_range"] = f"{min(data['years'])}-{max(data['years'])}"
            data["avg_tracks_per_album"] = data["total_tracks"] / len(data["albums"])
        
        return genre_data
    
    def get_collaboration_network(self):
        """Analyze collaboration patterns."""
        collaborators = {}
        
        for song, info in self.data["songs"].items():
            for writer in info["writers"]:
                if writer != "Taylor Swift":  # Exclude Taylor herself
                    if writer not in collaborators:
                        collaborators[writer] = {
                            "songs": [],
                            "albums": set(),
                            "total_duration": 0
                        }
                    
                    collaborators[writer]["songs"].append(song)
                    collaborators[writer]["albums"].add(info["album"])
                    collaborators[writer]["total_duration"] += info["duration"]
        
        # Convert sets to lists and add statistics
        for collaborator, data in collaborators.items():
            data["albums"] = list(data["albums"])
            data["song_count"] = len(data["songs"])
            data["album_count"] = len(data["albums"])
            data["avg_song_duration"] = data["total_duration"] / data["song_count"]
        
        return collaborators
    
    def generate_report(self):
        """Generate comprehensive analytics report."""
        report = {}
        
        # Basic statistics
        report["overview"] = {
            "total_albums": len(self.data["albums"]),
            "total_songs": len(self.data["songs"]),
            "career_span": None,
            "total_duration": sum(song["duration"] for song in self.data["songs"].values())
        }
        
        if self.data["albums"]:
            years = [album["year"] for album in self.data["albums"].values()]
            report["overview"]["career_span"] = f"{min(years)}-{max(years)}"
        
        # Genre analysis
        report["genre_analysis"] = self.get_genre_analysis()
        
        # Collaboration network
        report["collaborations"] = self.get_collaboration_network()
        
        return report

# Create and populate the analytics system
analytics = TaylorSwiftAnalytics()

# Add sample data
analytics.add_album("folklore", 2020, "Alternative")
analytics.add_album("Midnights", 2022, "Pop")

analytics.add_song("cardigan", "folklore", 3.59, ["Taylor Swift", "Aaron Dessner"])
analytics.add_song("august", "folklore", 4.21, ["Taylor Swift", "Jack Antonoff"])
analytics.add_song("Anti-Hero", "Midnights", 3.20, ["Taylor Swift", "Jack Antonoff"])
analytics.add_song("Lavender Haze", "Midnights", 3.22, ["Taylor Swift", "Jack Antonoff"])

# Generate and display report
report = analytics.generate_report()

print("🎵 TAYLOR SWIFT ANALYTICS DASHBOARD 🎵")
print("=" * 50)

print(f"\n📊 OVERVIEW:")
overview = report["overview"]
print(f"   Albums: {overview['total_albums']}")
print(f"   Songs: {overview['total_songs']}")
print(f"   Career span: {overview['career_span']}")
print(f"   Total duration: {overview['total_duration']:.1f} minutes")

print(f"\n🎭 GENRE EVOLUTION:")
for genre, data in report["genre_analysis"].items():
    print(f"   {genre}: {len(data['albums'])} albums ({data['year_range']}), "
          f"{data['total_tracks']} total tracks")

print(f"\n🤝 TOP COLLABORATORS:")
collaborators = sorted(report["collaborations"].items(), 
                      key=lambda x: x[1]["song_count"], reverse=True)
for collaborator, data in collaborators[:3]:  # Top 3
    print(f"   {collaborator}: {data['song_count']} songs across {data['album_count']} albums")

## Key Takeaways

### Dictionary Methods Mastery
- **Safe access**: `.get(key, default)` instead of `dict[key]` for unknown keys
- **Iteration**: `.keys()`, `.values()`, `.items()` for different loop needs
- **Modification**: `.update(other_dict)`, `.pop(key)`, `.setdefault(key, default)`
- **Copying**: `.copy()` for shallow copies, `copy.deepcopy()` for nested structures

### Nested Dictionary Patterns
- **Deep access**: Chain `.get()` calls for safe navigation
- **Structure**: Use consistent nesting levels and key naming
- **Validation**: Always check if keys exist before accessing nested data
- **Flattening**: Extract nested data into simpler structures when needed

### Dictionary Comprehensions
- **Basic**: `{key: value for item in iterable}`
- **Filtering**: `{k: v for k, v in dict.items() if condition}`
- **Transformation**: `{transform(k): transform(v) for k, v in dict.items()}`
- **From lists**: `{key_func(item): value_func(item) for item in list}`

### Advanced Patterns
- **Grouping**: Use `defaultdict` or manual grouping for categorizing data
- **Merging**: Use `**dict` syntax or `.update()` for combining dictionaries
- **Lookup tables**: Create reverse mappings for fast data retrieval
- **Aggregation**: Combine multiple dictionaries with conflict resolution

### Best Practices
- Use `.get()` with defaults for safe key access
- Prefer dictionary comprehensions for simple transformations
- Keep nesting levels reasonable (max 3-4 levels)
- Use meaningful key names that describe the data
- Consider `defaultdict` for grouping operations
- Validate data structure before processing nested dictionaries

### Performance Considerations
- Dictionary lookups are O(1) average case - use them for fast data access
- Avoid repeated `.keys()` or `.values()` calls in loops
- Use dictionary comprehensions instead of building dictionaries in loops
- Consider `collections.ChainMap` for merging multiple dictionaries

Next up: Modules & Packages - where we'll learn to organize and import code! 🎤