# Taylor Swift Python Tutorial: Modules & Packages

Learn to organize and reuse code using Python's module system! We'll explore importing, creating modules, and using Python's extensive standard library to enhance our Taylor Swift data analysis.

## Learning Goals
- Understand **importing** and different import styles
- Explore Python's **standard library** modules
- Learn about **packages** and namespaces
- Create reusable **utility modules**
- Apply modules to Taylor Swift data analysis

## Basic Importing

Python's import system lets us use code from other modules:

In [None]:
# Different ways to import modules
import math
import random
from datetime import datetime, timedelta
from collections import Counter, defaultdict
import json

# Using math module for Taylor Swift data
song_ratings = [9.5, 8.7, 9.2, 8.9, 9.8, 9.1, 8.5]

# Calculate statistics
average_rating = sum(song_ratings) / len(song_ratings)
variance = sum((x - average_rating) ** 2 for x in song_ratings) / len(song_ratings)
std_deviation = math.sqrt(variance)

print(f"Taylor Swift song ratings analysis:")
print(f"Average rating: {average_rating:.2f}")
print(f"Standard deviation: {std_deviation:.2f}")
print(f"Highest rating: {max(song_ratings)}")
print(f"Rating range: {max(song_ratings) - min(song_ratings):.1f}")

# Using random module for playlist shuffling
playlist = ["Love Story", "Shake It Off", "Anti-Hero", "cardigan", "22"]
print(f"\nOriginal playlist: {playlist}")

shuffled_playlist = playlist.copy()
random.shuffle(shuffled_playlist)
print(f"Shuffled playlist: {shuffled_playlist}")

# Random song selection
random_song = random.choice(playlist)
print(f"Random song pick: {random_song}")

# Using datetime for release date calculations
folklore_release = datetime(2020, 7, 24)
midnights_release = datetime(2022, 10, 21)
today = datetime.now()

folklore_age = today - folklore_release
gap_between_albums = midnights_release - folklore_release

print(f"\nAlbum timeline:")
print(f"folklore was released {folklore_age.days} days ago")
print(f"Gap between folklore and Midnights: {gap_between_albums.days} days")
print(f"That's about {gap_between_albums.days / 365:.1f} years")

## Collections Module - Advanced Data Structures

The collections module provides specialized container types:

In [None]:
from collections import Counter, defaultdict, namedtuple, deque

# Counter - count things automatically
taylor_lyrics = """
I've got a blank space baby and I'll write your name
So it goes he can't keep his wild eyes on the road
You were my crown now I'm in exile seeing you out
And I can go anywhere I want anywhere I want just not home
"""

# Count words in lyrics
words = taylor_lyrics.lower().split()
word_counts = Counter(words)

print("Most common words in Taylor's lyrics:")
for word, count in word_counts.most_common(5):
    print(f"  '{word}': {count} times")

# Count genres across discography
album_genres = ["Country", "Country", "Country", "Pop", "Pop", "Pop", "Alternative", "Alternative", "Pop"]
genre_counts = Counter(album_genres)

print(f"\nGenre distribution: {dict(genre_counts)}")
print(f"Most common genre: {genre_counts.most_common(1)[0]}")

# defaultdict - automatic default values
songs_by_album = defaultdict(list)

# Add songs without checking if album key exists
songs_by_album["folklore"].append("cardigan")
songs_by_album["folklore"].append("august")
songs_by_album["Midnights"].append("Anti-Hero")
songs_by_album["Midnights"].append("Lavender Haze")

print(f"\nSongs by album: {dict(songs_by_album)}")

# Group songs by first letter
songs = ["Love Story", "Anti-Hero", "Lavender Haze", "August", "Shake It Off"]
by_first_letter = defaultdict(list)

for song in songs:
    first_letter = song[0].upper()
    by_first_letter[first_letter].append(song)

print(f"\nSongs by first letter: {dict(by_first_letter)}")

# namedtuple - lightweight classes
Song = namedtuple('Song', ['title', 'album', 'year', 'duration'])
Album = namedtuple('Album', ['name', 'year', 'genre', 'tracks'])

# Create song objects
love_story = Song("Love Story", "Fearless", 2008, 3.55)
anti_hero = Song("Anti-Hero", "Midnights", 2022, 3.20)

print(f"\nSong objects:")
print(f"  {love_story.title} from {love_story.album} ({love_story.year}) - {love_story.duration} min")
print(f"  {anti_hero.title} from {anti_hero.album} ({anti_hero.year}) - {anti_hero.duration} min")

# Create album objects
folklore = Album("folklore", 2020, "Alternative", 16)
midnights = Album("Midnights", 2022, "Pop", 13)

albums = [folklore, midnights]
total_tracks = sum(album.tracks for album in albums)
print(f"\nTotal tracks across albums: {total_tracks}")

# deque - efficient queue operations
concert_queue = deque(["Love Story", "Shake It Off", "Anti-Hero"])

print(f"\nConcert setlist queue: {list(concert_queue)}")

# Add songs to both ends
concert_queue.appendleft("22")  # Add to beginning
concert_queue.append("All Too Well")  # Add to end

print(f"Updated setlist: {list(concert_queue)}")

# Remove from both ends
opening_song = concert_queue.popleft()
closing_song = concert_queue.pop()

print(f"Opening with: {opening_song}")
print(f"Closing with: {closing_song}")
print(f"Middle songs: {list(concert_queue)}")

## Date and Time Operations

Handle dates and times in music data:

In [None]:
from datetime import datetime, timedelta, date
import calendar

# Taylor Swift album release dates
album_releases = {
    "Taylor Swift": date(2006, 10, 24),
    "Fearless": date(2008, 11, 11),
    "Speak Now": date(2010, 10, 25),
    "Red": date(2012, 10, 22),
    "1989": date(2014, 10, 27),
    "reputation": date(2017, 11, 10),
    "Lover": date(2019, 8, 23),
    "folklore": date(2020, 7, 24),
    "evermore": date(2020, 12, 11),
    "Midnights": date(2022, 10, 21)
}

print("=== Taylor Swift Release Date Analysis ===")

# Calculate album ages
today = date.today()
album_ages = {}

for album, release_date in album_releases.items():
    age = today - release_date
    album_ages[album] = age.days

# Find oldest and newest albums
oldest_album = max(album_ages.items(), key=lambda x: x[1])
newest_album = min(album_ages.items(), key=lambda x: x[1])

print(f"Oldest album: {oldest_album[0]} ({oldest_album[1]} days old)")
print(f"Newest album: {newest_album[0]} ({newest_album[1]} days old)")

# Analyze release patterns
release_months = [release_date.month for release_date in album_releases.values()]
month_counts = Counter(release_months)

print("\nRelease month frequency:")
for month_num, count in sorted(month_counts.items()):
    month_name = calendar.month_name[month_num]
    print(f"  {month_name}: {count} albums")

# Find gaps between albums
sorted_releases = sorted(album_releases.items(), key=lambda x: x[1])
gaps = []

for i in range(1, len(sorted_releases)):
    prev_album, prev_date = sorted_releases[i-1]
    curr_album, curr_date = sorted_releases[i]
    gap = curr_date - prev_date
    gaps.append((prev_album, curr_album, gap.days))

print("\nGaps between album releases:")
for prev_album, curr_album, days in gaps:
    years = days / 365.25
    print(f"  {prev_album} → {curr_album}: {days} days ({years:.1f} years)")

# Find longest and shortest gaps
longest_gap = max(gaps, key=lambda x: x[2])
shortest_gap = min(gaps, key=lambda x: x[2])

print(f"\nLongest gap: {longest_gap[0]} → {longest_gap[1]} ({longest_gap[2]} days)")
print(f"Shortest gap: {shortest_gap[0]} → {shortest_gap[1]} ({shortest_gap[2]} days)")

# Predict next album date (based on average gap)
average_gap = sum(gap[2] for gap in gaps) / len(gaps)
last_release = max(album_releases.values())
predicted_next = last_release + timedelta(days=int(average_gap))

print(f"\nAverage gap between albums: {average_gap:.0f} days ({average_gap/365.25:.1f} years)")
print(f"Predicted next album date: {predicted_next.strftime('%B %d, %Y')}")

# Day of week analysis
weekday_counts = Counter(release_date.weekday() for release_date in album_releases.values())
weekday_names = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

print("\nRelease day of week frequency:")
for weekday, count in sorted(weekday_counts.items()):
    print(f"  {weekday_names[weekday]}: {count} albums")

favorite_weekday = max(weekday_counts.items(), key=lambda x: x[1])
print(f"\nTaylor's favorite release day: {weekday_names[favorite_weekday[0]]} ({favorite_weekday[1]} albums)")

## Working with JSON Data

Handle structured data like APIs and configuration files:

In [None]:
import json
from pprint import pprint

# Sample Taylor Swift data in JSON format (like from an API)
taylor_data = {
    "artist": {
        "name": "Taylor Swift",
        "birth_date": "1989-12-13",
        "genres": ["Country", "Pop", "Alternative", "Folk"],
        "active_years": "2004-present"
    },
    "albums": [
        {
            "title": "folklore",
            "release_date": "2020-07-24",
            "genre": "Alternative",
            "tracks": [
                {"title": "the 1", "duration": 210, "track_number": 1},
                {"title": "cardigan", "duration": 219, "track_number": 2},
                {"title": "the last great american dynasty", "duration": 231, "track_number": 3}
            ],
            "awards": ["Grammy Album of the Year 2021"],
            "certifications": {"US": "Platinum", "UK": "Gold", "Canada": "2x Platinum"}
        },
        {
            "title": "Midnights",
            "release_date": "2022-10-21",
            "genre": "Pop",
            "tracks": [
                {"title": "Lavender Haze", "duration": 202, "track_number": 1},
                {"title": "Anti-Hero", "duration": 200, "track_number": 2},
                {"title": "Midnight Rain", "duration": 174, "track_number": 3}
            ],
            "awards": [],
            "certifications": {"US": "2x Platinum", "UK": "Platinum"}
        }
    ],
    "statistics": {
        "total_albums": 10,
        "grammy_wins": 12,
        "grammy_nominations": 46,
        "total_sales_millions": 200
    }
}

print("=== Working with JSON Data ===")

# Convert to JSON string
json_string = json.dumps(taylor_data, indent=2)
print(f"JSON string length: {len(json_string)} characters")
print("\nFirst 200 characters of JSON:")
print(json_string[:200] + "...")

# Parse JSON (simulate reading from API/file)
parsed_data = json.loads(json_string)
print(f"\nParsed data type: {type(parsed_data)}")

# Extract and analyze data
artist_name = parsed_data["artist"]["name"]
total_albums = len(parsed_data["albums"])
total_tracks = sum(len(album["tracks"]) for album in parsed_data["albums"])

print(f"\nArtist: {artist_name}")
print(f"Albums in dataset: {total_albums}")
print(f"Total tracks: {total_tracks}")

# Analyze track durations
all_durations = []
for album in parsed_data["albums"]:
    for track in album["tracks"]:
        all_durations.append(track["duration"])

avg_duration = sum(all_durations) / len(all_durations)
min_duration = min(all_durations)
max_duration = max(all_durations)

print(f"\nTrack duration analysis:")
print(f"  Average: {avg_duration/60:.2f} minutes")
print(f"  Shortest: {min_duration/60:.2f} minutes")
print(f"  Longest: {max_duration/60:.2f} minutes")

# Find tracks by criteria
long_tracks = []
for album in parsed_data["albums"]:
    for track in album["tracks"]:
        if track["duration"] > 210:  # Longer than 3.5 minutes
            long_tracks.append({
                "title": track["title"],
                "album": album["title"],
                "duration": track["duration"]
            })

print(f"\nTracks longer than 3.5 minutes:")
for track in long_tracks:
    duration_min = track["duration"] / 60
    print(f"  {track['title']} from {track['album']}: {duration_min:.2f} min")

# Certification analysis
all_certifications = {}
for album in parsed_data["albums"]:
    for country, cert in album["certifications"].items():
        if country not in all_certifications:
            all_certifications[country] = []
        all_certifications[country].append(f"{album['title']}: {cert}")

print(f"\nCertifications by country:")
for country, certs in all_certifications.items():
    print(f"  {country}: {', '.join(certs)}")

# Create summary report
def create_album_summary(album_data):
    """Create a summary for an album."""
    total_duration = sum(track["duration"] for track in album_data["tracks"])
    track_count = len(album_data["tracks"])
    avg_track_length = total_duration / track_count if track_count > 0 else 0
    
    return {
        "title": album_data["title"],
        "release_date": album_data["release_date"],
        "genre": album_data["genre"],
        "track_count": track_count,
        "total_duration_minutes": total_duration / 60,
        "avg_track_length_minutes": avg_track_length / 60,
        "has_awards": len(album_data["awards"]) > 0,
        "certification_count": len(album_data["certifications"])
    }

album_summaries = [create_album_summary(album) for album in parsed_data["albums"]]

print(f"\nAlbum summaries:")
for summary in album_summaries:
    print(f"  {summary['title']} ({summary['genre']}):")
    print(f"    {summary['track_count']} tracks, {summary['total_duration_minutes']:.1f} min total")
    print(f"    Avg track: {summary['avg_track_length_minutes']:.2f} min")
    print(f"    Awards: {'Yes' if summary['has_awards'] else 'No'}")

## Math and Statistics

Analyze numerical data in music:

In [None]:
import math
import statistics
from decimal import Decimal, ROUND_HALF_UP

# Song rating data from various sources
song_ratings = {
    "Love Story": [9.2, 8.8, 9.5, 9.0, 8.9, 9.3, 9.1],
    "Shake It Off": [8.5, 8.7, 8.3, 8.9, 8.6, 8.4, 8.8],
    "All Too Well": [9.8, 9.9, 9.7, 9.6, 9.8, 9.9, 9.8],
    "Anti-Hero": [9.1, 8.9, 9.3, 9.2, 9.0, 9.4, 9.1],
    "cardigan": [9.0, 8.8, 9.2, 8.9, 9.1, 8.7, 9.0]
}

print("=== Statistical Analysis of Song Ratings ===")

# Calculate statistics for each song
song_stats = {}
for song, ratings in song_ratings.items():
    stats = {
        "mean": statistics.mean(ratings),
        "median": statistics.median(ratings),
        "mode": statistics.mode(ratings) if len(set(ratings)) < len(ratings) else "No mode",
        "stdev": statistics.stdev(ratings),
        "variance": statistics.variance(ratings),
        "min": min(ratings),
        "max": max(ratings),
        "range": max(ratings) - min(ratings)
    }
    song_stats[song] = stats

# Display statistics
for song, stats in song_stats.items():
    print(f"\n{song}:")
    print(f"  Mean: {stats['mean']:.2f}")
    print(f"  Median: {stats['median']:.2f}")
    print(f"  Std Dev: {stats['stdev']:.3f}")
    print(f"  Range: {stats['range']:.1f} ({stats['min']:.1f} - {stats['max']:.1f})")

# Find best and most consistent songs
best_song = max(song_stats.items(), key=lambda x: x[1]['mean'])
most_consistent = min(song_stats.items(), key=lambda x: x[1]['stdev'])

print(f"\nHighest rated song: {best_song[0]} (avg: {best_song[1]['mean']:.2f})")
print(f"Most consistent ratings: {most_consistent[0]} (std dev: {most_consistent[1]['stdev']:.3f})")

# Chart position analysis with math functions
chart_positions = [4, 1, 1, 20, 2, 8, 3, 1, 5, 12]

# Calculate chart success metrics
avg_position = statistics.mean(chart_positions)
median_position = statistics.median(chart_positions)
geometric_mean = statistics.geometric_mean(chart_positions)  # Good for multiplicative data

print(f"\nChart Position Analysis:")
print(f"Average position: {avg_position:.1f}")
print(f"Median position: {median_position}")
print(f"Geometric mean: {geometric_mean:.2f}")

# Chart success score (lower position = higher score)
success_scores = [1/pos for pos in chart_positions]  # Inverse for success score
avg_success = statistics.mean(success_scores)

print(f"Average success score: {avg_success:.3f}")

# Stream count analysis (using large numbers)
stream_counts = [1_200_000_000, 800_000_000, 650_000_000, 900_000_000, 750_000_000]
song_names = ["Anti-Hero", "Shake It Off", "Love Story", "cardigan", "All Too Well"]

# Total and average streams
total_streams = sum(stream_counts)
avg_streams = statistics.mean(stream_counts)

print(f"\nStreaming Analysis:")
print(f"Total streams: {total_streams:,}")
print(f"Average streams per song: {avg_streams:,.0f}")

# Find outliers using standard deviation
stream_mean = statistics.mean(stream_counts)
stream_stdev = statistics.stdev(stream_counts)
threshold = 1.5 * stream_stdev  # 1.5 standard deviations

outliers = []
for i, (song, streams) in enumerate(zip(song_names, stream_counts)):
    if abs(streams - stream_mean) > threshold:
        outliers.append((song, streams, "above" if streams > stream_mean else "below"))

if outliers:
    print(f"\nStream count outliers:")
    for song, streams, direction in outliers:
        print(f"  {song}: {streams:,} ({direction} average)")

# Precise calculations with Decimal for financial data
streaming_rate = Decimal('0.003')  # $0.003 per stream
total_revenue = Decimal('0')

print(f"\nRevenue calculations (${streaming_rate} per stream):")
for song, streams in zip(song_names, stream_counts):
    revenue = Decimal(str(streams)) * streaming_rate
    total_revenue += revenue
    # Round to nearest cent
    rounded_revenue = revenue.quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
    print(f"  {song}: ${rounded_revenue:,}")

print(f"\nTotal estimated revenue: ${total_revenue.quantize(Decimal('0.01'), rounding=ROUND_HALF_UP):,}")

# Growth rate calculations
quarterly_streams = [100_000_000, 120_000_000, 150_000_000, 180_000_000, 200_000_000]
growth_rates = []

for i in range(1, len(quarterly_streams)):
    growth = (quarterly_streams[i] - quarterly_streams[i-1]) / quarterly_streams[i-1]
    growth_rates.append(growth)

avg_growth_rate = statistics.mean(growth_rates)
print(f"\nQuarterly growth analysis:")
print(f"Average growth rate: {avg_growth_rate:.1%} per quarter")

# Predict next quarter using growth rate
predicted_next = quarterly_streams[-1] * (1 + avg_growth_rate)
print(f"Predicted next quarter: {predicted_next:,.0f} streams")

## Practice Time! 🎵

Let's practice using modules and packages:

In [None]:
# Practice Exercise 1:
# Use the random module to create:
# 1. A function that randomly selects 5 songs from a playlist
# 2. A function that creates a random rating (1-10) for each song
# 3. A function that randomly shuffles and returns the top 3 songs

import random

playlist = [
    "Love Story", "Shake It Off", "All Too Well", "Anti-Hero", "cardigan",
    "22", "ME!", "Blank Space", "Style", "august", "exile", "Lavender Haze"
]

# Your code here:



In [None]:
# Practice Exercise 2:
# Use the collections module to:
# 1. Count how many songs start with each letter
# 2. Group songs by their length (short ≤5 chars, medium 6-10, long >10)
# 3. Create a Song namedtuple and make song objects

from collections import Counter, defaultdict, namedtuple

songs = [
    "Love Story", "Shake It Off", "All Too Well", "Anti-Hero", "22",
    "ME!", "...Ready For It?", "We Are Never Ever Getting Back Together"
]

# Your code here:



In [None]:
# Practice Exercise 3:
# Use datetime and statistics modules to:
# 1. Calculate the average gap between these album releases
# 2. Find which month is most common for releases
# 3. Predict when the next album might be released

from datetime import date
import statistics
import calendar

releases = {
    "1989": date(2014, 10, 27),
    "reputation": date(2017, 11, 10),
    "Lover": date(2019, 8, 23),
    "folklore": date(2020, 7, 24),
    "evermore": date(2020, 12, 11),
    "Midnights": date(2022, 10, 21)
}

# Your code here:



## Real-World Example: Music Data Pipeline

Build a comprehensive data processing system using multiple modules:

In [None]:
import json
import statistics
import math
from datetime import datetime, timedelta
from collections import defaultdict, Counter, namedtuple
from decimal import Decimal, ROUND_HALF_UP
import random

class MusicDataProcessor:
    """
    Comprehensive music data processing system using Python standard library.
    """
    
    def __init__(self):
        self.Track = namedtuple('Track', ['title', 'duration', 'streams', 'rating'])
        self.Album = namedtuple('Album', ['title', 'release_date', 'genre', 'tracks'])
        self.data = []
    
    def load_sample_data(self):
        """Load sample Taylor Swift data."""
        sample_data = {
            "albums": [
                {
                    "title": "folklore",
                    "release_date": "2020-07-24",
                    "genre": "Alternative",
                    "tracks": [
                        {"title": "cardigan", "duration": 219, "streams": 800000000, "rating": 9.2},
                        {"title": "august", "duration": 261, "streams": 650000000, "rating": 8.9},
                        {"title": "exile", "duration": 284, "streams": 400000000, "rating": 9.0}
                    ]
                },
                {
                    "title": "Midnights",
                    "release_date": "2022-10-21",
                    "genre": "Pop",
                    "tracks": [
                        {"title": "Anti-Hero", "duration": 200, "streams": 1200000000, "rating": 9.1},
                        {"title": "Lavender Haze", "duration": 202, "streams": 500000000, "rating": 8.7},
                        {"title": "Midnight Rain", "duration": 174, "streams": 300000000, "rating": 8.5}
                    ]
                }
            ]
        }
        
        # Convert to structured objects
        for album_data in sample_data["albums"]:
            tracks = []
            for track_data in album_data["tracks"]:
                track = self.Track(
                    title=track_data["title"],
                    duration=track_data["duration"],
                    streams=track_data["streams"],
                    rating=track_data["rating"]
                )
                tracks.append(track)
            
            album = self.Album(
                title=album_data["title"],
                release_date=datetime.strptime(album_data["release_date"], "%Y-%m-%d").date(),
                genre=album_data["genre"],
                tracks=tracks
            )
            self.data.append(album)
    
    def analyze_streaming_performance(self):
        """Analyze streaming performance across all tracks."""
        all_streams = []
        track_performance = []
        
        for album in self.data:
            for track in album.tracks:
                all_streams.append(track.streams)
                track_performance.append({
                    'title': track.title,
                    'album': album.title,
                    'streams': track.streams,
                    'rating': track.rating,
                    'duration': track.duration
                })
        
        # Statistical analysis
        stats = {
            'total_streams': sum(all_streams),
            'average_streams': statistics.mean(all_streams),
            'median_streams': statistics.median(all_streams),
            'stdev_streams': statistics.stdev(all_streams),
            'min_streams': min(all_streams),
            'max_streams': max(all_streams)
        }
        
        # Find top performers
        top_tracks = sorted(track_performance, key=lambda x: x['streams'], reverse=True)[:3]
        
        return stats, top_tracks
    
    def calculate_revenue(self, rate_per_stream=0.003):
        """Calculate estimated revenue from streaming."""
        rate = Decimal(str(rate_per_stream))
        album_revenues = {}
        total_revenue = Decimal('0')
        
        for album in self.data:
            album_revenue = Decimal('0')
            for track in album.tracks:
                track_revenue = Decimal(str(track.streams)) * rate
                album_revenue += track_revenue
                total_revenue += track_revenue
            
            album_revenues[album.title] = album_revenue.quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
        
        return album_revenues, total_revenue.quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
    
    def analyze_ratings_vs_streams(self):
        """Analyze correlation between ratings and stream counts."""
        ratings = []
        streams = []
        track_data = []
        
        for album in self.data:
            for track in album.tracks:
                ratings.append(track.rating)
                streams.append(track.streams)
                track_data.append({
                    'title': track.title,
                    'rating': track.rating,
                    'streams': track.streams,
                    'streams_per_rating': track.streams / track.rating
                })
        
        # Calculate correlation coefficient
        if len(ratings) > 1:
            correlation = statistics.correlation(ratings, streams)
        else:
            correlation = 0
        
        # Find efficiency (streams per rating point)
        most_efficient = max(track_data, key=lambda x: x['streams_per_rating'])
        least_efficient = min(track_data, key=lambda x: x['streams_per_rating'])
        
        return {
            'correlation': correlation,
            'most_efficient': most_efficient,
            'least_efficient': least_efficient,
            'avg_rating': statistics.mean(ratings),
            'rating_stdev': statistics.stdev(ratings) if len(ratings) > 1 else 0
        }
    
    def create_playlist_recommendation(self, criteria=None):
        """Create a recommended playlist based on criteria."""
        all_tracks = []
        for album in self.data:
            for track in album.tracks:
                all_tracks.append({
                    'title': track.title,
                    'album': album.title,
                    'genre': album.genre,
                    'duration': track.duration,
                    'streams': track.streams,
                    'rating': track.rating
                })
        
        # Apply filters if criteria provided
        filtered_tracks = all_tracks
        
        if criteria:
            if 'min_rating' in criteria:
                filtered_tracks = [t for t in filtered_tracks if t['rating'] >= criteria['min_rating']]
            
            if 'max_duration' in criteria:
                filtered_tracks = [t for t in filtered_tracks if t['duration'] <= criteria['max_duration']]
            
            if 'genre' in criteria:
                filtered_tracks = [t for t in filtered_tracks if t['genre'] == criteria['genre']]
        
        # Sort by rating * streams (popularity score)
        for track in filtered_tracks:
            track['popularity_score'] = track['rating'] * math.log10(track['streams'])
        
        filtered_tracks.sort(key=lambda x: x['popularity_score'], reverse=True)
        
        # Shuffle top tracks for variety
        top_tracks = filtered_tracks[:6]  # Get top 6
        random.shuffle(top_tracks)
        
        return top_tracks[:4]  # Return 4 shuffled tracks
    
    def generate_report(self):
        """Generate comprehensive analysis report."""
        streaming_stats, top_tracks = self.analyze_streaming_performance()
        album_revenues, total_revenue = self.calculate_revenue()
        rating_analysis = self.analyze_ratings_vs_streams()
        
        report = {
            'timestamp': datetime.now().isoformat(),
            'summary': {
                'total_albums': len(self.data),
                'total_tracks': sum(len(album.tracks) for album in self.data),
                'total_streams': streaming_stats['total_streams'],
                'total_revenue': float(total_revenue)
            },
            'streaming_analysis': streaming_stats,
            'top_tracks': top_tracks,
            'revenue_by_album': {k: float(v) for k, v in album_revenues.items()},
            'rating_analysis': rating_analysis
        }
        
        return report

# Create and run the processor
processor = MusicDataProcessor()
processor.load_sample_data()

print("🎵 COMPREHENSIVE MUSIC DATA ANALYSIS 🎵")
print("=" * 50)

# Generate full report
report = processor.generate_report()

print(f"\n📊 SUMMARY:")
print(f"   Albums analyzed: {report['summary']['total_albums']}")
print(f"   Total tracks: {report['summary']['total_tracks']}")
print(f"   Total streams: {report['summary']['total_streams']:,}")
print(f"   Estimated revenue: ${report['summary']['total_revenue']:,.2f}")

print(f"\n🎧 TOP PERFORMING TRACKS:")
for i, track in enumerate(report['top_tracks'], 1):
    print(f"   {i}. {track['title']} ({track['album']}): {track['streams']:,} streams")

print(f"\n💰 REVENUE BY ALBUM:")
for album, revenue in report['revenue_by_album'].items():
    print(f"   {album}: ${revenue:,.2f}")

print(f"\n📈 RATING ANALYSIS:")
rating_data = report['rating_analysis']
print(f"   Rating-Stream Correlation: {rating_data['correlation']:.3f}")
print(f"   Average rating: {rating_data['avg_rating']:.2f}")
print(f"   Most efficient track: {rating_data['most_efficient']['title']}")

# Create sample playlist
playlist = processor.create_playlist_recommendation({'min_rating': 8.5})
print(f"\n🎵 RECOMMENDED PLAYLIST (min rating 8.5):")
for i, track in enumerate(playlist, 1):
    print(f"   {i}. {track['title']} ({track['rating']}/10)")

print(f"\nAnalysis completed at: {report['timestamp']}")

## Key Takeaways

### Import Styles
- **Full module**: `import math` → use `math.sqrt()`
- **Specific items**: `from math import sqrt` → use `sqrt()` directly
- **Multiple items**: `from collections import Counter, defaultdict`
- **All items**: `from module import *` (use sparingly)
- **Aliases**: `import numpy as np` → use `np.array()`

### Essential Standard Library Modules
- **math**: Mathematical functions (`sqrt`, `log`, `ceil`, `floor`)
- **statistics**: Statistical functions (`mean`, `median`, `stdev`)
- **random**: Random number generation (`choice`, `shuffle`, `randint`)
- **datetime**: Date and time operations (`datetime`, `timedelta`, `date`)
- **collections**: Specialized containers (`Counter`, `defaultdict`, `namedtuple`)
- **json**: JSON encoding/decoding (`dumps`, `loads`)
- **decimal**: Precise decimal arithmetic for financial calculations

### Collections Module Highlights
- **Counter**: Count hashable objects automatically
- **defaultdict**: Dictionary with default values for missing keys
- **namedtuple**: Lightweight classes for structured data
- **deque**: Double-ended queue for efficient append/pop operations

### Best Practices
- Import only what you need to keep namespace clean
- Use specific imports (`from module import function`) for frequently used items
- Use full module imports (`import module`) when using many functions
- Avoid `from module import *` except in specific cases
- Group imports: standard library, third-party, local modules

### When to Use Which Module
- **datetime**: Working with dates, times, durations
- **collections**: When you need specialized containers
- **statistics**: For statistical analysis of numerical data
- **json**: For API data, configuration files, data exchange
- **math**: For mathematical calculations beyond basic arithmetic
- **random**: For simulations, sampling, shuffling
- **decimal**: For financial calculations requiring precision

### Performance Considerations
- Import statements are executed once when first imported
- Put imports at the top of files for clarity
- Use `from module import function` for frequently called functions
- Consider lazy imports for heavy modules used conditionally

Next up: Files & JSON - where we'll learn to read and write data to files! 🎤