# Lesson 12: Course Review & Final Project Launch

**Session:** Week 3, Sunday (3 hours)  
**Learning Objectives:**
- Consolidate and review all Python fundamentals learned
- Demonstrate mastery through comprehensive exercises
- Launch the capstone final project
- Plan next steps in your data science journey
- Celebrate your incredible transformation from beginner to data scientist!

---

## 🎉 Welcome to Your Final Class!

**Congratulations! You've completed an incredible journey:**

### 📈 Your Transformation
- **3 weeks ago**: Complete Python beginners
- **Today**: Confident data scientists who can build complete analysis systems!

### 🎯 What You've Mastered
- **Week 1**: Python foundations (variables, data types, strings, lists, dictionaries)
- **Week 2**: Program logic (conditionals, loops, functions, code organization) 
- **Week 3**: Real-world skills (file I/O, error handling, data analysis, project development)

### 🏆 What Makes You Special
Unlike many Python courses that focus only on syntax, you've learned:
- **Problem-solving thinking** - How to break down complex challenges
- **Data intuition** - How to find insights in messy, real-world data
- **Professional practices** - Error handling, testing, and documentation
- **Business communication** - Translating technical work into actionable insights

**Today we review, consolidate, and launch your final capstone project!** 🚀

## 📚 Complete Course Review

Let's systematically review everything you've learned:

In [None]:
# WEEK 1 FUNDAMENTALS REVIEW
print("📊 WEEK 1: PYTHON FUNDAMENTALS REVIEW")
print("=" * 50)

# 1. Variables and Data Types
print("\n🏗️ VARIABLES & DATA TYPES:")

# Basic data types
name = "Data Scientist"  # String
age = 25                 # Integer
salary = 85000.50        # Float
is_employed = True       # Boolean
skills = None           # NoneType

print(f"Name: {name} (type: {type(name).__name__})")
print(f"Age: {age} (type: {type(age).__name__})")
print(f"Salary: ${salary:,.2f} (type: {type(salary).__name__})")
print(f"Employed: {is_employed} (type: {type(is_employed).__name__})")
print(f"Skills: {skills} (type: {type(skills).__name__})")

# Type conversion mastery
print("\n🔄 TYPE CONVERSION MASTERY:")
score_str = "95"
score_int = int(score_str)
score_float = float(score_str)
print(f"String '{score_str}' → int: {score_int}, float: {score_float}")

# 2. String Operations
print("\n📝 STRING OPERATIONS:")
message = "  Python Data Science  "
print(f"Original: '{message}'")
print(f"Cleaned: '{message.strip().lower()}'")
print(f"Title case: '{message.strip().title()}'")
print(f"Contains 'data': {'data' in message.lower()}")
print(f"Word count: {len(message.split())}")

# F-strings mastery
first_name = "Alice"
last_name = "Johnson"
score = 94.7
print(f"\nF-string mastery: {first_name} {last_name} scored {score:.1f}%")
print(f"Formatted: {score:08.2f} (padded with zeros)")

print("\n✅ Week 1 fundamentals: MASTERED!")

In [None]:
# WEEK 1 DATA STRUCTURES REVIEW
print("\n📊 WEEK 1: DATA STRUCTURES REVIEW")
print("=" * 50)

# 3. Lists Mastery
print("\n📋 LISTS MASTERY:")
students = ['Alice', 'Bob', 'Charlie', 'Diana']
grades = [95, 87, 92, 89]

# List operations
print(f"Students: {students}")
print(f"Grades: {grades}")
print(f"First student: {students[0]}, Last grade: {grades[-1]}")
print(f"Top 2 students: {students[:2]}")
print(f"Average grade: {sum(grades)/len(grades):.1f}")

# List methods
new_students = students.copy()
new_students.append('Eve')
new_students.insert(1, 'Frank')
print(f"Updated list: {new_students}")
print(f"Bob's position: {new_students.index('Bob')}")

# List comprehension preview
high_grades = [grade for grade in grades if grade >= 90]
print(f"High grades (≥90): {high_grades}")

# 4. Dictionaries Mastery
print("\n🗃️ DICTIONARIES MASTERY:")
student_data = {
    'name': 'Alice Johnson',
    'age': 22,
    'major': 'Data Science',
    'grades': [95, 88, 92, 90],
    'contact': {
        'email': 'alice@university.edu',
        'phone': '555-0123'
    }
}

# Dictionary operations
print(f"Student: {student_data['name']}")
print(f"Email: {student_data['contact']['email']}")
print(f"Average grade: {sum(student_data['grades'])/len(student_data['grades']):.1f}")

# Dictionary methods
print(f"\nAll keys: {list(student_data.keys())}")
print(f"Has phone: {'phone' in student_data['contact']}")

# Adding new data
student_data['gpa'] = 3.85
student_data['skills'] = ['Python', 'SQL', 'Statistics']
print(f"Updated student data keys: {list(student_data.keys())}")

# 5. Sets (bonus)
print("\n🎯 SETS (BONUS):")
python_students = {'Alice', 'Bob', 'Charlie', 'Diana'}
sql_students = {'Bob', 'Diana', 'Eve', 'Frank'}

print(f"Python students: {python_students}")
print(f"SQL students: {sql_students}")
print(f"Both skills: {python_students & sql_students}")
print(f"Any skill: {python_students | sql_students}")
print(f"Python only: {python_students - sql_students}")

print("\n✅ Week 1 data structures: MASTERED!")

In [None]:
# WEEK 2 PROGRAM LOGIC REVIEW
print("\n🧠 WEEK 2: PROGRAM LOGIC REVIEW")
print("=" * 50)

# 6. Conditionals Mastery
print("\n🚦 CONDITIONALS MASTERY:")

def evaluate_student(name, grade, attendance):
    """Comprehensive student evaluation using conditionals"""
    
    # Grade evaluation
    if grade >= 95:
        grade_level = "Exceptional"
        letter = "A+"
    elif grade >= 90:
        grade_level = "Excellent"
        letter = "A"
    elif grade >= 80:
        grade_level = "Good"
        letter = "B"
    elif grade >= 70:
        grade_level = "Satisfactory"
        letter = "C"
    else:
        grade_level = "Needs Improvement"
        letter = "F"
    
    # Attendance evaluation
    if attendance >= 95:
        attendance_status = "Perfect"
    elif attendance >= 85:
        attendance_status = "Good"
    else:
        attendance_status = "Concerning"
    
    # Combined evaluation with complex conditions
    if grade >= 90 and attendance >= 90:
        overall_status = "Dean's List"
    elif grade >= 80 and attendance >= 80:
        overall_status = "Good Standing"
    elif grade < 70 or attendance < 75:
        overall_status = "Academic Probation"
    else:
        overall_status = "Satisfactory"
    
    return {
        'name': name,
        'grade': grade,
        'letter': letter,
        'grade_level': grade_level,
        'attendance': attendance,
        'attendance_status': attendance_status,
        'overall_status': overall_status
    }

# Test conditional logic
test_students = [
    ('Alice', 96, 98),
    ('Bob', 78, 85),
    ('Charlie', 65, 70)
]

for name, grade, attendance in test_students:
    result = evaluate_student(name, grade, attendance)
    print(f"{result['name']}: {result['letter']} ({result['grade']}%), {result['attendance']}% attendance → {result['overall_status']}")

print("\n✅ Conditionals mastery: Complex decision-making achieved!")

In [None]:
# WEEK 2 LOOPS REVIEW
print("\n🔄 WEEK 2: LOOPS MASTERY REVIEW")
print("=" * 50)

# 7. For Loops Mastery
print("\n🔁 FOR LOOPS MASTERY:")

# Sales data for analysis
sales_data = [
    {'product': 'Laptop', 'price': 999.99, 'quantity': 5, 'category': 'Electronics'},
    {'product': 'Mouse', 'price': 29.99, 'quantity': 15, 'category': 'Electronics'},
    {'product': 'Book', 'price': 19.99, 'quantity': 8, 'category': 'Education'},
    {'product': 'Desk', 'price': 299.99, 'quantity': 3, 'category': 'Furniture'},
]

# Pattern 1: Accumulator pattern
total_revenue = 0
total_items = 0

for item in sales_data:
    item_revenue = item['price'] * item['quantity']
    total_revenue += item_revenue
    total_items += item['quantity']
    print(f"{item['product']}: ${item_revenue:,.2f} revenue ({item['quantity']} units)")

print(f"\nTotal Revenue: ${total_revenue:,.2f}")
print(f"Total Items Sold: {total_items}")
print(f"Average Order Value: ${total_revenue/len(sales_data):,.2f}")

# Pattern 2: Grouping pattern
categories = {}
for item in sales_data:
    category = item['category']
    if category not in categories:
        categories[category] = {'revenue': 0, 'items': 0}
    
    categories[category]['revenue'] += item['price'] * item['quantity']
    categories[category]['items'] += item['quantity']

print("\nRevenue by Category:")
for category, stats in categories.items():
    print(f"{category}: ${stats['revenue']:,.2f} ({stats['items']} items)")

# Pattern 3: Nested loops for comparison
print("\nPrice Comparison Matrix:")
for i, item1 in enumerate(sales_data):
    for j, item2 in enumerate(sales_data):
        if i < j:  # Avoid duplicate comparisons
            price_diff = abs(item1['price'] - item2['price'])
            print(f"{item1['product']} vs {item2['product']}: ${price_diff:.2f} difference")

print("\n✅ For loops mastery: Complex data processing achieved!")

In [None]:
# WEEK 2 FUNCTIONS REVIEW
print("\n⚙️ WEEK 2: FUNCTIONS MASTERY REVIEW")
print("=" * 50)

# 8. Functions Mastery
print("\n🔧 FUNCTIONS MASTERY:")

# Basic function with parameters and return
def calculate_grade_stats(grades):
    """
    Calculate comprehensive statistics for a list of grades
    
    Args:
        grades (list): List of numeric grades
    
    Returns:
        dict: Dictionary containing grade statistics
    """
    if not grades:
        return None
    
    stats = {
        'count': len(grades),
        'average': sum(grades) / len(grades),
        'highest': max(grades),
        'lowest': min(grades),
        'passing_count': sum(1 for grade in grades if grade >= 70)
    }
    
    stats['passing_rate'] = (stats['passing_count'] / stats['count']) * 100
    stats['range'] = stats['highest'] - stats['lowest']
    
    return stats

# Function with default parameters
def format_grade_report(student_name, grade, attendance=100, extra_credit=0):
    """
    Format a comprehensive grade report
    
    Args:
        student_name (str): Student's name
        grade (float): Base grade
        attendance (float, optional): Attendance percentage. Defaults to 100.
        extra_credit (float, optional): Extra credit points. Defaults to 0.
    
    Returns:
        str: Formatted grade report
    """
    final_grade = min(100, grade + extra_credit)
    
    # Attendance adjustment
    if attendance < 80:
        final_grade *= 0.95  # 5% penalty for poor attendance
    
    report = f"""
    GRADE REPORT FOR {student_name.upper()}
    ================================
    Base Grade: {grade}%
    Extra Credit: +{extra_credit}%
    Attendance: {attendance}%
    Final Grade: {final_grade:.1f}%
    Status: {'PASS' if final_grade >= 70 else 'FAIL'}
    """
    
    return report.strip()

# Function with *args and **kwargs
def analyze_multiple_classes(*class_grades, **options):
    """
    Analyze grades across multiple classes
    
    Args:
        *class_grades: Variable number of grade lists
        **options: Optional parameters (show_details, min_passing_grade)
    
    Returns:
        dict: Analysis results
    """
    show_details = options.get('show_details', True)
    min_passing = options.get('min_passing_grade', 70)
    
    results = {
        'total_classes': len(class_grades),
        'class_stats': [],
        'overall_average': 0,
        'total_students': 0
    }
    
    all_grades = []
    
    for i, grades in enumerate(class_grades, 1):
        class_stats = calculate_grade_stats(grades)
        if class_stats:
            class_stats['class_number'] = i
            class_stats['passing_rate_custom'] = (sum(1 for g in grades if g >= min_passing) / len(grades)) * 100
            results['class_stats'].append(class_stats)
            all_grades.extend(grades)
            
            if show_details:
                print(f"Class {i}: Avg {class_stats['average']:.1f}%, {class_stats['passing_rate_custom']:.1f}% passing")
    
    if all_grades:
        results['overall_average'] = sum(all_grades) / len(all_grades)
        results['total_students'] = len(all_grades)
    
    return results

# Test functions
print("Testing Function Mastery:")

# Test basic function
sample_grades = [95, 87, 92, 78, 85, 90, 88]
stats = calculate_grade_stats(sample_grades)
print(f"\nGrade Statistics: Average {stats['average']:.1f}%, {stats['passing_rate']:.1f}% passing")

# Test function with defaults
report = format_grade_report("Alice Johnson", 88, attendance=95, extra_credit=5)
print(f"\nSample Report:\n{report}")

# Test function with *args/**kwargs
class1_grades = [95, 87, 92, 78, 85]
class2_grades = [88, 90, 76, 82, 94]
class3_grades = [91, 89, 85, 88, 92]

analysis = analyze_multiple_classes(class1_grades, class2_grades, class3_grades, 
                                  show_details=True, min_passing_grade=75)
print(f"\nOverall Analysis: {analysis['total_students']} students across {analysis['total_classes']} classes")
print(f"Overall Average: {analysis['overall_average']:.1f}%")

print("\n✅ Functions mastery: Professional code organization achieved!")

In [None]:
# WEEK 3 REAL-WORLD SKILLS REVIEW
print("\n🌍 WEEK 3: REAL-WORLD SKILLS REVIEW")
print("=" * 50)

# 9. File I/O and Error Handling Mastery
print("\n📁 FILE I/O & ERROR HANDLING MASTERY:")

import csv
import json
from datetime import datetime

def safe_data_processor(filename, output_format='json'):
    """
    Safely process data file with comprehensive error handling
    
    Args:
        filename (str): Input file path
        output_format (str): Output format ('json' or 'csv')
    
    Returns:
        dict: Processing results with success/error information
    """
    results = {
        'success': False,
        'records_processed': 0,
        'errors': [],
        'output_file': None,
        'processing_time': None
    }
    
    start_time = datetime.now()
    
    try:
        # Attempt to read the file
        processed_data = []
        
        with open(filename, 'r') as file:
            # Try to determine file type
            first_line = file.readline()
            file.seek(0)  # Reset to beginning
            
            if ',' in first_line:  # Likely CSV
                try:
                    reader = csv.DictReader(file)
                    for row_num, row in enumerate(reader, 1):
                        try:
                            # Process each row with validation
                            if all(value.strip() for value in row.values()):
                                processed_data.append(row)
                                results['records_processed'] += 1
                            else:
                                results['errors'].append(f"Row {row_num}: Empty fields found")
                        except Exception as e:
                            results['errors'].append(f"Row {row_num}: {str(e)}")
                            
                except csv.Error as e:
                    results['errors'].append(f"CSV parsing error: {str(e)}")
                    return results
                    
            else:  # Try JSON
                try:
                    data = json.load(file)
                    if isinstance(data, list):
                        processed_data = data
                        results['records_processed'] = len(data)
                    else:
                        processed_data = [data]
                        results['records_processed'] = 1
                except json.JSONDecodeError as e:
                    results['errors'].append(f"JSON parsing error: {str(e)}")
                    return results
        
        # Save processed data
        if processed_data:
            timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
            
            if output_format == 'json':
                output_filename = f'processed_data_{timestamp}.json'
                with open(output_filename, 'w') as outfile:
                    json.dump({
                        'processing_info': {
                            'source_file': filename,
                            'processed_at': datetime.now().isoformat(),
                            'record_count': len(processed_data)
                        },
                        'data': processed_data
                    }, outfile, indent=2)
                    
            else:  # CSV output
                output_filename = f'processed_data_{timestamp}.csv'
                if processed_data and isinstance(processed_data[0], dict):
                    with open(output_filename, 'w', newline='') as outfile:
                        fieldnames = list(processed_data[0].keys())
                        writer = csv.DictWriter(outfile, fieldnames=fieldnames)
                        writer.writeheader()
                        writer.writerows(processed_data)
            
            results['output_file'] = output_filename
            results['success'] = True
        
    except FileNotFoundError:
        results['errors'].append(f"File '{filename}' not found")
    except PermissionError:
        results['errors'].append(f"Permission denied accessing '{filename}'")
    except Exception as e:
        results['errors'].append(f"Unexpected error: {str(e)}")
    
    finally:
        end_time = datetime.now()
        results['processing_time'] = (end_time - start_time).total_seconds()
    
    return results

# Create sample data for testing
sample_data = [
    {'name': 'Alice', 'score': 95, 'subject': 'Math'},
    {'name': 'Bob', 'score': 87, 'subject': 'Math'},
    {'name': 'Charlie', 'score': 92, 'subject': 'Math'}
]

# Create test CSV file
with open('test_data.csv', 'w', newline='') as file:
    fieldnames = ['name', 'score', 'subject']
    writer = csv.DictWriter(file, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(sample_data)

# Test file processing
print("Testing File Processing:")
result = safe_data_processor('test_data.csv', 'json')

if result['success']:
    print(f"✅ Successfully processed {result['records_processed']} records")
    print(f"✅ Output saved to: {result['output_file']}")
    print(f"✅ Processing time: {result['processing_time']:.3f} seconds")
else:
    print(f"❌ Processing failed:")
    for error in result['errors']:
        print(f"   • {error}")

# Test error handling
print("\nTesting Error Handling:")
error_result = safe_data_processor('nonexistent_file.csv')
print(f"Error handling test: {len(error_result['errors'])} errors caught gracefully")

print("\n✅ File I/O & Error Handling mastery: Production-ready code achieved!")

In [None]:
# WEEK 3 DATA ANALYSIS MASTERY REVIEW
print("\n📊 WEEK 3: DATA ANALYSIS MASTERY REVIEW")
print("=" * 50)

# 10. Complete Data Analysis Pipeline
print("\n🔬 COMPLETE DATA ANALYSIS PIPELINE:")

class DataAnalysisPipeline:
    """
    Complete data analysis pipeline demonstrating all skills
    """
    
    def __init__(self, name="Data Analysis"):
        self.name = name
        self.raw_data = []
        self.clean_data = []
        self.analysis_results = {}
        self.insights = []
        self.errors = []
    
    def load_data(self, data_source):
        """Load data from various sources"""
        try:
            if isinstance(data_source, str):  # File path
                with open(data_source, 'r') as file:
                    if data_source.endswith('.json'):
                        self.raw_data = json.load(file)
                    elif data_source.endswith('.csv'):
                        reader = csv.DictReader(file)
                        self.raw_data = list(reader)
            elif isinstance(data_source, list):  # Direct data
                self.raw_data = data_source
            else:
                raise ValueError("Unsupported data source type")
                
            return len(self.raw_data)
            
        except Exception as e:
            self.errors.append(f"Data loading error: {str(e)}")
            return 0
    
    def clean_data(self):
        """Clean and validate data"""
        self.clean_data = []
        
        for i, record in enumerate(self.raw_data):
            try:
                # Skip empty or invalid records
                if not record or not isinstance(record, dict):
                    self.errors.append(f"Record {i}: Invalid format")
                    continue
                
                # Clean string fields
                clean_record = {}
                for key, value in record.items():
                    if isinstance(value, str):
                        clean_record[key] = value.strip().title()
                    elif isinstance(value, (int, float)):
                        if value >= 0:  # Reject negative values for this example
                            clean_record[key] = value
                        else:
                            self.errors.append(f"Record {i}: Negative value for {key}")
                            continue
                    else:
                        clean_record[key] = value
                
                if clean_record:  # Only add if we have clean data
                    self.clean_data.append(clean_record)
                    
            except Exception as e:
                self.errors.append(f"Record {i}: Cleaning error - {str(e)}")
        
        return len(self.clean_data)
    
    def analyze(self):
        """Perform comprehensive analysis"""
        if not self.clean_data:
            self.errors.append("No clean data available for analysis")
            return {}
        
        results = {
            'record_count': len(self.clean_data),
            'data_quality': {
                'clean_records': len(self.clean_data),
                'total_records': len(self.raw_data),
                'error_count': len(self.errors),
                'quality_score': (len(self.clean_data) / len(self.raw_data)) * 100 if self.raw_data else 0
            }
        }
        
        # Analyze numeric fields
        numeric_fields = []
        for record in self.clean_data:
            for key, value in record.items():
                if isinstance(value, (int, float)) and key not in numeric_fields:
                    numeric_fields.append(key)
        
        for field in numeric_fields:
            values = [record[field] for record in self.clean_data if field in record]
            if values:
                results[f'{field}_stats'] = {
                    'count': len(values),
                    'sum': sum(values),
                    'mean': sum(values) / len(values),
                    'min': min(values),
                    'max': max(values),
                    'range': max(values) - min(values)
                }
        
        # Analyze categorical fields
        categorical_fields = []
        for record in self.clean_data:
            for key, value in record.items():
                if isinstance(value, str) and key not in categorical_fields:
                    categorical_fields.append(key)
        
        for field in categorical_fields:
            values = [record[field] for record in self.clean_data if field in record]
            if values:
                from collections import Counter
                counts = Counter(values)
                results[f'{field}_distribution'] = {
                    'unique_values': len(counts),
                    'most_common': counts.most_common(3),
                    'total_count': len(values)
                }
        
        self.analysis_results = results
        return results
    
    def generate_insights(self):
        """Generate business insights from analysis"""
        insights = []
        
        if not self.analysis_results:
            insights.append("No analysis results available for insight generation")
            return insights
        
        # Data quality insights
        quality_score = self.analysis_results['data_quality']['quality_score']
        if quality_score >= 95:
            insights.append(f"✅ Excellent data quality ({quality_score:.1f}%) - analysis is highly reliable")
        elif quality_score >= 85:
            insights.append(f"⚠️ Good data quality ({quality_score:.1f}%) - minor data issues detected")
        else:
            insights.append(f"❌ Poor data quality ({quality_score:.1f}%) - results should be interpreted carefully")
        
        # Generate insights for numeric fields
        for key, value in self.analysis_results.items():
            if key.endswith('_stats'):
                field_name = key.replace('_stats', '')
                if value['range'] > value['mean'] * 2:
                    insights.append(f"📊 High variability in {field_name} (range: {value['range']:.1f}, mean: {value['mean']:.1f})")
                
                if value['max'] > value['mean'] * 3:
                    insights.append(f"🔍 Potential outliers detected in {field_name} (max: {value['max']:.1f})")
        
        # Generate insights for categorical fields
        for key, value in self.analysis_results.items():
            if key.endswith('_distribution'):
                field_name = key.replace('_distribution', '')
                if value['most_common']:
                    top_category, count = value['most_common'][0]
                    percentage = (count / value['total_count']) * 100
                    if percentage > 50:
                        insights.append(f"📈 {field_name} heavily concentrated: '{top_category}' represents {percentage:.1f}% of data")
        
        self.insights = insights
        return insights
    
    def generate_report(self):
        """Generate comprehensive analysis report"""
        report = f"""
{'='*60}
{self.name.upper()} - ANALYSIS REPORT
{'='*60}

📊 DATA OVERVIEW:
   Records Loaded: {len(self.raw_data)}
   Clean Records: {len(self.clean_data)}
   Data Quality: {self.analysis_results.get('data_quality', {}).get('quality_score', 0):.1f}%
   Errors Found: {len(self.errors)}

🔍 KEY INSIGHTS:"""
        
        for insight in self.insights:
            report += f"\n   • {insight}"
        
        if self.errors:
            report += "\n\n⚠️ DATA QUALITY ISSUES:"
            for i, error in enumerate(self.errors[:5], 1):  # Show first 5 errors
                report += f"\n   {i}. {error}"
            if len(self.errors) > 5:
                report += f"\n   ... and {len(self.errors) - 5} more issues"
        
        report += f"\n\n📈 DETAILED STATISTICS:"
        for key, value in self.analysis_results.items():
            if isinstance(value, dict) and 'mean' in value:
                field_name = key.replace('_stats', '').title()
                report += f"\n   {field_name}: avg={value['mean']:.1f}, range={value['min']:.1f}-{value['max']:.1f}"
        
        report += "\n" + "="*60
        return report.strip()

# Test the complete pipeline
print("Testing Complete Data Analysis Pipeline:")

# Create sample data with quality issues
sample_student_data = [
    {'name': 'alice johnson', 'score': 95, 'subject': 'math', 'attendance': 98},
    {'name': 'BOB SMITH', 'score': 87, 'subject': 'MATH', 'attendance': 85},
    {'name': '  charlie brown  ', 'score': 92, 'subject': 'math', 'attendance': 90},
    {'name': '', 'score': 78, 'subject': 'math', 'attendance': 95},  # Missing name
    {'name': 'diana prince', 'score': -5, 'subject': 'math', 'attendance': 88},  # Invalid score
    {'name': 'eve wilson', 'score': 91, 'subject': 'math', 'attendance': 92},
]

# Run complete analysis
pipeline = DataAnalysisPipeline("Student Performance Analysis")
records_loaded = pipeline.load_data(sample_student_data)
clean_records = pipeline.clean_data()
analysis = pipeline.analyze()
insights = pipeline.generate_insights()
report = pipeline.generate_report()

print(f"\nPipeline Results:")
print(f"Raw records: {records_loaded}, Clean records: {clean_records}")
print(f"Quality score: {analysis['data_quality']['quality_score']:.1f}%")
print(f"Generated {len(insights)} insights")

print(f"\n{report}")

print("\n✅ Complete Data Analysis Pipeline: MASTERED!")
print("🎓 You can now build production-ready data analysis systems!")

## 🏆 Mastery Assessment Challenge

**Prove your mastery with this comprehensive challenge:**

In [None]:
# COMPREHENSIVE MASTERY CHALLENGE
print("🏆 COMPREHENSIVE MASTERY CHALLENGE")
print("=" * 50)

print("""
🎯 THE ULTIMATE PYTHON CHALLENGE:

Build a COMPLETE DATA SCIENCE SYSTEM in 30 minutes that:

✅ WEEK 1 SKILLS:
   • Uses all major data types (strings, ints, floats, lists, dicts)
   • Demonstrates string processing and formatting
   • Shows mastery of list and dictionary operations

✅ WEEK 2 SKILLS:
   • Uses complex conditionals for business logic
   • Implements loops for data processing
   • Organizes code with well-designed functions

✅ WEEK 3 SKILLS:
   • Handles file I/O with error management
   • Processes real data with cleaning/validation
   • Generates actionable business insights

🏗️ SYSTEM REQUIREMENTS:
   1. Load data from CSV file
   2. Clean and validate data quality
   3. Perform statistical analysis
   4. Generate business recommendations
   5. Export results to JSON
   6. Handle all errors gracefully
   7. Include comprehensive documentation

📊 SAMPLE SCENARIO: E-COMMERCE SALES ANALYSIS
   • Analyze product sales performance
   • Identify top customers and products
   • Calculate revenue trends
   • Generate inventory recommendations
   • Predict future sales patterns
""")

# Create challenge dataset
import csv
import random
from datetime import datetime, timedelta

def create_challenge_dataset():
    """Create the challenge dataset"""
    print("\n🔧 Creating challenge dataset...")
    
    products = ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Tablet', 'Phone', 'Headphones']
    customers = ['Alice Smith', 'Bob Johnson', 'Charlie Brown', 'Diana Prince', 'Eve Wilson']
    
    challenge_data = []
    base_date = datetime.now() - timedelta(days=30)
    
    for i in range(50):  # 50 sales records
        date = (base_date + timedelta(days=random.randint(0, 30))).strftime('%Y-%m-%d')
        customer = random.choice(customers)
        product = random.choice(products)
        quantity = random.randint(1, 5)
        price = random.uniform(20, 1000)
        
        # Add some data quality issues
        if i % 10 == 0:  # Every 10th record has issues
            if i % 20 == 0:
                customer = ''  # Missing customer
            else:
                quantity = -1  # Invalid quantity
        
        challenge_data.append({
            'date': date,
            'customer': customer,
            'product': product,
            'quantity': quantity,
            'price': round(price, 2),
            'total': round(quantity * price, 2)
        })
    
    # Save challenge dataset
    with open('challenge_sales_data.csv', 'w', newline='') as file:
        fieldnames = ['date', 'customer', 'product', 'quantity', 'price', 'total']
        writer = csv.DictWriter(file, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(challenge_data)
    
    print(f"✅ Challenge dataset created: {len(challenge_data)} records in 'challenge_sales_data.csv'")
    print("🎯 Dataset includes data quality issues for you to handle!")

# Create the challenge dataset
create_challenge_dataset()

print("""
⏰ YOUR 30-MINUTE CHALLENGE STARTS NOW!

🎯 SUCCESS CRITERIA:
   • System processes all data without crashing
   • Generates meaningful business insights
   • Exports results to JSON format
   • Code is well-organized with functions
   • Includes proper error handling
   • Documentation explains your approach

💡 HINTS:
   • Start with data loading and cleaning
   • Build incrementally - test each function
   • Focus on the most important insights first
   • Remember: working code beats perfect code

🚀 GO! Show us your Python mastery!
""")

# Challenge implementation space
print("\n" + "="*60)
print("YOUR CHALLENGE IMPLEMENTATION GOES BELOW:")
print("="*60)

# TODO: Implement your complete data science system here!
# Use all the skills you've learned across the 3 weeks

class MasteryChallengeSystem:
    """Your mastery challenge implementation"""
    
    def __init__(self):
        # TODO: Initialize your system
        pass
    
    def load_and_clean_data(self, filename):
        """TODO: Load and clean the challenge dataset"""
        pass
    
    def analyze_sales_performance(self):
        """TODO: Analyze sales performance"""
        pass
    
    def generate_insights(self):
        """TODO: Generate business insights"""
        pass
    
    def export_results(self, filename):
        """TODO: Export results to JSON"""
        pass

# TODO: Run your system
# system = MasteryChallengeSystem()
# system.load_and_clean_data('challenge_sales_data.csv')
# system.analyze_sales_performance()
# insights = system.generate_insights()
# system.export_results('challenge_results.json')
# print(insights)

print("\n⏰ Timer starts when you begin implementing!")
print("🏆 Show us what you've learned!")

## 🚀 Final Project Launch

**Your capstone project - the culmination of your Python journey!**

In [None]:
# FINAL PROJECT INTRODUCTION
print("🚀 FINAL PROJECT LAUNCH")
print("=" * 50)

print("""
🎓 CONGRATULATIONS! You're ready for your CAPSTONE PROJECT!

This is your chance to:
   ✨ Demonstrate mastery of all Python fundamentals
   ✨ Solve a real business problem with data
   ✨ Build a portfolio-worthy project
   ✨ Show your growth from beginner to data scientist

🎯 PROJECT OVERVIEW:
   • Duration: 1 week (work at your own pace)
   • Individual project with instructor support
   • Choice of 3 compelling business scenarios
   • Real datasets with genuine challenges
   • Professional presentation of findings

📊 WHAT YOU'LL BUILD:
   1. Complete data analysis system from scratch
   2. Professional data processing pipeline
   3. Business intelligence dashboard data
   4. Executive summary with recommendations
   5. Technical documentation for your code
   6. Presentation-ready insights and visuals

🏆 PROJECT THEMES TO CHOOSE FROM:
""")

project_themes = {
    "Healthcare Analytics": {
        "description": "Analyze patient outcomes and hospital efficiency",
        "business_value": "Improve patient care and reduce costs",
        "datasets": "Patient records, treatment outcomes, resource utilization",
        "skills_focus": "Data cleaning, statistical analysis, outcome prediction",
        "deliverables": "Risk assessment model, efficiency recommendations"
    },
    
    "Financial Services": {
        "description": "Credit risk analysis and fraud detection system",
        "business_value": "Reduce financial losses and improve decision-making",
        "datasets": "Transaction history, credit scores, demographic data",
        "skills_focus": "Pattern recognition, anomaly detection, risk modeling",
        "deliverables": "Fraud detection rules, credit scoring model"
    },
    
    "Retail Intelligence": {
        "description": "Customer behavior and inventory optimization",
        "business_value": "Increase sales and improve customer satisfaction",
        "datasets": "Sales history, customer demographics, inventory data",
        "skills_focus": "Customer segmentation, demand forecasting, optimization",
        "deliverables": "Customer insights, inventory recommendations"
    }
}

for theme, details in project_themes.items():
    print(f"\n🔹 {theme.upper()}:")
    print(f"   📋 Description: {details['description']}")
    print(f"   💼 Business Value: {details['business_value']}")
    print(f"   📊 Data Sources: {details['datasets']}")
    print(f"   🎯 Skills Focus: {details['skills_focus']}")
    print(f"   📈 Key Deliverables: {details['deliverables']}")

print("""

📋 PROJECT REQUIREMENTS:

🔧 TECHNICAL REQUIREMENTS:
   • Data loading with comprehensive error handling
   • Data cleaning and quality validation
   • Statistical analysis with Python fundamentals
   • Business intelligence generation
   • Results export to multiple formats (JSON, CSV)
   • Professional code documentation
   • Comprehensive testing and validation

📊 ANALYSIS REQUIREMENTS:
   • Exploratory data analysis with insights
   • Key performance indicators (KPIs)
   • Trend analysis and pattern identification
   • Business recommendations with data support
   • Risk assessment and opportunity identification
   • Data visualization preparation (dashboard-ready)

📝 DELIVERABLES:
   1. Complete Python analysis system (Jupyter notebook)
   2. Executive summary report (PDF/Markdown)
   3. Technical documentation (README with setup)
   4. Data quality assessment report
   5. Business recommendations presentation
   6. Code portfolio ready for job interviews

⏰ TIMELINE:
   • Week 4: Project planning and data exploration
   • Week 5: Core development and analysis
   • Week 6: Testing, documentation, and presentation prep
   • Week 7: Final presentations and peer review

🎯 SUCCESS CRITERIA:
   • System processes real data without errors
   • Generates actionable business insights
   • Code demonstrates mastery of Python fundamentals
   • Professional presentation suitable for stakeholders
   • Portfolio-ready project for job applications
""")

print("\n🎉 YOUR FINAL PROJECT AWAITS!")
print("This is your chance to shine and show the world what you've learned!")
print("\n📋 Next steps: Choose your theme and begin planning!")

## 💼 Career Guidance & Next Steps

**Your path from here to data science success:**

In [None]:
# CAREER ROADMAP AND GUIDANCE
print("💼 YOUR DATA SCIENCE CAREER ROADMAP")
print("=" * 50)

career_path = {
    "IMMEDIATE (Next 1-2 months)": {
        "technical_skills": [
            "Master pandas for data manipulation",
            "Learn NumPy for numerical computing",
            "Introduction to data visualization (Matplotlib, Seaborn)",
            "Basic SQL for database interaction",
            "Git version control for code management"
        ],
        "projects": [
            "Complete 2-3 guided pandas tutorials",
            "Recreate your Python projects using pandas",
            "Build a simple web scraping project",
            "Create interactive visualizations"
        ],
        "portfolio": [
            "Create GitHub account and upload projects",
            "Write professional README files",
            "Document your learning journey",
            "Start a data science blog or LinkedIn articles"
        ]
    },
    
    "SHORT-TERM (3-6 months)": {
        "technical_skills": [
            "Advanced pandas and data wrangling",
            "Machine learning basics (scikit-learn)",
            "Statistical analysis and hypothesis testing",
            "Web development basics (Flask/FastAPI)",
            "Cloud platforms introduction (AWS/GCP)"
        ],
        "projects": [
            "Build 3-5 end-to-end ML projects",
            "Create a web-based data dashboard",
            "Participate in Kaggle competitions",
            "Contribute to open-source projects"
        ],
        "career_prep": [
            "Network with data professionals",
            "Attend data science meetups/conferences",
            "Practice technical interviews",
            "Apply for junior data analyst positions"
        ]
    },
    
    "MEDIUM-TERM (6-12 months)": {
        "technical_skills": [
            "Deep learning frameworks (TensorFlow/PyTorch)",
            "Advanced SQL and database design",
            "Big data tools (Spark, Hadoop)",
            "MLOps and model deployment",
            "Specialized domains (NLP, Computer Vision)"
        ],
        "career_goals": [
            "Secure first data science role",
            "Build reputation in data community",
            "Mentor other aspiring data scientists",
            "Speak at conferences or meetups"
        ],
        "specialization": [
            "Choose specialization (ML Engineer, Data Analyst, etc.)",
            "Develop domain expertise (healthcare, finance, etc.)",
            "Build advanced portfolio projects",
            "Consider advanced degrees or certifications"
        ]
    }
}

for timeframe, goals in career_path.items():
    print(f"\n🎯 {timeframe}:")
    for category, items in goals.items():
        print(f"\n   {category.replace('_', ' ').title()}:")
        for item in items:
            print(f"   • {item}")

print("""

💰 SALARY EXPECTATIONS:
   🥉 Junior Data Analyst: $45,000 - $65,000
   🥈 Data Scientist I: $70,000 - $95,000
   🥇 Senior Data Scientist: $100,000 - $150,000
   🏆 Principal/Staff: $150,000 - $250,000+

🎯 JOB TITLES TO TARGET:
   📊 Entry Level: Data Analyst, Junior Data Scientist, Business Analyst
   📈 Mid Level: Data Scientist, ML Engineer, Analytics Manager
   🚀 Senior Level: Senior Data Scientist, Head of Analytics, Chief Data Officer

🏢 COMPANIES HIRING:
   💼 Tech: Google, Microsoft, Amazon, Meta, Netflix
   🏦 Finance: JPMorgan, Goldman Sachs, Bank of America
   🏥 Healthcare: Johnson & Johnson, Pfizer, UnitedHealth
   🛒 Retail: Walmart, Target, Amazon, Shopify
   🚗 Automotive: Tesla, Ford, GM, Uber, Lyft

📚 LEARNING RESOURCES:

   🆓 FREE RESOURCES:
   • Kaggle Learn (free courses)
   • YouTube: 3Blue1Brown, StatQuest, Corey Schafer
   • GitHub: Awesome Data Science lists
   • Papers With Code (latest ML research)
   • Google Colab (free GPU access)

   💳 PAID RESOURCES:
   • DataCamp, Coursera, edX specializations
   • Udacity Data Science Nanodegree
   • Fast.ai courses
   • O'Reilly Learning Platform
   • Pluralsight technology tracks

🤝 NETWORKING & COMMUNITY:
   • Join local data science meetups
   • Participate in Kaggle competitions
   • Contribute to open-source projects
   • Follow data science influencers on LinkedIn/Twitter
   • Attend conferences (PyData, Strata, KDD)

🎯 KEY SUCCESS FACTORS:
   ✨ Never stop learning - technology evolves rapidly
   ✨ Build projects that solve real problems
   ✨ Communicate insights clearly to non-technical audiences
   ✨ Stay curious and ask great questions
   ✨ Network authentically and help others
   ✨ Embrace failure as learning opportunities

🚀 REMEMBER: You already have the foundation!
   Everything else builds on the Python fundamentals you've mastered.
   You're not starting from zero - you're accelerating from a strong base!
""")

print("\n💪 You have everything you need to succeed!")
print("🌟 Your data science career starts now!")

## 🎉 Final Celebration & Reflection

**Take a moment to appreciate your incredible journey:**

In [None]:
# FINAL CELEBRATION AND REFLECTION
print("🎉 FINAL CELEBRATION & COURSE REFLECTION")
print("=" * 60)

print("""
🏆 CONGRATULATIONS! You've completed an EXTRAORDINARY journey!

📈 YOUR TRANSFORMATION:
   • 3 weeks ago: Complete Python beginner
   • Today: Confident data scientist ready for real-world challenges
   • Skills gained: 50+ Python concepts and techniques
   • Projects built: 6+ complete data analysis systems
   • Lines of code written: 1000+ lines of professional Python

🎯 WHAT MAKES YOU SPECIAL:
   ✨ You didn't just learn syntax - you developed data intuition
   ✨ You can solve real business problems with code
   ✨ You understand what happens 'under the hood' in data science
   ✨ You can communicate technical insights to business audiences
   ✨ You have the foundation for any advanced data science topic

🔧 TECHNICAL SKILLS MASTERED:
   • Variables, data types, and type conversion
   • String processing and text analysis
   • Lists, dictionaries, and complex data structures
   • Conditional logic and decision-making systems
   • Loops for efficient data processing
   • Functions for code organization and reusability
   • File I/O and data format handling (CSV, JSON)
   • Error handling and robust system design
   • Statistical analysis and insight generation
   • Complete data analysis pipeline development

💼 PROFESSIONAL SKILLS GAINED:
   • Problem decomposition and systematic thinking
   • Data quality assessment and cleaning
   • Business communication and stakeholder reporting
   • Project planning and execution
   • Testing, debugging, and quality assurance
   • Technical documentation and code comments
   • Presentation skills for technical audiences

🚀 WHAT YOU'RE READY FOR:
   • Junior Data Analyst positions
   • Advanced Python libraries (pandas, NumPy, scikit-learn)
   • Machine learning and statistical modeling
   • Database integration and SQL
   • Web development and API integration
   • Cloud computing and big data platforms
   • Specialized domains (NLP, Computer Vision, etc.)

📊 BY THE NUMBERS:
   🕒 Hours invested: 30+ hours of dedicated learning
   📝 Concepts mastered: 50+ Python fundamentals
   🏗️ Projects completed: 6+ end-to-end systems
   💾 Code written: 1000+ lines of professional Python
   🎯 Skills assessment: From 0% to 85%+ Python proficiency
   💡 'Aha!' moments: Countless discoveries and breakthroughs
   🌟 Confidence gained: Immeasurable!

🎓 WHAT SETS YOU APART:
   Unlike typical programming courses that focus on syntax,
   you've learned to THINK like a data scientist:
   
   • You approach problems systematically
   • You question data quality and validate assumptions
   • You focus on business value, not just technical complexity
   • You communicate insights that drive decisions
   • You build robust systems that handle real-world messiness

💝 PERSONAL MESSAGE:
   The transformation you've achieved in just 3 weeks is remarkable.
   You've proven that with dedication, curiosity, and the right
   guidance, anyone can become a data scientist.
   
   You didn't just learn Python - you developed a new way of
   thinking about problems, data, and solutions. This mindset
   will serve you throughout your entire career.
   
   Remember: every expert was once a beginner. You've taken the
   hardest step - getting started. Now you have the foundation
   to build whatever you can imagine.

🌟 YOUR FUTURE IS BRIGHT:
   • The data science field is growing rapidly
   • Companies desperately need skilled data professionals
   • You have both technical skills AND business acumen
   • Your foundation is solid and will serve you for years
   • The community is supportive and collaborative

🎯 FINAL WORDS OF WISDOM:
   1. Keep coding - use it or lose it!
   2. Build projects that excite you
   3. Don't be afraid to tackle challenging problems
   4. Help others who are starting their journey
   5. Stay curious and never stop learning
   6. Remember: you're a data scientist now!

🎊 THANK YOU for being amazing students!
🚀 Go forth and change the world with data!
💫 The future of data science is in great hands with you!

""")

# Course completion certificate
from datetime import datetime

certificate = f"""
{'*'*70}
         🎓 CERTIFICATE OF COMPLETION 🎓
{'*'*70}

                    This certifies that

                    [YOUR NAME HERE]

              has successfully completed the

          PYTHON FUNDAMENTALS FOR DATA SCIENCE
                  Intensive Bootcamp

              Demonstrating mastery of:
         • Python Programming Fundamentals
         • Data Analysis and Processing
         • Business Intelligence Generation
         • Professional Development Practices

         Completion Date: {datetime.now().strftime('%B %d, %Y')}
         Course Duration: 3 Intensive Weeks
         Projects Completed: 6+ Data Science Systems

     🌟 You are now ready to tackle real-world data
         science challenges with confidence! 🌟

{'*'*70}
"""

print(certificate)

print("\n🎉 CONGRATULATIONS, DATA SCIENTIST!")
print("🚀 Your journey is just beginning!")
print("💫 Go change the world with your new superpowers!")

# Final stats display
print(f"\n📊 FINAL COURSE STATISTICS:")
print(f"   Course completed: {datetime.now().strftime('%B %d, %Y')}")
print(f"   Total lessons: 12 comprehensive sessions")
print(f"   Skills mastered: 50+ Python concepts")
print(f"   Projects built: 6+ complete systems")
print(f"   Ready for: Advanced data science topics")
print(f"   Career potential: $45,000+ starting salary")
print(f"   Pride level: MAXIMUM! 🏆")

## 🎓 Course Conclusion

**You did it! From Python beginner to data scientist in just 3 weeks!**

### 🏆 What You've Accomplished
This wasn't just a Python course - it was a complete transformation. You've developed:

- **💻 Technical Mastery**: All essential Python skills for data science
- **🧠 Analytical Thinking**: Problem-solving approaches that scale
- **📊 Data Intuition**: Understanding of how to extract insights
- **💼 Professional Skills**: Code organization, documentation, and communication
- **🚀 Career Readiness**: Portfolio projects and real-world experience

### 🌟 What Makes You Special
Unlike many who learn Python syntax, you learned to **think** like a data scientist:
- You question data quality and validate assumptions
- You focus on business value, not just technical complexity
- You build systems that handle real-world messiness
- You communicate insights that drive decisions

### 🎯 Your Next Steps
1. **Complete your final project** - make it portfolio-worthy!
2. **Master pandas and visualization** - build on your solid foundation
3. **Network and apply** - you're ready for junior data roles
4. **Keep building** - practice makes permanent
5. **Help others** - teach what you've learned

### 💝 A Personal Thank You
Teaching you has been an incredible privilege. Watching complete beginners transform into confident data scientists in just 3 weeks never stops being amazing. 

You've proven that with dedication, good instruction, and hands-on practice, anyone can master these skills. You should be incredibly proud of what you've accomplished.

### 🚀 Go Change the World
Data science is one of the most impactful fields in technology. You now have the skills to:
- Help healthcare systems save lives through better analysis
- Enable businesses to make smarter decisions
- Fight climate change with environmental data
- Advance scientific research across all domains
- Build the AI systems of tomorrow

**The world needs more thoughtful, skilled data scientists like you.**

---

### 🎊 Final Message

*"Three weeks ago, you couldn't write a simple Python program. Today, you can build complete data analysis systems that solve real business problems. This transformation is proof of your potential - there's no limit to what you can achieve when you set your mind to it."*

*"You're not just Python programmers now - you're data scientists, problem solvers, and future leaders in one of the world's most important fields."*

*"Go forth with confidence, curiosity, and compassion. The future of data science is in excellent hands with you!"*

**🎓 Congratulations, Data Scientists! 🎓**

**🚀 Your adventure is just beginning! 🚀**

---

## 📧 Stay Connected
- **Questions**: Always feel free to reach out
- **Updates**: Share your career progress
- **Projects**: Show off what you build next
- **Community**: Help the next cohort of students

**Once a student, always part of the family!** 💙