# NumPy Exercise: Weather Data Analysis Lab 🌤️
**Duration:** 30-40 minutes  
**Difficulty:** Beginner to Intermediate

## 📋 Exercise Overview
You are a junior data analyst at WeatherTech Analytics. Your supervisor has given you a dataset containing weather information for 5 major cities over one year. Your task is to analyze this data using NumPy to answer specific business questions and demonstrate your array programming skills.

## 🎯 Learning Goals
- Apply array creation and inspection techniques
- Use advanced indexing and slicing for data extraction  
- Implement vectorized operations for performance
- Utilize broadcasting for multi-dimensional calculations

## 🔧 Setup and Data Generation (Section 1)
Execute this cell to load all necessary libraries and generate the weather dataset.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import time

# Set random seed for consistent results
np.random.seed(42)

# Dataset parameters
cities = ['New York', 'London', 'Tokyo', 'Sydney', 'Mumbai']
days = 365
n_cities = len(cities)

# Generate weather data (provided by supervisor)
def generate_weather_data():
    base_temps = np.array([12, 10, 15, 18, 27])
    day_of_year = np.arange(1, days + 1)
    seasonal_factor = np.sin(2 * np.pi * (day_of_year - 80) / 365)
    
    temperature = base_temps[:, np.newaxis] + 15 * seasonal_factor + np.random.normal(0, 3, (n_cities, days))
    humidity = 70 - 0.3 * temperature + np.random.normal(0, 10, (n_cities, days))
    humidity = np.clip(humidity, 20, 100)
    
    precip_base = np.random.exponential(2, (n_cities, days))
    seasonal_precip = 1 + 0.5 * np.sin(2 * np.pi * (day_of_year - 120) / 365)
    precipitation = precip_base * seasonal_precip
    
    return temperature, humidity, precipitation

# Load the data
temp_data, humidity_data, precip_data = generate_weather_data()

print("Weather data loaded successfully!")
print(f"Temperature data shape: {temp_data.shape}")
print(f"Cities: {cities}")
print(f"Data covers {days} days for {n_cities} cities")
print("\nFirst 5 days of temperature data for each city:")
print(temp_data[:, :5].round(1))

## 📊 Data Inspection and Basic Analysis (Section 2)

### Task 2.1: Array Properties
Your supervisor wants a quick overview of the dataset properties.

**🎯 Your Task:** Complete the function to analyze array characteristics.

In [None]:
def analyze_dataset_properties(data_array, data_name):
    """
    Analyze and report basic properties of weather data array
    
    TODO: Fill in the missing parts using NumPy functions
    """
    print(f"\n=== {data_name} Dataset Analysis ===")
    
    # TODO: Print the array shape and number of dimensions
    shape = ___  # Your code here: use data_array.shape
    ndim = ___   # Your code here: use data_array.ndim
    print(f"Shape: {shape}, Dimensions: {ndim}")
    
    # TODO: Calculate and print min, max, mean, and standard deviation
    min_val = ___    # Your code here: use np.min()
    max_val = ___    # Your code here: use np.max()
    mean_val = ___   # Your code here: use np.mean()
    std_val = ___    # Your code here: use np.std()
    
    print(f"Range: {min_val:.2f} to {max_val:.2f}")
    print(f"Mean: {mean_val:.2f}, Std Dev: {std_val:.2f}")
    
    # TODO: Find which city has the highest average for this metric
    city_averages = ___  # Calculate mean along axis=1 (across days for each city)
    highest_city_index = ___  # Find index of maximum value using np.argmax()
    highest_city = cities[highest_city_index]
    
    print(f"Highest average {data_name.lower()}: {highest_city}")
    
    return city_averages

# TODO: Test your function with the temperature and humidity data
# Uncomment the lines below after completing the function:
# temp_averages = analyze_dataset_properties(temp_data, "Temperature")
# humidity_averages = analyze_dataset_properties(humidity_data, "Humidity")

### Task 2.2: Data Quality Check
Check for any unusual values in the dataset.

**🎯 Your Task:** Find and report potential data quality issues.

**💡 Hints:**
- Use boolean indexing: `array > threshold` creates a boolean array
- Count True values with `np.sum(boolean_array)` 
- For multiple conditions, use `&` (and) and `|` (or)

In [None]:
def check_data_quality():
    """
    Check for unusual values that might indicate data quality issues
    """
    print("\n=== Data Quality Check ===")
    
    # TODO: Check for negative temperatures (unusual but possible)
    negative_temps = ___  # Create boolean array: temp_data < 0
    negative_count = ___  # Count how many: np.sum(negative_temps)
    print(f"Negative temperatures found: {negative_count}")
    
    # TODO: Check for humidity outside normal range (0-100%)
    invalid_humidity = ___  # Boolean array: (humidity_data < 0) | (humidity_data > 100)
    invalid_humidity_count = ___  # Count invalid values
    print(f"Invalid humidity values: {invalid_humidity_count}")
    
    # TODO: Find days with extreme precipitation (> 50mm)
    extreme_rain = ___  # Boolean array: precip_data > 50
    extreme_rain_count = ___  # Count extreme rain days
    print(f"Days with extreme precipitation (>50mm): {extreme_rain_count}")
    
    # TODO: Calculate what percentage of all data points are "extreme"
    total_data_points = ___  # Total number: temp_data.size (or shape[0] * shape[1])
    extreme_percentage = (negative_count + invalid_humidity_count + extreme_rain_count) / total_data_points * 100
    print(f"Percentage of extreme values: {extreme_percentage:.2f}%")

# TODO: Run your quality check after completing the function:
# check_data_quality()

## 🔍 Advanced Indexing and Slicing (Section 3)

### Task 3.1: Seasonal Analysis
The marketing team needs seasonal weather patterns for campaign planning.

**🎯 Your Task:** Extract and analyze seasonal data using array slicing.

In [None]:
def extract_seasonal_data(data_array, season_name):
    """
    Extract data for a specific season using advanced indexing
    
    Seasons defined as (approximate):
    - Winter: Dec, Jan, Feb = days [0:59] + [334:365] 
    - Spring: Mar, Apr, May = days [59:151] 
    - Summer: Jun, Jul, Aug = days [151:243]
    - Autumn: Sep, Oct, Nov = days [243:334]
    """
    
    if season_name.lower() == 'winter':
        # TODO: Combine winter months using np.concatenate
        # Hint: winter_data = np.concatenate([data_array[:, 334:365], data_array[:, 0:59]], axis=1)
        seasonal_data = ___  # Your code here
    elif season_name.lower() == 'spring':
        seasonal_data = ___  # Use data_array[:, 59:151]
    elif season_name.lower() == 'summer':
        seasonal_data = ___  # Use data_array[:, 151:243]
    else:  # autumn
        seasonal_data = ___  # Use data_array[:, 243:334]
    
    return seasonal_data

# TODO: Use your function to analyze temperature patterns
# summer_temps = extract_seasonal_data(temp_data, 'summer')
# winter_temps = extract_seasonal_data(temp_data, 'winter')

# print(f"Summer temperature - Mean: {np.mean(summer_temps):.1f}°C")
# print(f"Winter temperature - Mean: {np.mean(winter_temps):.1f}°C")

# TODO: Calculate seasonal difference and find biggest swing city
# seasonal_difference = np.mean(summer_temps) - np.mean(winter_temps)
# city_summer_means = np.mean(summer_temps, axis=1)
# city_winter_means = np.mean(winter_temps, axis=1)
# city_seasonal_swings = city_summer_means - city_winter_means
# biggest_swing_city = np.argmax(city_seasonal_swings)

# print(f"Summer-Winter difference: {seasonal_difference:.1f}°C")
# print(f"City with biggest seasonal swing: {cities[biggest_swing_city]}")

### Task 3.2: Extreme Weather Detection
Find days with extreme weather conditions using boolean indexing.

**🎯 Your Task:** Identify extreme weather events for risk assessment.

**💡 Hints:**
- Use `np.percentile(array, 95)` for 95th percentile
- Boolean arrays can be summed: `np.sum(boolean_array)`
- For city-specific analysis: `boolean_array[city_index, :]`

In [None]:
def find_extreme_weather_days():
    """
    Use boolean indexing to find days with extreme conditions
    """
    print("\n=== Extreme Weather Detection ===")
    
    # TODO: Define extreme conditions using boolean arrays
    # Very hot: temperature > 95th percentile
    hot_threshold = np.percentile(temp_data, 95)
    very_hot_days = ___  # temp_data > hot_threshold
    
    # Very cold: temperature < 5th percentile  
    cold_threshold = np.percentile(temp_data, 5)
    very_cold_days = ___  # temp_data < cold_threshold
    
    # Very humid: humidity > 85%
    very_humid_days = ___  # humidity_data > 85
    
    # Heavy rain: precipitation > 90th percentile
    rain_threshold = np.percentile(precip_data, 90)
    heavy_rain_days = ___  # precip_data > rain_threshold
    
    # TODO: For each city, count extreme weather days
    for i, city in enumerate(cities):
        # Count extreme days for this city (use boolean array slicing)
        hot_count = ___      # np.sum(very_hot_days[i, :])
        cold_count = ___     # np.sum(very_cold_days[i, :])
        humid_count = ___    # np.sum(very_humid_days[i, :])
        rain_count = ___     # np.sum(heavy_rain_days[i, :])
        
        total_extreme = hot_count + cold_count + humid_count + rain_count
        
        print(f"{city}: {total_extreme} extreme days ({hot_count} hot, {cold_count} cold, "
              f"{humid_count} humid, {rain_count} rainy)")
    
    # TODO: Find the single worst weather day (most cities with extreme conditions)
    # Hint: Sum boolean arrays across cities (axis=0) for each day
    daily_extreme_count = ___  # Sum of all extreme conditions across cities per day
    worst_day = ___            # np.argmax(daily_extreme_count)
    
    print(f"\nWorst weather day: Day {worst_day + 1} ({daily_extreme_count[worst_day]} cities affected)")
    
    return very_hot_days, very_cold_days, very_humid_days, heavy_rain_days

# TODO: Run your extreme weather analysis:
# extreme_conditions = find_extreme_weather_days()

## ⚡ Universal Functions (Ufuncs) and Vectorization (Section 4a)

### Task 4.1: Temperature Unit Conversions
The international team needs temperature data in different units.

**🎯 Your Task:** Create efficient temperature conversion functions using vectorized operations.

**📚 Workshop Connection:** This demonstrates universal functions (ufuncs) - the foundation of NumPy's speed and efficiency.

In [None]:
def convert_temperatures(celsius_data):
    """
    Convert temperature data to Fahrenheit and Kelvin using vectorized operations
    """
    
    # TODO: Convert to Fahrenheit: F = C × 9/5 + 32
    fahrenheit_data = ___  # celsius_data * 9/5 + 32 (works on entire array!)
    
    # TODO: Convert to Kelvin: K = C + 273.15  
    kelvin_data = ___      # celsius_data + 273.15
    
    return fahrenheit_data, kelvin_data

# TODO: Test your conversion functions
# temp_f, temp_k = convert_temperatures(temp_data)

# print("Temperature Conversion Results:")
# print(f"Celsius range: {np.min(temp_data):.1f}°C to {np.max(temp_data):.1f}°C")
# print(f"Fahrenheit range: {np.min(temp_f):.1f}°F to {np.max(temp_f):.1f}°F") 
# print(f"Kelvin range: {np.min(temp_k):.1f}K to {np.max(temp_k):.1f}K")

### Task 4.2: Weather Comfort Index Calculation
Calculate a custom comfort index using multiple weather factors.

**🎯 Your Task:** Create a vectorized comfort index calculation.

**💡 Hints:**
- Use `np.abs()` for absolute values
- Use `np.maximum(0, array)` to ensure no negative values
- Use `np.clip(array, min_val, max_val)` to constrain ranges

In [None]:
def calculate_comfort_index(temp, humidity, precipitation):
    """
    Calculate weather comfort index using vectorized operations
    
    Comfort formula:
    - Start with base score of 100
    - Subtract penalty for temperature deviation from 22°C: |temp - 22| × 2
    - Subtract penalty for high humidity: max(0, humidity - 65) × 0.5  
    - Subtract penalty for precipitation: precipitation × 3
    - Final score should be between 0 and 100
    """
    
    # TODO: Calculate temperature penalty (deviation from ideal 22°C)
    temp_penalty = ___  # np.abs(temp - 22) * 2
    
    # TODO: Calculate humidity penalty (anything above 65% is uncomfortable)
    humidity_penalty = ___  # np.maximum(0, humidity - 65) * 0.5
    
    # TODO: Calculate precipitation penalty
    precip_penalty = ___  # precipitation * 3
    
    # TODO: Calculate final comfort score
    comfort_scores = ___  # 100 - temp_penalty - humidity_penalty - precip_penalty
    
    # TODO: Ensure scores stay between 0 and 100
    comfort_scores = ___  # np.clip(comfort_scores, 0, 100)
    
    return comfort_scores

# TODO: Calculate comfort scores for all cities and days
# comfort_index = calculate_comfort_index(temp_data, humidity_data, precip_data)

# TODO: Find the most and least comfortable cities on average
# avg_comfort = np.mean(comfort_index, axis=1)  # Average across days for each city
# most_comfortable_city = np.argmax(avg_comfort)
# least_comfortable_city = np.argmin(avg_comfort)

# print(f"Most comfortable city: {cities[most_comfortable_city]} (avg: {avg_comfort[most_comfortable_city]:.1f})")
# print(f"Least comfortable city: {cities[least_comfortable_city]} (avg: {avg_comfort[least_comfortable_city]:.1f})")

# TODO: Find the most comfortable day across all cities
# daily_avg_comfort = np.mean(comfort_index, axis=0)  # Average across cities for each day
# best_day = np.argmax(daily_avg_comfort)
# print(f"Most comfortable day overall: Day {best_day + 1} (comfort: {daily_avg_comfort[best_day]:.1f})")

## 🚀 Broadcasting and Advanced Operations (Section 4b)

### Task 4b.1: Cross-City Weather Comparison
Compare weather patterns between all city pairs using broadcasting.

**🎯 Your Task:** Use broadcasting to create comparison matrices.

**📚 Workshop Connection:** Broadcasting enables operations between arrays of different shapes - a key concept that makes ufuncs so powerful.

**💡 Hints:**
- For broadcasting: try `array[:, np.newaxis] - array` to create a matrix
- Use `np.fill_diagonal(matrix, value)` to set diagonal values
- Use `np.unravel_index()` to convert flat index to 2D coordinates

In [None]:
def create_city_comparison_matrix():
    """
    Create matrices comparing weather metrics between all city pairs
    """
    
    # TODO: Calculate average temperature for each city
    avg_temps = ___  # np.mean(temp_data, axis=1) - shape should be (5,)
    
    # TODO: Use broadcasting to create temperature difference matrix
    # Hint: reshape to (5,1) and subtract from (5,) to get (5,5) matrix
    temp_diff_matrix = ___  # avg_temps[:, np.newaxis] - avg_temps
    
    print("Temperature differences between cities (rows - columns):")
    print("Cities:", [city[:3] for city in cities])
    
    # Display the matrix
    for i, city in enumerate(cities):
        row_values = [f"{temp_diff_matrix[i, j]:5.1f}" for j in range(len(cities))]
        print(f"{city[:3]}: [{', '.join(row_values)}]")
    
    # TODO: Find which two cities have the most similar average temperatures
    # Hint: Look for the smallest non-zero absolute difference
    abs_diff = ___  # np.abs(temp_diff_matrix)
    # Set diagonal to large number to ignore same-city comparisons
    np.fill_diagonal(abs_diff, 999)  
    min_diff_idx = ___  # np.argmin(abs_diff)
    
    city1_idx, city2_idx = np.unravel_index(min_diff_idx, abs_diff.shape)
    print(f"\nMost similar temperatures: {cities[city1_idx]} and {cities[city2_idx]}")
    print(f"Temperature difference: {abs_diff[city1_idx, city2_idx]:.2f}°C")
    
    return temp_diff_matrix

# TODO: Run your comparison analysis:
# comparison_matrix = create_city_comparison_matrix()

### Task 4b.2: Aggregations and Performance Optimization
Compare vectorized vs loop-based operations to see NumPy's performance benefits.

**🎯 Your Task:** Implement the same calculation using loops and vectorization, then compare performance.

**📚 Workshop Connection:** This demonstrates how ufuncs, broadcasting, and aggregations work together for optimal performance.

In [None]:
def weather_score_with_loops(temp, humidity, precip):
    """
    Calculate weather scores using Python loops (SLOW METHOD)
    
    Score = (temp × 0.4) + (100 - humidity) × 0.3 + (20 - min(precip, 20)) × 0.3
    """
    scores = np.zeros_like(temp)
    
    # TODO: Use nested loops to calculate scores
    for i in range(temp.shape[0]):        # Loop through cities
        for j in range(temp.shape[1]):    # Loop through days
            # TODO: Calculate score for temp[i,j], humidity[i,j], precip[i,j]
            temp_score = ___      # temp[i,j] * 0.4
            humidity_score = ___  # (100 - humidity[i,j]) * 0.3
            precip_score = ___    # (20 - min(precip[i,j], 20)) * 0.3
            
            scores[i, j] = temp_score + humidity_score + precip_score
    
    return scores

def weather_score_vectorized(temp, humidity, precip):
    """
    Calculate weather scores using vectorized operations (FAST METHOD)
    Same formula as above, but using NumPy operations
    """
    
    # TODO: Calculate the same scores using vectorized operations
    temp_score = ___      # temp * 0.4
    humidity_score = ___  # (100 - humidity) * 0.3
    precip_score = ___    # (20 - np.minimum(precip, 20)) * 0.3
    
    scores = temp_score + humidity_score + precip_score
    return scores

# TODO: Performance comparison (uncomment to test)
# print("\n=== Performance Comparison ===")

# # Use a subset of data for speed
# subset_temp = temp_data[:, :100]
# subset_humidity = humidity_data[:, :100]  
# subset_precip = precip_data[:, :100]

# # Time the loop version
# start_time = time.time()
# scores_loops = weather_score_with_loops(subset_temp, subset_humidity, subset_precip)
# loop_time = time.time() - start_time

# # Time the vectorized version  
# start_time = time.time()
# scores_vectorized = weather_score_vectorized(subset_temp, subset_humidity, subset_precip)
# vectorized_time = time.time() - start_time

# print(f"Loop method: {loop_time:.4f} seconds")
# print(f"Vectorized method: {vectorized_time:.4f} seconds") 
# print(f"Speedup: {loop_time/vectorized_time:.1f}x faster!")

# # TODO: Verify results are the same
# results_match = np.allclose(scores_loops, scores_vectorized)
# print(f"Results identical: {results_match}")

## 🏆 Final Challenge - Comprehensive Weather Analytics (Section 5)

### Task 5.1: Comprehensive City Ranking System
Put everything together to create a city ranking system.

**🎯 Your Task:** Create a comprehensive ranking system using all your NumPy skills: ufuncs, broadcasting, aggregations, and advanced indexing.

**📚 Workshop Connection:** This integrates all concepts from the training - the perfect capstone exercise!

In [None]:
def create_weather_ranking_system():
    """
    Create a comprehensive city ranking system using multiple weather criteria
    """
    print("\n=== FINAL CHALLENGE: City Weather Ranking ===")
    
    # TODO: Create scoring criteria (each should be 0-100 scale)
    scores = np.zeros((len(cities), 4))  # 4 criteria
    
    # Criteria 1: Temperature comfort (penalty for deviation from 20°C)
    avg_temps = ___  # np.mean(temp_data, axis=1)
    temp_deviations = np.abs(avg_temps - 20)
    scores[:, 0] = ___  # 100 - temp_deviations * 5 (adjust multiplier as needed)
    
    # Criteria 2: Humidity comfort (penalty for deviation from 60%)  
    avg_humidity = ___  # np.mean(humidity_data, axis=1)
    humidity_deviations = np.abs(avg_humidity - 60)
    scores[:, 1] = ___  # 100 - humidity_deviations * 2
    
    # Criteria 3: Low precipitation (less rain = higher score)
    avg_precip = ___  # np.mean(precip_data, axis=1)
    scores[:, 2] = ___  # 100 - avg_precip * 10 (adjust as needed)
    
    # Criteria 4: Weather stability (lower temperature variance = higher score)
    temp_variance = ___  # np.var(temp_data, axis=1)
    scores[:, 3] = ___  # 100 - temp_variance * 2
    
    # Ensure all scores are between 0 and 100
    scores = np.clip(scores, 0, 100)
    
    # TODO: Calculate weighted overall score using broadcasting
    weights = np.array([0.3, 0.25, 0.25, 0.2])  # Adjust weights as desired
    overall_scores = ___  # np.dot(scores, weights) or use broadcasting
    
    # TODO: Create city rankings (highest score = rank 1)
    ranking_indices = ___  # np.argsort(overall_scores)[::-1] (descending order)
    
    print("\n🏆 FINAL CITY WEATHER RANKINGS 🏆")
    print("="*50)
    
    for rank, city_idx in enumerate(ranking_indices):
        city = cities[city_idx]
        score = overall_scores[city_idx]
        temp = np.mean(temp_data[city_idx])
        humidity = np.mean(humidity_data[city_idx])
        precip = np.mean(precip_data[city_idx])
        
        print(f"{rank + 1}. {city:10} | Score: {score:5.1f}/100")
        print(f"   📊 Temp: {temp:5.1f}°C | Humidity: {humidity:4.1f}% | Rain: {precip:4.1f}mm/day")
        print()
    
    return overall_scores, ranking_indices

# TODO: Run your final ranking system:
# final_scores, city_rankings = create_weather_ranking_system()

### Visualization
Create a simple chart showing the city rankings.

In [None]:
# TODO: Create a simple visualization (uncomment after completing the ranking system)
# plt.figure(figsize=(10, 6))
# plt.bar(range(len(cities)), final_scores, color=['skyblue', 'lightcoral', 'lightgreen', 'gold', 'plum'])
# plt.title('City Weather Comfort Rankings', fontsize=16, fontweight='bold')
# plt.xlabel('Cities')
# plt.ylabel('Overall Weather Score')
# plt.xticks(range(len(cities)), cities, rotation=45)
# plt.grid(axis='y', alpha=0.3)

# # Add score labels on bars
# for i, score in enumerate(final_scores):
#     plt.text(i, score + 1, f'{score:.1f}', ha='center', va='bottom', fontweight='bold')

# plt.tight_layout()
# plt.show()

# print("🎉 Congratulations! You've completed the Weather Analytics challenge!")

## ✅ Exercise Summary and Self-Check

### Check Your Understanding:
After completing the exercises, you should be able to answer:

1. **Array Creation & Inspection**: What's the difference between `array.shape` and `array.ndim`?
2. **Indexing**: Why do we use `array[:, indices]` instead of `array[indices]` for selecting columns?
3. **Vectorization**: How much faster were your vectorized operations compared to loops?
4. **Broadcasting**: What shape requirements must be met for broadcasting to work?

### Expected Performance Results:
- Vectorized operations should be **10-100x faster** than loops
- Memory usage should be efficient (no unnecessary copies)
- All boolean indexing should work without explicit loops

### Extension Challenges (If You Finish Early):
1. Add error handling for invalid weather data
2. Implement a weather prediction algorithm using historical patterns
3. Create additional visualizations showing seasonal trends
4. Calculate correlations between different weather metrics across cities

---

## 📝 Submission Checklist

Before you finish, ensure you have:
- [ ] Completed all TODO items in the code
- [ ] Verified your functions work with the provided test cases
- [ ] Achieved significant performance improvements with vectorization
- [ ] Successfully created the final city ranking system
- [ ] Generated the visualization chart

**Time Target**: 30-40 minutes total
**Key Skills Demonstrated**: Array operations, indexing, vectorization, broadcasting, performance optimization

---

*This exercise was designed to simulate real-world data analysis tasks using NumPy's core features. The skills you've practiced here are directly applicable to data science, scientific computing, and machine learning projects!*