In [2]:
import csv
from datetime import datetime
from collections import defaultdict

# Simulate Map step
year_data = defaultdict(lambda: {"max_sum": 0, "min_sum": 0, "count": 0})

with open("weather.csv", "r") as f:
    reader = csv.reader(f)
    next(reader)  # Skip header
    for row in reader:
        try:
            date = row[0]
            temp_max = float(row[2])
            temp_min = float(row[3])
            year = datetime.strptime(date, "%Y-%m-%d").year

            year_data[year]["max_sum"] += temp_max
            year_data[year]["min_sum"] += temp_min
            year_data[year]["count"] += 1
        except:
            continue

# Simulate Reduce step
hottest_year = None
coolest_year = None
max_avg_temp = float("-inf")
min_avg_temp = float("inf")

for year, data in year_data.items():
    avg_max = data["max_sum"] / data["count"]
    avg_min = data["min_sum"] / data["count"]

    if avg_max > max_avg_temp:
        max_avg_temp = avg_max
        hottest_year = year

    if avg_min < min_avg_temp:
        min_avg_temp = avg_min
        coolest_year = year

# Output
print(f"Hottest Year: {hottest_year} with avg max temperature {max_avg_temp:.2f}°C")
print(f"Coolest Year: {coolest_year} with avg min temperature {min_avg_temp:.2f}°C")


Hottest Year: 2015 with avg max temperature 17.43°C
Coolest Year: 2012 with avg min temperature 7.29°C


Absolutely! Here's the **simplest line-by-line explanation** of the code, designed to help you understand each part clearly:

---

### 📝 Code Breakdown

```python
import csv
from datetime import datetime
from collections import defaultdict
```
- ✅ We import:
  - `csv`: to read the CSV file.
  - `datetime`: to extract the year from dates.
  - `defaultdict`: makes creating and updating dictionary entries easier.

---

```python
# Simulate Map step
year_data = defaultdict(lambda: {"max_sum": 0, "min_sum": 0, "count": 0})
```
- ✅ We create a `defaultdict` called `year_data`.
- For each `year`, it will automatically store:
  - `max_sum`: total of all max temperatures in that year.
  - `min_sum`: total of all min temperatures in that year.
  - `count`: number of days of data available for that year.

---

```python
with open("weather.csv", "r") as f:
    reader = csv.reader(f)
    next(reader)  # Skip header
```
- ✅ Open the file `weather.csv` for reading.
- Create a `csv.reader` object to read it row by row.
- Skip the first line (header: `date,precipitation,temp_max,...`).

---

```python
    for row in reader:
        try:
            date = row[0]
            temp_max = float(row[2])
            temp_min = float(row[3])
```
- ✅ Loop through each row in the CSV file.
- Extract:
  - `date` → column 0.
  - `temp_max` → column 2 (converted to float).
  - `temp_min` → column 3 (converted to float).

---

```python
            year = datetime.strptime(date, "%Y-%m-%d").year
```
- ✅ Convert the `date` string into a date object.
- Extract the **year** (like `2012`) from it.

---

```python
            year_data[year]["max_sum"] += temp_max
            year_data[year]["min_sum"] += temp_min
            year_data[year]["count"] += 1
```
- ✅ For this `year`:
  - Add today's max temp to the total max.
  - Add today's min temp to the total min.
  - Increase the number of days by 1.

---

```python
        except:
            continue
```
- ✅ If anything goes wrong (like missing data), **skip** that row.

---

```python
# Simulate Reduce step
hottest_year = None
coolest_year = None
max_avg_temp = float("-inf")
min_avg_temp = float("inf")
```
- ✅ Prepare variables to store the final answers:
  - Hottest and coolest year.
  - Their respective average temperatures.

---

```python
for year, data in year_data.items():
    avg_max = data["max_sum"] / data["count"]
    avg_min = data["min_sum"] / data["count"]
```
- ✅ For each `year`, calculate:
  - `avg_max`: average max temperature.
  - `avg_min`: average min temperature.

---

```python
    if avg_max > max_avg_temp:
        max_avg_temp = avg_max
        hottest_year = year
```
- ✅ If this year’s avg max temp is the **highest so far**, update `hottest_year`.

---

```python
    if avg_min < min_avg_temp:
        min_avg_temp = avg_min
        coolest_year = year
```
- ✅ If this year’s avg min temp is the **lowest so far**, update `coolest_year`.

---

```python
# Output
print(f"Hottest Year: {hottest_year} with avg max temperature {max_avg_temp:.2f}°C")
print(f"Coolest Year: {coolest_year} with avg min temperature {min_avg_temp:.2f}°C")
```
- ✅ Finally, print the **hottest** and **coolest** years with their average temperatures.

---

Let me know if you'd like a diagram to explain the flow or a version that also finds the hottest **day** instead of **year**!