# Weather-Based Plant Recommendation Engine

Name: Zihan

### Workflow Overview

This notebook implements the core logic for a weather-based plant recommendation system. The end-to-end process is as follows:

1.  **Setup and Data Loading:**
    * Import necessary Python libraries (`requests`, `pandas`, `numpy`, `json`, `random`).
    * Load the pre-processed plant dataset from `Table13_GeneralPlantListforRecommendation.csv` into a pandas DataFrame.
    * Another version is that directly access the data from MySQL.
    * The `sunlight` column, originally a JSON string, is parsed back into a Python list for easier processing.

2.  **Weather Data Fetching and Aggregation:**
    * A function, `get_and_aggregate_weather_data`, is defined to handle interactions with the Open-Meteo API.
    * Given a specific `latitude` and `longitude`, this function fetches a 16-day forecast for six key weather variables: minimum/maximum temperature, sunshine duration, max UV index, precipitation sum, and mean relative humidity.
    * It then aggregates this daily data into a summarized dictionary of six key metrics (e.g., `extreme_min_temp`, `avg_sunshine_duration`), handling potential `null` values and rounding the results to two decimal places.

3.  **Core Matching Logic Definition:**
    * The main recommendation "brain" is a function called `is_plant_suitable`.
    * This function takes the aggregated weather dictionary and a single plant's data (a DataFrame row) as input.
    * It applies a series of sophisticated rules to determine suitability:
        * **Temperature Survival:** Compares the forecast's extreme minimum temperature against the plant's absolute minimum survival temperature.
        * **Sunlight Needs:** Checks if the average sunshine duration and UV index match the plant's requirements for "full sun" or "part shade".
        * **Watering Needs:** Matches the average daily precipitation with the plant's preference for frequent, average, or minimal watering.
        * **Drought Tolerance:** Uses the average relative humidity to assess if a non-drought-tolerant plant can survive in a dry-air environment.

4.  **Execution and Output:**
    * The main script sets the target coordinates.
    * It calls the weather function to get the summarized forecast.
    * It applies the `is_plant_suitable` function to every plant in the DataFrame to filter out unsuitable options.
    * The resulting list of suitable plant IDs is then **randomly shuffled** to ensure variety in the presentation.
    * Finally, the notebook outputs the two key results: the summarized weather information and the randomized list of recommended plant IDs.

## Step 1 (本地版本): 导入库、设置并加载植物数据
首先，我们导入所有需要的库，并加载我们之前创建的 Table13。一个重要的步骤是将 sunlight 列从JSON字符串转换回Python的列表格式，以便后续的逻辑判断。

In [23]:
import requests
import pandas as pd
import numpy as np
import json
import random 

# 设置Table13的文件路径
TABLE13_PATH = "Table13.csv"

# 加载植物数据
try:
    df_plants = pd.read_csv(TABLE13_PATH)
    # 关键一步：将sunlight列从字符串转换回列表
    df_plants['sunlight'] = df_plants['sunlight'].apply(json.loads)
    print("植物数据 Table13 加载并准备成功！")
    print(f"共加载 {len(df_plants)} 种植物。")
    df_plants.head()
except FileNotFoundError:
    print(f"错误：找不到植物数据文件，请确认路径是否正确: {TABLE13_PATH}")

植物数据 Table13 加载并准备成功！
共加载 1008 种植物。


## Step 1 (DB版本): 从MySQL数据库加载植物数据

In [1]:
# ==============================================================================
# Step 1: 导入库、连接数据库并加载植物数据
# ==============================================================================

# 导入所需库
import requests
import pandas as pd
import numpy as np
import json
import random
import mysql.connector # <-- 新增：导入MySQL连接库
from mysql.connector import Error

# --- 1. 数据库连接配置 ---
# 注意：在生产环境中，请勿将密码硬编码在代码中。
db_config = {
    'host': 'database-plantx.cqz06uycysiz.us-east-1.rds.amazonaws.com',
    'user': 'zihan',
    'password': '2002317Yzh12138.',
    'database': 'FIT5120_PlantX_Database',
    'use_pure': True,
    'charset': 'utf8mb4'
}

# --- 2. 从数据库加载数据 ---
try:
    # 建立数据库连接
    connection = mysql.connector.connect(**db_config)
    if connection.is_connected():
        print("成功连接到MySQL数据库。")
        
        # 定义SQL查询语句
        query = "SELECT * FROM Table13_GeneralPlantListforRecommendation;"
        
        # 使用pandas的read_sql功能直接将查询结果读入DataFrame
        df_plants = pd.read_sql(query, connection)
        
        print(f"成功从数据库加载 {len(df_plants)} 种植物。")

except Error as e:
    print(f"从MySQL加载数据时发生错误: {e}")
    # 创建一个空的DataFrame，以避免后续单元格出错
    df_plants = pd.DataFrame() 

finally:
    # 关闭数据库连接
    if 'connection' in locals() and connection.is_connected():
        connection.close()
        print("MySQL数据库连接已关闭。")

# --- 3. 数据后续处理 (与之前完全一样) ---
if not df_plants.empty:
    # 关键一步：将sunlight列从字符串转换回列表
    df_plants['sunlight'] = df_plants['sunlight'].apply(json.loads)
    
    # 将从数据库读出的0/1值转换为布尔值True/False，确保逻辑判断无误
    df_plants['drought_tolerant'] = df_plants['drought_tolerant'].astype(bool)
    
    print("\n植物数据 DataFrame 已准备就绪！")
    display(df_plants.head()) # 在Jupyter中，display()比print()更美观
else:
    print("\n未能成功加载植物数据，后续步骤可能无法执行。")

成功连接到MySQL数据库。


  df_plants = pd.read_sql(query, connection)


成功从数据库加载 1008 种植物。
MySQL数据库连接已关闭。

植物数据 DataFrame 已准备就绪！


Unnamed: 0,general_plant_id,sunlight,watering,drought_tolerant,absolute_min_temp_c
0,398,"[full sun, part shade]",Average,True,-17.8
1,399,"[full sun, part shade]",Average,False,-23.3
2,400,"[full sun, part shade]",Average,True,-28.9
3,401,"[Full sun, part shade]",Average,False,-23.3
4,402,"[full sun, part shade]",Average,True,-28.9


## Step 2: 定义数据获取与聚合函数
这个函数将负责调用Open-Meteo API，获取天气数据，并根据提前设计的逻辑进行聚合。它会特别注意处理数据末尾可能出现的null值。

In [2]:
def get_and_aggregate_weather_data(latitude, longitude):
    """
    根据经纬度获取未来16天天气预报，并聚合成我们需要的6个核心指标。
    """
    # 构建API请求URL
    api_url = (
        f"https://api.open-meteo.com/v1/forecast?latitude={latitude}&longitude={longitude}"
        "&daily=precipitation_sum,sunshine_duration,uv_index_max,temperature_2m_max,"
        "temperature_2m_min,relative_humidity_2m_mean&timezone=auto&forecast_days=16"
    )
    
    # 发起API请求
    try:
        response = requests.get(api_url)
        response.raise_for_status()  # 如果请求失败 (如 404, 500)，则会抛出异常
        weather_data = response.json()['daily']
    except requests.exceptions.RequestException as e:
        print(f"API请求失败: {e}")
        return None
    
    # --- 开始聚合数据 ---
    # 定义一个辅助函数来清洗列表中的null值
    def clean_list(data_list):
        return [item for item in data_list if item is not None]

    # 清洗所有数据列表
    temp_min_list = clean_list(weather_data['temperature_2m_min'])
    temp_max_list = clean_list(weather_data['temperature_2m_max'])
    sunshine_list = clean_list(weather_data['sunshine_duration'])
    uv_list = clean_list(weather_data['uv_index_max'])
    precipitation_list = clean_list(weather_data['precipitation_sum'])
    humidity_list = clean_list(weather_data['relative_humidity_2m_mean'])
    
    # 计算6个聚合指标
    aggregated_weather = {
        'extreme_min_temp': np.min(temp_min_list) if temp_min_list else None,
        'extreme_max_temp': np.max(temp_max_list) if temp_max_list else None,
        'avg_sunshine_duration': (np.mean(sunshine_list) / 3600) if sunshine_list else None, # 秒 -> 小时
        'avg_max_uv_index': np.mean(uv_list) if uv_list else None,
        'avg_daily_precipitation': np.mean(precipitation_list) if precipitation_list else None,
        'avg_relative_humidity': np.mean(humidity_list) if humidity_list else None
    }
    
    # --- 【修改之处】在返回前，将所有浮点数保留两位小数 ---
    formatted_weather = {
        key: round(value, 2) if isinstance(value, float) else value
        for key, value in aggregated_weather.items()
    }
    
    return formatted_weather

## Step 3: 定义核心匹配逻辑函数
这个函数是整个推荐系统的“大脑”。它将提前设计好的所有匹配规则代码化，判断单个植物是否符合聚合后的天气条件。

In [3]:
def is_plant_suitable(agg_weather, plant_row):
    """
    根据聚合天气数据和植物属性，判断该植物是否适合种植。
    返回 True (适合) 或 False (不适合)。
    """
    # 规则 1: 生存底线检查 (硬性过滤)
    if agg_weather['extreme_min_temp'] < plant_row['absolute_min_temp_c']:
        return False

    # 规则 2: 日照需求检查 (时长 + 强度)
    sun_duration = agg_weather['avg_sunshine_duration']
    uv_index = agg_weather['avg_max_uv_index']
    sunlight_needs = plant_row['sunlight']
    
    sun_duration_ok = False
    if 'full sun' in sunlight_needs and sun_duration >= 6: sun_duration_ok = True
    if 'part shade' in sunlight_needs or 'part sun/part shade' in sunlight_needs and (3 <= sun_duration < 6): sun_duration_ok = True
    if 'full shade' in sunlight_needs and sun_duration < 3: sun_duration_ok = True
    
    if not sun_duration_ok: return False
    
    # 强度优化判断
    if uv_index > 8 and sunlight_needs == ['part shade']: return False # 紫外线太强，不适合只耐半阴的
    if uv_index < 3 and sunlight_needs == ['full sun']: return False # 紫外线太弱，满足不了全日照

    # 规则 3: 浇水需求检查
    precipitation = agg_weather['avg_daily_precipitation']
    watering_needs = plant_row['watering']
    
    if watering_needs == 'Frequent' and precipitation < 3: return False
    if watering_needs == 'Minimal' and precipitation > 5: return False
    if watering_needs == 'Average' and not (1 <= precipitation <= 8): return False

    # 规则 4: 抗旱性检查
    humidity = agg_weather['avg_relative_humidity']
    is_drought_tolerant = plant_row['drought_tolerant']
    
    if not is_drought_tolerant and humidity < 40: return False

    # 如果所有检查都通过了
    return True

## Step 4: 执行所有流程并输出最终结果
我们指定一个经纬度，调用函数获取并聚合天气，然后应用匹配函数来筛选植物，最终打印出我们需要的两个结果。

### Test Locations for Plant Recommendation

To better test the recommendation system, here are several locations in Melbourne and regional Victoria with different geographical and climatic conditions. This allows for testing the matching logic's performance in various scenarios (e.g., coastal, inland, high-altitude).

| Location | Area/Climate Notes | Latitude | Longitude |
| :--- | :--- | :--- | :--- |
| **Melbourne CBD** | Melbourne City Centre, urban heat island effect | `-37.8136` | `144.9631` |
| **Geelong** | City of Geelong, Victoria's second-largest city, on the western bay | `-38.1471` | `144.3603` |
| **Mornington** | Mornington Peninsula, coastal area, mild climate | `-38.2167` | `145.0333` |
| **Olinda** | Dandenong Ranges, high-altitude area, generally cooler and wetter | `-37.8500` | `145.3667` |
| **Bendigo** | City of Bendigo, inland Victoria, generally drier, hot summers, cold winters | `-36.7570` | `144.2794` |
| **Mildura** | Mildura, northwestern Victoria, agricultural area, hot and dry climate | `-34.1847` | `142.1587` |

To use these samples, simply modify the `latitude` and `longitude` variables in the final execution cell and re-run it to see the recommendation results for different locations.

In [4]:
# --- 步骤 1: 设置目标经纬度 (以墨尔本为例) ---
latitude = -37.8136
longitude = 144.9631

# # --- 步骤 1: 设置目标经纬度 (以丹顿农山脉的Olinda为例) ---
# latitude = -37.8500
# longitude = 145.3667

In [5]:
# --- 步骤 2: 获取并聚合天气数据 ---
print(f"正在获取经纬度 ({latitude}, {longitude}) 的天气数据...")
aggregated_weather_info = get_and_aggregate_weather_data(latitude, longitude)

# --- 步骤 3: 筛选符合条件的植物 ---
if aggregated_weather_info and 'df_plants' in locals():
    # 使用 .apply 方法将匹配函数应用到每一行
    is_suitable_series = df_plants.apply(
        lambda row: is_plant_suitable(aggregated_weather_info, row),
        axis=1
    )
    
    # 获取筛选后的植物ID列表
    suitable_plant_ids = df_plants[is_suitable_series]['general_plant_id'].tolist()
    
    # --- 【修改之处】新增代码：将ID列表随机打乱 ---
    random.shuffle(suitable_plant_ids)
    
    # --- 步骤 4: 输出最终结果 ---
    print("\n" + "="*50)
    print("      未来16天聚合天气信息      ")
    print("="*50)
    for key, value in aggregated_weather_info.items():
        print(f"{key}: {value:.2f}" if isinstance(value, float) else f"{key}: {value}")
    
    print("\n" + "="*50)
    print("      符合种植条件的植物ID列表      ")
    print("="*50)
    if suitable_plant_ids:
        print(suitable_plant_ids)
    else:
        print("根据未来天气，未找到特别符合条件的植物。")
    print("="*50)

else:
    print("\n由于天气数据获取失败或植物数据未加载，无法进行推荐。")

正在获取经纬度 (-37.8136, 144.9631) 的天气数据...



      未来16天聚合天气信息      
extreme_min_temp: 6.40
extreme_max_temp: 28.30
avg_sunshine_duration: 8.89
avg_max_uv_index: 4.73
avg_daily_precipitation: 0.29
avg_relative_humidity: 67.13

      符合种植条件的植物ID列表      
[1174, 1167, 694, 1169, 1182, 1184, 1178, 725, 472, 1256, 979, 1339, 847, 1186, 540, 1337, 1151, 781, 975, 1014, 1152, 606, 1235, 1236, 728, 443, 985, 1223, 972, 833, 908, 721, 779, 1181, 1166, 1237, 803, 439, 613, 612, 986, 692, 1165, 1162, 1098, 610, 1175, 802, 607, 1096, 726, 1025, 617, 1176, 492, 1183, 618, 1171, 568, 1170, 848, 1153, 611, 651, 1173, 1168, 1021, 1179, 849]
