# Dataset

##  Dataset Overview
The dataset contains various sensor readings (like light intensity, temperature, humidity, etc, throughout the time series) along with a categorical label called Plant_Health_Status. Our goal is to understand the relationships among these features, visualize patterns, and explore potential clustering structures in the data that reflect different plant health conditions.

## Dataset Quick Info
Here we load the dataset and view the first few rows and the statistical summary (mean, std, min, max, etc.) for each numeric column. This helps confirm that all features are numeric and gives us a sense of their scales and ranges before preprocessing.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv("./data/plant_health_data.csv")
display(data.info())
display(data.head())
display(data.describe())

## Data Quality Check
We can verify whether or not there are any missing values in the dataset by checking if there is such entry by summing each entry with a `null` value.

In [None]:
print(data.isnull().sum())

From the result above, it does not seem like there are any missing values in the dataset. We can however clean them by dropping the rows.

In [None]:
# Drop missing values if any
data_clean = data.dropna()

## Data Visualization & Dimensionality Reduction (t-SNE)
We apply t-SNE to create a nonlinear visualization that projects the data into a reduced-dimensional space, helping us better understand the structure and relationships among data points.

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.manifold import TSNE
import seaborn as sns

numeric_cols = data.select_dtypes(include="number").columns
X = data[numeric_cols]

# Standardize
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# t-SNE
tsne = TSNE(n_components=2, perplexity=30, max_iter=1000, random_state=42)
X_tsne = tsne.fit_transform(X_scaled)
data["TSNE1"], data["TSNE2"] = X_tsne[:, 0], X_tsne[:, 1]

plt.figure(figsize=(8,6))
sns.scatterplot(
    data=data,
    x="TSNE1",
    y="TSNE2",
    hue="Plant_Health_Status",
    palette="deep",
    s=80,
    alpha=0.9
)
plt.title("t-SNE Projection of Plant Health Data")
plt.legend(title="Health Status")
plt.grid(True)
plt.show()

## Correlation / Feature Extraction

## Dimensionality Reduction (PCA)