# Vertical Farming

## Reinforcement Learning for Farming Operations

The goal of this project is to use Reinforcement Learning (RL) to automate and optimize farming operations.

#### Dataset

The key columns in the dataset include:

- Cube ID: Likely an identifier for the sensor or location where the data was collected.
- Timestamp: The time at which the data was recorded (e.g., 2016-01-01 00:00:01).
- Temperature Layer A and Temperature Layer B: Temperature readings from two different layers (e.g., soil layers or greenhouse zones).
- Humidity Layer A and Humidity Layer B: Humidity readings from two different layers.
- Door: A binary or numeric value indicating the state of a door (e.g., open or closed).

In [1]:
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
import gym
from gym import spaces
import numpy as np
import pandas as pd

## 1. Load and Inspect Data

In [2]:
vertical_farm = pd.read_csv('cleaned_cubes.csv')
vertical_farm.head()

Unnamed: 0,Cube ID,Timestamp,Temperature Layer A,Temperature Layer B,Door,Humidity Layer A,Humidity Layer B,Hour,DayOfWeek,Month,Temperature Diff,Humidity Diff,Temperature Layer A Rolling Avg,Humidity Layer A Rolling Avg
0,49,2016-01-01 00:00:01,21.721156,22.734969,0.0,9.374497,9.404221,0,4,1,-1.013812,-0.029724,21.721156,9.374497
1,95,2016-01-01 00:00:02,21.721156,25.711899,0.0,9.374497,9.404221,0,4,1,-3.990743,-0.029724,21.721156,9.374497
2,48,2016-01-01 00:00:02,21.721156,22.734969,0.0,9.374497,9.404221,0,4,1,-1.013812,-0.029724,21.721156,9.374497
3,55,2016-01-01 00:00:02,21.721156,22.734969,0.0,9.374497,8.594411,0,4,1,-1.013812,0.780086,21.721156,9.374497
4,90,2016-01-01 00:00:03,21.721156,22.734969,0.0,9.374497,9.404221,0,4,1,-1.013812,-0.029724,21.721156,9.374497


**Reinforcement Learning (RL) requires defining states, actions, and rewards**

This project uses the Proximal Policy Optimization (PPO) algorithm from the stable_baselines3 library. It provides detailed information about the training process, including performance metrics and hyperparameters. 

In [4]:
# Define a custom environment for farming operations
class FarmingEnv(gym.Env):
    def __init__(self, data):
        super(FarmingEnv, self).__init__()
        self.data = data
        self.current_step = 0

        # Exclude non-numeric columns (e.g., 'Timestamp') from the state
        self.state_columns = self.data.select_dtypes(include=[np.number]).columns
        self.state = self.data[self.state_columns].iloc[self.current_step]

        # Define action and observation spaces
        self.action_space = spaces.Discrete(3)  # Example: 3 actions (e.g., water, fertilize, do nothing)
        self.observation_space = spaces.Box(
            low=0, 
            high=100, 
            shape=(len(self.state),), 
            dtype=np.float32
        )

    def reset(self):
        self.current_step = 0
        self.state = self.data[self.state_columns].iloc[self.current_step]
        return self.state.values

    def step(self, action):
        # Simulate the effect of the action (e.g., adjust temperature, humidity)
        reward = self.calculate_reward(action)
        self.current_step += 1
        done = self.current_step >= len(self.data) - 1
        self.state = self.data[self.state_columns].iloc[self.current_step]
        return self.state.values, reward, done, {}

    def calculate_reward(self, action):
        # Define a reward function based on farming goals
        reward = 0
        # Example: Reward for maintaining optimal temperature and humidity
        if 20 <= self.state['Temperature Layer A'] <= 25 and 40 <= self.state['Humidity Layer A'] <= 60:
            reward += 1
        return reward

# Create the environment
env = FarmingEnv(vertical_farm)
env = DummyVecEnv([lambda: env])

# Train a PPO model
model = PPO('MlpPolicy', env, verbose=1)
model.learn(total_timesteps=10000)

# Save the model
model.save("farming_rl_model")



Using cpu device
-----------------------------
| time/              |      |
|    fps             | 79   |
|    iterations      | 1    |
|    time_elapsed    | 25   |
|    total_timesteps | 2048 |
-----------------------------
----------------------------------------
| time/                   |            |
|    fps                  | 79         |
|    iterations           | 2          |
|    time_elapsed         | 51         |
|    total_timesteps      | 4096       |
| train/                  |            |
|    approx_kl            | 0.02019383 |
|    clip_fraction        | 0.15       |
|    clip_range           | 0.2        |
|    entropy_loss         | -1.09      |
|    explained_variance   | -42.9      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0489    |
|    n_updates            | 10         |
|    policy_gradient_loss | -0.00564   |
|    value_loss           | 0.00249    |
----------------------------------------
-----------------------------------

## Observations and Insights

**Stable Training**:
- The approx_kl values are small (e.g., 0.0098), which indicates that the policy is not changing too drastically between updates. This is a good sign of stable training.
- The clip_fraction values are reasonable (e.g., 0.0983), meaning the PPO clipping mechanism is working as intended.

**Potential Issues**:

The explained_variance is negative (e.g., -4.17), which suggests the value function is not predicting returns accurately. This could be due to:
- A poorly designed reward function.
- Insufficient training data or episodes.
- A mismatch between the environment dynamics and the model's assumptions.

The loss values are negative, which is unusual. This could indicate issues with the reward function or environment setup.

**Exploration**: 
- The entropy_loss values (e.g., -1.07) indicate moderate exploration. If the entropy loss becomes too low, the policy may stop exploring.

## Recommendations

Check the Reward Function:
- Ensure the reward function is well-designed and provides meaningful feedback to the agent.
- Avoid sparse rewards or rewards that are too small in magnitude.

Increase Training Time:
- The model has only completed 10,240 timesteps. Consider training for more timesteps (e.g., 100,000 or more) to allow the model to learn better.

Tune Hyperparameters:
- Experiment with different values for clip_range, learning_rate, and entropy_coeff to improve performance.
- For example, try reducing the learning_rate to 0.0001 or increasing the clip_range to 0.3.

Evaluate the Environment:
- Ensure the environment is providing meaningful observations and rewards.
- Debug the environment to confirm it behaves as expected.

Monitor Progress:
- Continue monitoring the training logs to ensure the explained_variance improves and the loss becomes positive.

## Next Steps

1. Run the training for more timesteps and observe if the metrics improve.

2. If the explained_variance remains negative or the loss stays unusual, revisit the reward function and environment design.

## Conclusion

This project has the potential to significantly improve farming efficiency and sustainability by automating decision-making and optimizing resource usage. By reducing waste and maximizing crop yield, such systems can contribute to more sustainable agricultural practices, addressing global challenges like food security and resource scarcity.

In conclusion, this project serves as a strong foundation for applying AI and machine learning to agriculture, showcasing the transformative potential of these technologies in real-world applications.