<a href="https://colab.research.google.com/github/Magaton1010/Python_Analysis/blob/main/Tomato_Yield_Prediction_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tomato Yield Prediction Project



## Table of Contents

- [Introduction](#introduction)
- [Data](#data)
- [Methodology](#methodology)
- [Results](#results)
- [Usage](#usage)
- [Contributing](#contributing)
- [License](#license)

## Introduction

This project focuses on predicting yield using nutration rate levels as predictors. The goal is to develop a predictive model that can assist in optimizing crop management strategies.

## Data

We used datasets from three different o cultivation sites, each containing information  nutration rate, and total yield (TY). The datasets were preprocessed and split into training and testing sets for model development and evaluation.

## Methodology

We employed a Linear Regression model to predict yield based on the provided predictors. The model was trained and evaluated for each site separately.

## Results

The performance of the developed model was assessed using Mean Squared Error (MSE) and R-squared (R2) metrics. The model's predictions were visualized through scatter plots, comparing actual and predicted total yields.


In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

# Load datasets for each site
site1_data = pd.read_csv('site1_dataset.csv')
site2_data = pd.read_csv('site2_dataset.csv')
site3_data = pd.read_csv('site3_dataset.csv')

# Assuming the columns in each dataset are 'TY', 'nutrition1', and 'nutrition2'
# For each site, extract features (nutrition2 and nutrition2) and target variable (Total Yield)
X_site1 = site1_data[['nutrition1', 'nutrition2']]
y_site1 = site1_data['TY']

X_site2 = site2_data[['nutrition1', 'nutrition2']]
y_site2 = site2_data['TY']

X_site3 = site3_data[['nutrition1', 'nutrition2']]
y_site3 = site3_data['TY']

# Split data for each site into training and testing sets
X_train_site1, X_test_site1, y_train_site1, y_test_site1 = train_test_split(X_site1, y_site1, test_size=0.2, random_state=42)
X_train_site2, X_test_site2, y_train_site2, y_test_site2 = train_test_split(X_site2, y_site2, test_size=0.2, random_state=42)
X_train_site3, X_test_site3, y_train_site3, y_test_site3 = train_test_split(X_site3, y_site3, test_size=0.2, random_state=42)

# Initialize and train Linear Regression models for each site
model_site1 = LinearRegression()
model_site1.fit(X_train_site1, y_train_site1)

model_site2 = LinearRegression()
model_site2.fit(X_train_site2, y_train_site2)

model_site3 = LinearRegression()
model_site3.fit(X_train_site3, y_train_site3)

# Make predictions and evaluate models for each site
def evaluate_model(model, X_test, y_test, site_name):
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)

    print(f"Site: {site_name}")
    print(f"Mean Squared Error: {mse:.2f}")
    print(f"R-squared: {r2:.2f}")
    print()

evaluate_model(model_site1, X_test_site1, y_test_site1, "Site 1")
evaluate_model(model_site2, X_test_site2, y_test_site2, "Site 2")
evaluate_model(model_site3, X_test_site3, y_test_site3, "Site 3")