# Yield Prediction with Model Training and Comparison

## IntroductionIn this notebook, we will implement three algorithms for predicting agricultural yield, optimize their hyperparameters, compare their performance, and select the best model.

## Import Libraries

In [ ]:
import pandas as pdimport numpy as npimport seaborn as snsimport matplotlib.pyplot as pltfrom sklearn.model_selection import train_test_split, GridSearchCVfrom sklearn.linear_model import LinearRegression, LogisticRegressionfrom sklearn.metrics import mean_squared_error, accuracy_score, classification_reportimport joblib

## Load Dataset

In [ ]:
df = pd.read_csv('../data/crop_yield.csv')df.head()

## Data Preprocessing

In [ ]:
df.dropna(inplace=True)  # Drop missing valuesX = df.drop('yield', axis=1)  # Featuresy = df['yield']  # Target variableX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Model Training

In [ ]:
linear_model = LinearRegression()linear_model.fit(X_train, y_train)y_pred_linear = linear_model.predict(X_test)mse_linear = mean_squared_error(y_test, y_pred_linear)print(f'Linear Regression MSE: {mse_linear}')

In [ ]:
logistic_model = LogisticRegression(max_iter=1000)logistic_model.fit(X_train, y_train)y_pred_logistic = logistic_model.predict(X_test)accuracy_logistic = accuracy_score(y_test, y_pred_logistic)print(f'Logistic Regression Accuracy: {accuracy_logistic}')

## Hyperparameter Optimization

In [ ]:
param_grid = {'C': [0.1, 1, 10], 'solver': ['liblinear', 'saga']}grid_search = GridSearchCV(LogisticRegression(max_iter=1000), param_grid, cv=5)grid_search.fit(X_train, y_train)print(f'Best parameters for Logistic Regression: {grid_search.best_params_}')

## Model Comparison

In [ ]:
results = {'Linear Regression MSE': mse_linear, 'Logistic Regression Accuracy': accuracy_logistic}results

## ConclusionBased on the results, we can determine which model performs better for predicting agricultural yield.