## Data Description

The dataset used in this analysis contains information related to paddy cultivation. Below is a description of the data:

The dataset consists of the following columns:
- `Temperature`: Average temperature (in Celsius).
- `Avg rain(mm)`: Average rainfall (in millimeters).
- `Fertilizer`: Amount of fertilizer used.
- `RAINFED(Hect)`: Rainfed area (in hectares).
- `RAINFED Yeild(Kg)`: Yield of rainfed paddy (in kilograms).


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

#### Load data from an Excel sheet (adjust the file path) file path is in the data folder

In [None]:
data = pd.read_excel('paddy.xlsx')

#### Define the independent variables (features) and the dependent variable (target)

In [None]:
X = data[['Temperature', 'Avg rain(mm)', 'Fertilizer ', 'RAINFED(Hect)']]
y = data['RAINFED Yeild(Kg)']

#### Split the data into training and testing sets

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


#### Create and train the linear regression model

In [None]:
model = LinearRegression()
model.fit(X_train, y_train)

LinearRegression()

#### Use the trained model to make predictions

In [None]:
predictions = model.predict(X_test)

#### Print the model's coefficients and intercept

In [None]:
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)

Coefficients: [-1.17361588e+02  2.62442795e+00  5.39719178e-02 -2.09136339e-02]
Intercept: 5483.521027751586


#### Fit the model to the training data

In [None]:
model.fit(X_train, y_train)

LinearRegression()

#### Make predictions on the test data

In [None]:
y_pred = model.predict(X_test)

#### Evaluate the model

In [None]:
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

In [None]:
print("Mean Squared Error:", mse)
print("R-squared:", r2)

Mean Squared Error: 1318883.1489845559
R-squared: 0.2244992057784787


#### Predict paddy yield for a new set of input features

In [None]:
new_data = np.array([[30.3, 42.4,30569.661, 1596]])  # Replace with your own values
predicted_yield = model.predict(new_data)

print("Predicted Paddy Yield:", predicted_yield[0])

Predicted Paddy Yield: 3655.265733565367


#### Create a scatter plot

In [None]:
plt.figure(figsize=(8, 6))
plt.scatter(data['Temperature'], data['RAINFED Yeild(Kg)'], alpha=0.5)
plt.title('Scatter Plot of Temperature vs. RAINFED Yeild(Kg)')
plt.xlabel('Temperature (°C)')
plt.ylabel('RAINFED Yeild(Kg)')
plt.grid(True)
plt.show()