## Data Description

The dataset used in this analysis contains information related to paddy cultivation. Below is a description of the data:

The dataset consists of the following columns:
- `Temperature`: Average temperature (in Celsius).
- `Avg rain(mm)`: Average rainfall (in millimeters).
- `Fertilizer`: Amount of fertilizer used.
- `RAINFED(Hect)`: Rainfed area (in hectares).
- `RAINFED Yeild(Kg)`: Yield of rainfed paddy (in kilograms).


In [4]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

#### Load data from an Excel sheet (adjust the file path) file path is in the data folder

In [2]:
data = pd.read_excel('paddy.xlsx')

#### Define the independent variables (features) and the dependent variable (target)

In [3]:
X = data[['Temperature', 'Avg rain(mm)', 'Fertilizer ', 'RAINFED(Hect)']]
y = data['RAINFED Yeild(Kg)']

#### Split the data into training and testing sets

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


#### Create and train the linear regression model

In [5]:
model = LinearRegression()
model.fit(X_train, y_train)

LinearRegression()

In [6]:
# Use the trained model to make predictions
predictions = model.predict(X_test)

In [7]:
# Print the model's coefficients and intercept
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)

Coefficients: [-1.17361588e+02  2.62442795e+00  5.39719178e-02 -2.09136339e-02]
Intercept: 5483.521027751586


In [8]:
# Fit the model to the training data
model.fit(X_train, y_train)

LinearRegression()

In [12]:
# Make predictions on the test data
y_pred = model.predict(X_test)

In [18]:
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)


In [23]:
print("Mean Squared Error:", mse)
print("R-squared:", r2)

Mean Squared Error: 1318883.1489845559
R-squared: 0.2244992057784787


In [26]:
# Predict paddy yield for a new set of input features
new_data = np.array([[30.3, 42.4,30569.661, 1596]])  # Replace with your own values
predicted_yield = model.predict(new_data)

print("Predicted Paddy Yield:", predicted_yield[0])

Predicted Paddy Yield: 3655.265733565367


In [29]:
import pickle

In [32]:
with open('model_pickle','wb') as f:
    pickle.dump(model,f)

In [33]:
with open('model_pickle','rb') as f:
    mp = pickle.load(f)

In [34]:
mp.predict(new_data)

array([3655.26573357])

In [5]:
# Create a scatter plot
plt.figure(figsize=(8, 6))
plt.scatter(data['Temperature'], data['RAINFED Yeild(Kg)'], alpha=0.5)
plt.title('Scatter Plot of Temperature vs. RAINFED Yeild(Kg)')
plt.xlabel('Temperature (°C)')
plt.ylabel('RAINFED Yeild(Kg)')
plt.grid(True)
plt.show()

NameError: name 'data' is not defined

<Figure size 576x432 with 0 Axes>