# Crop Yield Prediction #

This project focuses on predicting crop yield using a combination of soil properties, weather conditions, and farming practices. The dataset contains information on soil nutrients (such as nitrogen, phosphorus, and potassium), environmental factors like temperature, rainfall, humidity, and sunlight, as well as categorical variables including crop type, season, region, and irrigation method.

The goal of this project is to build a machine learning regression model that can learn the relationships between these factors and crop yield measured in tons per hectare. By performing exploratory data analysis, feature engineering, and model evaluation, I aim to identify the most important factors influencing crop productivity and develop a model that can generalize well across different crops and regions. This work is intended to simulate a real-world agricultural yield prediction system that could support data-driven decision-making in smart farming and agricultural planning.

### - Import Libraries ###

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [4]:
data= pd.read_csv("../data/crop-yield.csv")

In [5]:
data.head(10)

Unnamed: 0,N,P,K,Soil_pH,Soil_Moisture,Soil_Type,Organic_Carbon,Temperature,Humidity,Rainfall,Sunlight_Hours,Wind_Speed,Region,Altitude,Season,Crop_Type,Irrigation_Type,Fertilizer_Used,Pesticide_Used,Crop_Yield_ton_per_hectare
0,132,62,22,6.35,59.78,Clay,0.43,22.97,53.89,1305.68,7.73,15.96,Central,36,Rabi,Maize,Canal,223.48,23.36,11.42
1,122,71,66,5.98,25.54,Sandy,0.65,17.0,76.9,1942.05,9.25,12.6,North,1561,Rabi,Potato,Canal,161.54,4.42,23.19
2,44,35,104,8.07,25.87,Sandy,0.79,25.52,44.78,2216.2,8.5,15.63,North,1870,Rabi,Rice,Rainfed,184.62,6.29,7.94
3,136,96,113,4.83,42.97,Silt,0.45,18.59,31.89,607.18,8.75,5.49,East,765,Kharif,Sugarcane,Rainfed,274.02,2.72,72.53
4,101,34,42,5.84,48.01,Silt,0.69,22.74,46.27,483.47,8.0,7.44,Central,1143,Zaid,Wheat,Rainfed,72.69,15.37,6.72
5,50,29,22,6.87,32.73,Silt,1.2,13.88,68.91,1993.65,10.17,11.25,East,1739,Kharif,Rice,Canal,335.8,3.8,8.67
6,132,83,148,7.46,40.98,Silt,0.92,14.92,87.21,2433.33,10.28,13.82,East,1360,Rabi,Potato,Canal,301.54,2.84,26.96
7,151,91,86,7.58,26.39,Sandy,0.85,28.42,53.74,1499.4,8.24,17.7,North,1348,Kharif,Rice,Rainfed,317.16,19.71,9.51
8,104,65,90,4.96,21.8,Silt,0.86,26.96,77.85,1881.33,9.12,2.16,East,54,Rabi,Cotton,Sprinkler,253.49,17.82,7.01
9,117,90,86,7.21,26.91,Clay,1.29,15.14,42.03,1045.25,7.24,10.21,North,57,Rabi,Sugarcane,Sprinkler,231.33,21.83,75.1


In [6]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 20 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   N                           10000 non-null  int64  
 1   P                           10000 non-null  int64  
 2   K                           10000 non-null  int64  
 3   Soil_pH                     10000 non-null  float64
 4   Soil_Moisture               10000 non-null  float64
 5   Soil_Type                   10000 non-null  object 
 6   Organic_Carbon              10000 non-null  float64
 7   Temperature                 10000 non-null  float64
 8   Humidity                    10000 non-null  float64
 9   Rainfall                    10000 non-null  float64
 10  Sunlight_Hours              10000 non-null  float64
 11  Wind_Speed                  10000 non-null  float64
 12  Region                      10000 non-null  object 
 13  Altitude                    1000

In [7]:
data.info()
data.isnull().sum()
data.duplicated().sum()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 20 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   N                           10000 non-null  int64  
 1   P                           10000 non-null  int64  
 2   K                           10000 non-null  int64  
 3   Soil_pH                     10000 non-null  float64
 4   Soil_Moisture               10000 non-null  float64
 5   Soil_Type                   10000 non-null  object 
 6   Organic_Carbon              10000 non-null  float64
 7   Temperature                 10000 non-null  float64
 8   Humidity                    10000 non-null  float64
 9   Rainfall                    10000 non-null  float64
 10  Sunlight_Hours              10000 non-null  float64
 11  Wind_Speed                  10000 non-null  float64
 12  Region                      10000 non-null  object 
 13  Altitude                    1000

np.int64(0)