# DEVELOPING PREDICTIVE MODELS FOR CROP YIELD ESTIMATION IN INDIA.

## Introduction.
The estimation of crop yield is critical for ensuring food security and optimizing agricultural practices. Digital Green has collected a comprehensive data-set from various districts in India, focusing on factors influencing rice crop yield. This project is aimed at developing predictive models for crop yield estimation using this valuable data-set. A crop yield model could revolutionise Indian agriculture, and serve as a global model for smallholder farmers. Accurate yield predictions empower smallholder farmers to make informed planting and resource allocation decisions, reducing poverty and malnutrition and improving food security. 

As climate change intensifies, adaptive farming practices become crucial, making precise yield predictions even more valuable. Solutions developed here can drive sustainable agriculture and ensure a stable food supply for the world's growing population.

## Business Understanding.
India is a nation with a high population density, most of whom are small scale farmers that relly on agriculture as a source of income and as a source of food as well. It is thus crucial for the government of India to help small scale farmers maximize their potential when it comes to food production. Rice is a staple food in India cultivated by a majority of the farmers because it is well supported by the climatic conditions of India.



## Problem Statement.
Accurate crop yield estimation is essential for farmers, policymakers, and the agricultural sector. It helps in making informed decisions regarding planting, resource allocation, and food distribution. This project aims to address the challenge of predicting rice crop yields based on multiple factors, including fertilizer usage, seedling quantity, land preparation methods, and irrigation techniques.

## Objective.
The primary objective of this project is to develop accurate predictive models for estimating rice crop yields based on the Digital Green data-set. These models will assist farmers and stakeholders in optimizing agricultural practices and improving food production.

## Data Understanding.
The data-set has been downloaded from zindi and was collected through a survey conducted across multiple districts in India. It consists of a variety of factors that could potentially impact the yield of rice crops. These factors include things like the type and amount of fertilizers used, the quantity of seedlings planted, methods of preparing the land, different irrigation techniques employed, among other features. The dataset comprises more than 5000 data points, each having more than 40 features.

### Importing Libraries.

In [2]:
import pandas as pd

### Reading the Data.

In [3]:
# read the data
train = pd.read_csv("Train (1).csv")
train.head()

Unnamed: 0,ID,District,Block,CultLand,CropCultLand,LandPreparationMethod,CropTillageDate,CropTillageDepth,CropEstMethod,RcNursEstDate,...,Harv_method,Harv_date,Harv_hand_rent,Threshing_date,Threshing_method,Residue_length,Residue_perc,Stubble_use,Acre,Yield
0,ID_GTFAC7PEVWQ9,Nalanda,Noorsarai,45,40,TractorPlough FourWheelTracRotavator,2022-07-20,5,Manual_PuddledRandom,2022-06-27,...,machine,2022-11-16,,2022-11-16,machine,30,40,plowed_in_soil,0.3125,600
1,ID_TK40ARLSPOKS,Nalanda,Rajgir,26,26,WetTillagePuddling TractorPlough FourWheelTrac...,2022-07-18,5,Manual_PuddledRandom,2022-06-20,...,hand,2022-11-25,3.0,2022-12-24,machine,24,10,plowed_in_soil,0.3125,600
2,ID_1FJY2CRIMLZZ,Gaya,Gurua,10,10,TractorPlough FourWheelTracRotavator,2022-06-30,6,Manual_PuddledRandom,2022-06-20,...,hand,2022-12-12,480.0,2023-01-11,machine,30,10,plowed_in_soil,0.148148,225
3,ID_I3IPXS4DB7NE,Gaya,Gurua,15,15,TractorPlough FourWheelTracRotavator,2022-06-16,6,Manual_PuddledRandom,2022-06-17,...,hand,2022-12-02,240.0,2022-12-29,hand,26,10,plowed_in_soil,0.222222,468
4,ID_4T8YQWXWHB4A,Nalanda,Noorsarai,60,60,TractorPlough WetTillagePuddling,2022-07-19,4,Manual_PuddledRandom,2022-06-21,...,machine,2022-11-30,,2022-12-02,machine,24,40,plowed_in_soil,0.46875,550


In [8]:
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3870 entries, 0 to 3869
Data columns (total 44 columns):
 #   Column                              Non-Null Count  Dtype  
---  ------                              --------------  -----  
 0   ID                                  3870 non-null   object 
 1   District                            3870 non-null   object 
 2   Block                               3870 non-null   object 
 3   CultLand                            3870 non-null   int64  
 4   CropCultLand                        3870 non-null   int64  
 5   LandPreparationMethod               3870 non-null   object 
 6   CropTillageDate                     3870 non-null   object 
 7   CropTillageDepth                    3870 non-null   int64  
 8   CropEstMethod                       3870 non-null   object 
 9   RcNursEstDate                       3787 non-null   object 
 10  SeedingSowingTransplanting          3870 non-null   object 
 11  SeedlingsPerPit                     3581 no