Skip to content

Erdos Institute Data Science Bootcamp – May 2022 Project

License

Notifications You must be signed in to change notification settings

akshaysuresh1/may22-barrel

Repository files navigation

Team Barrel

Topic: Predicting Fertilizer Input for Rice Cultivation in India

Team members:

This project was completed as part of the Erdos Institute Data Science Bootcamp, May 2022. Special mention to James Bramante for mentoring us throughout the duration of the bootcamp.

5 minute video presentation: MP4
Presentation slides: PDF
Executive summary: PDF


Table of Contents


Project Goal

Home to over 1.38 billion people, India is tackling a severe hunger crisis. Though the country has achieved self-sufficiency in grain production, nearly 14% of the population is still undernourished. India's agricultural landscape is primarily rural, where widespread poverty, low literacy rates, and poor infrastructure lead to questions over its sustainability. Indiscriminate use of fertilizers has led to significant irregularity in crop production despite consistent agricultural subsidies.

With the current global shortage of fertilizers, precision farming is vital to eliminate redundant costs and streamline resources to ensure equitable food access for all communities. Here, we assist policy-makers in their decisions through models predicting the fertilizer consumption (nitrogen, phosphorus, and potash) required to obtain a specific rice yield.

Methodology

Rice is a hardy crop capable of thriving in a variety of soils, including loams, silts, and gravel. Collating up to 26 years of district-level rice cultivation (cropped area, yield, irrigated area) and environment data (temperature, precipitation, wind speed, evapotranspiration), our analysis involved two key steps.

  1. Firstly, we grouped districts with similar ecological parameters into clusters. To do so, we experimented with two unsupervised learning approaches, namely, K-means and hierarchical clustering.

  2. At the level of clusters, we regressed the historical NPK consumption data against rice yield. Here, we trialed simple linear regression, random forest regression, and support vector regression.

Clustering

Based on environmental variables, both K-means and hierarchical clustering favor the grouping of Indian districts into 6 rice-growing clusters. Here is a map of India showing the spatial grouping of districts generated by our hierarchical clustering algorithm.

We note that the above map bears some visual resemble to the Koppen-Geiger climate classification map of India. However, we caution readers against performing meticulous comparisons between these maps as our algorithms additionally incorporate soil-dependent features such as surface runoff and evapotranspiration.

Modeling

For every cluster, we independently regressed their nitrogen, phosphorous, and potash fertilizer inputs per unit area against rice yield. Performing a 80-20 train-test split, we evaluated model performance on our test data using the SMAPE metric. As shown below, support vector regression marginally outperforms other models with a smaller SMAPE.

Future Work

Future extensions to our model will incorporate soil nutrient data, solar irradiance data, and knowledge of off-season farming practices (e.g., crop rotation) to improve the accuracy of our estimated fertilizer inputs.

Troubleshooting

Please submit an issue to voice any problems or requests. Suggestions that will help improve our data analyses are always welcome.

About

Erdos Institute Data Science Bootcamp – May 2022 Project

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages