# Pohora.LK - Machine Learning for Fertilizer Recommendation

The **Pohora.LK** project uses machine learning to tackle a common agricultural problem - deciding which fertilizer must be applied to a crop, based on different variable factors and environmental conditions, such as soil nitrogen content, humidity etc.

This notebook will dive into a dataset available on [Kaggle](https://www.kaggle.com/datasets/shankarpriya2913/crop-and-soil-dataset), and use it to train a machine learning model that can be part of a larger application for farmers and agriculturists to use.

## 1. Getting Started

This notebook uses the following dependencies:

1. Jupyter - to run the notebook
2. Scikit-Learn - to build, train and evaluate machine learning models
3. NumPy - to handle numerical computations in the data
4. Pandas - to load and manipulate the data as required
5. Matplotlib - to perform various visualizations on the data

These dependencies can be installed with:

```sh
pip install jupyter scikit-learn numpy pandas matplotlib
```

In [1]:
# Import dependencies

import sklearn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Next, we can load the dataset from the CSV file as a Pandas **dataframe**.

In [15]:
# Load data from CSV file into a dataframe named `df`
df = pd.read_csv("../data/fertilizer.csv")

# Inspect the first 5 elements (a.k.a the head) of the dataframe
df.head()

Unnamed: 0,Temparature,Humidity,Moisture,Soil Type,Crop Type,Nitrogen,Potassium,Phosphorous,Fertilizer Name
0,26.0,52.0,38.0,Sandy,Maize,37,0,0,Urea
1,29.0,52.0,45.0,Loamy,Sugarcane,12,0,36,DAP
2,34.0,65.0,62.0,Black,Cotton,7,9,30,14-35-14
3,32.0,62.0,34.0,Red,Tobacco,22,0,20,28-28
4,28.0,54.0,46.0,Clayey,Paddy,35,0,0,Urea


At a glance, we can see that the dataset contains the following columns of data:

1. **Temparature** - the surrounding temperature of the crop (feature variable)
2. **Humidity** - the surrounding humidity of the crop (feature variable)
3. **Moisture** - the moisture in the crop's soil (feature variable)
4. **Soil Type** - the type of soil the crop has been planted in (feature variable)
5. **Crop Type** - the type of crop planted (feature variable)
6. **Nitrogen** - the nitrogen content of the soil (feature variable)
7. **Potassium** - the potassium content of the soil (feature variable)
8. **Phosphorous** - the phosphorous content of the soil (feature variable)
9. **Fertilizer Name** - the name of the fertilizer recommended for the application (target variable)

Before we begin analyzing the dataset, we'll first rename the `Temparature` column to fix the typo in it.

In [16]:
# Fix typo in `Temparature` column
df.rename(
    columns={"Temparature": "Temperature"},
    errors="raise",
    inplace=True,
)

df.head()

Unnamed: 0,Temperature,Humidity,Moisture,Soil Type,Crop Type,Nitrogen,Potassium,Phosphorous,Fertilizer Name
0,26.0,52.0,38.0,Sandy,Maize,37,0,0,Urea
1,29.0,52.0,45.0,Loamy,Sugarcane,12,0,36,DAP
2,34.0,65.0,62.0,Black,Cotton,7,9,30,14-35-14
3,32.0,62.0,34.0,Red,Tobacco,22,0,20,28-28
4,28.0,54.0,46.0,Clayey,Paddy,35,0,0,Urea
