# Crop-Fertiliser Recommendation System

### Step 1: Data Understanding & Initial Preprocessing
This notebook covers:
- Importing datasets
- Viewing basic structure and records
- Checking for missing values and duplicates
- Summary statistics

In [1]:
import pandas as pd
import numpy as np

In [5]:
# Load the crop and fertiliser datasets
crop_df = pd.read_csv('crop.csv')
fertiliser_df = pd.read_csv('fertiliser.csv')

In [6]:
# Display first 5 records
print('Crop Dataset:')
print(crop_df.head())

print('\nFertiliser Dataset:')
print(fertiliser_df.head())

Crop Dataset:
   Temperature  Humidity   Moisture Soil Type  Crop Type  Nitrogen  Potassium  \
0           26         52        38     Sandy      Maize        37          0   
1           29         52        45     Loamy  Sugarcane        12          0   
2           34         65        62     Black     Cotton         7          9   
3           32         62        34       Red    Tobacco        22          0   
4           28         54        46    Clayey      Paddy        35          0   

   Phosphorous Fertilizer Name  
0            0            Urea  
1           36             DAP  
2           30        14-35-14  
3           20           28-28  
4            0            Urea  

Fertiliser Dataset:
    N   P   K  Temperature   Humidity        Ph    Rainfall Label
0  90  42  43    20.879744  82.002744  6.502985  202.935536  rice
1  85  58  41    21.770462  80.319644  7.038096  226.655537  rice
2  60  55  44    23.004459  82.320763  7.840207  263.964248  rice
3  74  35  40   

In [7]:
# Dataset info
print('Crop Dataset Info:')
print(crop_df.info())

print('\nFertiliser Dataset Info:')
print(fertiliser_df.info())

Crop Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99 entries, 0 to 98
Data columns (total 9 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Temperature      99 non-null     int64 
 1   Humidity         99 non-null     int64 
 2   Moisture         99 non-null     int64 
 3   Soil Type        99 non-null     object
 4   Crop Type        99 non-null     object
 5   Nitrogen         99 non-null     int64 
 6   Potassium        99 non-null     int64 
 7   Phosphorous      99 non-null     int64 
 8   Fertilizer Name  99 non-null     object
dtypes: int64(6), object(3)
memory usage: 7.1+ KB
None

Fertiliser Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2200 entries, 0 to 2199
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   N            2200 non-null   int64  
 1   P            2200 non-null   int64  
 2   K            2200 non-nu

In [8]:
# Check for missing values
print('Missing values in crop dataset:\n', crop_df.isnull().sum())
print('\nMissing values in fertiliser dataset:\n', fertiliser_df.isnull().sum())

Missing values in crop dataset:
 Temperature        0
Humidity           0
Moisture           0
Soil Type          0
Crop Type          0
Nitrogen           0
Potassium          0
Phosphorous        0
Fertilizer Name    0
dtype: int64

Missing values in fertiliser dataset:
 N              0
P              0
K              0
Temperature    0
Humidity       0
Ph             0
Rainfall       0
Label          0
dtype: int64


In [9]:
# Check for duplicates
print('Duplicates in crop dataset:', crop_df.duplicated().sum())
print('Duplicates in fertiliser dataset:', fertiliser_df.duplicated().sum())

Duplicates in crop dataset: 0
Duplicates in fertiliser dataset: 0


In [10]:
# Summary statistics
print('Crop Dataset Description:\n', crop_df.describe())
print('\nFertiliser Dataset Description:\n', fertiliser_df.describe())

Crop Dataset Description:
        Temperature  Humidity    Moisture   Nitrogen  Potassium  Phosphorous
count    99.000000  99.000000  99.000000  99.000000  99.000000    99.000000
mean     30.282828  59.151515  43.181818  18.909091   3.383838    18.606061
std       3.502304   5.840331  11.271568  11.599693   5.814667    13.476978
min      25.000000  50.000000  25.000000   4.000000   0.000000     0.000000
25%      28.000000  54.000000  34.000000  10.000000   0.000000     9.000000
50%      30.000000  60.000000  41.000000  13.000000   0.000000    19.000000
75%      33.000000  64.000000  50.500000  24.000000   7.500000    30.000000
max      38.000000  72.000000  65.000000  42.000000  19.000000    42.000000

Fertiliser Dataset Description:
                  N            P            K  Temperature     Humidity  \
count  2200.000000  2200.000000  2200.000000  2200.000000  2200.000000   
mean     50.551818    53.362727    48.149091    25.616244    71.481779   
std      36.917334    32.985883  