# CROP YIELD PREDICTION


**Tujuan dari buku catatan ini adalah untuk memprediksi hasil panen menggunakan data dari kumpulan data yang diberikan. Kumpulan data tersebut diambil dari Prediksi Hasil Panen.**

<img src = "https://img.in-part.com/resize?stripmeta=true&noprofile=true&quality=95&url=https%3A%2F%2Fs3-eu-west-1.amazonaws.com%2Fassets.in-part.com%2Ftechnologies%2Fheader-images%2F2aVv2twTYW9qZGGhPrxw_AdobeStock_241906053.jpeg&width=1200&height=820" width = "700" height = "500">

### Data Dictionary
| Column Name | Description |
|-------------|-------------|
| Rain Fall (mm) | Rainfall in millimeters |
| Temperature (C) | Temperature in Celsius |
| Fertilizer (kg) | Fertilizer in kilograms |
| Nitrogen (N)| Nitrogen macro nutrient |
| Phosphorous (P) | Phosphorous macro nutrient |
| Potassium (K) | Potassium macro nutrient |
| Yield (Q/acres) | Crop yield Quintals per acre|

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
df = pd.read_excel ("crop yield data sheet.xlsx")
df.head()

Unnamed: 0,Rain Fall (mm),Fertilizer,Temperatue,Nitrogen (N),Phosphorus (P),Potassium (K),Yeild (Q/acre)
0,1230.0,80.0,28,80.0,24.0,20.0,12.0
1,480.0,60.0,36,70.0,20.0,18.0,8.0
2,1250.0,75.0,29,78.0,22.0,19.0,11.0
3,450.0,65.0,35,70.0,19.0,18.0,9.0
4,1200.0,80.0,27,79.0,22.0,19.0,11.0


## Data Preprocessing

In [3]:
# Checking the shape of the dataset
df.shape

(109, 7)

In [4]:
# Checking the data types of the columns
df.dtypes

Rain Fall (mm)    float64
Fertilizer        float64
Temperatue         object
Nitrogen (N)      float64
Phosphorus (P)    float64
Potassium (K)     float64
Yeild (Q/acre)    float64
dtype: object

Di sini, suhu memiliki tipe data objek. Kita perlu mengubahnya menjadi tipe data float. Namun, pertama-tama, saya akan memeriksa nilai-nilai di kolom

In [6]:
df ['Temperatue'].unique()

array([28, 36, 29, 35, 27, 34, 37, 39, 26, 38, 24, 25, 40, nan, ':'],
      dtype=object)

Kolom memiliki nilai yang tidak valid ":". Saya perlu menghapus nilai ini sebelum mengonversi kolom ke tipe data float.

In [7]:
# Dropping ":" from the Temperatue column
df = df[df['Temperatue'] != ':']

In [8]:
# Checking the Temperatue column to float 
df ['Temperatue'] = df ['Temperatue'].astype (float)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df ['Temperatue'] = df ['Temperatue'].astype (float)


In [9]:
# Checking for null values
df.isnull().sum()

Rain Fall (mm)    9
Fertilizer        9
Temperatue        9
Nitrogen (N)      9
Phosphorus (P)    9
Potassium (K)     9
Yeild (Q/acre)    9
dtype: int64

Mengganti nilai yang hilang dengan median kolom

In [10]:
# Replacing missing values with medium
columns = [df.columns]
for col in columns:
    df [col] = df [col].fillna (df [col].median())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df [col] = df [col].fillna (df [col].median())


In [11]:
# Descriptive Statistics
df.describe()

Unnamed: 0,Rain Fall (mm),Fertilizer,Temperatue,Nitrogen (N),Phosphorus (P),Potassium (K),Yeild (Q/acre)
count,108.0,108.0,108.0,108.0,108.0,108.0,108.0
mean,874.814815,67.990741,32.111111,70.759259,21.12037,18.138889,9.046296
std,391.818744,9.616473,5.277944,6.390516,1.868167,1.758601,1.88146
min,400.0,50.0,24.0,59.0,18.0,15.0,5.5
25%,450.0,60.0,28.0,65.0,20.0,16.0,7.0
50%,1150.0,70.0,29.0,71.0,21.0,19.0,9.0
75%,1226.25,77.0,38.0,76.25,23.0,19.0,11.0
max,1300.0,80.0,40.0,80.0,25.0,22.0,12.0


In [12]:
df.head()

Unnamed: 0,Rain Fall (mm),Fertilizer,Temperatue,Nitrogen (N),Phosphorus (P),Potassium (K),Yeild (Q/acre)
0,1230.0,80.0,28.0,80.0,24.0,20.0,12.0
1,480.0,60.0,36.0,70.0,20.0,18.0,8.0
2,1250.0,75.0,29.0,78.0,22.0,19.0,11.0
3,450.0,65.0,35.0,70.0,19.0,18.0,9.0
4,1200.0,80.0,27.0,79.0,22.0,19.0,11.0


## Exploratory Data Analysis


### Rainfall Distribution