# Bhutan Crop Price Prediction
This program is used to analyze the crop data from past 5 years and then predict the price.

The Data collected contains data from 20 Dzongkhags and 20 Gewogs (1 Gewog each from each Dzongkhag). The crop listed is only Potatoes for Winter Season. The other parameter that dataset includes the Area of Cultivation in Acre, Yield of Potatoes in Kg and Price per Kg.

## Importing Necessary Modules and DataSet

In [24]:
import csv
import numpy as np
import pandas as pd

# Load the Crop data as Panda DataFrame
crop_df = pd.read_csv('Dataset/Full_Crop_Dataset.csv')

## Analyzing and Formatting Dataset

We have text in our dataset. As these features will be used for analysis, we need to convert them to numbers.
For simplicity, we will give numbers manually. For example: 1 to 20 for each Dzongkhag name.

We need to convert Dzongkhag, Gewog, Season and Crop

In [25]:
crop_df.head()

Unnamed: 0,Dzongkhag,Gewog,Year,Season,Crop,CultivatedArea_Acre,Yeild_Kg,Price_Per_Kg
0,Samtse,Norbugang,2015,Winter,Potatoes,50,20000,15
1,Haa,Uesu,2015,Winter,Potatoes,75,25000,17
2,Chhukha,Gedu,2015,Winter,Potatoes,100,100000,14
3,Paro,Shaba,2015,Winter,Potatoes,150,500000,16
4,Thimphu,Genekha,2015,Winter,Potatoes,40,22000,15


In [26]:
crop_df.describe()

Unnamed: 0,Year,CultivatedArea_Acre,Yeild_Kg,Price_Per_Kg
count,160.0,160.0,160.0,160.0
mean,2017.875,197.84375,113119.4125,15.0
std,1.9058,89.248332,115925.646584,2.052763
min,2015.0,15.0,10000.0,12.0
25%,2016.75,115.0,60391.75,13.0
50%,2017.5,201.5,93811.5,15.0
75%,2019.25,273.0,133432.25,17.0
max,2021.0,350.0,830000.0,20.0


In [27]:
# Checking unique Dzongkhag, Gewog, Year, Season and Crop
print('Dzongkhag : ', crop_df['Dzongkhag'].unique(), crop_df['Dzongkhag'].nunique())

print('Gewog : ', crop_df['Gewog'].unique(), crop_df['Dzongkhag'].nunique())

print('Year : ', crop_df['Year'].unique(), crop_df['Year'].nunique())

print('Season : ', crop_df['Season'].unique(), crop_df['Season'].nunique())

print('Crop : ', crop_df['Crop'].unique(), crop_df['Crop'].nunique())

Dzongkhag :  ['Samtse' 'Haa' 'Chhukha' 'Paro' 'Thimphu' 'Gasa' 'Punakha' 'Wangdue'
 'Dagana' 'Tsirang' 'Sarpang' 'Zhemgang' 'Trongsa' 'Bumthang' 'Lhuntse'
 'Mongar' 'Trashiyangtse' 'Trashigang' 'Pemagatshel' 'Samdrup Jongkhar'] 20
Gewog :  ['Norbugang' 'Uesu' 'Gedu' 'Shaba' 'Genekha' 'Lunana' 'Khuruthang' 'Bajo'
 'Drujeygang' 'Mendrelgang' 'Gelephu' 'Panbang' 'Karsong' 'Jakar'
 'Shabling' 'Gyelpoizhing' 'Taripe' 'Khaling' 'Yurung' 'Jomotshangkha'] 20
Year :  [2015 2016 2017 2018 2019 2020 2021] 7
Season :  ['Winter'] 1
Crop :  ['Potatoes'] 1


In [28]:
# As we only have one crop
crop_df['Crop'] = crop_df['Crop'].map( {'Potatoes':1} )

# As we only have one Season in Dataset
crop_df['Season'] = crop_df['Season'].map( {'Winter':1} )

# As we only have 20 Dzongkhag and Gewogs in Dataset
crop_df['Dzongkhag'] = crop_df['Dzongkhag'].map( {
    'Samtse':1,'Haa':2,'Chhukha':3,'Paro':4,'Thimphu':5,
    'Gasa':6,'Punakha':7,'Wangdue':8,'Dagana':9,'Tsirang':10,
    'Sarpang':11,'Zhemgang':12,'Trongsa':13, 'Bumthang':14,'Lhuntse':15,
    'Mongar':16,'Trashiyangtse':17,'Trashigang':18,'Pemagatshel':19,'Samdrup Jongkhar':20,
} )
crop_df['Gewog'] = crop_df['Gewog'].map( {
    'Norbugang':1,'Uesu':2,'Gedu':3,'Shaba':4,'Genekha':5,
    'Lunana':6,'Khuruthang':7,'Bajo':8,'Drujeygang':9,'Mendrelgang':10,
    'Gelephu':11,'Panbang':12,'Karsong':13, 'Jakar':14,'Shabling':15,
    'Gyelpoizhing':16,'Taripe':17,'Khaling':18,'Yurung':19,'Jomotshangkha':20,
} )

In [29]:
# Let us check if conversion is done or not
crop_df.head()

Unnamed: 0,Dzongkhag,Gewog,Year,Season,Crop,CultivatedArea_Acre,Yeild_Kg,Price_Per_Kg
0,1,1,2015,1,1,50,20000,15
1,2,2,2015,1,1,75,25000,17
2,3,3,2015,1,1,100,100000,14
3,4,4,2015,1,1,150,500000,16
4,5,5,2015,1,1,40,22000,15


## Selecting Data for Modelling

The analysis and conversion is required if you are having raw format. If your dataset is perfect then you can directly proceed here.

In [30]:
# Selecting the Prediction Target
# As we want to predict the Price of the Crop, so our Target will be Price.
y = crop_df.Price_Per_Kg

# We will choose which features (Columns), we will use for prediction
crop_df.columns # List all columns of the data frame

crop_features = ['Dzongkhag', 'Gewog', 'Year', 'Season', 'Crop', 'CultivatedArea_Acre',
       'Yeild_Kg']

X = crop_df[crop_features]


In [31]:
crop_df.head()

Unnamed: 0,Dzongkhag,Gewog,Year,Season,Crop,CultivatedArea_Acre,Yeild_Kg,Price_Per_Kg
0,1,1,2015,1,1,50,20000,15
1,2,2,2015,1,1,75,25000,17
2,3,3,2015,1,1,100,100000,14
3,4,4,2015,1,1,150,500000,16
4,5,5,2015,1,1,40,22000,15


## Building Model

Using  decision tree model with scikit-learn and fitting it with the features and target variable.

In [32]:
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor

# Split into validation and training data
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)

# Define model. Specify a number for random_state to ensure same results each run
crop_model = DecisionTreeRegressor(random_state=1)

# Fit model
crop_model.fit(train_X, train_y)

DecisionTreeRegressor(random_state=1)

### Making Prediction

In [33]:
# Make validation predictions and calculate mean absolute error
val_predictions = crop_model.predict(val_X)
print("Predicted Price : ",val_predictions) # Prints Predicted Price
val_mae = mean_absolute_error(val_predictions, val_y)
print("Validation MAE: {:,.0f}".format(val_mae))

29     12
42     17
14     18
91     16
81     15
19     14
44     18
11     16
40     13
97     14
89     12
94     12
73     18
105    16
59     14
90     16
66     12
54     14
108    16
100    13
154    14
35     17
51     17
5      20
84     14
31     17
16     15
145    14
143    18
93     17
127    16
103    13
111    16
131    17
58     12
102    15
47     12
69     12
33     12
56     13
Name: Price_Per_Kg, dtype: int64
Predicted Price :  [16. 14. 17. 17. 18. 12. 16. 13. 16. 13. 13. 17. 17. 16. 12. 16. 14. 15.
 13. 12. 14. 13. 18. 15. 13. 18. 17. 13. 13. 12. 17. 13. 15. 14. 12. 18.
 18. 18. 18. 12.]
Validation MAE: 2


## Converting Model to Pickle File for later use

The Pickle file could be used later to use saved model for prediction. There are other options to be used for different purposes.

In [37]:
import pickle
# We will save our model
with open("crop_model.pkl", "wb") as f:
    pickle.dump(crop_model, f)

## Reading Pickle File and Predicting


In [42]:
# Loading pickle file
model = pickle.load(open("crop_model.pkl", "rb"))

# Predicting with new random Data
# Provide data in following format ['Dzongkhag', 'Gewog', 'Year', 'Season', 'Crop', 'CultivatedArea_Acre','Yeild_Kg']
# Our Test Data ['Sarpang', 'Gelephu', '2021', 'Winter', 'Potatoes', '50','2000'] 

random_data=[11,11,2021,1,1,50,2000]
prediction = model.predict([random_data])

print('Your Predicted Price is : ', prediction)

Your Predicted Price is :  [15.]


## NEXT STEPS --__

This is simple model using small dataset and algorithm. 

Ensure to test it with real and large dataset. Also experiemnt with other algorithms for better result.