# Domain Understanding : SWIR and NDVI

Normalized Difference Vegetation Index (NDVI) quantifies vegetation by measuring the difference between near-infrared (which vegetation strongly reflects) and red light (which vegetation absorbs).

NDVI always ranges from -1 to +1. 
Negative values : highly likely that it’s water 
Value close to +1 : dense green leaves 
Value close to zero : No green leaves, could even be an urbanized area

Calculate NDVI : Normalized Difference Vegetation Index (NDVI) uses the NIR and red channels in its formula.
NDVI = (NIR-Red) / (NIR+Red)

Near Infrared sensor are extremely important for ecology because healthy plants reflect it – the water in their leaves scatters the wavelengths back into the sky. They can be used for vegetation monitoring, crop stress etc. By comparing it with other bands, we get indexes like NDVI, which let us measure plant health more precisely than if we only looked at visible greenness. But, NIR sensor do not tell us about the geology, rocks etc.

SWIR sensors are useful for telling wet earth from dry earth, and for geology: rocks and soils that look similar in other bands often have strong contrasts in SWIR. SWIR sensors discriminates moisture content of soil and vegetation and penetrates thin clouds.

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
import xgboost as xgb
warnings.filterwarnings("ignore")
warnings.filterwarnings(module='sklearn*', action='ignore', category=DeprecationWarning)

from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeClassifier 
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

In [3]:
dataframe  = pd.read_csv('Crop Classification 2.csv', encoding = 'utf-8')
print(dataframe.shape)
dataframe.head()

(18834, 35)


Unnamed: 0,UID,Crop Name,B11:SWIR,B12:SWIR,B2:Blue,B3:Green,B4:Red,B5:Vegetation red edge,B6:Vegetation red edge,B7:Vegetation red edge,...,3_B3:Green,3_B4:Red,3_B5:Vegetation red edge,3_B6:Vegetation red edge,3_B7:Vegetation red edge,3_B8:NIR,3_B8A:Narrow NIR,NDVI 1,NDVI 2,NDVI 3
0,1,Wheat,2055,1257,1080,1017,644,1276,2898,3431,...,1011,635,1224,3121,4009,4360,4338,0.713714,0.753436,0.745746
1,2,Potato,1786,1026,1058,1031,738,1202,2818,3390,...,1093,696,1225,4009,5141,5153,5381,0.638678,0.756262,0.762011
2,3,Rapeseed,1415,657,1052,1138,695,1328,4092,4921,...,1187,738,1356,3917,4750,4566,4942,0.743873,0.714617,0.721719
3,4,Wheat,2041,1229,1097,1099,796,1324,2940,3619,...,1039,680,1140,3222,4159,4655,4395,0.668954,0.756324,0.74508
4,5,Rapeseed,1944,1272,1090,1166,825,1490,3533,4194,...,1083,725,1339,2946,3576,3930,3852,0.717804,0.651241,0.688507


## Data Preprocessing

In [4]:
pd.isnull(dataframe).sum()

UID                         0
Crop Name                   0
B11:SWIR                    0
B12:SWIR                    0
B2:Blue                     0
B3:Green                    0
B4:Red                      0
B5:Vegetation red edge      0
B6:Vegetation red edge      0
B7:Vegetation red edge      0
B8:NIR                      0
B8A:Narrow NIR              0
2_B11:SWIR                  0
2_B12:SWIR                  0
2_B2:Blue                   0
2_B3:Green                  0
2_B4:Red                    0
2_B5:Vegetation red edge    0
2_B6:Vegetation red edge    0
2_B7:Vegetation red edge    0
2_B8:NIR                    0
2_B8A:Narrow NIR            0
3_B11:SWIR                  0
3_B12:SWIR                  0
3_B2:Blue                   0
3_B3:Green                  0
3_B4:Red                    0
3_B5:Vegetation red edge    0
3_B6:Vegetation red edge    0
3_B7:Vegetation red edge    0
3_B8:NIR                    0
3_B8A:Narrow NIR            0
NDVI 1                      0
NDVI 2    

=========================================================================================================

=> There is No null values in Dataset.
Hence, We do not need to perform Data Imputation.

As part of Data Preprocessing, We will perform 'Scalling' before model building.

=========================================================================================================

In [5]:
# Checking Crop wise Data frequency 
print('Crop record frequency :')
print(dataframe['Crop Name'].value_counts(normalize=True))
print(' ')
print('Crop record count :')
print(dataframe['Crop Name'].value_counts())

Crop record frequency :
Wheat        0.304874
Cumin        0.248009
Jowar        0.125518
Castor       0.082617
Maize        0.079696
Rapeseed     0.074652
Gram         0.056387
Fennel       0.019911
Sugarcane    0.005044
Tobacco      0.002336
Potato       0.000956
Name: Crop Name, dtype: float64
 
Crop record count :
Wheat        5742
Cumin        4671
Jowar        2364
Castor       1556
Maize        1501
Rapeseed     1406
Gram         1062
Fennel        375
Sugarcane      95
Tobacco        44
Potato         18
Name: Crop Name, dtype: int64


# Approach 1

## Step 1 - Dropping Crop Categories that has less than 5 % frequency

=========================================================================================================

=> As very few records of these crop categories are present in dataset, anyway We are not able to make proper decision about these crops due to scarcity of records.

Moreover, These records will create problems in detection of other major crop categories by acting as Noise.

Hence, It felt appropriate to remove these records from dataset.
We will see how Models are reacting in presence of these records later on.

=========================================================================================================

In [6]:
#Remove Crop categories with less than 5 % frequency

dataframe_5per = dataframe[dataframe['Crop Name'] != 'Potato']
dataframe_5per = dataframe_5per[dataframe_5per['Crop Name'] != 'Tobacco']
dataframe_5per = dataframe_5per[dataframe_5per['Crop Name'] != 'Sugarcane']
dataframe_5per = dataframe_5per[dataframe_5per['Crop Name'] != 'Fennel']

print('Crop record frequency')
print(dataframe_5per['Crop Name'].value_counts(normalize=True))

Crop record frequency
Wheat       0.313736
Cumin       0.255218
Jowar       0.129166
Castor      0.085018
Maize       0.082013
Rapeseed    0.076822
Gram        0.058026
Name: Crop Name, dtype: float64


In [7]:
# Splitting Independent and Dependent Variables
x_5per = dataframe_5per.iloc[:, 2:35].values
y_5per = dataframe_5per.iloc[:, 1].values

# Splitting the dataset into the Training set and Test set
X_train, X_test, y_train, y_test = train_test_split(x_5per, y_5per, test_size=0.2, random_state=0)

# Feature Scaling
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [8]:
# DescisionTreeClassifier

#Loop to identify optimum tree_depth
#for i in range (7, 13):
    
dtree_model = DecisionTreeClassifier(max_depth=10).fit(X_train, y_train)   #Optimum Value for max_depth : 10
dtree_predictions = dtree_model.predict(X_test)

print('Accuracy : ', accuracy_score(y_test, dtree_predictions))

('Accuracy : ', 0.5544933078393881)


In [9]:
# RandomForestClassifier

#Loop to identify optimum values of estimators + tree_depth
#i_array = [130, 140, 150, 160]
#j_array = [40, 45, 50]

#for i in temp_array:
#    print(' ')
#    for j in j_array:
rf_clf = RandomForestClassifier(n_estimators=140, max_depth=45, random_state=0) #Optimum Values: estimators=140 & max_depth=45
rf_clf.fit(X_train, y_train)
rf_predictions = rf_clf.predict(X_test)
   
print('Accuracy : ', accuracy_score(y_test, rf_predictions))

('Accuracy : ', 0.6309751434034416)


In [10]:
# Loop to identify optimum values of estimators + tree_depth
#i_array = [100, 500, 1000]
#j_array = [0.08, 0.09, 0.1]

#for i in i_array:
#    print(' ')
#    for j in j_array:
xg_cl = xgb.XGBClassifier(objective= "multi:softprob", 
                          n_estimators=1000, 
                          learning_rate=0.1, 
                          seed=123,
                          max_depth=7)

xg_cl.fit(X_train, y_train)

preds = xg_cl.predict(X_test)
        
print('Accuracy : ', accuracy_score(y_test, preds))

('Accuracy : ', 0.6724938541382136)


=========================================================================================================

=> Dataset with Crop categories greater than 5 % frequency

Decision Tree Classifier : 55.44 %
Random Forest Classifier : 63.09 %
XGBoost Classifier : 67.24 %

It is clear that RandomForestClassifier (Ensemble Learning), XGBoost Classifier are performing better compare to DecisionTreeClassifier.

Hence, We will use RandomForestClassifier and XGBoost Classifier now onwards.

=========================================================================================================

## Step 2 - Dropping Crop Categories that has less than 7 % frequency

In [11]:
# Remove Crop categories with less than 7 % frequency

print('Crop record frequency :')
print(dataframe['Crop Name'].value_counts(normalize=True))

dataframe_7per = dataframe_5per[dataframe_5per['Crop Name'] != 'Gram']

Crop record frequency :
Wheat        0.304874
Cumin        0.248009
Jowar        0.125518
Castor       0.082617
Maize        0.079696
Rapeseed     0.074652
Gram         0.056387
Fennel       0.019911
Sugarcane    0.005044
Tobacco      0.002336
Potato       0.000956
Name: Crop Name, dtype: float64


In [12]:
# Splitting Independent and Dependent Variables

x_7per = dataframe_7per.iloc[:, 2:35].values
y_7per = dataframe_7per.iloc[:, 1].values

# Splitting the dataset into the Training set and Test set
X_train, X_test, y_train, y_test = train_test_split(x_7per, y_7per, test_size=0.2, random_state=0)

# Feature Scaling
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [13]:
# RandomForestClassifier

rf_clf = RandomForestClassifier(n_estimators=180, max_depth=45, random_state=0) #Optimum Values : estimators=180 & max_depth=45
rf_clf.fit(X_train, y_train)
rf_predictions = rf_clf.predict(X_test)
    
print('Accuracy : ', accuracy_score(y_test, rf_predictions))

('Accuracy : ', 0.673723897911833)


In [14]:
xg_cl = xgb.XGBClassifier(objective= "multi:softprob", 
                                  n_estimators=1000, 
                                  learning_rate=0.1, 
                                  seed=123,
                                  max_depth=7)

xg_cl.fit(X_train, y_train)

preds = xg_cl.predict(X_test)
        
print('Accuracy : ', accuracy_score(y_test, preds))

('Accuracy : ', 0.7033062645011601)


In [15]:
# Creates a confusion matrix
cm = confusion_matrix(y_test, preds)

# Transform to df for easier plotting
cm_df = pd.DataFrame(cm)

cm_df.style.background_gradient(cmap='coolwarm')

Unnamed: 0,0,1,2,3,4,5
0,187,34,15,11,4,59
1,12,743,37,15,2,127
2,16,90,241,31,2,83
3,14,53,51,77,3,75
4,15,18,0,4,240,16
5,15,149,51,15,6,937


=========================================================================================================

=> Dataset with Crop categories greater than 7 % frequency

Random Forest Classifier : 67.37 %
XGBoost Classifier : 70.33 %

=========================================================================================================

=========================================================================================================

By dropping low frequency crop categories, we successfully increased model accuracy. 

=> RandomForestClassifier accuracy increased from 63.09 % to 67.37 %

=> XGBoost Classifier accuracy increased from 67.24 % to 70.33 %

But Now, It is not advisable/appropriate to drop more crop categories to achieve higher accuracy.


So, Over Record selected of greater than 7 % frequency, Let us apply Feature Selection.

=========================================================================================================

## Step 3 - Feature Selection - Dropping Highly corelated Columns

In [16]:
# Correlation Matrix

df_Corr = dataframe_7per[['Crop Name', 'B11:SWIR', 'B12:SWIR', 'B2:Blue', 'B3:Green', 'B4:Red', 'B5:Vegetation red edge', 
                          'B6:Vegetation red edge', 'B7:Vegetation red edge', 'B8:NIR', 'B8A:Narrow NIR', 'NDVI 1']]

corr = df_Corr.corr()

corr.style.background_gradient(cmap='coolwarm')

Unnamed: 0,B11:SWIR,B12:SWIR,B2:Blue,B3:Green,B4:Red,B5:Vegetation red edge,B6:Vegetation red edge,B7:Vegetation red edge,B8:NIR,B8A:Narrow NIR,NDVI 1
B11:SWIR,1.0,0.933718,0.583836,0.732848,0.757565,0.829988,0.136881,0.0485603,0.0546999,0.0742981,-0.366222
B12:SWIR,0.933718,1.0,0.679042,0.747001,0.856879,0.826637,-0.139019,-0.231784,-0.220184,-0.210609,-0.617325
B2:Blue,0.583836,0.679042,1.0,0.913556,0.878879,0.804541,-0.109045,-0.206904,-0.216806,-0.209375,-0.646406
B3:Green,0.732848,0.747001,0.913556,1.0,0.898925,0.930595,0.173692,0.0611205,0.0506569,0.0551443,-0.47939
B4:Red,0.757565,0.856879,0.878879,0.898925,1.0,0.91099,-0.164414,-0.276596,-0.29488,-0.278074,-0.761349
B5:Vegetation red edge,0.829988,0.826637,0.804541,0.930595,0.91099,1.0,0.171763,0.0474927,0.0363038,0.0464441,-0.487681
B6:Vegetation red edge,0.136881,-0.139019,-0.109045,0.173692,-0.164414,0.171763,1.0,0.985219,0.958007,0.978884,0.718575
B7:Vegetation red edge,0.0485603,-0.231784,-0.206904,0.0611205,-0.276596,0.0474927,0.985219,1.0,0.974117,0.993979,0.791058
B8:NIR,0.0546999,-0.220184,-0.216806,0.0506569,-0.29488,0.0363038,0.958007,0.974117,1.0,0.975262,0.816513
B8A:Narrow NIR,0.0742981,-0.210609,-0.209375,0.0551443,-0.278074,0.0464441,0.978884,0.993979,0.975262,1.0,0.79407


=========================================================================================================

=> Conclusion from Correlation Matrix : 

(1) 'B11:SWIR' and 'B12:SWIR' are highly correlated.
(2) 'B6:Vegetation red edge', 'B7:Vegetation red edge', 'B8:NIR', 'B8:Narrow NIR' are highly correlated.

Let's Keep => B11:SWIR and B8:NIR (As NDVI gets calculated based on NIR and Red).
And Drop => 'B12:SWIR', 'B6:Vegetation red edge', 'B7:Vegetation red edge' and 'B8A:Narrow NIR'.

=========================================================================================================

In [17]:
useless= ['B12:SWIR', 'B6:Vegetation red edge', 'B7:Vegetation red edge', 'B8A:Narrow NIR', 
          '2_B12:SWIR', '2_B6:Vegetation red edge', '2_B7:Vegetation red edge', '2_B8A:Narrow NIR',
          '3_B12:SWIR', '3_B6:Vegetation red edge', '3_B7:Vegetation red edge', '3_B8A:Narrow NIR']

dataframe2 = dataframe_7per.copy()

dataframe2.head()

dataframe2= dataframe2.drop(useless, axis = 1)
print(len(dataframe2.columns))
print(dataframe2.columns)

23
Index([u'UID', u'Crop Name', u'B11:SWIR', u'B2:Blue', u'B3:Green', u'B4:Red',
       u'B5:Vegetation red edge', u'B8:NIR', u'2_B11:SWIR', u'2_B2:Blue',
       u'2_B3:Green', u'2_B4:Red', u'2_B5:Vegetation red edge', u'2_B8:NIR',
       u'3_B11:SWIR', u'3_B2:Blue', u'3_B3:Green', u'3_B4:Red',
       u'3_B5:Vegetation red edge', u'3_B8:NIR', u'NDVI 1', u'NDVI 2',
       u'NDVI 3'],
      dtype='object')


In [18]:
# Splitting Independent and Dependent Variables
x = dataframe2.iloc[:, 2:20].values
y = dataframe2.iloc[:, 1].values

# Splitting the dataset into the Training set and Test set
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)

# Feature Scaling
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [19]:
# RandomForestClassifier
#Optimum Values : estimators=180 & max_depth=45

rf_clf = RandomForestClassifier(n_estimators=180, max_depth=45, random_state=0)   
rf_clf.fit(X_train, y_train)
rf_predictions = rf_clf.predict(X_test)
    
print('Accuracy : ', accuracy_score(y_test, rf_predictions))

('Accuracy : ', 0.6661832946635731)


In [20]:
xg_cl = xgb.XGBClassifier(objective= "multi:softprob", 
                          n_estimators=1000, 
                          learning_rate=0.1, 
                          seed=123,
                          max_depth=7)

xg_cl.fit(X_train, y_train)

preds = xg_cl.predict(X_test)
        
print('Accuracy : ', accuracy_score(y_test, preds))

('Accuracy : ', 0.6844547563805105)


=========================================================================================================

=> Dataset containing records of Crops with more than 7 % frequency :

Without Feature Selection :

(1) Random Forest Classifier : 67.37 %

(2) XGBoost Classifier : 70.33 %
                            
With Feature Selection : 

(1) Random Forest Classifier : 66.61 %

(2) XGBoost Classifier : 68.44 %                            


=> Accuracy decreases by 1-2 % after we introduce Feature Selection to the dataset.

So, We can conclude that, We should take all the columns of dataset into consideration no matter how correlated they are.

Let's try Principal Component Analysis to check whether we can achieve better Model performance through it.

=========================================================================================================

# Approach 2

## Principal Component Analysis - Dimensionality Reduction

## [ Considering whole Dataset ]

=========================================================================================================

The main idea of principal component analysis (PCA) is to reduce the dimensionality of a data set consisting of many variables correlated with each other, either heavily or lightly, while retaining the variation present in the dataset, up to the maximum extent. 

The same is done by transforming the variables to a new set of variables, which are known as the principal components (or simply, the PCs) and are orthogonal, ordered such that the retention of variation present in the original variables decreases as we move down in the order.

=========================================================================================================

In [21]:
# Splitting Independent and Dependent Variables
X = dataframe.iloc[:, 2:35].values
y = dataframe.iloc[:, 1].values

# Splitting the dataset into the Training set and Test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)

# Feature Scaling
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [22]:
# Creating PCA Object
from sklearn.decomposition import PCA
pca = PCA(n_components = None)
X_train = pca.fit_transform(X_train)
X_test = pca.transform(X_test)
explained_variance = pca.explained_variance_ratio_
print(len(list(explained_variance)))

print(sum(explained_variance[0:10]))    #first 10 PCs explain 98.68 % of the variance

list(explained_variance)

33
0.9868793836973294


[0.31495056317663034,
 0.26232690901276023,
 0.16709838123403856,
 0.10437887600801797,
 0.07473423190795096,
 0.02604707825000772,
 0.019197566855010353,
 0.00805696128451673,
 0.005547383236222875,
 0.004541432732173742,
 0.0025449510221778864,
 0.0015505720527917424,
 0.0014219370064909977,
 0.0012959915256157528,
 0.00113973699906862,
 0.0008008648682779804,
 0.000736989473076751,
 0.0006631124198804289,
 0.0005642043107659235,
 0.0003746760560584836,
 0.0003117750157203712,
 0.00029068536332230236,
 0.0002575936294467947,
 0.00022372528711290458,
 0.0001980328087143595,
 0.00017374344572193274,
 0.00012667132768238136,
 0.00012038321740322912,
 0.00010452236565874256,
 8.61063119101056e-05,
 6.74522078756977e-05,
 4.084865545894498e-05,
 2.6040932438214122e-05]

=========================================================================================================

Here,the first 10 PCs explain 98.68 % of the variance.

Let's consider first 10 PCAs to train model.

=========================================================================================================

In [23]:
# Covering 10 Variances
pca = PCA(n_components = 10)
X_train = pca.fit_transform(X_train)
X_test = pca.transform(X_test)

# RandomForestClassifier
#Optimum Values : estimators=180 & max_depth=45

rf_clf = RandomForestClassifier(n_estimators=180, max_depth=45, random_state=0)
rf_clf.fit(X_train, y_train)
rf_predictions = rf_clf.predict(X_test)
    
print('Accuracy : ', accuracy_score(y_test, rf_predictions))

('Accuracy : ', 0.575789753119193)


In [24]:
xg_cl = xgb.XGBClassifier(objective= "multi:softprob", 
                                  n_estimators=1000, 
                                  learning_rate=0.1, 
                                  seed=123,
                                  max_depth=7)

xg_cl.fit(X_train, y_train)

preds = xg_cl.predict(X_test)
        
print('Accuracy : ', accuracy_score(y_test, preds))

('Accuracy : ', 0.5694186355189806)


=========================================================================================================

We considered 98.68 % of the variance of Dataset And, PCA is giving us 57 % Accuracy.

(1) Random Forest Classifier : 57.57 %

(2) XGBoost Classifier : 56.94 %


Let's treat Imbalanced Dataset with "SMOTE".

=========================================================================================================

# Approach 3

# SMOTE - Synthetic Minority Oversampling Technique

=========================================================================================================

SMOTE synthetically generates new minority instances between existing instances. 

The new instances created are not just the copy of existing minority cases instead; the algorithm takes sample of feature space for each target class and its neighbors and then generates new instances that combine the features of the target cases with features of its neighbors.

This approach increases the features available to each class and makes the samples more general. SMOTE takes the entire dataset as an input, but it increases the percentage of only the minority cases.

=========================================================================================================

In [25]:
# Splitting Independent and Dependent Variables
X = dataframe.iloc[:, 2:35].values
y = dataframe.iloc[:, 1].values

# Splitting the dataset into the Training set and Test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, stratify = y, random_state = 0)

# Feature Scaling
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
print(X_train.shape)

(15067L, 33L)


In [26]:
from imblearn.over_sampling import SMOTE
smote = SMOTE('minority')

x_sm, y_sm = smote.fit_sample(X_train, y_train)
print(x_sm.shape)

(19646L, 33L)


=========================================================================================================

SMOTE added around 4600 new minority Crop categories records to dataset.

=========================================================================================================

In [27]:
# RandomForestClassifier
#Optimum Values identified for estimators = 140 and max_depth = 45

#class_weight for improving model performance

rf_clf = RandomForestClassifier(n_estimators=140, max_depth=45, random_state=0, class_weight = 'balanced')
rf_clf.fit(x_sm, y_sm)
rf_predictions = rf_clf.predict(X_test)
    
print('Accuracy : ', accuracy_score(y_test, rf_predictions))

('Accuracy : ', 0.6179984072206)


## XGBoost Classifier

In [28]:

xg_cl = xgb.XGBClassifier(objective= "multi:softprob", 
                          n_estimators=1000, 
                          learning_rate=0.1, 
                          seed=123,
                          max_depth=7)

xg_cl.fit(x_sm, y_sm)

preds = xg_cl.predict(X_test)
        
print('Accuracy : ', accuracy_score(y_test, preds))

('Accuracy : ', 0.64719936288824)


=========================================================================================================

SMOTE gives us accuracy of : 
    
(1) Random Forest Classifier : 61.79 %
    
(2) XGBoost Classifier : 64.71 %

=========================================================================================================

# Combined Approach

=========================================================================================================

Let's combine all positives of above approaches :

(1) Dataset with greater than 7% crop category frequencies

(2) All columns of Dataset

(3) SMOTE

(4) XGBoost

=========================================================================================================

In [29]:
# Splitting Independent and Dependent Variables
x_7per = dataframe_7per.iloc[:, 2:35].values
y_7per = dataframe_7per.iloc[:, 1].values

# Splitting the dataset into the Training set and Test set
X_train, X_test, y_train, y_test = train_test_split(x_7per, y_7per, test_size=0.2, random_state=0)

# Feature Scaling
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

#SMOTE
from imblearn.over_sampling import SMOTE
smote = SMOTE('minority')
x_sm, y_sm = smote.fit_sample(X_train, y_train)

#XGB Classifier
xg_cl = xgb.XGBClassifier(objective= "multi:softprob", 
                                  n_estimators=1000, 
                                  learning_rate=0.1, 
                                  seed=123,
                                  max_depth=8)

xg_cl.fit(x_sm, y_sm)

preds = xg_cl.predict(X_test)
        
print('Accuracy : ', accuracy_score(y_test, preds))

('Accuracy : ', 0.7093967517401392)


=========================================================================================================

i.e. 71 % Accuracy.

I have tried to tweak model parameters to possible extent.

And, This output is as optimum as I get.

=========================================================================================================