#**Recomendación tipo de cultivo**

Generaremos un modelo de máquina de soporte vectorial SVM para recomendar un tipo de cultivo dependiendo de la información del suelo y ambiental. El modelo luego se pondrá en producción usando google cloud y streamlit.

## **Dataset:**

El dataset usado se denomina "Crop Recommendation Dataset" y se encuentra público en kaggle. Puedes encontrarlo en el siguiente enlace:

[Dataset Kaggle](https://www.kaggle.com/atharvaingle/crop-recommendation-dataset?select=Crop_recommendation.csv)

## **Variables independientes:**

*   Nitrógeno
*   Fósforo
*   Potasio
*   Temperatura
*   Humedad
*   pH
*   Lluvia


# 1. Instalación scikit-learn

In [85]:
!pip install scikit-learn==0.24.1
!pip install xlrd

Collecting xlrd
  Downloading xlrd-2.0.1-py2.py3-none-any.whl (96 kB)
[K     |████████████████████████████████| 96 kB 474 kB/s eta 0:00:01
[?25hInstalling collected packages: xlrd
Successfully installed xlrd-2.0.1


# 2. Importamos librerias

In [75]:
import numpy as np
from sklearn import svm
import pandas as pd
import matplotlib.pyplot as plt
import pickle
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import sklearn
import seaborn as sns
sklearn.__version__

'0.24.1'

In [76]:
!python3 --version

Python 3.9.2


# 3. Cargamos dataset

In [86]:
df = pd.read_excel('./data.xls')
df.head(10)

Unnamed: 0,STG,SCG,STR,LPR,PEG,UNS
0,0.0,0.0,0.0,0.0,0.0,very_low
1,0.08,0.08,0.1,0.24,0.9,High
2,0.06,0.06,0.05,0.25,0.33,Low
3,0.1,0.1,0.15,0.65,0.3,Middle
4,0.08,0.08,0.08,0.98,0.24,Low
5,0.09,0.15,0.4,0.1,0.66,Middle
6,0.1,0.1,0.43,0.29,0.56,Middle
7,0.15,0.02,0.34,0.4,0.01,very_low
8,0.2,0.14,0.35,0.72,0.25,Low
9,0.0,0.0,0.5,0.2,0.85,High


In [87]:
df.describe()

Unnamed: 0,STG,SCG,STR,LPR,PEG
count,403.0,403.0,403.0,403.0,403.0
mean,0.353141,0.35594,0.457655,0.431342,0.45636
std,0.212018,0.215531,0.246684,0.257545,0.266775
min,0.0,0.0,0.0,0.0,0.0
25%,0.2,0.2,0.265,0.25,0.25
50%,0.3,0.3,0.44,0.33,0.4
75%,0.48,0.51,0.68,0.65,0.66
max,0.99,0.9,0.95,0.99,0.99


In [89]:
df.columns = [each.strip() for each in df.columns]
df.columns

Index(['STG', 'SCG', 'STR', 'LPR', 'PEG', 'UNS'], dtype='object')

In [90]:
df.UNS.value_counts()

Low         129
Middle      122
High        102
Very Low     26
very_low     24
Name: UNS, dtype: int64

In [91]:
df.UNS = [each.lower().replace("very low","very_low") for each in df.UNS]
df.UNS.value_counts()

low         129
middle      122
high        102
very_low     50
Name: UNS, dtype: int64

In [94]:
Y = df.UNS
X = df.drop(["UNS"],axis= 1)

In [95]:
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier,VotingClassifier
from sklearn.metrics import confusion_matrix,classification_report
from sklearn.model_selection import StratifiedKFold, GridSearchCV
from sklearn.metrics import accuracy_score

In [96]:
# Separamos datos para ajuste y prueba
X_train, X_test, y_train, y_test = train_test_split(X, Y, train_size=0.8, test_size=0.2, random_state=100)

In [97]:
X_train.shape

(322, 5)

In [98]:
pd.Series(y_train).value_counts(normalize = True)

low         0.313665
middle      0.304348
high        0.260870
very_low    0.121118
Name: UNS, dtype: float64

# 4. Ajuste del modelo SVM

In [99]:
model = svm.SVC(kernel='poly', degree=2, gamma='scale')

In [100]:
# Creamos el modelo SVM para clasificacion con kernel lineal/rbf y entrenamos el modelo
model = svm.SVC(kernel='linear', C=100).fit(X_train, y_train)

In [105]:
# Grabamos el modelo en el directorio
pkl_filename = "pickle_model.pkl"
with open(pkl_filename, 'wb') as file:
    pickle.dump(model, file)

In [102]:
# Cargamos el modelo
pkl_filename = "pickle_model.pkl"
with open(pkl_filename, 'rb') as file:
    model = pickle.load(file)

# 5. Desempeño del modelo

In [103]:
# Encontramos el accuracy promedio usando datos de test
score = model.score(X_test, y_test)
print(score)

0.9629629629629629


# 6. Probamos con una muestra nueva

In [104]:
x_in = np.asarray([0.08, 0.08 ,0.1, 0.24,0.9]).reshape(1,-1)
predicts = model.predict(x_in)
predicts[0]

'high'