### Weather Prediction

O modelo de classificação construído neste notebook utiliza dados do clima em Seattle para predizer se, frente a determinadas circunstâncias, fará sol, chuva, se estará nublado ou se irá chuviscar.

Referência: https://www.kaggle.com/datasets/ananthr1/weather-prediction

In [3]:
# Importação das bibliotecas e módulos necessários

import pandas as pd
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn import metrics

In [4]:
# Leitura dos dados

df = pd.read_csv('seattle-weather.csv')
df.head(10)

Unnamed: 0,date,precipitation,temp_max,temp_min,wind,weather
0,2012-01-01,0.0,12.8,5.0,4.7,drizzle
1,2012-01-02,10.9,10.6,2.8,4.5,rain
2,2012-01-03,0.8,11.7,7.2,2.3,rain
3,2012-01-04,20.3,12.2,5.6,4.7,rain
4,2012-01-05,1.3,8.9,2.8,6.1,rain
5,2012-01-06,2.5,4.4,2.2,2.2,rain
6,2012-01-07,0.0,7.2,2.8,2.3,rain
7,2012-01-08,0.0,10.0,2.8,2.0,sun
8,2012-01-09,4.3,9.4,5.0,3.4,rain
9,2012-01-10,1.0,6.1,0.6,3.4,rain


In [5]:
# Visualizando as informações do dataframe

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1461 entries, 0 to 1460
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   date           1461 non-null   object 
 1   precipitation  1461 non-null   float64
 2   temp_max       1461 non-null   float64
 3   temp_min       1461 non-null   float64
 4   wind           1461 non-null   float64
 5   weather        1461 non-null   object 
dtypes: float64(4), object(2)
memory usage: 68.6+ KB


### Pré-processamento de dados

Lidos os dados, é necessário realizar seu pré-processamento, a fim de prepará-los para o modelo de predição.

In [6]:
# Conferindo se existem valures nulos no dataframe

df.isnull().sum()

date             0
precipitation    0
temp_max         0
temp_min         0
wind             0
weather          0
dtype: int64

In [7]:
# Filtrando as colunas para treino do modelo

filtered_df = df[['precipitation','temp_max','temp_min','wind']]

In [8]:
# Normalizando os atributos

scaler = preprocessing.MinMaxScaler()
scaler_data = scaler.fit_transform(filtered_df)
scaled_df = pd.DataFrame(scaler_data, columns=filtered_df.columns)
scaled_df

Unnamed: 0,precipitation,temp_max,temp_min,wind
0,0.000000,0.387097,0.476378,0.472527
1,0.194991,0.327957,0.389764,0.450549
2,0.014311,0.357527,0.562992,0.208791
3,0.363148,0.370968,0.500000,0.472527
4,0.023256,0.282258,0.389764,0.626374
...,...,...,...,...
1456,0.153846,0.161290,0.346457,0.274725
1457,0.026834,0.177419,0.346457,0.098901
1458,0.000000,0.236559,0.303150,0.241758
1459,0.000000,0.193548,0.240157,0.329670


In [9]:
# Unindo o dataframe com os atributos numéricos ao dataframe original

columns = scaled_df.columns

original_columns = df.columns.difference(columns)
df = df[original_columns]

df = pd.concat([df, scaled_df[columns]], axis=1)
df

Unnamed: 0,date,weather,precipitation,temp_max,temp_min,wind
0,2012-01-01,drizzle,0.000000,0.387097,0.476378,0.472527
1,2012-01-02,rain,0.194991,0.327957,0.389764,0.450549
2,2012-01-03,rain,0.014311,0.357527,0.562992,0.208791
3,2012-01-04,rain,0.363148,0.370968,0.500000,0.472527
4,2012-01-05,rain,0.023256,0.282258,0.389764,0.626374
...,...,...,...,...,...,...
1456,2015-12-27,rain,0.153846,0.161290,0.346457,0.274725
1457,2015-12-28,rain,0.026834,0.177419,0.346457,0.098901
1458,2015-12-29,fog,0.000000,0.236559,0.303150,0.241758
1459,2015-12-30,sun,0.000000,0.193548,0.240157,0.329670


In [10]:
# Ordenando as colunas do dataframe

df = df[['date', 'precipitation', 'temp_max', 'temp_min', 'wind', 'weather']]
df

Unnamed: 0,date,precipitation,temp_max,temp_min,wind,weather
0,2012-01-01,0.000000,0.387097,0.476378,0.472527,drizzle
1,2012-01-02,0.194991,0.327957,0.389764,0.450549,rain
2,2012-01-03,0.014311,0.357527,0.562992,0.208791,rain
3,2012-01-04,0.363148,0.370968,0.500000,0.472527,rain
4,2012-01-05,0.023256,0.282258,0.389764,0.626374,rain
...,...,...,...,...,...,...
1456,2015-12-27,0.153846,0.161290,0.346457,0.274725,rain
1457,2015-12-28,0.026834,0.177419,0.346457,0.098901,rain
1458,2015-12-29,0.000000,0.236559,0.303150,0.241758,fog
1459,2015-12-30,0.000000,0.193548,0.240157,0.329670,sun


In [11]:
# Dividindo x e y para treino e teste

x = df[['precipitation', 'temp_max', 'temp_min', 'wind']].values
y = df[['weather']].values

# Dividindo dados para treino e dados para teste

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 42)

In [12]:
!pip3 install pycaret[full]



ERROR: Could not install packages due to an OSError: [WinError 5] Acesso negado: 'C:\\Users\\Inteli\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python310\\site-packages\\~yzmq.libs\\libsodium-ac42d648.dll'
Check the permissions.




Collecting shap>=0.38.0 (from pycaret[full])
  Obtaining dependency information for shap>=0.38.0 from https://files.pythonhosted.org/packages/76/0f/a17e7f29c9bb859231a7098457b08ca99d16079b8d8c6c68d5be84800efb/shap-0.42.1-cp310-cp310-win_amd64.whl.metadata
  Using cached shap-0.42.1-cp310-cp310-win_amd64.whl.metadata (24 kB)
Collecting interpret>=0.2.7 (from pycaret[full])
  Obtaining dependency information for interpret>=0.2.7 from https://files.pythonhosted.org/packages/06/71/f765ef06a6e2e7c3705ffd4995bd2ddd9946a79abe69ea396384e37b7ad2/interpret-0.4.4-py3-none-any.whl.metadata
  Using cached interpret-0.4.4-py3-none-any.whl.metadata (1.1 kB)
Collecting umap-learn>=0.5.2 (from pycaret[full])
  Using cached umap_learn-0.5.3-py3-none-any.whl
Collecting pandas-profiling>=3.1.0 (from pycaret[full])
  Using cached pandas_profiling-3.6.6-py2.py3-none-any.whl (324 kB)
Collecting explainerdashboard>=0.3.8 (from pycaret[full])
  Obtaining dependency information for explainerdashboard>=0.3.8 fr

In [13]:
from pycaret.classification import *

In [14]:
s = setup(
  data=df,
  target = "weather",
  session_id = 123
)

Unnamed: 0,Description,Value
0,Session id,123
1,Target,weather
2,Target type,Multiclass
3,Target mapping,"drizzle: 0, fog: 1, rain: 2, snow: 3, sun: 4"
4,Original data shape,"(1461, 6)"
5,Transformed data shape,"(1461, 6)"
6,Transformed train set shape,"(1022, 6)"
7,Transformed test set shape,"(439, 6)"
8,Numeric features,4
9,Categorical features,1


In [15]:
melhor_modelo = compare_models()


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
ridge,Ridge Classifier,0.7153,0.0,0.7153,0.6332,0.6687,0.4932,0.5063,0.183
lr,Logistic Regression,0.7094,0.8538,0.7094,0.6249,0.6611,0.4825,0.4959,1.603
svm,SVM - Linear Kernel,0.5636,0.0,0.5636,0.6209,0.4918,0.2544,0.3203,0.227
knn,K Neighbors Classifier,0.5069,0.6043,0.5069,0.6205,0.4008,0.141,0.2414,0.261
ada,Ada Boost Classifier,0.4384,0.5,0.4384,0.1922,0.2672,0.0,0.0,0.269
dummy,Dummy Classifier,0.4364,0.5,0.4364,0.1905,0.2652,0.0,0.0,0.255
et,Extra Trees Classifier,0.1977,0.6551,0.1977,0.1358,0.1554,-0.0853,-0.1167,0.359
rf,Random Forest Classifier,0.1233,0.537,0.1233,0.0747,0.0864,-0.2245,-0.3028,0.386
lightgbm,Light Gradient Boosting Machine,0.0891,0.1484,0.0891,0.0428,0.0539,-0.256,-0.3522,1.198
gbc,Gradient Boosting Classifier,0.0833,0.1408,0.0833,0.0445,0.0566,-0.5581,-0.5844,0.631


In [16]:
melhor_modelo

In [17]:
evaluate_model(melhor_modelo)

interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Pipeline Plot', 'pipelin…

In [18]:
from sklearn.linear_model import RidgeClassifier

ridge_classifier = RidgeClassifier()
ridge_classifier.fit(x_train, y_train)
ridge_classifier.score(x_test, y_test)

0.7107061503416856

In [19]:
import pickle

In [20]:
pickle.dump(ridge_classifier, open('model.pkl', 'wb'))
save_model(melhor_modelo, 'ridgeClassifier')

Transformation Pipeline and Model Successfully Saved


(Pipeline(memory=FastMemory(location=C:\Users\Inteli\AppData\Local\Temp\joblib),
          steps=[('label_encoding',
                  TransformerWrapperWithInverse(exclude=None, include=None,
                                                transformer=LabelEncoder())),
                 ('numerical_imputer',
                  TransformerWrapper(exclude=None,
                                     include=['precipitation', 'temp_max',
                                              'temp_min', 'wind'],
                                     transformer=SimpleImputer(add_indicator=False,
                                                               copy=Tr...
                                     transformer=TargetEncoder(cols=['date'],
                                                               drop_invariant=False,
                                                               handle_missing='return_nan',
                                                               handle_unknown='value

In [21]:
create_api(melhor_modelo, 'api')

API successfully created. This function only creates a POST API, it doesn't run it automatically. To run your API, please run this command --> !python api.py


In [23]:
create_docker('api')

Writing requirements.txt
Writing Dockerfile
Dockerfile and requirements.txt successfully created.
    To build image you have to run --> !docker image build -f "Dockerfile" -t IMAGE_NAME:IMAGE_TAG .
            
