## Decission Tree Clasification Using Scikit-Learn

### Precipitation Prediction in LA

*Predicted Value: ***High_PRCP*** = False or True.* *Means Precipitation Value is Under or Above Precipitation Average*
- ***PRCP:*** Precipitation (tenths of mm)
- ***TMAX:*** Maximun Temperature (tenths of degrees C)
- ***TMIN:*** Minimun Temperature (tenths of degrees C)
- ***AWND:*** Average Daily Wind Speed (tenths of meters per seconds)
- ***WDFx:*** Direction of Fastest x-minute Wind (degree)
- ***WSFx:*** Fastest x-minutes Wind Speed (tenth of meters per seconds)

In [73]:
# Imports
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [74]:
# Read Data Set
df = pd.read_csv('Data Set/weather.csv')
print(df.count())
df.head()

STATION    1827
NAME       1827
DATE       1827
AWND       1823
PGTM         62
PRCP       1827
TAVG          0
TMAX       1827
TMIN       1827
WDF2       1822
WDF5       1809
WSF2       1823
WSF5       1809
WT01        551
WT02         39
WT08        861
dtype: int64


Unnamed: 0,STATION,NAME,DATE,AWND,PGTM,PRCP,TAVG,TMAX,TMIN,WDF2,WDF5,WSF2,WSF5,WT01,WT02,WT08
0,USW00093134,"LOS ANGELES DOWNTOWN USC, CA US",2016-01-01,2.46,,0.0,,64,43,10.0,30.0,8.1,11.0,,,1.0
1,USW00093134,"LOS ANGELES DOWNTOWN USC, CA US",2016-01-02,2.01,,0.0,,65,47,270.0,30.0,6.0,8.9,,,
2,USW00093134,"LOS ANGELES DOWNTOWN USC, CA US",2016-01-03,0.67,,0.0,,62,44,150.0,150.0,10.1,14.1,,,
3,USW00093134,"LOS ANGELES DOWNTOWN USC, CA US",2016-01-04,1.34,,0.01,,69,55,270.0,280.0,8.1,14.1,,,
4,USW00093134,"LOS ANGELES DOWNTOWN USC, CA US",2016-01-05,2.46,,1.61,,59,49,140.0,140.0,10.1,16.1,1.0,1.0,


In [75]:
del df['PGTM']
del df['TAVG']
del df['WT01']
del df['WT02']
del df['WT08']
del df['STATION']
del df['NAME']
del df['DATE']

In [76]:
df.count()

AWND    1823
PRCP    1827
TMAX    1827
TMIN    1827
WDF2    1822
WDF5    1809
WSF2    1823
WSF5    1809
dtype: int64

In [77]:
# Columns
df.columns

Index(['AWND', 'PRCP', 'TMAX', 'TMIN', 'WDF2', 'WDF5', 'WSF2', 'WSF5'], dtype='object')

In [78]:
# Drop Missing Values
df = df.dropna()
print(df.count())

AWND    1809
PRCP    1809
TMAX    1809
TMIN    1809
WDF2    1809
WDF5    1809
WSF2    1809
WSF5    1809
dtype: int64


In [79]:
# Describe
df.describe()

Unnamed: 0,AWND,PRCP,TMAX,TMIN,WDF2,WDF5,WSF2,WSF5
count,1809.0,1809.0,1809.0,1809.0,1809.0,1809.0,1809.0,1809.0
mean,1.674837,0.035959,76.525705,58.132117,254.051962,256.428966,9.078773,14.721669
std,0.910097,0.190752,9.133809,7.220202,54.937611,55.571902,2.008258,3.622257
min,0.0,0.0,52.0,38.0,10.0,10.0,2.9,6.0
25%,1.12,0.0,70.0,53.0,260.0,270.0,8.1,12.1
50%,1.57,0.0,76.0,58.0,270.0,270.0,8.9,14.1
75%,2.01,0.0,83.0,64.0,270.0,280.0,10.1,16.1
max,8.05,2.67,111.0,79.0,360.0,360.0,21.0,34.0


In [80]:
# Creating Target Value
df['HIGH_PRCP'] = (df[["PRCP"]]>df[["PRCP"]].mean()) * 1
df.head()

Unnamed: 0,AWND,PRCP,TMAX,TMIN,WDF2,WDF5,WSF2,WSF5,HIGH_PRCP
0,2.46,0.0,64,43,10.0,30.0,8.1,11.0,0
1,2.01,0.0,65,47,270.0,30.0,6.0,8.9,0
2,0.67,0.0,62,44,150.0,150.0,10.1,14.1,0
3,1.34,0.01,69,55,270.0,280.0,8.1,14.1,0
4,2.46,1.61,59,49,140.0,140.0,10.1,16.1,1


In [81]:
# Features Selection
feature = ['AWND', 'TMAX','TMIN','WDF2','WDF5', 'WSF2', 'WSF5']
# Targer Selection
target = ['HIGH_PRCP']

In [82]:
# Values
x = df[feature]
y = df[target]

In [83]:
# Split Train and Test
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=100)
print(x_train)
print(y_train)

      AWND  TMAX  TMIN   WDF2   WDF5  WSF2  WSF5
48    2.24    68    56  270.0  280.0   8.9  13.0
1123  0.45    75    50  270.0  270.0   6.9  11.0
1204  1.57    73    57  270.0  280.0  13.0  17.0
1700  0.89    89    67  260.0  270.0  10.1  15.0
567   1.79    85    67  270.0  280.0  10.1  15.0
...    ...   ...   ...    ...    ...   ...   ...
53    2.01    83    55  270.0  270.0   8.1  13.0
352   0.67    67    42  270.0  270.0   4.9   8.1
79    1.34    72    57  270.0  270.0   8.9  13.0
798   1.79    72    55  270.0  280.0   6.9  12.1
1560  5.82    57    52   80.0   90.0  10.1  15.0

[1212 rows x 7 columns]
      HIGH_PRCP
48            1
1123          0
1204          0
1700          0
567           0
...         ...
53            0
352           0
79            0
798           0
1560          1

[1212 rows x 1 columns]


In [84]:
# Model
model = DecisionTreeClassifier(max_leaf_nodes=20, random_state=0)
type(model)

sklearn.tree._classes.DecisionTreeClassifier

In [85]:
# Fit the Model
model.fit(x_train, y_train)

In [91]:
# Prediction
y_prediction = model.predict(x_test)

# Accurancy
acc = accuracy_score(y_true=y_test, y_pred=y_prediction)
print("Accurancy➡️ {}".format(acc))
print('Predicted Values➡️',y_prediction[:10])
print('Real Values⬇️\n',y_test[:10])

Accurancy➡️ 0.9229480737018425
Predicted Values➡️ [0 0 0 1 0 0 1 0 0 0]
Real Values⬇️
       HIGH_PRCP
134           0
455           0
960           0
31            0
512           0
1066          0
1110          1
495           0
57            0
176           0
