# Decision Tree (Play Tennis Prediction)

## Explanation

### Objective

create a machine learning model that can predict whether a personis going to play tennis or not, using the Decision Tree algorithm.

### Column Descriptions

- ***Outlook*** = Describes the general weather condition for the day. Possible values are Sunny, Overcast, and Rain.
- ***Temperature*** = Indicates the temperature level of the day. Possible values are Hot, Mild, and Cool.
- ***Humidity*** = Represents the humidity level in the atmosphere. Possible values are High or Normal.
- ***Wind*** = Describes the wind strength on the day. Possible values are Weak or Strong.
- ***PlayTennis*** = Target variable indicating whether tennis was played on that day. Possible values are Yes or No.

## A. Data Preparation

### A.1 Import Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### A.2 Load Data

In [2]:
try:
    df = pd.read_csv("playtennis.csv")
    print("Berhasil Membaca Data")
except:
    print("Gagal Membaca Data")

Berhasil Membaca Data


### A.3 Viewing Data Dimensions

In [3]:
df.shape

(200, 5)

### A.4 Viewing Data Informations

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Outlook      200 non-null    object
 1   Temperature  200 non-null    object
 2   Humidity     191 non-null    object
 3   Wind         190 non-null    object
 4   PlayTennis   200 non-null    object
dtypes: object(5)
memory usage: 7.9+ KB


### A.5 Viewing Data Statistics

In [6]:
# df.describe().T.style.format("{:.4f}").background_gradient(cmap='flare')

### A.6 Viewing Top 5 Data and Bottom 5 Data

In [7]:
df.head()

Unnamed: 0,Outlook,Temperature,Humidity,Wind,PlayTennis
0,Sunny,Hot,High,Weak,No
1,Overcast,Mild,Normal,Strong,No
2,Overcast,Cool,Normal,,Yes
3,Rain,Mild,High,Strong,No
4,Rain,Hot,Normal,Weak,Yes


In [8]:
df.tail()

Unnamed: 0,Outlook,Temperature,Humidity,Wind,PlayTennis
195,Rain,Mild,High,Weak,Yes
196,Overcast,Hot,High,Strong,Yes
197,Sunny,Mild,Normal,Strong,Yes
198,Rain,Mild,High,Weak,Yes
199,Rain,Cool,High,Weak,No


### A.7 Viewing Duplicated Data

In [9]:
df.duplicated().sum()

np.int64(141)

#### Not Removing Duplicated Data Because It Is Too Much and DecTree Model need to understand the Data Pattern

### A.8 Viewing Missing Data

In [10]:
df.isna().sum()

Outlook         0
Temperature     0
Humidity        9
Wind           10
PlayTennis      0
dtype: int64

In [12]:
nan_cols = ['Humidity', 'Wind']

for col in nan_cols:
    modus = df[col].mode()[0]
    df[col].fillna(modus, inplace=True)

In [14]:
df.isna().sum()

Outlook        0
Temperature    0
Humidity       0
Wind           0
PlayTennis     0
dtype: int64

### A.9 Viewing Outlier Data

#### No Outlier Because The Data Is Categorical

## B. Data Preprocessing

### B.1 Mapping Feature

In [15]:
outlook_map = {'Sunny': 0, 'Overcast': 1, 'Rain': 2}
temperature_map    = {'Hot': 0, 'Mild': 1, 'Cool': 2}
humidity_map   = {'High': 0, 'Normal': 1}
wind_map    = {'Weak': 0, 'Strong': 1}

df['Outlook'] = df['Outlook'].map(outlook_map)
df['Temperature'] = df['Temperature'].map(temperature_map)
df['Humidity'] = df['Humidity'].map(humidity_map)
df['Wind'] = df['Wind'].map(wind_map)

In [16]:
df.head()

Unnamed: 0,Outlook,Temperature,Humidity,Wind,PlayTennis
0,0,0,0,0,No
1,1,1,1,1,No
2,1,2,1,0,Yes
3,2,1,0,1,No
4,2,0,1,0,Yes


### B.2 Mapping Label

In [17]:
target_map  = {'No': 0, 'Yes': 1}

df['PlayTennis'] = df['PlayTennis'].map(target_map)

In [18]:
df.head()

Unnamed: 0,Outlook,Temperature,Humidity,Wind,PlayTennis
0,0,0,0,0,0
1,1,1,1,1,0
2,1,2,1,0,1
3,2,1,0,1,0
4,2,0,1,0,1


## C. Exploratory Data Analysis