# DECISION TREE

#### A decision tree is a supervised machine learning algorithm used for classification and regression tasks. It works by recursively splitting the dataset based on feature conditions to create a tree-like structure.

#### Key Components of a Decision Tree : 
- Root Node - The starting point of the tree, representing the entire dataset.
- Internal Nodes - Decision points where the dataset is split based on feature conditions.
- Branches - Connections between nodes showing possible outcomes of a decision.
- Leaf Nodes - The final output (class label or value).

In [1]:
import pandas as pd 

In [2]:
df = pd.read_csv('salaries.csv')
df.head()

Unnamed: 0,company,job,degree,salary_more_then_100k
0,google,sales executive,bachelors,0
1,google,sales executive,masters,0
2,google,business manager,bachelors,1
3,google,business manager,masters,1
4,google,computer programmer,bachelors,0


In [3]:
inputs = df.drop(['salary_more_then_100k'], axis = 'columns')  # input = X
target = df['salary_more_then_100k']

In [4]:
inputs.head()

Unnamed: 0,company,job,degree
0,google,sales executive,bachelors
1,google,sales executive,masters
2,google,business manager,bachelors
3,google,business manager,masters
4,google,computer programmer,bachelors


In [5]:
target.head()

0    0
1    0
2    1
3    1
4    0
Name: salary_more_then_100k, dtype: int64

## now converting the string or words to number using label encoder 

In [6]:
from sklearn.preprocessing import LabelEncoder

In [7]:
le_company = LabelEncoder()
le_job = LabelEncoder()
le_degree = LabelEncoder()

In [8]:
inputs['company_n'] = le_company.fit_transform(inputs['company'])
inputs['job_n'] = le_job.fit_transform(inputs['job'])
inputs['degree_n'] = le_degree.fit_transform(inputs['degree'])

In [10]:
inputs.head()

Unnamed: 0,company,job,degree,company_n,job_n,degree_n
0,google,sales executive,bachelors,2,2,0
1,google,sales executive,masters,2,2,1
2,google,business manager,bachelors,2,0,0
3,google,business manager,masters,2,0,1
4,google,computer programmer,bachelors,2,1,0


### now we have to drop the columns compnay job and degree

In [14]:
final_inputs = inputs.drop(['company','job','degree'], axis = 'columns')
final_inputs.head()

Unnamed: 0,company_n,job_n,degree_n
0,2,2,0
1,2,2,1
2,2,0,0
3,2,0,1
4,2,1,0


### now lets split the datasets 

In [16]:
from sklearn.model_selection import train_test_split

In [17]:
X_train , X_test , y_train, y_test = train_test_split(final_inputs, target, test_size = 0.2)

### now after splitting the datasets lets train our model using decision tree

In [22]:
from sklearn.tree import DecisionTreeClassifier

In [23]:
model = DecisionTreeClassifier()

In [28]:
model.fit(X_train,y_train)

In [29]:
model.predict(X_test)

array([1, 1, 1, 0])

In [30]:
y_test

12    1
8     0
10    1
5     1
Name: salary_more_then_100k, dtype: int64

In [31]:
model.score(X_test,y_test)

0.5

In [32]:
model.predict([[2,2,1]])



array([0])

In [33]:
model.predict([[2,0,1]])



array([1])