**Decision Trees are a sort of Supervised Machine Learning (we describe what the input is and what the related output is in the training data) in which the data is continually separated based on a specific parameter. Two entities can be used to explain the tree: decision nodes and leaves. The decisions or possibilities are represented by the leaves. And the data is separated at the decision nodes.**

`Pandas is built on top of two key Python libraries: matplotlib for data visualization and NumPy for mathematical calculations. Pandas acts as a wrapper around both libraries, letting you to use many matplotlib and NumPy methods with minimal code. Take, for example, pandas'.`


`The read csv() function in Pandas converts a CSV file to DataFrame format. header: this allows you to define which row will be used to name the columns in your dataframe. An int value or a list of int values is expected. The default option is header=0, which means that the first row of the CSV file is used as column names.`

In [54]:
import pandas as pd
df = pd.read_csv ('/content/drive/MyDrive/Colab Notebooks/DecisionTreeFile.csv')
df.head(6)

Unnamed: 0,Company,Job,Degree,Salary_100K
0,google,sales executive,bachelors,0
1,google,sales executive,masters,0
2,google,business manager,bachelors,1
3,google,business manager,masters,1
4,google,computer programmer,bachelors,0
5,google,computer programmer,masters,1


`The drop() function is used to remove a set of labels from a row or column. By giving label names and related axes, or by directly specifying index or column names, you can remove rows or columns. Labels on different levels can be deleted when using a multi-index by specifying the level.`

In [55]:
newdf = df.drop('Salary_100K', axis= 'columns')  #features or newdf 
newdf 

Unnamed: 0,Company,Job,Degree
0,google,sales executive,bachelors
1,google,sales executive,masters
2,google,business manager,bachelors
3,google,business manager,masters
4,google,computer programmer,bachelors
5,google,computer programmer,masters
6,abc pharma,sales executive,masters
7,abc pharma,computer programmer,bachelors
8,abc pharma,business manager,bachelors
9,abc pharma,business manager,masters


In [56]:
target = df['Salary_100K'].values
target

array([0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1])

`The sklearn.preprocessing package includes a number of popular utility methods and transformer classes for converting raw feature vectors into a format appropriate for downstream estimators.`

`Sklearn Library can be used in Python to execute label encoding. Sklearn is a powerful tool for converting the levels of categorical characteristics into numerical values. LabelEncoder generates labels with values ranging from 0 to n classes-1, where n is the number of different labels.`

In [57]:
from sklearn.preprocessing import LabelEncoder
object_company = LabelEncoder()
object_job = LabelEncoder()
object_degree = LabelEncoder()


`The fit.transform() method fits and transforms the input data at the same time and converts the data points. `
`When we use fit and transform separately when we require both, the model's efficiency suffers, thus we utilize fit.transform(), which does both functions.`

In [58]:
newdf ['Company_n'] = object_company.fit_transform(newdf ['Company'])
newdf ['Job_n'] = object_job.fit_transform(newdf ['Job'])
newdf ['Degree_n'] = object_job.fit_transform(newdf ['Degree'])

In [59]:
newdf.head(16)  #dataset is of 16 rows.

Unnamed: 0,Company,Job,Degree,Company_n,Job_n,Degree_n
0,google,sales executive,bachelors,2,2,0
1,google,sales executive,masters,2,2,1
2,google,business manager,bachelors,2,0,0
3,google,business manager,masters,2,0,1
4,google,computer programmer,bachelors,2,1,0
5,google,computer programmer,masters,2,1,1
6,abc pharma,sales executive,masters,0,2,1
7,abc pharma,computer programmer,bachelors,0,1,0
8,abc pharma,business manager,bachelors,0,0,0
9,abc pharma,business manager,masters,0,0,1


In [60]:
newdf_n = newdf.drop(['Company','Job','Degree'], axis = 'columns')
newdf_n

Unnamed: 0,Company_n,Job_n,Degree_n
0,2,2,0
1,2,2,1
2,2,0,0
3,2,0,1
4,2,1,0
5,2,1,1
6,0,2,1
7,0,1,0
8,0,0,0
9,0,0,1


`Decision Trees (DTs) are a type of non-parametric supervised learning method that is commonly used for classification and regression. The goal is to build a model that predicts the value of a target variable using basic decision rules derived from data attributes. A tree is an example of a piecewise constant approximation.`

*DecisionTreeClassifier is capable of both binary (where the labels are [-1, 1]) classification and multiclass (where the labels are [0, …, K-1]) classification.*

In [61]:
from sklearn import tree
model =tree.DecisionTreeClassifier()
model.fit(newdf_n, target)

DecisionTreeClassifier()

*A model in which various variables are weighted in different ways to produce a score This score is then used to form the basis for a conclusion, decision, or piece of advice.*

In [62]:
model.score(newdf_n,target) *100

100.0

`The Python predict() function predicts the labels of data values based on the trained model. The predict() function only accepts one argument, which is typically the data to be tested.`

In [63]:
model.predict([[1,0,0]])

  "X does not have valid feature names, but"


array([1])