# Decision Tree Algorithm
Decision tree algorithm is a popular supervised learning method used for classification and regression tasks. It works by partitioning the feature space into a set of regions, and for each region, it predicts the target variable based on the majority class or average value of the training samples in that region. Here's a basic overview of how the decision tree algorithm works
* they are prone to overfitting, especially when the tree is deep and complex. Techniques like pruning, limiting the tree depth, and using ensemble methods like Random Forest or Gradient Boosting can help mitigate overfitting and improve performance.

Import the necessary libraries


In [3]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import sklearn as sns
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

Read the data set 

In [4]:
df = pd.read_csv('cust_satisfaction.csv')
df

Unnamed: 0,Gender,Customer Type,Type of Travel,Class,satisfaction,Age,Flight Distance,Inflight entertainment,Baggage handling,Cleanliness,Departure Delay in Minutes,Arrival Delay in Minutes
0,Male,Loyal Customer,Personal Travel,Eco Plus,neutral or dissatisfied,13,460,5,4,5,25,18.0
1,Male,disloyal Customer,Business travel,Business,neutral or dissatisfied,25,235,1,3,1,1,6.0
2,Female,Loyal Customer,Business travel,Business,satisfied,26,1142,5,4,5,0,0.0
3,Female,Loyal Customer,Business travel,Business,neutral or dissatisfied,25,562,2,3,2,11,9.0
4,Male,Loyal Customer,Business travel,Business,satisfied,61,214,3,4,3,0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...
103899,Female,disloyal Customer,Business travel,Eco,neutral or dissatisfied,23,192,2,4,2,3,0.0
103900,Male,Loyal Customer,Business travel,Business,satisfied,49,2347,5,5,4,0,0.0
103901,Male,disloyal Customer,Business travel,Business,neutral or dissatisfied,30,1995,4,4,4,7,14.0
103902,Female,disloyal Customer,Business travel,Eco,neutral or dissatisfied,22,1000,1,1,1,0,0.0


Find the information of data 

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 103904 entries, 0 to 103903
Data columns (total 12 columns):
 #   Column                      Non-Null Count   Dtype  
---  ------                      --------------   -----  
 0   Gender                      103904 non-null  object 
 1   Customer Type               103904 non-null  object 
 2   Type of Travel              103904 non-null  object 
 3   Class                       103904 non-null  object 
 4   satisfaction                103904 non-null  object 
 5   Age                         103904 non-null  int64  
 6   Flight Distance             103904 non-null  int64  
 7   Inflight entertainment      103904 non-null  int64  
 8   Baggage handling            103904 non-null  int64  
 9   Cleanliness                 103904 non-null  int64  
 10  Departure Delay in Minutes  103904 non-null  int64  
 11  Arrival Delay in Minutes    103594 non-null  float64
dtypes: float64(1), int64(6), object(5)
memory usage: 9.5+ MB


Find the totel null in data type and totel sum of null by using sum function

In [7]:
df.isnull().sum()

Gender                          0
Customer Type                   0
Type of Travel                  0
Class                           0
satisfaction                    0
Age                             0
Flight Distance                 0
Inflight entertainment          0
Baggage handling                0
Cleanliness                     0
Departure Delay in Minutes      0
Arrival Delay in Minutes      310
dtype: int64

Find the totel duplicates value in data set and totel sum of duplicated value in data set 

In [10]:
df.duplicated().sum() 

0

Delete of duplicate value from datafram by using drop_duplicates function

In [9]:
df.drop_duplicates(inplace=True)  


Shape function are find the shape of data-fram 

In [11]:
df.shape


(103732, 12)

Data select from from data set (object data only )

In [12]:
cat_col = df.select_dtypes(include='O')
cat_col.head()


Unnamed: 0,Gender,Customer Type,Type of Travel,Class,satisfaction
0,Male,Loyal Customer,Personal Travel,Eco Plus,neutral or dissatisfied
1,Male,disloyal Customer,Business travel,Business,neutral or dissatisfied
2,Female,Loyal Customer,Business travel,Business,satisfied
3,Female,Loyal Customer,Business travel,Business,neutral or dissatisfied
4,Male,Loyal Customer,Business travel,Business,satisfied


Divide data in two parts 

In [14]:

loyal_customer =cat_col[cat_col['Customer Type']=='Loyal Customer']
disloyal_Customer =cat_col[cat_col['Customer Type']=='disloyal Customer']

Sample are used to get value from another feature 

In [15]:
loyal_cust=loyal_customer.sample(21000) 

Add freature 

In [16]:
balance_df = pd.concat([loyal_cust,disloyal_Customer],axis=0) # 
balance_df.head()
balance_df.shape

(39954, 5)

Select_dtypes are used select the data (inclde="O") with object 

In [17]:
cat_col = balance_df.select_dtypes(include='O') 
cat_col.head()
cat_col.shape

(39954, 5)

Select_dtypes are used select the data (inclde="O") without object 

In [18]:
num_value = balance_df.select_dtypes(exclude='O')
num_value.head()

88350
64155
39217
5440
22138


In [20]:
final_df = pd.concat([cat_col,num_value],axis=0)
final_df.head()

Unnamed: 0,Gender,Customer Type,Type of Travel,Class,satisfaction
88350,Female,Loyal Customer,Business travel,Eco,neutral or dissatisfied
64155,Male,Loyal Customer,Business travel,Eco,neutral or dissatisfied
39217,Male,Loyal Customer,Business travel,Business,satisfied
5440,Female,Loyal Customer,Personal Travel,Eco,neutral or dissatisfied
22138,Male,Loyal Customer,Personal Travel,Eco,neutral or dissatisfied


In [19]:
from sklearn.tree import DecisionTreeClassifier
dtc = DecisionTreeClassifier()

