## Decision Tree

* A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.

* Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal, but are also a popular tool in machine learning.

#### Types

1. **Classification Tree**: the predicted outcome is the class to which the data belongs.

2. **Regression Tree**: The predicted outcome can be considered a real number (e.g. the price of a house, or a patient's length of stay in a hospital).

#### Decision Tree implementations differ primarily along these axes:

1. the **splitting criterion** (i.e., how variance is calculated)

2. whether it builds models for **regression** (continuous variables) or **classification** (discrete variables, e.g., a class label)

3. technique to eliminate/reduce **over-fitting**

4. whether it can handle incomplete data

#### Family of Decision Tree Learning Algorithms

* ID3 (Iterative Dichotomiser 3)
* C4.5 (successor of ID3)
* C5.0
* CART (Classification And Regression Tree)
* CHAID (CHi-squared Automatic Interaction Detector): Performs multi-level splits when computing classification trees.
* MARS: extends decision trees to handle numerical data better.

## ID3 (Iterative Dichotomizer 3)

* Dichotomisation means dividing into two completely opposite things. This is why, the algorithm iteratively divides attributes into two groups which are the most dominant attribute and others to construct a tree. 
* Then, **it calculates the entropy and information gains of each atrribute.** 
* In this way, the most dominant attribute is identified. After then, the most dominant one is put on the tree as decision node. Thereafter, entropy and gain scores is calculated again among the other attributes. Thus, the next most dominant attribute is found. 
* Finally, this procedure continues until reaching a decision for that branch. This is the reason, it is called Iterative Dichotomiser.

**Drawbacks:** 
* Attributes must be nominal values
* dataset must not include missing data
* Overfitting

## C4.5

C4.5 is a succesor of ID3.

The new features added were: 
1. accepts both continuous and discrete features
2. handles incomplete data points
3. solves over-fitting problem by bottom-up technique usually known as "pruning"
4. different weights can be applied the features that comprise the training data.

## C5.0

**C4.5 improvements over ID3:**
* Can work with both discrete and continuous attributes
* Doesn’t care if you have missing attribute values
* Pruning trees (replacing irrelevant branches with leaf nodes)

**C5.0 improvements over C4.5:**
* Several orders of magnitude faster
* Memory efficiency
* Smaller decision trees
* Boosting (more accuracy)
* Ability to weight different attributes
* Winnowing (reducing noise)

## CART
### Classification And Regression Trees

The CART implementation is very similar to C4.5.

* In ID3 and C4.5 both lead to multiway split, **CART can have binary split**
* ID3 uses information gain whereas C4.5 uses gain ratio for splitting. CART uses Gini Index.
* CART is not significantly impacted by outliers in the input variables.


- The main elements of CART (and any decision tree algorithm) are:

1. Rules for splitting data at a node based on the value of one variable
2. Stopping rules for deciding when a branch is terminal and can be split no more
3. Finally, a prediction for the target variable in each terminal node.

Drawbacks:
* Binary Splits leads to large tree structure which is not easy to interprete the rules.
* Overgrowing of Tree often leads to Overfitting (Pruning is done to handle this)

## CHAID
### Chi-Square Automatic Interaction Detector

* In CART technique independent variable can be a binary(0/1,Yes/No, Married/Unmarried,Male/Female etc) or a continuous (ex: salary, age, height etc) but in CHAID it can be categorical (type of house :apartment,villa,bungalow / mode of transport : bus,car,train) variable.
* In CART dependent variables could be binary/continuous but in CHAID it can be more than 2 categories or continuous variables.
* In CART Gini index is the measure of classification and in CHAID it could be Chi-square or F test determines classification.
* A key difference between the two models is that CART produces binary splits, one out of two possible outcomes, whereas CHAID can produce multiple branches of a single root/parent node.

* **Pre-Pruning:** A node is only split if a significance criterion is fulfilled.Thus CHAID tries to prevent overfitting right from the start (only split is there is significant association)

* Popular in marketing research, used in Market segmentation

#### Mechanism in CHAID:

* At each split, the algorithm looks for the predictor variable that if split, most "explains" the category response variable. In order to decide whether to create a particular split based on this variable, the CHAID algorithm tests a hypothesis regarding dependence between the splitted variable and the categorical response(using the chi-squared test for independence). Using a pre-specified significance level, if the test shows that the splitted variable and the response are independent, the algorithm stops the tree growth. Otherwise the split is created, and the next best split is searched. 

* In contrast, the CART algorithm decides on a split based on the amount of homogeneity within class that is achieved by the split. And later on, the split is reconsidered based on considerations of over-fitting.

**It appears to me that CHAID is most useful for analysis, whereas CART is more suitable for prediction.**
* CHAID should be used when the goal is to describe or understand the relationship between a response variable and a set of explanatory variables, whereas CART is better suited for creating a model that has high prediction accuracy of new cases.

## MARS
### Multi-Adaptive Regression Splines

Multivariate Adaptive Regression Splines or MARS model is a regression model that automatically constructed using an adaptive spline algorithm, partitioning the data and run a linear regression model on each different partition. MARS provides a great stepping stone into nonlinear modelling and it is closely related to multiple regression techniques. MARS is actually an adaption of CART that allows for additive terms to be entered onto the model.

In order to construct a CART decision tree, there are two main steps –
* Grow a large tree
* Prune the large tree

##### The process for constructing a MARS model is similar and involves two steps –
* Forward step (add terms to the model- Generate candidate basis functions for the model.)
    * The forward stage involves generating basis functions and adding to the model. Like a decision tree, each value for each input variable in the training dataset is considered as a candidate for a basis function.
    * Functions are always added in pairs, for the left and right version of the piecewise linear function of the same split point. A generated pair of functions is only added to the model if it reduces the error made by the overall model.
* Backward step (delete terms from the model - Delete basis functions from the model.)
    * The backward stage involves selecting functions to delete from the model, one at a time. A function is only removed from the model if it results in no impact in performance (neutral) or a lift in predictive performance.

#### Features of Multivariate Adaptive Regression Splines (MARS)

* Automatic variable selection – MARS automatically select the variables used in the model via the forward and backward step.
* Automatic nonlinear modelling – MARS automatically models nonlinear functions via piecewise linear approximations.
* Automatic variable interaction – Determine interactions between predictor variables.
* Automatic missing value handling – Handling missing values with new nested variable techniques.
* Automatic numeric and categorical predictor handling – MARS automatically handles both categorical and numeric predictors directly.

#### Use of Multivariate Adaptive Regression Splines (MARS)

* Given a target variable and a set of candidate predictor variables, MARS automates all aspects of model development and model deployment.
* Multivariate adaptive regression splines (MARS) enables you to rapidly search through all possible models and quickly identify the “optimal” solution.
* MARS performs regression techniques along with the search for nonlinearities in the data that helps to maximise the predictive accuracy of the model.
* Multivariate adaptive regression splines (MARS) have useful features to effectively reduce the number of terms in a model.
* MARS can automatically select and transform variables and can identify potential interactions between variables.

#### MARS Python 

!pip install sklearn-contrib-py-earth

In [1]:
!pip install sklearn-contrib-py-earth

Collecting sklearn-contrib-py-earth
  Using cached sklearn-contrib-py-earth-0.1.0.tar.gz (1.0 MB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: sklearn-contrib-py-earth
  Building wheel for sklearn-contrib-py-earth (setup.py): started
  Building wheel for sklearn-contrib-py-earth (setup.py): finished with status 'error'
  Running setup.py clean for sklearn-contrib-py-earth
Failed to build sklearn-contrib-py-earth
Installing collected packages: sklearn-contrib-py-earth
  Running setup.py install for sklearn-contrib-py-earth: started
  Running setup.py install for sklearn-contrib-py-earth: finished with status 'error'


  error: subprocess-exited-with-error
  
  python setup.py bdist_wheel did not run successfully.
  exit code: 1
  
  [73 lines of output]
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-3.9
  creating build\lib.win-amd64-3.9\pyearth
  copying pyearth\earth.py -> build\lib.win-amd64-3.9\pyearth
  copying pyearth\export.py -> build\lib.win-amd64-3.9\pyearth
  copying pyearth\_version.py -> build\lib.win-amd64-3.9\pyearth
  copying pyearth\__init__.py -> build\lib.win-amd64-3.9\pyearth
  creating build\lib.win-amd64-3.9\pyearth\test
  copying pyearth\test\testing_utils.py -> build\lib.win-amd64-3.9\pyearth\test
  copying pyearth\test\test_earth.py -> build\lib.win-amd64-3.9\pyearth\test
  copying pyearth\test\test_export.py -> build\lib.win-amd64-3.9\pyearth\test
  copying pyearth\test\test_forward.py -> build\lib.win-amd64-3.9\pyearth\test
  copying pyearth\test\test_knot_search.py -> build\lib.win-amd64-3.9\pyearth\test
  copying 

In [2]:
!python --version

Python 3.9.5


In [3]:
pip install pyearth

Note: you may need to restart the kernel to use updated packages.Collecting pyearth
  Using cached pyearth-0.1.20-py2.py3-none-any.whl (75 kB)
Collecting cartopy
  Using cached Cartopy-0.21.0.tar.gz (10.9 MB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting netCDF4
  Using cached netCDF4-1.6.1-cp39-cp39-win_amd64.whl (5.2 MB)
Collecting gdal
  Using cached GDAL-3.5.2.tar.gz (756 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting pyshp>=2.1
  Using cached pyshp-2.3.1-py2.py3-none-any.whl (46 kB)
Collecting shapely<2,>=1.6.4
  Downloading Shapely-1.8.5.post1-cp39-cp39-win_amd64.whl (1.3 MB)
     ---------------------------------------- 

  error: subprocess-exited-with-error
  
  Building wheel for cartopy (pyproject.toml) did not run successfully.
  exit code: 1
  
  [279 lines of output]
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-cpython-39
  creating build\lib.win-amd64-cpython-39\cartopy
  copying lib\cartopy\crs.py -> build\lib.win-amd64-cpython-39\cartopy
  copying lib\cartopy\geodesic.py -> build\lib.win-amd64-cpython-39\cartopy
  copying lib\cartopy\img_transform.py -> build\lib.win-amd64-cpython-39\cartopy
  copying lib\cartopy\util.py -> build\lib.win-amd64-cpython-39\cartopy
  copying lib\cartopy\vector_transform.py -> build\lib.win-amd64-cpython-39\cartopy
  copying lib\cartopy\_epsg.py -> build\lib.win-amd64-cpython-39\cartopy
  copying lib\cartopy\_version.py -> build\lib.win-amd64-cpython-39\cartopy
  copying lib\cartopy\__init__.py -> build\lib.win-amd64-cpython-39\cartopy
  creating build\lib.win-amd64-cpython-39\cartopy\feature
  copying li

In [4]:
# check pyearth version
from pyearth import earth
import pyearth
#display version
# print(pyearth.__version__)

ModuleNotFoundError: No module named 'pyearth'

In [None]:
# earth.get_versions()

In [None]:
# pip install cartopy

In [None]:
# pip install pyearth

In [None]:
# define the model
model = Earth()

# fit the model on training dataset
model.fit(X, y)

In [None]:
#Xnew = 
# make a prediction
yhat = model.predict(Xnew)

# print a summary of the fit model
print(model.summary())

# Conclusion

All tree methods tend to require large sample sizes for stability; multiway splits more so than binary ones. Bagging and boosting may alleviate some of this instability, at the cost of more difficult interpretation.

Reference Links:

https://r-forge.r-project.org/projects/chaid/
https://machinelearningcatalogue.com/algorithm/alg_decision-tree.html    
https://qsutra.com/explore/knowledge-base/multivariate-adaptive-regression-splines/
https://machinelearningmastery.com/multivariate-adaptive-regression-splines-mars-in-python/
https://towardsdatascience.com/mars-multivariate-adaptive-regression-splines-how-to-improve-on-linear-regression-e1e7a63c5eae

In [None]:
- Required links


In [None]:
#package miss match error

In [None]:
#error installing moduel pyearth