# Project: _Project Name_

## *Type of ML project*

![](logo.jpg)

----
## Table of Contents

- [Getting Started](#Getting-Started)
    - [Feature Statistics](#Feature-Statistics)
        - [Feature Describe](#Feature-Describe)
        - [Feature Skew](#Feature-Skew)
        - [Class Distribution](#Class-Distribution)
    - [Feature Visualization](#Feature-Visualization)
        - [Feature Spread](#Feature-Spread)
        - [Feature Distribution](#Feature-Distribution)
        - [Feature Comparison](#Feature-Comparison)
        - [Feature Correlation](#Feature-Correlation)       
- [Data Engineering](#Data-Engineering)
    - [Observation Cleaning](#Observation-Cleaning)
        - [Handling Missing Values](#Handling-Missing-Values)
        - [Handling Duplicates](#Handling-Duplicates)
    - [Dimentionality Reduction](#Dimentionality-Reduction)
        - [Extra-Trees Classifier](#Extra-Trees-Classifier)
        - [Random Forest Classifier](#Random-Forest-Classifier)
        - [AdaBoost Classifier](#AdaBoost-Classifier)
        - [Gradient Boosting Classifier](#Gradient-Boosting-Classifier)
    - [Feature Scaling](#Feature-Scaling)
    - [Train-Test Split](#Train-Test-Split)
- [Model Evaluations](#Model-Evaluations)
    - [Model 1](#Model-1)
    - [Model 2](#Model-2)
    - [Model 3](#Model-3)
    - [Model 4](#Model-4)
    - [Model 5](#Model-5)
    - [Model 6](#Model-6)
- [Testing Model](#Testing-Model)
- [Conclusion](#Conclusion)
- [Notes](#Notes)
-----
-----

## Getting Started
_Giving necessory, important and short info on this project._

### Version Check-In

In [None]:
# Importing required libraries for the project
import sys # for python library version
import numpy as np # for scientific computing
import pandas as pd # for data anaysis
import matplotlib # for visualization
import seaborn as sns # for visualization
import sklearn # ML Library
import tensorflow as tf # deep learning framework

In [None]:
print('Python: {}'.format(sys.version))  # Python version
print('numpy: {}'.format(np.__version__))  # Numpy version
print('pandas: {}'.format(pd.__version__))  # Pandas version
print('matplotlib: {}'.format(matplotlib.__version__))  # Matplotlib version
print('seaborn: {}'.format(sns.__version__))  # seaborn version
print('sklearn: {}'.format(sklearn.__version__))  # sklearn version

In [None]:
# No warning of any kind please!
import warnings
# will ignore any warnings
warnings.filterwarnings("ignore")

------
------

## Data Exploration

### Feature Statistics

#### Feature Describe


#### Feature Skew

#### Class Distribution

Let's take a look how each class is distributed..

-------

### Feature Visualization

#### Feature Spread

#### Feature Distribution

#### Feature Comparison

#### Feature Correlation

--------
---------

## Data Engineering

### Observation Cleaning

#### Handling Missing Values

#### Handling Duplicates

------

### Dimentionality Reduction

#### Extra-Trees Classifier

In [None]:
# importing model for feature importance
from sklearn.ensemble import ExtraTreesClassifier

# passing the model
model = ExtraTreesClassifier(random_state = 53)

# feeding all our features to var 'X'
X = data.iloc[:,:-1]
# feeding our target variable to var 'y'
y = data['']

# training the model
model.fit(X, y)

# extracting feature importance from model and making a dataframe of it in descending order
ETC_feature_importances = pd.DataFrame(model.feature_importances_, index = X.columns, columns=['ETC']).sort_values('ETC', ascending=False)

# removing traces of this model
model = None

# show top 10 features
ETC_feature_importances.head(10)

#### Random Forest Classifier

In [None]:
# importing model for feature importance
from sklearn.ensemble import RandomForestClassifier

# passing the model
model = RandomForestClassifier(random_state = 53)

# training the model
model.fit(X, y)

# extracting feature importance from model and making a dataframe of it in descending order
RFC_feature_importances = pd.DataFrame(model.feature_importances_, index = X.columns, columns=['RFC']).sort_values('RFC', ascending=False)

# removing traces of this model
model = None

# show top 10 features
RFC_feature_importances.head(10)

#### AdaBoost Classifier

In [None]:
# importing model for feature importance
from sklearn.ensemble import AdaBoostClassifier

# passing the model
model = AdaBoostClassifier(random_state = 53)

model.fit(X, y)

# extracting feature importance from model and making a dataframe of it in descending order
ADB_feature_importances = pd.DataFrame(model.feature_importances_, index = X.columns, columns=['ADB']).sort_values('ADB', ascending=False)

# removing traces of this model
model = None

ADB_feature_importances.head(10)

#### Gradient Boosting Classifier

In [None]:
# importing model for feature importance
from sklearn.ensemble import GradientBoostingClassifier

# passing the model
model = GradientBoostingClassifier(random_state = 53)

# training the model
model.fit(X, y)

# extracting feature importance from model and making a dataframe of it in descending order
GBC_feature_importances = pd.DataFrame(model.feature_importances_, index = X.columns, columns=['GBC']).sort_values('GBC', ascending=False)

# removing traces of this model
model = None

# show top 10 features
GBC_feature_importances.head(10)

---------

### Feature Scaling

--------

### Train-Test Split

-------
-------

## Model Evaluations

In [None]:
### defining function for training models and measuring performance 

# to measure performance
from sklearn.model_selection import cross_val_score

# for calculating time elapsed
import time

# fucntion
def model_evaluation(clf):
    
    # passing classifier to a variable
    clf = clf
    
    # records time
    t_start = time.time()
    # classifier learning the model
    clf = clf.fit(X_train, y_train)
    # records time
    t_end = time.time()
    
    
    # records time
    c_start = time.time()     
    # Using 10 K-Fold CV on data, gives peroformance measures
    accuracy  = cross_val_score(clf, X_train, y_train, cv = 10, scoring = 'accuracy')
    f1_score = cross_val_score(clf, X_train, y_train, cv = 10, scoring = 'f1_macro')
    # records the time
    c_end = time.time()    
    
    
    # calculating mean of all 10 observation's accuracy and f1, taking percent and rounding to two decimal places
    acc_mean = np.round(accuracy.mean() * 100, 2)
    f1_mean = np.round(f1_score.mean() * 100, 2)
    
    
    # substracts end time with start to give actual time taken in seconds
    # divides by 60 to convert in minutes and rounds the answer to three decimal places
    # time in training
    t_time = np.round((t_end - t_start) / 60, 3)
    # time for evaluating scores
    c_time = np.round((c_end - c_start) / 60, 3)
    
    
    # Removing traces of classifier
    clf = None
    
    
    # returns performance measure and time of the classifier 
    print("The accuracy score of this classifier on our training set is", acc_mean,"% and f1 score is", f1_mean,"% taking", t_time,"minutes to train and", c_time,
          "minutes to evaluate cross validation and metric scores.")

---------

### Model 1

### Model 2

### Model 3

### Model 4

### Model 5

### Model 6

------

### Choosing Model

Out of 6 Models evaluated above and benchmark model, which performs better? Lets see all the scores of all the models in a table below:

| Model | Accuracy | F1 Score | Train Time (m) | Evaluation Time (m) |
| ----- | -------- | -------- | ---------- | --------------- |
|  |  |  |  |  |
|  |  |  |  |  |
|  |  |  |  |  |
|  |  |  |  |  |
|  |  |  |  |  |
|  |  |  |  |  |

------
------

## Testing Model

In [None]:
# importing EM scores for model performance measure
from sklearn.metrics import accuracy_score, f1_score

# definning best chosen classifier
clf = RandomForestClassifier(n_estimators = 50, random_state = 53)

# training our model
clf = clf.fit(X_train, y_train)

# predicting unseen data
predict = clf.predict(X_test)

# calculating accuracy
accuracy = accuracy_score(y_test, predict)

# calculating f1 score
f1_score = f1_score(y_test, predict, average = 'macro')

# taking precentage and rounding to 3 places
accuracy = np.round(accuracy * 100, 3)
f1_score = np.round(f1_score * 100, 3)

# cleaning traces
clf = None

# results
print("The accuracy score of our final model Random Forest Classifier on our testing set is", accuracy,"% and f1 score is", f1_score,"%.")

------
------

## Conclusion

------
------

## Notes

------
------