## AIML Module Project - ENSEMBLE TECHNIQUES - Project 

- Learner support material

1. Import and warehouse data: 
    - Import all the given datasets from MYSQL server. Explore shape and size. 
    - Merge all datasets onto one and explore final shape and size.

2. Data cleansing: 
    - Missing value treatment
    - Convert categorical attributes to continuous using relevant functional knowledge
    - Drop attribute/s if required using relevant functional knowledge
    - Automate all the above steps

3. Data analysis & visualisation: 
    - Perform detailed statistical analysis on the data.
    - Perform a detailed univariate, bivariate and multivariate analysis with appropriate detailed comments after each analysis. 

4. Data pre-processing: 
    - Segregate predictors vs target attributes
    - Check for target balancing and fix it if found imbalanced.
    - Perform train-test split.
    - Check if the train and test data have similar statistical characteristics when compared with original data.

5. Model training, testing and tuning: 
    - Train and test all ensemble models taught in the learning module.
        - Suggestion: Use standard ensembles available. Also you can design your own ensemble technique using weak classifiers.
    - Display the classification accuracies for train and test data.
    - Apply all the possible tuning techniques to train the best model for the given data. 
        - Suggestion: Use all possible hyper parameter combinations to extract the best accuracies. 
    - Display and compare all the models designed with their train and test accuracies.
    - Select the final best trained model along with your detailed comments for selecting this model. 
    - Pickle the selected model for future use.

6. GUI development: 
    - Design a clickable GUI desk application or web service application.
    - This GUI should allow the user to input all future values and on a click use these values on the trained model above to predict.
    - It should display the prediction.

7. Conclusion and improvisation: 
    - Write your conclusion on the results.
    - Detailed suggestions or improvements or on quality, quantity, variety, velocity, veracity etc. on the data points collected by the telecom operator to perform a better data analysis in future.

#### 1. Import and warehouse data: 

    - Import all the given datasets from MYSQL server. 
    - Explore shape and size. 
    - Merge all datasets onto one and explore final shape and size.
      - Reference link: https://dev.mysql.com/doc/connector-python/en/connector-python-examples.html
      - Recommended steps
          1. Download and install MySql
          2. Download and install MySql workbench
          3. Use table import function from MySql workbench to import csv/excel datasets
          4. Install and Import python's mysql connection library
          5. Establish a virtual connection from python to MySql 
          6. Import MySql table (both tables) as pandas dataframes.
          7. Merge pandas dataframes as one single dataframe

#### 2. Data cleansing: 
    - Missing value treatment
        - Reference link 1: https://www.kaggle.com/dansbecker/handling-missing-values
        - Reference link 2: https://towardsdatascience.com/data-cleaning-with-python-and-pandas-detecting-missing-values-3e9c6ebcf78b
        - Reference link 3: https://towardsdatascience.com/how-to-deal-with-missing-data-in-python-1f74a9112d93
    - Convert categorical attributes to continuous using relevant functional knowledge
        - Label encoder
        - Dummies function
        - One hot encoding
        - Functional replacement using dictionary
    - Drop attribute/s if required using relevant functional knowledge
       
    - Automate all the above steps
        - Put all/some of the obove steps under a function

#### 3. Data analysis & visualisation: 

    - Perform detailed statistical analysis on the data.
        - Use statistical summary to analyse the feature patterns
        - Use any other statistical/formula based graphs/visualisations
    - Perform a detailed univariate, bivariate and multivariate analysis with appropriate detailed comments after each analysis. 
        - Identify top individual features which make relevance functionally/numerically w.r.t. to target and perform uni-variate analysis. 
        - Identify top individual features which make relevance functionally/numerically w.r.t. to target and perform bi-variate analysis. 
        - Identify top individual features which make relevance functionally/numerically w.r.t. to target and perform multi-variate analysis. 
        
        - Hint: Use the power of plotly - https://plotly.com/python/

#### 4. Data pre-processing: 
    - Segregate predictors vs target attributes
    - Check for target balancing and fix it if found imbalanced.
        - Option 1: Up-sampling (
        - Option 2: Down-sampling
        - Option 3: No change
        - Reference: blog by our Great Learning AIML alumni - https://www.kaggle.com/saurav9786/feature-engineering-up-and-down-sampling
        - Reference link: https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets
    - Perform train-test split.
    - Check if the train and test data have similar statistical characteristics when compared with original data.
        - Compare and check if ORIGINAL vs TRAIN vs TEST datasets have similar statistical characteristics

#### 5. Model training, testing and tuning: 
    - Train and test all ensemble models taught in the learning module.
        - Suggestion: Use standard ensembles available. Also you can design your own ensemble technique using weak classifiers. 
        
        Hint:
        
        Recommended: Use standard ensemble techniques
        
        +
        
        Extra mile : Design and train LOGR, SVM, KNN, NB etc. individual algorithms. Pass the test data through each one, collect the predicted target, take a vote and the majority is the final predicted target.
        
    - Display the classification accuracies for train and test data.
    - Apply all the possible tuning techniques to train the best model for the given data. 
        - Suggestion: Use all possible hyper parameter combinations to extract the best accuracies.
        Hint: Use SHIFT + TAB over sklearn's classifier to get all details about hyperparameters
    - Display and compare all the models designed with their train and test accuracies.
    - Select the final best trained model along with your detailed comments for selecting this model. 
    - Pickle the selected model for future use.
        - Reference link: https://www.geeksforgeeks.org/understanding-python-pickling-example/

#### 6. GUI development: 
    - Design a clickable GUI desk application or web service application.
    - This GUI should allow the user to input all future values and on a click use these values on the trained model above to predict.
    - It should display the prediction.

In [9]:
# Import library

from tkinter import *

# App window
TKINTER_GUI = Tk()

TKINTER_GUI.title("AIML - EST MODULE PROJECT - GUI")


# Static text
TEXT = Label(TKINTER_GUI, text="TKINTER TUTORIAL - KRISHNAV DAVE")

TEXT.grid(column=0, row=0)   # 0 row, 0 col

def BUTTON_FUNCTION():

    TEXT.configure(text="GUI REACTION TO THE BUTTON !!")


# Button    
BUTTON = Button(TKINTER_GUI, text="Click Me", command=BUTTON_FUNCTION)

BUTTON.grid(column=1, row=1) # 1 row, 1 col


# App closure
TKINTER_GUI.mainloop()


#### 7. Conclusion and improvisation: 
    - Write your conclusion on the results.
    - Detailed suggestions or improvements or on quality, quantity, variety, velocity, veracity etc. on the data points collected by the telecom operator to perform a better data analysis in future.
        - Thin: Was the data enough ? 
                Was the data relevant ?  
                If more data could have been collected which would have enhanced your model's performance ?

### Some pointers: 

- Learning and Innovation has no boundaries. They are directly proportional to each other. 
- Use your best possible efforts and analysis to solve the above problem at best.
- Highlight all of your assumptions/decisions to support your code helping the moderator to trace your thinking while evaluating your code/approach.