# Advance Classification Solution
### EDSA - Climate Change Belief Analysis 2022 
#### RecycleStats Solutions - Team 12 EDSA

© Explore Data Science Academy

## Objectives

Many companies are built around lessening one’s environmental impact or carbon footprint. They offer products and services that are environmentally friendly and sustainable, in line with their values and ideals. They would like to determine how people perceive climate change and whether or not they believe it is a real threat. This would add to their market research efforts in gauging how their product/service may be received.

Hence, We will be creating a Machine Learning model that is able to classify whether or not a person believes in climate change, based on their novel tweet data.

Providing an accurate and robust solution to this task gives companies access to a broad base of consumer sentiment, spanning multiple demographic and geographic categories - thus increasing their insights and informing future marketing strategies.

## Outline
We will covering the following areas in hope to archieving the set objective:
- Importing Packages
- Collect Data
    - Load Data
    - Review Data to gain insight to nature of Dataset
- Clean Data
    - Ensure Correct Formatting
    - Text Cleaning (Removing Noise, Tokenisation, Stemming, Remove Stop Words, Lemmatisation)
    - Text Feature Extraction 
        - n-grams
        - Bag of words
    - Name file appropriately
- Exploratory Data Analysis (EDA)
    - Missing Data Check
    - Data Range Check
    - Check for Outlier
    - Collinarity & Multicollinarity
- Data Engineering
    - Standardization
    - Feature Selection
    - Train/Test Split
- Modeling
- Model Performance
    - Model Testing
    - Model Selection
- Model Explanations
- Model Deployment
    - To GitHub/Kaggle/Commet/AWS EC2
- Conclusion

## Introduction

...........................

So let's get started.

<a id="cont"></a>
## Table of Content

<a href=#one>1. Importing Packages</a>

<a href=#two>2. Collect Data</a>

<a href=#three>3. Clean Data</a>

<a href=#four>4. Exploratory Data Analysis (EDA)<a>

<a href=#five>5. Data Engineering</a>

<a href=#six>6. Modeling</a>

<a href=#seven>7. Model Performance</a>

<a href=#eight>8. Model Explanations</a>

<a href=#nine>9. Model Deployment</a>

<a href=#ten>10. Conclusion</a>

<a href=#eleven>11. Recommendation</a>

<a href=#ref>Reference Document Links</a>

<a id="one"></a>
## 1. Importing Packages
<a href=#cont>Back to Table of Contents</a>

In this section you are required to import........... briefly discuss, the libraries that will be used throughout your analysis and modelling.

#### 1.1 Ensure you've got NLTK Corpora installed
Some of the `nltk` text processing involve a lookup operation. For example, to find all [stopwords](https://www.geeksforgeeks.org/removing-stop-words-nltk-python/) in a given string of text, we require a list of all possible stopwords in the English language to use for the lookup. Such a list is refered to as a [corpus](https://en.wikipedia.org/wiki/Text_corpus). Therefore, first download the corpora we're going use if you don't have it installed, otherwise we may get a lookup error! Watch out specifically for the `tokenize` and `stopwords` sections. Not to worry, as we can easily avoid these errors by downloading the [corpora](http://www.nltk.org/nltk_data/) using the `nltk` downloader tool:

In [None]:
'''
if you don't have installed, the NLTK corpora, Remove Hashtag below and Run 
'''
import nltk

#nltk.download()

You should see this pop-up box. 

**NOTE:** the box might pop-up in the backround, in which case you should use `alt + tab` to switch to the downloader window.

<img src="https://github.com/Explore-AI/Pictures/blob/master/nltk_downloader.png?raw=true" width=50%/> 

Use it to navigate to the item we need to download: 
- stopwords corpus (Corpora tab)
- punkt tokenizer models (Models tab)

Navigate to these, click the download button, and exit the downloader when finished.

#### 1.2 Loading Libraries

In [None]:
# All Importing should be done here

# Libraries for data loading, data manipulation and data visulisation
import pandas as pd                                                   # for loading CSV data
import numpy as np                                                    # Used for mathematical operations
import matplotlib.pyplot as plt                                       # for Graphical Representation
%matplotlib inline                                                    # 
import seaborn as sns                                                 # for specialized plots
import re                                               
sns.set()                                                             # set plot style

# Libraries for data preparation
from nltk.corpus import stopwords                                     # 
import string                                                         #
from nltk.tokenize import word_tokenize, TreebankWordTokenizer        #
from nltk import SnowballStemmer, PorterStemmer, LancasterStemmer     #
from nltk.stem import WordNetLemmatizer                               #
from nltk.util import ngrams                                          #
from statsmodels.graphics.correlation import plot_corr                # To plot correlation heatmap
from sklearn.preprocessing import StandardScaler                      # For standardizing features

# Libraries for Model Building
from sklearn.model_selection import train_test_split                  # To split the data into training and testing data

# Libraries for calculating performance metrics


# Libraries to Save/Restore Models
import pickle

# Setting global constants to ensure notebook results are reproducible


<a id="two"></a>
## 2. Collect Data
<a href=#cont>Back to Table of Contents</a>

In [None]:
# Load Data
#df = pd.read_csv()

In [None]:
# View Dataset
#df.head()

#### A brief on dataset

............

<a id="three"></a>
## 3. Clean Data
<a href=#cont>Back to Table of Contents</a>

Let's get the data and clean it up a bit

#### 3.1 Format Review

#### 3.2 Text Cleaning

In [None]:
# Removing Noise


In [None]:
# Remove Punctuation

In [None]:
# Tokenisation

In [None]:
# Stemming

In [None]:
# Lemmatization

In [None]:
# Remove StopWords


#### 3.3 Text Feature Extraction

#### 3.4 Rearrange File Appropriately

### If you will be applying CountVectorizer then we might boycourt the huddles of tokenization down to Rearranging files.

<a id="four"></a>
## 4. Exploratory Data Analysis (EDA)
<a href=#cont>Back to Table of Contents</a>



#### 4.1 Missing Data Check

#### 4.2 Data Range Check

#### 4.3 Check for Outlier


#### 4.4 Collinarity & Multicollinarity

<a id="five"></a>
## 5. Data Engineering
<a href=#cont>Back to Table of Contents</a>



#### 5.1 Standardization

#### 5.2 Feature Selection


#### 5.3 Train/Test Split

<a id="six"></a>
## 6. Modeling
<a href=#cont>Back to Table of Contents</a>



<a id="seven"></a>
## 7. Model Performance
<a href=#cont>Back to Table of Contents</a>



#### 7.1 Model Testing


#### 7.2 Model Selection


<a id="eight"></a>
## 8. Model Explanations
<a href=#cont>Back to Table of Contents</a>



<a id="nine"></a>
## 9. Model Deployment
<a href=#cont>Back to Table of Contents</a>

####  To GitHub / To Commet / AWS EC2


<a id="ten"></a>
## 10. Conclusion
<a href=#cont>Back to Table of Contents</a>

...........................

<a id="eleven"></a>
## 11. Recommendation
<a href=#cont>Back to Table of Contents</a>

<a id="ref"></a>
## Reference Links
<a href=#cont>Back to Table of Contents</a>

* [GitHub Collab Ref.](https://github.com/)
* [Commet Collab Ref](https://www.comet.ml/) 
* [Kaggle Collab Ref](https://www.kaggle.com/c/edsa-climate-change-belief-analysis-2022/overview)