# Classification Predict Student Solution

© Explore Data Science Academy

---
### Honour Code

I {**Team NM3**}, confirm - by submitting this document - that the solutions in this notebook are a result of my own work and that I abide by the [EDSA honour code](https://drive.google.com/file/d/1QDCjGZJ8-FmJE3bZdIQNwnJyQKPhHZBn/view?usp=sharing).

Non-compliance with the honour code constitutes a material breach of contract.

Team Members
1. Edidiong Udofia
2. Hlawulekani Rikhotso
3. Lesego Maponyane
4. Boitemogelo Tagane
5. Priscila Vhafuniwa Ndou
6. Fransisca Onyinyechukwu iloh
### Predict Overview: Climate Change Challenge

Many companies are built around lessening one’s environmental impact or carbon
footprint. They offer products and services that are environmentally friendly and
sustainable, in line with their values and ideals. They would like to determine how
people perceive climate change and whether or not they believe it is a real threat.
This would add to their market research efforts in gauging how their
product/service may be received.

With this context, EDSA is challenging you during the Classification Sprint with the
task of creating a Machine Learning model that is able to classify whether or not a
person believes in climate change, based on their novel tweet data.
Providing an accurate and robust solution to this task gives companies access to a
broad base of consumer sentiment, spanning multiple demographic and
geographic categories - thus increasing their insights and informing future
marketing strategies.

![gettyimages-586087414-2048x2048-smaller-scaled.jpg](attachment:gettyimages-586087414-2048x2048-smaller-scaled.jpg)

<a id="cont"></a>

## Table of Contents

<a href=#one>1. Introduction</a>

<a href=#two>2. Problem Statement</a>

<a href=#three>3. Importing Packages</a>

<a href=#four>4. Loading Data</a>

<a href=#five>5. Pre-processing</a>

<a href=#six>6. Exploratory Data Analysis (EDA)</a>

<a href=#seven>7. Data Engineering</a>

<a href=#eight>8. Modeling</a>

<a href=#nine>9. Model Performance</a>

<a href=#ten>10. Model Explanations</a>

<a href=#eleven>11. Conclusion</a>


 <a id="one"></a>
## **1. Introduction**

In the context of machine learning for sentiment analysis, sophisticated classification methods are useful instruments for deciphering the emotional nuance contained in textual data. These methods, which are similar to linear regression in principle, function as sophisticated sentiment interpreters. Examples of these methods include support vector machines, neural networks, and transformer models.

Similar to how linear regression makes connections between variables to reveal patterns in numerical data, sentiment analysis models navigate the complex terrain of text to identify the emotional content of every statement. For instance, when delving into the sentiment of a piece of text, a model may leverage techniques like word embeddings or transformer architectures to capture the contextual subtleties that define positive, negative, or neutral sentiments.

In this case, the equations driving these models are complex algorithms that analyze linguistic patterns rather than merely mathematical formulas. The endeavor is to capture the sentiment expressed, whether it be joy, discontent, or neutrality. With the help of these advanced techniques, sentiments within textual data can be understood more deeply, offering a sophisticated lens through which we can extract insightful information about the emotional content.





![SEC_165384176-94a7.webp](attachment:SEC_165384176-94a7.webp)

 <a id="one"></a>
## 2. Problem Statement
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Problem Statement ⚡ |
| :--------------------------- |
| In this section you are required to introduce and elaborate on the problem statement or challenge you are required to solve. |


Given a collection of tweets authored by an individual, develop a machine learning model to predict the individual's belief in climate change. The model should be able to identify patterns in the individual's language and sentiment that are indicative of their stance on climate change. The model should be trained on a large dataset of labeled tweets to ensure its generalizability and accuracy.

Potential Applications:

The proposed model could have various applications, including:

- Understanding public opinion on climate change: The model could be used to analyze large volumes of social media data to understand the general public's sentiment towards climate change.

- Identifying individuals with strong opinions on climate change: The model could be used to identify individuals who hold strong opinions on climate change, either for or against, for further research or targeted communication campaigns.

- Analyzing the effectiveness of climate change communication: The model could be used to analyze the effectiveness of different communication strategies in influencing public opinion on climate change.


 <a id="one"></a>
## 3. Importing Packages
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Importing Packages ⚡ |
| :--------------------------- |
| In this section you are required to import, and briefly discuss, the libraries that will be used throughout your analysis and modelling. |

---

In [None]:
# Libraries for data loading, data manipulation and data visulisation
import *

# Libraries for data preparation and model building
import *

# Setting global constants to ensure notebook results are reproducible
PARAMETER_CONSTANT = ###

<a id="two"></a>
## 4. Loading the Data
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Loading the data ⚡ |
| :--------------------------- |
| In this section you are required to load the data from the `df_train` file into a DataFrame. |

---

In [None]:
df = # load the data

<a id="two"></a>
## 5. Pre-processing
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Pre-processing ⚡ |
| :--------------------------- |
| In this section you are required to make the raw data suitable for modelling. |

In [None]:
# Pre-process the data

<a id="three"></a>
## 6. Exploratory Data Analysis (EDA)
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Exploratory data analysis ⚡ |
| :--------------------------- |
| In this section, you are required to perform an in-depth analysis of all the variables in the DataFrame. |

---


![1_Ra02AqsQlC0KV229EvM98g.jpg](attachment:1_Ra02AqsQlC0KV229EvM98g.jpg)

In [None]:
# look at data statistics

In [None]:
# plot relevant feature interactions

In [None]:
# evaluate correlation

In [None]:
# have a look at feature distributions

<a id="four"></a>
## 7. Data Engineering
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Data engineering ⚡ |
| :--------------------------- |
| In this section you are required to: clean the dataset, and possibly create new features - as identified in the EDA phase. |

---

![cb-dataengineervsscientist-060421-duplicate.png](attachment:cb-dataengineervsscientist-060421-duplicate.png)

In [None]:
# remove missing values/ features

In [None]:
# create new features

In [None]:
# engineer existing features

<a id="five"></a>
## 8. Modelling
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Modelling ⚡ |
| :--------------------------- |
| In this section, you are required to create one or more regression models that are able to accurately predict the thee hour load shortfall. |

---

![1682017713026.png](attachment:1682017713026.png)

In [None]:
# split data

In [None]:
# create targets and features dataset

In [None]:
# create one or more ML models

In [None]:
# evaluate one or more ML models

<a id="six"></a>
## 9. Model Performance
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Model performance ⚡ |
| :--------------------------- |
| In this section you are required to compare the relative performance of the various trained ML models on a holdout dataset and comment on what model is the best and why. |

---

![1_GmmzvXzqwkeX50HHSNzANg.png](attachment:1_GmmzvXzqwkeX50HHSNzANg.png)

In [None]:
# Compare model performance

In [None]:
# Choose best model and motivate why it is the best choice

<a id="seven"></a>
## 10. Model Explanations
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Model explanation ⚡ |
| :--------------------------- |
| In this section, you are required to discuss how the best performing model works in a simple way so that both technical and non-technical stakeholders can grasp the intuition behind the model's inner workings. |

---

In [None]:
# discuss chosen methods logic

<a id="seven"></a>
## 11. Conclusion
<a class="anchor" id="1.1"></a>
<a href=#cont>Back to Table of Contents</a>

---
    
| ⚡ Description: Conclusion ⚡ |
| :--------------------------- |
| In this section, you are required to conclude your findings and the project as a whole. |

---

In [None]:
# Conclusion