## Problem statement
The problem we aim to solve here is predicting the Fire Weather Index (FWI) for the Bejaia and Sidi Bel-abbes regions in Algeria. The FWI is an important measure of fire danger, which indicates the likelihood of forest fires based on weather conditions. By using regression algorithms, we want to build a mathematical model that can understand how different weather factors (like temperature, humidity, wind speed, and rain) and FWI components (FFMC, DMC, DC, ISI, BUI) influence the FWI.

To achieve this, we will use the available historical data to train the regression models. This data contains information about weather conditions and corresponding FWI values for various days from June to September in 2012. Once the models are trained, we can use them to make predictions about the FWI for future dates based on the expected weather conditions.

The ultimate aim is to have accurate models that can help us predict the Fire Weather Index, which can be valuable for fire management and prevention strategies in these regions of Algeria.

## Dataset information
The dataset includes 244 instances that regroup a data of two regions of Algeria,namely the Bejaia region located in the northeast of Algeria and the Sidi Bel-abbes region located in the northwest of Algeria.

122 instances for each region.

The period from June 2012 to September 2012. The dataset includes 11 attribues and 1 output attribue (class) The 244 instances have been classified into fire(138 classes) and not fire (106 classes) classes.

Dataset columns:

**Date** : (DD/MM/YYYY) Day, month ('june' to 'september'), year (2012) Weather data observations

**Temp** : temperature noon (temperature max) in Celsius degrees: 22 to 42

**RH** : Relative Humidity in %: 21 to 90

**Ws** :Wind speed in km/h: 6 to 29

**Rain**: total day in mm: 0 to 16.8 FWI Components

**Fine Fuel Moisture Code (FFMC)** index from the FWI system: 28.6 to 92.5

**Duff Moisture Code (DMC)** index from the FWI system: 1.1 to 65.9

**Drought Code (DC)** index from the FWI system: 7 to 220.4

**Initial Spread Index (ISI)** index from the FWI system: 0 to 18.5

**Buildup Index (BUI)** index from the FWI system: 1.1 to 68

**Fire Weather Index (FWI)** Index: 0 to 31.1

**Classes**: two classes, namely Fire and not Fire



**I have already completed the Exploratory Data Analysis (EDA) and feature engineering in my previous notebook. Now, I am using the cleaned dataset from that notebook to make predictions and solve the given problem.** 

You can download the cleaned or processed dataset from the provided link below.

### Algerian Forest Fires Processed Dataset
- Algerian Forest Fires Processed Dataset :👉  **[Link](https://www.kaggle.com/datasets/sudhanshu432/algerian-forest-fires-cleaned-dataset)**

### Notebook for Exploratory Data Analysis and Feature Engineering Applied On Algerian Forest Fires Dataset
- Here is the notebook we used earlier to analyze and improve the Algerian Forest Fires dataset through Exploratory Data Analysis (EDA) and Feature Engineering (FE) :👉 **[Link](https://www.kaggle.com/code/sudhanshu432/eda-and-fe-algerian-forest-fires-dataset)**


## Regression
Regression is a valuable and widely used tool in the world of data science and machine learning. It empowers us to explore and predict the connections between multiple factors. In simpler terms, regression allows us to uncover a mathematical equation that links one factor (the thing we want to figure out) with one or more other factors (the things we think influence it).

Think about having a bag full of various fruits, and you're curious about how the weight of a fruit is related to its size. With regression, you can discover a formula that explains how the weight changes when the size varies. This formula becomes your guide for making predictions about a fruit's weight based on its size. In essence, regression helps us unravel the mysteries hidden within our data and make informed forecasts.






## The Importance of Regression Analysis

Imagine you're in a neighborhood where you know the prices of houses, and you're eyeing a new house. You're curious about how much that house should cost based on its characteristics like size, the number of rooms, and where it's located. That's where regression comes into play. It's like having a crystal ball that can predict the house's price for you.

But regression isn't just for house hunting; it's a powerful tool for uncovering connections between things. Let's say you're a student wondering if the time you spend studying has a real impact on your exam scores. Regression can dig into your study habits and show you if there's a strong link between hitting the books and acing those tests.

In the real world, things get complicated. There are tons of variables at play, and it's not always clear which ones truly matter. Regression helps cut through the confusion. It's like being a detective, finding the important clues in a sea of information. It helps businesses, like your favorite store, predict how much of a product they'll need so you never run out of your must-haves.

In a nutshell, regression is your trusty sidekick when you want to understand how things are connected, make predictions, and make smart decisions based on data. It's the tool that guides you through the maze of information in our data-driven world.

In summary, regression is a valuable tool for
1. **Prediction:** Let's say you have information about the price of houses in a neighborhood and want to know the price of a new house. Regression helps you make a prediction based on the features of the new house, such as its size, number of rooms, and location.
2. **Understanding Relationships :** Regression helps us understand how different factors influence each other. For example, you might want to know how the amount of time spent studying affects exam scores. Regression can show you if there is a strong relationship between study time and scores.
3. **Identifying Important Factors:** In complex situations with many variables, regression helps us figure out which factors have a significant impact on the outcome. It helps us separate the essential factors from the ones that don't matter much.
4. **Decision Making:** Organizations and businesses use regression to make informed decisions. For instance, a company might use regression to predict customer demand for a product, helping them plan their production and inventory efficiently.


--------------------------------------------------------------------------------or------------------------------------------------------------------------

Regression is a valuable tool for understanding relationships between variables and making predictions. Its need arises whenever we want to understand, predict, or make decisions based on data and the relationships between different factors.

 **********************************************************************************************************************************************************

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
df=pd.read_csv('/kaggle/input/algerian-forest-fires-cleaned-dataset/Algerian_forest_fires_cleaned_dataset.csv')

In [None]:
df.head()

In [None]:
df.columns

In [None]:
##drop month,day and yyear
df.drop(['day','month','year'],axis=1,inplace=True)

In [None]:
df.head()

In [None]:
df['Classes'].value_counts()

## Encoding

In [None]:
## Encoding
df['Classes']=np.where(df['Classes'].str.contains("not fire"),0,1)

In [None]:
df.tail()

In [None]:
df['Classes'].value_counts()

## Independent And dependent features

In [None]:
## Independent And dependent features
X=df.drop('FWI',axis=1)
y=df['FWI']

In [None]:
X.head()

In [None]:
y

## Train Test Split

In [None]:
#Train Test Split
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=42)

In [None]:
X_train.shape,X_test.shape

## Correlation:

Correlation is like a math tool that helps us see if two things are connected.

Imagine you have data on two things: how much you study each day and how well you do on your exams. Correlation helps us figure out if studying more or less has anything to do with getting higher or lower scores on your tests.

**There are two kinds of correlation:**

#### 1. Positive Correlation:
This is when studying more is connected to getting better exam scores. So, if you spend more time studying, you usually get higher test scores. Think of it like this: more study, better grades.

#### 2. Negative Correlation: 
Here, it's the opposite. If you study more, you tend to get lower scores, or if you study less, you get higher scores. This might mean that too much studying or too little studying isn't great for your test results.

####  No Correlation: 
This simply means that there's no clear link between how much you study and how well you do on exams. Studying more or less doesn't seem to consistently change your scores.

Understanding correlation is helpful because it shows us patterns. For example:

- If we find a positive correlation, it means spending more time studying could lead to better grades.

- If there's a negative correlation, it might mean we need to find a balance between studying and other things to improve our scores.

Remember, correlation doesn't prove that one thing causes another. It just tells us they're connected in some way. So, it's a useful tool for making smart decisions and seeing how things relate to each other.

## What is Feature Selection? 

Imagine you're trying to cook a delicious meal, and you have a lot of ingredients in your kitchen. Some ingredients make your dish taste amazing, while others don't add much flavor. Feature selection is like choosing the best ingredients for your recipe.

In feature selection for a regression problem (predicting a number), you have a bunch of characteristics or features for each thing you're trying to predict. You want to pick the most important ones that actually help you make good predictions, without making things too complicated.

Here are some ways to do this:

1. Univariate Feature Selection: You look at each feature by itself and see how well it's connected to what you're trying to predict. If it's not very useful, you leave it out.

2. Recursive Feature Elimination (RFE): You start with all your features and then keep removing the less important ones one by one until you're left with only the most important ones.

3. LASSO: This is like giving a penalty to some features, making them less important. It encourages the model to ignore features that aren't very helpful.

4. Feature Importance from Tree-based Models: You use a special kind of model to figure out which features have the most impact. It's like asking a tree which ingredients are the most important for your recipe.

5. Correlation and Domain Knowledge: You check if some features are related to each other or if you already know some features are important based on what you know about the problem. This helps you choose the right ingredients based on common sense.

So, feature selection is like being a smart chef in the kitchen. You only use the ingredients that make your dish taste the best, and you leave out the ones that don't make a difference. This way, you end up with a simpler and tastier meal, or in this case, a better regression model.




We are using **Correlation and Domain Knowledge** technique for our problem

## Feature Selection based on correlaltion

In [None]:
## Feature Selection based on correlaltion
X_train.corr()

## Observations:

**Positive Correlation:**

Temperature has a strong positive correlation with FFMC (0.69) and ISI (0.62). FFMC, DMC, DC, ISI, BUI, and FWI are positively correlated with each other, with correlation coefficients ranging from 0.58 to 0.76.

**Negative Correlation:**

Temperature has a strong negative correlation with RH (-0.65). RH is negatively correlated with FFMC, DMC, DC, ISI, BUI, and FWI, with correlation coefficients ranging from -0.68 to -0.58. Rain is negatively correlated with FFMC, DMC, DC, ISI, BUI, and FWI, but the correlation is relatively weak (between -0.37 and -0.04).

**Weak Correlation:**

The correlation between Ws (wind speed) and other variables is weak, with coefficients ranging from -0.18 to 0.07. The correlation between the 'Region' variable and other variables is also weak, with coefficients ranging from -0.40 to 0.26.

Lets Visualize co relation using Heatmap given below

## Feature Selection

In [None]:
## Check for multicollinearity
plt.figure(figsize=(12,10))
corr=X_train.corr()
sns.heatmap(corr,annot=True)


In [None]:
X_train.corr()

In [None]:
def correlation(dataset, threshold):
    col_corr = set()
    corr_matrix = dataset.corr()
    for i in range(len(corr_matrix.columns)):
        for j in range(i):
            if abs(corr_matrix.iloc[i, j]) > threshold:
                colname = corr_matrix.columns[i]
                col_corr.add(colname)
    return col_corr

In [None]:
## threshold--Domain expertise
corr_features=correlation(X_train,0.85)

We have set the threshold to 0.85, we are looking for features that have a very strong correlation with each other. These highly correlated features can potentially redundant in our model, and removing one of them can help improve the our model's accuracy and make it easier to interpret the relationships between features and the target variable. The threshold is a way to identify which features are so strongly related that we might not need both of them in our regression model

In [None]:
corr_features

BUI' and 'DC' are highly correlated, they might be redundant in our regression model. To avoid multicollinearity issues and improve the model's performance, we decide to remove "DC" from one of these features.

In [None]:
## drop features when correlation is more than 0.85
X_train.drop(corr_features,axis=1,inplace=True)
X_test.drop(corr_features,axis=1,inplace=True)
X_train.shape,X_test.shape

## Multicollinearity


Multicollinearity is like having two or more ingredients in a recipe that taste almost the same or do similar things. It's when some features in your data are so related that it confuses a regression model, like having two chefs doing the same cooking task together, and you can't tell whose effort made the dish taste a certain way.

For example, let's say you have two ingredients, 'BUI' and 'DC,' and they are very similar in how they affect your dish. They have a correlation score of 0.85, which is higher than a threshold you set. This means they are strongly connected.

Now, if you use both 'BUI' and 'DC' in your recipe (or regression model), it's like having two chefs in the kitchen doing the same job. The model can't figure out who is more responsible for the final taste (or the target variable).

To avoid this confusion, you decide to remove one of these very similar ingredients, 'DC,' from your recipe. By doing this, you make sure that the model can focus on the unique contribution of 'BUI' without getting mixed up by the similar effect of 'DC.' It's like having just one chef doing the cooking job, so you can clearly see the impact of each ingredient.

## What is Feature Scaling 

**Feature scaling** is the technique to bring all the features to the same scale. If we don’t bring the features to the same scale, the model tends to give higher weightage to higher values and lower weightage to lower values irrespective of the units of values. 
Feature scaling is bringing continuous variables to the same scale. 

For example, student A got 60 out of 100 in subject 1, 120 out of 150 in subject 2, 180 out of 200 in subject 3.

After rescaling to 10, Student A got 6 out of 10 in subject 1, 8 out of 10 in subject 2, 9 out of 10 in subject 3


### Why Feature Scaling

Machine learning algorithms like linear regression, logistic regression, neural network, etc. that use gradient descent as an optimization technique require data to be scaled. The difference in ranges of features will cause different step sizes for each feature. To ensure that the gradient descent moves smoothly towards the minima and that the steps for gradient descent are updated at the same rate for all the features, we scale the data before feeding it to the model.

## What is Standardization
It is another scaling technique where the values are centered around the mean with a unit standard deviation. This means that the mean of the attribute becomes zero and the resultant distribution has a unit standard deviation.
$$X_{new}=\frac{x_i-X_{mean}}{u}$$

## Implementing Feature Scaling Or Standardization using Sklearn

In [None]:
from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()
X_train_scaled=scaler.fit_transform(X_train)
X_test_scaled=scaler.transform(X_test)

In [None]:
X_train_scaled

## Box Plots To understand Effect Of Standard Scaler

In [None]:
plt.subplots(figsize=(15, 5))
plt.subplot(1, 2, 1)
sns.boxplot(data=X_train)
plt.title('X_train Before Scaling')
plt.subplot(1, 2, 2)
sns.boxplot(data=X_train_scaled)
plt.title('X_train After Scaling')

**Before Scaling (Left Plot):** The left plot shows the distribution of the features in the original training dataset (X_train) before scaling. Each feature is represented by a box, and the box contains a range of values. The box represents the interquartile range (IQR), which is the middle 50% of the data. The line inside the box represents the median value, and the "whiskers" extending from the box show the minimum and maximum values within a certain range. Any points beyond the whiskers are considered outliers.

This plot allows us to see how the data is spread out for each feature. If the boxes are narrow, it means the data values are close together, and if they are wide, it means the data values are more spread out.

**After Scaling (Right Plot):** The right plot shows the distribution of the features in the training dataset (X_train_scaled) after scaling. The features have been scaled to have zero mean and unit variance.

By scaling the features, we make sure that each feature contributes equally to the model. It also helps certain machine learning algorithms converge faster and improves model performance.

## Performance Metrics

**Mean Absolute Error (MAE):**
The Mean Absolute Error is a measure of how well a model predicts the actual values. It calculates the average difference between the predicted values and the actual values, ignoring whether the differences are positive or negative. In other words, it measures how far, on average, the predicted values are from the actual values. In simpler terms, the MAE tells us, on average, how far off our predictions are from the actual values.

For example, suppose we are predicting house prices, and the MAE is 1.7. This means that, on average, our predictions differ from the actual house prices by $1.7k. The lower the MAE, the better the model's predictions are, as it means the model is making more accurate predictions.

**R2 Score:**
The R2 Score, also known as the coefficient of determination, is a measure of how well the model explains the variance in the data. It ranges from 0 to 1, where 0 indicates that the model does not explain any variance, and 1 means the model perfectly explains the variance.

In simpler terms, the R2 Score tells us how much of the variation in the actual values is captured by the model's predictions. A higher R2 Score (closer to 1) indicates that the model is doing a good job of explaining the variability in the data, and its predictions are in line with the actual values.

For example, if the R2 Score is 0.90, it means that about 90% of the variation in the actual values can be explained by the model's predictions. This is a good R2 Score, showing that the model is capturing a significant amount of the data's variability and making accurate predictions.

## Linear Regression Model

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score
linreg=LinearRegression()
linreg.fit(X_train_scaled,y_train)
y_pred=linreg.predict(X_test_scaled)
mae=mean_absolute_error(y_test,y_pred)
score=r2_score(y_test,y_pred)
print("Mean absolute error: ", mae)
print("R2 Score: ", score)

## Lasso Regression

In [None]:
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score
lasso=Lasso()
lasso.fit(X_train_scaled,y_train)
y_pred=lasso.predict(X_test_scaled)
mae=mean_absolute_error(y_test,y_pred)
score=r2_score(y_test,y_pred)
print("Mean absolute error: ", mae)
print("R2 Score: ", score)

## Ridge Regression model

In [None]:
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score
ridge=Ridge()
ridge.fit(X_train_scaled,y_train)
y_pred=ridge.predict(X_test_scaled)
mae=mean_absolute_error(y_test,y_pred)
score=r2_score(y_test,y_pred)
print("Mean absolute error: ", mae)
print("R2 Score: ", score)

## Elasticnet Regression

In [None]:
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score
elastic=ElasticNet()
elastic.fit(X_train_scaled,y_train)
y_pred=elastic.predict(X_test_scaled)
mae=mean_absolute_error(y_test,y_pred)
score=r2_score(y_test,y_pred)
print("Mean absolute error: ", mae)
print("R2 Score: ", score)

## Top Performing models¶
**Ridge Regression:**

- Mean Absolute Error (MAE): 0.498
- R2 Score: 0.988

**Linear Regression:**

- Mean Absolute Error (MAE): 0.482
- R2 Score: 0.989

## Conclusion
In both cases, the R2 Score is quite high, which means that both models do a good job of explaining the variation in the data and making accurate predictions.

However, when it comes to the Mean Absolute Error (MAE), we can see that Linear Regression has a slightly lower value (0.482) compared to Ridge Regression (0.498).

Considering both the R2 Score and MAE, we can conclude that the Linear Regression model performs slightly better than the Ridge Regression model in terms of prediction accuracy. 
but **We would prefer using the Ridge Regression model for making predictions because Ridge Regression can effectively address the problem of overfitting.**

## Pickling
Python pickle module is used for serialising and de-serialising a Python object structure. Any object in Python can be pickled so that it can be saved on disk. What pickle does is that it “serialises” the object first before writing it to file. Pickling is a way to convert a python object (list, dict, etc.) into a character stream. The idea is that this character stream contains all the information necessary to reconstruct the object in another python script.

In [None]:
import pickle
pickle.dump(scaler,open('scaler.pkl','wb'))
pickle.dump(ridge,open('ridge.pkl','wb'))

## Thank you for checking out notebook!

That concludes our exploration in this Jupyter Notebook. If you found this notebook helpful or insightful, I would greatly appreciate it if you could **upvote it.**

Thank you once again for your time and consideration. If you have any feedback or suggestions, feel free to leave a comment. I'm always open to learning and improving.

 
 
 
 

Thanks & Regards,

Sudhanshu Kumar