# Weather Forecasting with Machine Learning

In this notebook, we will explore weather forecasting using machine learning techniques. We'll analyze historical temperature data, perform exploratory data analysis, visualize patterns, and build a predictive model to forecast future temperatures.

The project will cover:
1. Data loading and preprocessing
2. Exploratory data analysis and visualization
3. Clustering analysis to identify temperature patterns
4. Seasonal weather analysis
5. Building a machine learning model for temperature forecasting
6. Predicting temperatures for future months

Let's begin by importing the necessary libraries and loading our dataset.


In [None]:
# Library Imports
import numpy as np # For Linear Algebra
import pandas as pd # To Work With Data
# for visualizations
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from datetime import datetime # Time Series analysis.


## Data Loading and Preprocessing

Now we'll load our weather dataset and perform initial preprocessing steps to prepare it for analysis.


In [None]:
# Data Loading
df = pd.read_csv("Weather.csv")

# View first 5 rows
df.head() # This will show us top 5 rows of the dataset by default

# Fix unnamed column issue
df = pd.read_csv("Weather.csv", index_col=0)


## Data Preprocessing

Now we'll transform our data into a format suitable for time series analysis by melting the dataframe and creating date attributes.


In [None]:
# Create date attribute for timeline analysis
df1 = pd.melt(df, id_vars='YEAR', value_vars=df.columns[1:])
df1.head()

# Create Date column by combining year and month
df1['Date'] = df1['variable'] + ' ' + df1['YEAR'].astype(str)
df1.loc[:,'Date'] = df1['Date'].apply(lambda x : datetime.strptime(x, '%b %Y'))
df1.head()

# Rename columns for clarity
df1.columns=['Year', 'Month', 'Temperature', 'Date']
df1.sort_values(by='Date', inplace=True) ## To get the time series right


## Exploratory Data Analysis and Visualization

Let's visualize our temperature data to understand patterns and trends over time.


In [None]:
# Temperature through time visualization
fig = go.Figure(layout = go.Layout(yaxis=dict(range=[0, df1['Temperature'].max()])))
fig.add_trace(go.Scatter(x=df1['Date'], y=df1['Temperature']))
fig.update_layout(title='Temperature Throught Timeline:',
                 xaxis_title='Time', yaxis_title='Temperature')
fig.update_layout(xaxis=go.layout.XAxis(
    rangeselector=dict(
        buttons=list([dict(label="Whole View", step="all"),
                     dict(count=1,label="One Year View",step="year")
                     ])),
    rangeslider=dict(visible=True),type="date"))
fig.show()


In [None]:
# Monthly temperature patterns
fig = px.box(df1, 'Month', 'Temperature')
fig.update_layout(title='Warmest, Coldest and Median Monthly Temperature')
fig.show()


## Clustering Analysis

Next, we'll use K-means clustering to identify patterns in our temperature data. First, we'll determine the optimal number of clusters.


In [None]:
# Clustering analysis
from sklearn.cluster import KMeans
sse = []
target = df1['Temperature'].to_numpy().reshape(-1,1)
num_clusters = list(range(1, 10))

for k in num_clusters:
    km = KMeans(n_clusters=k)
    km.fit(target)
    sse.append(km.inertia_)

fig = go.Figure(data=[
    go.Scatter(x = num_clusters, y=sse, mode='lines'),
    go.Scatter(x = num_clusters, y=sse, mode='markers')
])

fig.update_layout(title="Evaluation on number of clusters:",
                 xaxis_title = "Number of Clusters:",
                 yaxis_title = "Sum of Squared Distance",
                 showlegend=False)
fig.show()


In [None]:
# Apply KMeans with 3 clusters
km = KMeans(3)
km.fit(df1['Temperature'].to_numpy().reshape(-1,1))
df1.loc[:,'Temp Labels'] = km.labels_
fig = px.scatter(df1, 'Date', 'Temperature', color='Temp Labels')
fig.update_layout(title = "Temperature clusters.",
                xaxis_title="Date", yaxis_title="Temperature")
fig.show()


In [None]:
# Frequency distribution of temperature
fig = px.histogram(x=df1['Temperature'], nbins=200, histnorm='')
fig.update_layout(title='Frequency chart of temperature readings:',
                xaxis_title='Temperature', yaxis_title='Count')
fig.show()


## Yearly and Monthly Temperature Analysis

Now let's analyze yearly average temperatures and monthly patterns over time.


In [None]:
# Yearly average temperature analysis
df['Yearly Mean'] = df.iloc[:,1:].mean(axis=1) ## Axis 1 for rows
fig = go.Figure(data=[
    go.Scatter(name='Yearly Temperatures', x=df['YEAR'], y=df['Yearly Mean']),
    go.Scatter(name='Yearly Temperatures', x=df['YEAR'], y=df['Yearly Mean'])
])
fig.update_layout(title='Yearly Mean Temperature:',
                xaxis_title='Time', yaxis_title='Temperature')
fig.show()


In [None]:
# Monthly temperatures through history
fig = px.line(df1, 'Year', 'Temperature', facet_col='Month', facet_col_wrap=4)
fig.update_layout(title='Monthly temperature throughout history:')
fig.show()


## Seasonal Weather Analysis

Let's analyze temperature patterns by seasons to better understand yearly climate cycles.


In [None]:
# Seasonal Weather Analysis
df['Winter'] = df[['DEC', 'JAN', 'FEB']].mean(axis=1)
df['Summer'] = df[['MAR', 'APR', 'MAY']].mean(axis=1)
df['Monsoon'] = df[['JUN', 'JUL', 'AUG', 'SEP']].mean(axis=1)
df['Autumn'] = df[['OCT', 'NOV']].mean(axis=1)
seasonal_df = df[['YEAR', 'Winter', 'Summer', 'Monsoon', 'Autumn']]
seasonal_df = pd.melt(seasonal_df, id_vars='YEAR', value_vars=['Winter', 'Summer', 'Monsoon', 'Autumn'])
seasonal_df.columns=['Year', 'Season', 'Temperature']

fig = px.scatter(seasonal_df, 'Year', 'Temperature', facet_col='Season')
fig.update_layout(title='Seasonal mean temperatures throughout years:')
fig.show()


## Animation Visualization

Let's create an animated visualization to see how temperature patterns change over the years.


In [None]:
# Animation visualization
px.scatter(df1, 'Month', 'Temperature', size='Temperature', animation_frame='Year')


## Weather Forecasting with Machine Learning

Now we'll build a machine learning model to forecast future temperature values. We'll use a Decision Tree Regressor as our model since the data shows non-linear patterns.


In [None]:
# Import required libraries for machine learning
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Prepare data for modeling
df2 = df1[['Year', 'Month', 'Temperature']].copy()
df2 = pd.get_dummies(df2)
y = df2[['Temperature']]
X = df2.drop(columns='Temperature')

# Create and train model
dtr = DecisionTreeRegressor()
train_x, test_x, train_y, test_y = train_test_split(X,y,test_size=0.2)
dtr.fit(train_x, train_y)
pred = dtr.predict(test_x)
r2_score(test_y, pred)


## Forecasting for 2018

Now that we have a trained model with good accuracy, let's use it to forecast temperatures for 2018.


In [None]:
# Forecast for 2018
next_Year = df1[df1['Year']==2017][['Year', 'Month']]
next_Year.Year.replace(2017,2018, inplace=True)
next_Year= pd.get_dummies(next_Year)
temp_2018 = dtr.predict(next_Year)

# Format the forecast results
temp_2018 = {'Month':df1['Month'].unique(), 'Temperature':temp_2018}
temp_2018=pd.DataFrame(temp_2018)
temp_2018['Year'] = 2018
temp_2018


## Conclusion

In this project, we've successfully analyzed historical temperature data and built a machine learning model to forecast future temperatures. 

We performed:
1. Data preprocessing and exploration
2. Visualization of temperature trends over time
3. Clustering analysis to identify temperature patterns
4. Seasonal weather analysis
5. Machine learning model training with a Decision Tree Regressor
6. Temperature forecasting for 2018

The model achieved a high R² score, indicating good predictive performance. This approach demonstrates how machine learning can be applied to weather forecasting tasks.
