<a href="https://colab.research.google.com/github/Kunal-s-git/Kunal-s-git/blob/main/Indian_Weather_Predictor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Just a teaser run all the cells.

In [None]:
m2

#Description
This dataset provides real-time weather information for major cities in India. Unlike forecast data, this dataset offers a comprehensive set of features that reflect the current weather conditions.
Starting from August 29, 2023.
It provides over 40+ features, including temperature, wind, pressure, precipitation, humidity, visibility, and air quality measurements. This dataset is a valuable resource for analyzing India's present weather trends and exploring the relationships between various weather parameters.

Dataset : https://www.kaggle.com/datasets/nelgiriyewithana/indian-weather-repository-daily-snapshot



---



#Potential use cases
**Weather trend analysis**: Analyze historical weather data to identify long-term patterns and trends.

**Geospatial analysis**: Explore geographical variations in weather conditions across different regions.

**Weather condition correlations**: Investigate relationships between various weather parameters and their effects on each other.

**Air quality impact**: Study the impact of weather conditions on air quality measurements.

**Celestial events analysis**: Examine correlations between celestial events and weather phenomena.

In [None]:
#Importing lib's
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import geopandas as gpd
import folium
from folium import Choropleth,Circle,Marker
from folium.plugins import HeatMap

Upload the dataset and making the dataframe.

In [None]:
weather=pd.read_csv('/content/IndianWeatherRepository.csv')

In [None]:
print(weather.info())

Removing the redundant features.

In [None]:
weather.drop(['last_updated_epoch','last_updated','temperature_fahrenheit','wind_mph',
              'pressure_in','precip_in','feels_like_fahrenheit','visibility_miles',
              'gust_mph'],axis=1,inplace=True)

Seeing the distribution of the data in dataset.

In [None]:
weather.hist(bins=10,figsize=(20,15))
plt.show()

Plotting the correlation matrix.

In [None]:
cor=weather.corr().round(3)
plt.figure(figsize=(15,15))
heatmap=sns.heatmap(cor,annot=True,cmap='BrBG',linewidths=0.1)
plt.show()

#Objective:
We will be making a model that predicts the temperatuer in celsius using different techniques and determine the best one for use.

Importing the must need libraries.

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.callbacks import EarlyStopping

from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split as tts
from sklearn.metrics import f1_score,r2_score
from sklearn.preprocessing import StandardScaler

Selecting the features for the model.

In [None]:
X=weather[['latitude','longitude','wind_kph','wind_degree','pressure_mb',
           'precip_mm','humidity','cloud','uv_index','gust_kph',]]
y=weather['temperature_celsius']
print(f'Shape of features : {X.shape}')
print(f'Shape of target : {y.shape}')

Shape of features : (105708, 10)
Shape of target : (105708,)


Splitting the data for training and testing.

In [None]:
xtrain,xtest,ytrain,ytest=tts(X,y,test_size=0.2,shuffle=True,random_state=10 )
print(f'Shape of training data : {xtrain.shape}',f'Shape of testing data : {xtest.shape}',)

Shape of training data : (84566, 10) Shape of testing data : (21142, 10)




---


First we will start with a simple Linear Regression model and see it's accuracy on test data.

In [None]:
lr_model=LinearRegression()
lr_model.fit(xtrain,ytrain)
# ypred=lr_model.predict(xtest)
print(f'Model\'s accuracy : {lr_model.score(xtest,ytest).round(2)*100}%')

Model's accuracy : 73.0%


With the score of 73% Linear Regression model seems to be working fine. But we are working with a real world data that changes or update daily, so we need the model to perform better.
Let's go to a different regressor now


---



Now we will be using the Random Forest model to perform predictions.

In [None]:
rf_model=RandomForestRegressor(n_estimators=100)
rf_model.fit(xtrain,ytrain)
ypred=rf_model.predict(xtest)
print(f'Model\'s accuracy : {rf_model.score(xtest,ytest).round(2)*100}%')

Model's accuracy : 94.0%


94% seems to be great for predicting the temperature. Loss of 6% seems to be because of the random noise in the data or due to random shuffle of the data points during the splitting.

Now let's see the important features for predicting the temperature.

In [None]:
imp_ft=pd.Series(rf_model.feature_importances_,index=xtrain.columns).sort_values(ascending=False)
print(imp_ft)

pressure_mb    0.531967
latitude       0.265459
longitude      0.072991
humidity       0.045897
wind_degree    0.025605
cloud          0.020817
gust_kph       0.015977
wind_kph       0.012487
uv_index       0.005902
precip_mm      0.002897
dtype: float64


In [None]:
plt.figure(figsize=(12,12))
plt.plot(imp_ft,marker='*')
plt.show()

With this graph it seems like **Pressuer, Latitude, Longitude , Humidity,** are some of the important measuers when predicting the Temperature.


---



Now let's move to the last perdicting model that is the famous Neural Networks.
For that we will Standardize the data.

In [None]:
scale=StandardScaler()
xtrain_=scale.fit_transform(xtrain)
xtest_=scale.fit_transform(xtest)

Using TensorFlow's keras for making a Sequential model, with 5 hidden layers.

In [None]:
shape=xtrain.shape[1]
early_stopping=EarlyStopping(min_delta=0.005,patience=10,restore_best_weights=True)
tf_model=keras.Sequential([
    layers.BatchNormalization(input_shape=[shape]),
    layers.Dense(1024,'relu'),
    layers.Dense(512,'relu',),
    layers.Dropout(0.3),layers.BatchNormalization(),
    layers.Dense(256,'relu'),
    layers.Dropout(0.3),layers.BatchNormalization(),
    layers.Dense(64,'relu'),
    layers.Dropout(0.2),layers.BatchNormalization(),
    layers.Dense(32,'relu'),
    layers.Dropout(0.1),layers.BatchNormalization(),
    layers.Dense(1)
])
tf_model.compile(loss='mae',optimizer='adam',metrics=['accuracy'])
#tf_model.summary()

In [None]:
hist=tf_model.fit(xtrain_,ytrain,validation_data=(xtest_,ytest),batch_size=256,epochs=200,callbacks=[early_stopping],verbose=0)

In [None]:
hist_df=pd.DataFrame(hist.history)
hist_df.loc[:,['loss','val_loss']].plot()
hist_df.loc[:,['accuracy','val_accuracy']].plot()
print('Minimum validation loss : {}'.format(hist_df['val_loss'].min()))
print('Maximum validation accuracy : {}'.format(hist_df['val_accuracy'].max()))
print('Maximum accuracy : {}'.format(hist_df['accuracy'].max()))

After seeing the results of the tensorflow model we can say that :


*   Model seems to generalise ok but is definetly Underfitting.

And I've spent lot of time to make the model work but it does not seems to work.
It's accuracy and loss measuer are also not good, So we won't be using this model for predictions.

---




Let's define a function that predicts the temperature for new data.

In [None]:
example=xtrain[1:2]
print('Do you want to Predict temperature for new data :\t1.Yes\t2.No\n')
ans=str(input())
if ans=='yes' or ans=='Yes':
  print('Please enter your details in respective order')
  print('Example:\n',example,'\t\t *don\'t enter brackets*')
  new_data=input('Seprate values with space:')
  new_data=new_data.split(' ')
  new_data=[[eval(i) for i in new_data]]

Do you want to Predict temperature for new data :	1.Yes	2.No

no


In [None]:
print(f'Acctual temperature of example : {ytrain[1:2]} celsius')  #first no is the index
print('Temperature predicted by the model : ')
whats_the_temp(example)

Acctual temperature of example : 22587    24.6
Name: temperature_celsius, dtype: float64 celsius
Temperature predicted by the model : 
Temperature is [24.724] celsius


Acctual temperature and the predicted temperature are relatively close. And it shows that the model is performing great.

In [None]:
whats_the_temp(new_data)

In [None]:
def whats_the_temp(new_data):
  temp=rf_model.predict(new_data)
  return print(f'Temperature is {temp} celsius')



---

Now let's do some **Geospatial Analysis**.

In [None]:
weather=gpd.GeoDataFrame(weather,geometry=gpd.points_from_xy(weather.latitude,weather.longitude))
print(weather.info())

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 105708 entries, 0 to 105707
Data columns (total 34 columns):
 #   Column                        Non-Null Count   Dtype   
---  ------                        --------------   -----   
 0   country                       105708 non-null  object  
 1   location_name                 105708 non-null  object  
 2   region                        105708 non-null  object  
 3   latitude                      105708 non-null  float64 
 4   longitude                     105708 non-null  float64 
 5   timezone                      105708 non-null  object  
 6   temperature_celsius           105708 non-null  float64 
 7   condition_text                105708 non-null  object  
 8   wind_kph                      105708 non-null  float64 
 9   wind_degree                   105708 non-null  int64   
 10  wind_direction                105708 non-null  object  
 11  pressure_mb                   105708 non-null  float64 
 12  precip_mm             

In [None]:
ind_temp=weather[['latitude','longitude','temperature_celsius','geometry',]].set_index('temperature_celsius').copy()
loc_=[28.571101,77.074686]  #location
print(ind_temp.head())

                     latitude  longitude                   geometry
temperature_celsius                                                
27.5                    24.57      77.72  POINT (24.57000 77.72000)
27.5                    23.33      77.80  POINT (23.33000 77.80000)
26.3                    22.07      78.93  POINT (22.07000 78.93000)
25.6                    21.86      77.93  POINT (21.86000 77.93000)
27.2                    22.75      77.72  POINT (22.75000 77.72000)


In [None]:

m1=folium.Map(location=loc_,tiles='openstreetmap',zoom_start=7)
Marker([loc_[0],loc_[1]]).add_to(m1)
HeatMap(data=ind_temp[['latitude','longitude',]],radius=10).add_to(m1)
m1

In [None]:
plot_dic=ind_temp.index.value_counts()
m2=folium.Map(location=loc_,tiles='openstreetmap',zoom_start=7)
Choropleth(geo_data=ind_temp.__geo_interface__,data=plot_dic,key_on='feature.id',
           fill_color='YlGnBu',legend_name='Temperature varition in INDIA').add_to(m2)
m2



---


**With this we made a model that predicts the temperature in INDIA
using MACHINE LEARNING techniques.**

---

