# Logistical Regression Analysis:

### Covers data loading, preprocessing, exploratory data analysis, feature selection, model training, evaluation, result interpretation and prediction for a new input.

**Introduction**
This project aims to predict whether it will rain tomorrow in Australia based on historical weather data using a logistic regression model.

**Dataset context**
This dataset sourced from kaggle contains about 10 years of daily weather observations from numerous Australian weather stations.

RainTomorrow is the target variable to predict. It means, did it rain the next day - Yes or No?
This column is Yes if the rain for that day was 1mm or more.

https://www.kaggle.com/datasets/jsphyg/weather-dataset-rattle-package

In [None]:
# Import necessary libraries for data analysis and modeling

import pandas as pd # For data manipulation and analysis
import numpy as np # For numerical computations
import matplotlib.pyplot as plt # For data visualisation
import seaborn as sns # For enhanced data visualisation

from sklearn.model_selection import train_test_split  # For splitting data into training and testing sets
from sklearn.preprocessing import StandardScaler  # For feature standardisation
from sklearn.impute import SimpleImputer  # For handling missing values
from sklearn.linear_model import LogisticRegression  # For logistic regression model
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report  # For model evaluation

In [None]:
df = pd.read_csv('weatherAUS 3.csv')

df.head()

In [None]:
df.shape

In [None]:
df.info()

In [None]:
# Identify columns with missing values
columns_with_missing = df.columns[df.isnull().any()]

# Handle missing values
for column in columns_with_missing:
    # Numerical columns: impute with mean
    if df[column].dtype == 'float64':
        imputer = SimpleImputer(strategy='mean')
        df[column] = imputer.fit_transform(df[[column]])
    # Categorical columns: impute with mode
    else:
        imputer = SimpleImputer(strategy='most_frequent')
        df[column] = imputer.fit_transform(df[[column]])

# Verify that all missing values have been handled
print(df.isnull().sum())

In [None]:
df['Date'].duplicated().sum()

Exploratory Data Analysis (EDA)

In [None]:
df['RainTomorrow'].value_counts()

In [None]:
# Visualise the distribution of the target variable
plt.figure(figsize=(8, 6))
sns.countplot(data=df, x='RainTomorrow')
plt.title('Distribution of Rain Tomorrow')
plt.xlabel('Rain Tomorrow')
plt.ylabel('Count')
plt.show()