# Electricity Demand Data Analysis and Processing

This project involves integrating multiple CSV and JSON files containing electricity demand and weather data. The objectives are to clean and preprocess the data, detect and handle outliers, perform exploratory data analysis (EDA), and build a regression model to predict electricity demand. The final deliverables are a cleaned CSV file and this Jupyter Notebook documenting the entire process.


## Environment Setup

The following libraries are required for this project:
- pandas
- matplotlib
- seaborn
- scikit-learn
- statsmodels
- json

You can install these using pip:



import os
import pandas as pd
import json
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import skew, kurtosis, zscore
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score


## Data Integration

The project data is split into multiple CSV files (weather data) and JSON files (electricity demand data). The following code merges these files into single CSV files for further processing.

## Data Preprocessing and Cleaning

In this step, we:
- Drop columns with more than 50% missing values.
- Fill missing values using median imputation for numeric columns and mode imputation for categorical columns.
- Convert the 'period' column to a datetime format and extract features such as hour, day, month, etc.
- Remove duplicate rows.


## Exploratory Data Analysis (EDA)

Below are some visualizations and statistical summaries to understand the distribution, relationships, and trends in the data.


## Regression Modeling

We use a linear regression model to predict electricity demand based on time-based features extracted from the 'period' column. The dataset is split into training and testing sets, and the model's performance is evaluated using Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R² score.


## Final Submission

The final deliverables for this project are:
- **final_cleaned_data.csv:** The cleaned and processed dataset.
- **project.ipynb:** This Jupyter Notebook documenting the entire data processing, analysis, and modeling workflow.
