
# **Energy Consumption Data Analysis**

## Description
Using advanced data analysis methods to examine energy consumption patterns, identify optimization opportunities, and recommend strategies to improve energy efficiency.

## Interest
Contributing to the promotion of energy efficiency and the reduction of energy costs for businesses and households.

## Motivation
Energy efficiency is crucial both economically and environmentally, allowing for significant savings and a reduced carbon footprint.

## Tools
Excellent command of Python, R, Pandas, Matplotlib, and Tableau for data visualization and communicating results to stakeholders.


In [1]:
import pandas as pd
import os
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
import requests


### Loading the energy consumption data

To fetch data directly from Kaggle using their API, we need to follow a different approach than simply using requests.get with a URL. Kaggle requires authentication and uses its API to access datasets programmatically. 

1. Install Kaggle API:

First, we need to install the Kaggle API client using pip:

In [24]:

pip install kaggle


Collecting kaggle
  Downloading kaggle-1.6.14.tar.gz (82 kB)
     ---------------------------------------- 0.0/82.1 kB ? eta -:--:--
     --------- ---------------------------- 20.5/82.1 kB 682.7 kB/s eta 0:00:01
     -------------- ----------------------- 30.7/82.1 kB 435.7 kB/s eta 0:00:01
     ------------------ ------------------- 41.0/82.1 kB 326.8 kB/s eta 0:00:01
     ----------------------- -------------- 51.2/82.1 kB 327.7 kB/s eta 0:00:01
     ----------------------- -------------- 51.2/82.1 kB 327.7 kB/s eta 0:00:01
     ---------------------------- --------- 61.4/82.1 kB 218.8 kB/s eta 0:00:01
     -------------------------------------- 82.1/82.1 kB 271.2 kB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting certifi>=2023.7.22 (from kaggle)
  Obtaining dependency information for certifi>=2023.7.22 from https://files.pythonhosted.org/packages/5b/11/1e78951465b4a225519b8c3ad29769c49e0d8d157a070f681d5b

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
transformers 2.1.1 requires sentencepiece, which is not installed.

[notice] A new release of pip is available: 23.2.1 -> 24.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


***Note: Ensure you have set up the necessary credentials (kaggle.json file) and Place your kaggle.json file in the ~/.kaggle/ directory.***

In [None]:
dataset_url = "https://www.kaggle.com/datasets/pranjalverma08/energy-dataset-countrywise-19002021?select="


In [13]:
# Ask for the country name
country = input("Please enter the country name: ")

# Define the dataset URL and file name
dataset_url = "https://www.kaggle.com/datasets/pranjalverma08/energy-dataset-countrywise-19002021?select="
file_name = country + "_energy_data.csv"

# Use Kaggle API to download the dataset
os.system(f'https://www.kaggle.com/datasets/pranjalverma08/energy-dataset-countrywise-19002021 -f {file_name} --unzip')

# Load the dataset into a pandas DataFrame
energy_data = pd.read_csv(file_name)

# Display the first few rows of the dataset
print(energy_data.head())


Please enter the country name: Algeria


ParserError: Error tokenizing data. C error: Expected 1 fields in line 9, saw 2


In [6]:
type(file_name)

str

### Exploratory Data Analysis (EDA)

In [None]:

# Summary statistics
energy_data.describe()


In [None]:

# Checking for missing values
energy_data.isnull().sum()


In [None]:

# Visualizing energy consumption over time
plt.figure(figsize=(10,6))
plt.plot(energy_data['date'], energy_data['consumption'])
plt.title('Energy Consumption Over Time')
plt.xlabel('Date')
plt.ylabel('Energy Consumption')
plt.show()


### Advanced Analysis

In [None]:

# Identifying consumption patterns using clustering
from sklearn.cluster import KMeans

# Assuming the data has been preprocessed and relevant features selected
features = energy_data[['feature1', 'feature2', 'feature3']]

# Applying KMeans clustering
kmeans = KMeans(n_clusters=3)
kmeans.fit(features)
energy_data['Cluster'] = kmeans.labels_

# Visualizing the clusters
plt.figure(figsize=(10,6))
sns.scatterplot(x='feature1', y='feature2', hue='Cluster', data=energy_data, palette='viridis')
plt.title('Energy Consumption Clusters')
plt.show()
