### Prediction of Electric Power Consumption in an Individual Household


##### import libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

##### Read the data

In [None]:
df = pd.read_csv('../input/electric-power-consumption-data-set/household_power_consumption.txt', sep=';',
                  parse_dates={'dt' : ['Date', 'Time']}, infer_datetime_format=True, 
                 low_memory=False, na_values=['nan','?'],index_col='dt')

### 1. Data include 'nan' and '?' as a string. I converted both to numpy nan in importing stage (above) and treated both of them the same

### 2. I merged two columns 'Date' and 'Time' to 'dt'.

In [None]:
# Check the shape of dataset (No. of rows and No. of Columns)

df.shape

Total there were 2075259 observations or rows and 8 Features or columns, but after converting the Date column as index, number of columns left are 7

In [None]:
# Check top five records of dataframe

df.head()

In [None]:
# Check last five records of dataframe

df.tail()

##### Working with Missing Data

##### The dataset contains some missing values in the measurements (nearly 1.25% of the total rows). 

In [None]:
# Check Missing Values

df.isnull().sum()

There are total 25979 rows which are Null. 

There are multiple ways to handle and fill missing values like mean, median, forward fill, backward fill, scikit learn Imputer methods, etc.

For this problem, we will either use forward fill or backward fill. Reason: The records ahving null values are in between and the power used is recorded at every minute for 3-4 years. ffill() will fill last valid observation in next found Null record

In [None]:
# fill missing values row wise and making the changes permanent in the original dataframe

df.ffill(axis=0,inplace=True)

In [None]:
# Cross check whether all missing values are filled

df.isnull().sum()

##### Analysis
1. Weekly
2. Monthly
3. Quarterly
4. Yearly

##### Sub Datasets
1. Power Consumption 
2. Sub metering
3. Global Reactive, Global Active and Global Intensity

In [None]:
# Creating Target Variable

eq1 = (df['Global_active_power']*1000/60) 
eq2 = df['Sub_metering_1'] + df['Sub_metering_2'] + df['Sub_metering_3']
df['power_consumption'] = eq1 - eq2
df.head()

### Creating two more columns from index, Date and Time Column Separately
##### With the help of this new column 'Date', it will be easier to do grouping on the data which willl ease the work of Visualization for better understanding on Data

In [None]:
df['Date'] = df.index.date
df['time'] = df.index.time

In [None]:
# Converting Date Datatype form object to datetime

df['Date'] = pd.to_datetime(df['Date'])

In [None]:
# Checking the data types of all columns

df.info()

##### From 2006-12-16 to 2006-12-31 > 16 Days
##### From 2007-01-01 to 2007-12-31 > 365 Days
##### From 2008-01-01 to 2008-12-31 > 366 Days
##### From 2009-01-01 to 2009-12-31 > 365 Days
##### From 2010-01-01 to 2010-11-26 > 330 Days

##### Total Days: 1442 days

As we have only 16 records for 2006 year, which may deceive our analysis. As we will analyse the data yearly and this 16 records will not help us understand the data flow for the year 2006. We will remove those 16 records of 2006 and move ahead for analysis of remaining four year data.

Reason of unavailabilty of data for 2006 could be, data collection might have started for analysis if from 16th december 2006 or else the previous data might have lost due to some serious problem or due to unavailabilty of proper required data for analysis like missing of some features or so.

In [None]:
# filter out 2006 data, only keep data post 2006
df = df[df.index.year>2006]

In [None]:
# Printing first five records of dataframe
df.head()

In [None]:
# printing No. of rows and No. of columns
df.shape

### We will create sub datasets from original dataset
##### As we have data for each minute for each day, we will group the data day-wise, so we will get dataset for per day (where all each minute data is grouped for same date)

In [None]:
# Grouping the entire data by Date

df_data = df.groupby(['Date']).sum()

In [None]:
# Check whether the data is grouped day-wise

df_data.head()

In [None]:
# check No. of rows and No. of columns

df_data.shape

### Create all three Sub-DataFrame from original dataframe.

##### Power_consumption: It represents the active energy consumed every day (in watt hour) in the household by electrical equipment not measured in sub-meterings 1, 2 and 3.


In [None]:
# Creating sub-dataframe of power consumption (measured in watt-hour)

##### Every 1 Watt-hour = 0.001 Kilowatt-hour. Example: 25000 Watt-hour = 25000 multiplied by 0.001 = 25 Kilowatt-hour.

df_power_consumption = df_data[['power_consumption']]

In [None]:
# Check top five records

df_power_consumption.head()

##### sub_metering_1:  It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave (hot plates are not electric but gas powered).

##### sub_metering_2: It corresponds to the laundry room, containing a washing-machine, a tumble-drier, a refrigerator and a light.

##### sub_metering_2:  It corresponds to an electric water-heater and an air-conditioner.

In [None]:
# Creating sub-dataframe of sub-metering 1, sub-metering 2, and sub-metering 3 (measured in watt-hour)

df_sub_meterings = df_data[['Sub_metering_1','Sub_metering_2','Sub_metering_3']]

In [None]:
# Check first five records

df_sub_meterings.head()

In [None]:
# Creating sub-dataframe of Global_active_power, Global_reactive_power, and Global_intensity 
# (Global_active_power and Global_reactive_power measured in kilowatt whereas, Global_intensity measured in Ampere)
# kilowatt = (ampere * volt) / 1000


df_active_reactive = df_data[['Global_active_power','Global_reactive_power','Global_intensity']]

In [None]:
# Check first five records

df_active_reactive.head()

### Analysis of Power Consumption Yearly

In [None]:
# Checking Statistical summary of power consumption yearly

df_power_consumption.groupby(df_power_consumption.index.year).describe()

### Observation: 
1. for 2006, there are only 16 records, where are for other years there are 300+ records, which describes slightly imbalance in dataset, due to which values are bit disturbed compared to other years.

### We will visualize the power consumption column using bar chart

##### Four types of aggregation (Sum, Max, Min, Mean) each for Weekly, Monthly, Quarterly and Yearly Aanalysis.

In [None]:
# Yearly - Total watt-hour for Power Consumption

df_power_consumption.groupby(df_power_consumption.index.year).sum().plot(kind="bar",xlabel='Year',ylabel='Readings in watt-hour',title="Yearly - Total watt-hour for Power Consumption", figsize=(16,6))

In [None]:
# Yearly - Maximum watt-hour for Power Consumption

df_power_consumption.groupby(df_power_consumption.index.year).max().plot(kind="bar",xlabel='Year',ylabel='Readings in watt-hour',title="Yearly - Maximum watt-hour for Power Consumption", figsize=(16,6))

In [None]:
# Yearly - Minimum watt-hour for Power Consumption

df_power_consumption.groupby(df_power_consumption.index.year).min().plot(kind="bar",xlabel='Year',ylabel='Readings in watt-hour',title="Yearly - Minimum watt-hour for Power Consumption", figsize=(16,6))

In [None]:
# Yearly - Average watt-hour for Power Consumption

df_power_consumption.groupby(df_power_consumption.index.year).mean().plot(kind="bar",xlabel='Year',ylabel='Readings in watt-hour',title="Yearly - Average watt-hour for Power Consumption", figsize=(16,6))

###Observation for Yearly Power Consumption (in watt hour)

(Ignoring 2006 year)

1. Total power consumption range is 4000000 - ~5500000 watt hour. i.e., 4000 - ~5500 kWH
2. Maximum Power consumption was done in year 2007
3. Minimum power cosumption was done in year 2010
4. Avearge power consumption is almost same across all years, range is ~12000 WH to ~ 14000 WH *italicized text*

In [None]:
# Checking Statistical summary of power consumption monthly

df_power_consumption.groupby(df_power_consumption.index.month).describe()

In [None]:
# Monthly - Total watt-hour for Power Consumption

df_power_consumption.groupby(df_power_consumption.index.month).sum().plot(kind="bar",xlabel='Month',ylabel='Readings in watt-hour',title="Monthly - Total watt-hour for Power Consumption", figsize=(16,6))

In [None]:
# Monthly - Average watt-hour for Power Consumption

df_power_consumption.groupby(df_power_consumption.index.month).mean().plot(kind="bar",xlabel='Month',ylabel='Readings in watt-hour',title="Monthly - Average watt-hour for Power Consumption", figsize=(16,6))

In [None]:
# Monthly - Minimum watt-hour for Power Consumption

df_power_consumption.groupby(df_power_consumption.index.month).min().plot(kind="bar",xlabel='Month',ylabel='Readings in watt-hour',title="Monthly - Minimum watt-hour for Power Consumption", figsize=(16,6))

In [None]:
# Monthly - Maximum watt-hour for Power Consumption

df_power_consumption.groupby(df_power_consumption.index.month).max().plot(kind="bar",xlabel='Month',ylabel='Readings in watt-hour',title="Monthly - Maximum watt-hour for Power Consumption", figsize=(16,6))

###Observation for Monthly Power Consumption (in watt hour)


1. Total power consumption range is 8000000 - ~ 2300000 watt hour. i.e., 800 - ~2300 kWH
2. Minumum Power consumption is done in september month across all years around 2000 WH
3. Maximum power cosumption is done in December month across all years around ~ 61000 WH
4. Avearge power consumption for all 12 months, range is ~7064 WH to ~ 20000 WH

We can see clearly, that less power is consumed in June, July, August, September, whereas more power is consumed during December, January, February.

If we look at Monthly Average Graph, we can see the power consumption starts decreasing almost after March month till August - September post which the power consumption increases gradually.

In [None]:
# Checking Statistical summary of power consumption quarterly

df_power_consumption.groupby(df_power_consumption.index.quarter).describe()

In [None]:
# Quarterly - Total watt-hour for Power Consumption

df_power_consumption.groupby(df_power_consumption.index.quarter).sum().plot(kind="bar",xlabel='Quarters',ylabel='Readings in watt-hour',title="Quarterly - Total watt-hour for Power Consumption", figsize=(16,6))

In [None]:
# Quarterly - Maximum watt-hour for Power Consumption

df_power_consumption.groupby(df_power_consumption.index.quarter).max().plot(kind="bar",xlabel='Quarters',ylabel='Readings in watt-hour',title="Quarterly - Maximum watt-hour for Power Consumption", figsize=(16,6))

In [None]:
# Quarterly - Minimum watt-hour for Power Consumption

df_power_consumption.groupby(df_power_consumption.index.quarter).min().plot(kind="bar",xlabel='Quarters',ylabel='Readings in watt-hour',title="Quarterly - Minimum watt-hour for Power Consumption", figsize=(16,6))

In [None]:
# Quarterly - Average watt-hour for Power Consumption

df_power_consumption.groupby(df_power_consumption.index.quarter).mean().plot(kind="bar",xlabel='Quarters',ylabel='Readings in watt-hour',title="Quarterly - Average watt-hour for Power Consumption", figsize=(16,6))

###Observation for Quarterly Power Consumption (in watt hour)

1. Minumum Power consumption is done in Third Quarter across all years around 2100 WH
2. Maximum power cosumption is done in Fourth Quarter across all years around ~ 6100 WH
3. Avearge power consumption for Second and third Quarter is comparatively less than First and fourth Quarter, which proves our above observation.

In [None]:
# Checking Statistical summary of power consumption weekly

df_power_consumption.groupby(df_power_consumption.index.week).describe()

In [None]:
# Weekly - Total watt-hour for Power Consumption

df_power_consumption.groupby(df_power_consumption.index.week).sum().plot(kind="bar",xlabel='Week',ylabel='Readings in watt-hour',title="Weekly - Total watt-hour for Power Consumption", figsize=(16,6))

In [None]:
# Weekly - Maximum watt-hour for Power Consumption

df_power_consumption.groupby(df_power_consumption.index.week).max().plot(kind="bar",xlabel='Week',ylabel='Readings in watt-hour',title="Weekly - Maximum watt-hour for Power Consumption", figsize=(16,6))

In [None]:
# Weekly - Minimum watt-hour for Power Consumption

df_power_consumption.groupby(df_power_consumption.index.week).min().plot(kind="bar",xlabel='Week',ylabel='Readings in watt-hour',title="Weekly - Minimum watt-hour for Power Consumption", figsize=(16,6))

In [None]:
# Weekly - Average watt-hour for Power Consumption

df_power_consumption.groupby(df_power_consumption.index.week).mean().plot(kind="bar",xlabel='Week',ylabel='Readings in watt-hour',title="Weekly - Average watt-hour for Power Consumption", figsize=(16,6))

In [None]:
# analysis of week 9 for all 4 years
data = df_power_consumption[df_power_consumption.index.week==9]
print(data.groupby(data.index.year).mean())

In [None]:
# analysis of week 8 for all 4 years
data_2 = df_power_consumption[df_power_consumption.index.week==8]
print(data.groupby(data.index.year).mean())

In [None]:
data.index

In [None]:
# creating separate sub dataframes of week 9 for each year

week9_2007 = df_data[(df_data.index >= '2007-02-26') & (df_data.index <= '2007-03-04')]
week9_2008 = df_data[(df_data.index >= '2008-02-25') & (df_data.index <= '2008-03-02')]
week9_2009 = df_data[(df_data.index >= '2009-02-23') & (df_data.index <= '2009-03-01')]
week9_2010 = df_data[(df_data.index >= '2010-03-01') & (df_data.index <= '2010-03-07')]

In [None]:
# Check entire data of week 9 of year 2007

week9_2007

In [None]:
# # Check entire data of week 9 of year 2009

week9_2009

In [None]:
# # Check entire data of week 9 of year 2008

week9_2008

In [None]:
# check statistical summary of week 9 for year 2007

week9_2007.describe()

In [None]:
# check statistical summary of week 9 for year 2009

week9_2009.describe()

In [None]:
week9_2007.sum()

In [None]:
week9_2007.mean()

In [None]:
# check sentire data of week 9 for year 2010

week9_2010

## Observation for Week 9

1. In week 9 for year 2007 and 2008, from above statistical summary we can see and note the inference as the readings of Global Active power is less than half of the readings of Global Active power for year 2009 and 2010, similarly the figures are 0s for sub metering 1 for week 9 of year 2007 and 2008 whereas, most of the readings are available for year 2009 and 2010, also there is noticeable change in the readings of sub metering 3 for year 2007 and 2008 as compared to year 2009 and 2010.

2. There could be few assumptions behind this figures, as from the graph it is noticeable that power consumption is less for week 9. 

- First assumption could be the data collected wouldn't be correct for that week or some campaign in some region to save electricity as much as possible to see how much power can be saved within a week

- Second assumption could be, some tragic incident might happen with the powerhouse of the region, due to which the power might be fluctuating many times in a day.

- Third assumption can be, few members of families go out for a couple of days

- Fourth assumption is, if in case the collected data is from cold region and suddenly for a week the weather or temperature changes which might result is less consumption of energy

###Observation for Weekly Power Consumption (in watt hour)

1. Minumum Power consumption is done in between 28th to 35th week
2. Maximum power cosumption is done in almost first 12 weeks and in last 50th to 52th week
3. Avearge power consumption for is comparatively less in from 20th week to almost 38th week as compared to other weeks.

#### We completed Analysis for Power Consumption dataset

-------------------------------------------------------------------------------------------


## Let's start with Sub-metering analysis for Yearly, Quarterly, Monthly and Weekly

In [None]:
# Yearly - Maximum watt-hour for sub_meterings

df_sub_meterings.groupby(df_sub_meterings.index.year).max().plot(kind="bar",xlabel='Year',ylabel='Readings in watt-hour',title="Yearly - Maximum watt-hour for sub_meterings",figsize=(16,6))

In [None]:
# Yearly - Average watt-hour for sub_meterings

df_sub_meterings.groupby(df_sub_meterings.index.year).mean().plot(kind="bar",xlabel='Year',ylabel='Readings in watt-hour',title="Yearly - Average watt-hour for sub_meterings",figsize=(16,6))

###Observation for Yearly Sub-Meterings: 1,2,3 (in watt hour)

(Ignoring year 2006)

1. Maximum enerygy is taken by submetering 3 in all years followed by submetering 2 and then submetering 3
2. consumption of energy is increasing every year by submetering 3 (comprises of AC and water heater)
3. After submetering 3, it is followed by submetering 2 (comprises of  laundry room, containing a washing-machine, a tumble-drier, a refrigerator and a light.), followed by submetering 1 (comprises of  kitchen, containing mainly a dishwasher, an oven and a microwave)

Reasons could be understood for sub metering 3 might be increasing Global Warming, increased humidity or so, families with more percentage of teenagers, youth or toddlers at home makes submetering 2 to consume energy, for sub-metring 1 not more families might be having kitchen applicances working on electricity.

In [None]:
# Monthly - Total watt-hour for sub_meterings

df_sub_meterings.groupby(df_sub_meterings.index.month).sum().plot(kind="bar",xlabel='Month',ylabel='Readings in watt-hour',title="Monthly - Total watt-hour for sub_meterings",figsize=(16,6))

In [None]:
# Monthly - Maximum watt-hour for sub_meterings

df_sub_meterings.groupby(df_sub_meterings.index.month).max().plot(kind="bar",xlabel='Month',ylabel='Readings in watt-hour',title="Monthly - Maximum watt-hour for sub_meterings",figsize=(16,6))

In [None]:
# Monthly - Average watt-hour for sub_meterings

df_sub_meterings.groupby(df_sub_meterings.index.month).mean().plot(kind="bar",xlabel='Month',ylabel='Readings in watt-hour',title="Monthly - Average watt-hour for sub_meterings",figsize=(16,6))

###Observation for Monthly Sub-Meterings: 1,2,3 (in watt hour)

1. Consumption of sub-metering 3 enerygy is comparatively low in july  and august month
2. Sub-metering 1 and 2 consumes almost same amount of energy on average as compared to submetering 3...it increases from around september till january february month

In [None]:
# Quarterly - Total watt-hour for sub-meterings

df_sub_meterings.groupby(df_sub_meterings.index.quarter).sum().plot(kind="bar",xlabel='Quarter',ylabel='Readings in watt-hour',figsize=(16,6),title="Quarterly - Total watt-hour for sub-meterings")

## total 4000 kilo watt consumed by submetering 3 quarterly 4.0 * le6 = 4000000 * 0.001

In [None]:
# Quarterly - Maximum watt-hour for sub-meterings

df_sub_meterings.groupby(df_sub_meterings.index.quarter).max().plot(kind="bar",xlabel='Quarter',ylabel='Readings in watt-hour',figsize=(16,6),title="Quarterly - Maximum watt-hour for sub-meterings")

In [None]:
# Quarterly - Average watt-hour for sub-meterings

df_sub_meterings.groupby(df_sub_meterings.index.quarter).mean().plot(kind="bar",xlabel='Quarter',ylabel='Readings in watt-hour',figsize=(16,6),title="Quarterly - Average watt-hour for sub-meterings")

Observation for Quarterly Sub-Meterings: 1,2,3 (in watt hour)

1. Consumption of submetering 1 and 2 is almost same in Second Quarter, whereas submetering 3 consumes less energy in 2nd and 3rd quarter as compared to 1st and 4th quarter.
2. Average consumption of submetering 1,2 and 3 is almost same in quarter 1st and 4th quarter.

In [None]:
# Weekly - Total watt-hour for sub-meterings

df_sub_meterings.groupby(df_sub_meterings.index.week).sum().plot(kind="bar",xlabel='Week',ylabel='Readings in watt-hour',figsize=(16,6),title="Weekly - Total watt-hour for sub-meterings")

In [None]:
# Weekly - Maximum watt-hour for sub-meterings

df_sub_meterings.groupby(df_sub_meterings.index.week).max().plot(kind="bar",xlabel='Week',ylabel='Readings in watt-hour',figsize=(16,6),title="Weekly - Maximum watt-hour for sub-meterings")

In [None]:
# Weekly - Minimum watt-hour for sub-meterings

df_sub_meterings.groupby(df_sub_meterings.index.week).min().plot(kind="bar",xlabel='Week',ylabel='Readings in watt-hour',figsize=(16,6),title="Weekly - Minimum watt-hour for sub-meterings")

In [None]:
# Weekly - Average watt-hour for sub-meterings

df_sub_meterings.groupby(df_sub_meterings.index.week).mean().plot(kind="bar",xlabel='Week',ylabel='Readings in watt-hour',figsize=(16,6),title="Weekly - Average watt-hour for sub-meterings")

###Observation for Weekly Sub-Meterings: 1,2,3 (in watt hour)

Consumption of energy for all submetering on average remains high im almost all weeks except weeks betwwen 28th to 33rd


We completed Analysis for Sub-Meterings dataset

------------------------------------------------------------------------------


### Let's start with Global_active_power, Global_reactive_power and Global Intensity analysis for Yearly, Quarterly, Monthly and Weekly

(Global_active_power and Global_reactive_power measured in kilowatt whereas, Global_intensity measured in Ampere)

kilowatt = (ampere * volt) / 1000

In [None]:
# Yearly - Total Kilowatt-hour for Global Active_Reactive_Intensity

df_active_reactive.groupby(df_active_reactive.index.year).sum().plot(kind="bar",xlabel='Year',ylabel='Readings in kilowatt-hour',figsize=(16,6),title="Yearly - Total Kilowatt-hour for Global Active_Reactive_Intensity")

In [None]:
# Yearly - Maximum Kilowatt-hour for active_reactive

df_active_reactive.groupby(df_active_reactive.index.year).max().plot(kind="bar",xlabel='Year',ylabel='Readings in kilowatt-hour',figsize=(16,6),title="Yearly - Maximum Kilowatt-hour for active_reactive")

In [None]:
# Yearly - Minimum Kilowatt-hour for active_reactive

df_active_reactive.groupby(df_active_reactive.index.year).min().plot(kind="bar",xlabel='Year',ylabel='Readings in kilowatt-hour',figsize=(16,6),title="Yearly - Minimum Kilowatt-hour for active_reactive")

In [None]:
# Yearly - Average Kilowatt-hour for active_reactive

df_active_reactive.groupby(df_active_reactive.index.year).mean().plot(kind="bar",xlabel='Year',ylabel='Readings in kilowatt-hour',figsize=(16,6),title="Yearly - Average Kilowatt-hour for active_reactive")

###Observation for Yearly Global Reactive Power, Global Reactive Power and Global Intensity (in kilowatt hour)

(Ignoring year 2006)

1. From the Yearly - Total graph, we can say that the ratio of Global Active and Global Reactive power is about 6:1, that means for every 6 kWH of Global Active Power , there is about 1 kWH of Global Reactive Power

2. From the Yearly - Maximum graph, in the year 2007, maximum real power consumption i.e. the power consumed by electrical appliances other than the sub metered appliances.

3. From the Yearly - Minimum graph, in the year 2010, minimum real power consumption i.e. the power consumed by electrical appliances other than the sub metered appliances.

But, according to me, there was less data for more than a month at the end of year 2010. If the data would have been available then it seems the consumption of real power would be nearly same as year 2008 or else it could be same as year 2009. 

4. From the Yearly - Average graph, average real power consumption i.e. the power consumed by electrical appliances other than the sub metered appliances is nearly same across all years.

In [None]:
# Month - Total Kilowatt-hour for active_reactive

df_active_reactive.groupby(df_active_reactive.index.month).sum().plot(kind="bar",xlabel='Month',ylabel='Readings in kilowatt-hour',figsize=(16,6),title="Month - Total Kilowatt-hour for active_reactive")


In [None]:
# Month - Maximum Kilowatt-hour for active_reactive

df_active_reactive.groupby(df_active_reactive.index.month).max().plot(kind="bar",xlabel='Month',ylabel='Readings in kilowatt-hour',figsize=(16,6),title="Month - Maximum Kilowatt-hour for active_reactive")


In [None]:
# Month - Minimum Kilowatt-hour for active_reactive

df_active_reactive.groupby(df_active_reactive.index.month).min().plot(kind="bar",xlabel='Month',ylabel='Readings in kilowatt-hour',figsize=(16,6),title="Month - Minimum Kilowatt-hour for active_reactive")

In [None]:
# Month - Average Kilowatt-hour for active_reactive

df_active_reactive.groupby(df_active_reactive.index.month).mean().plot(kind="bar",xlabel='Month',ylabel='Readings in kilowatt-hour',figsize=(16,6),title="Month - Average Kilowatt-hour for active_reactive")


### Observation for Monthly Global Reactive Power, Global Reactive Power and Global Intensity (in kilowatt hour)

1. Maximum consumption of real energy is visualised in 12th month (i.e. December)
2. Minimum consumption of real energy is visualised in 9th month (i.e. September)
3. On Average, we can say that the real power is consumed less from june till September, post that the consumption increases for remaining g months

In [None]:
# Quarterly - Total Kilowatt-hour for active_reactive

df_active_reactive.groupby(df_active_reactive.index.quarter).sum().plot(kind="bar",xlabel='Quarter',ylabel='Readings in kilowatt-hour',figsize=(16,6),title="Quarterly - Total Kilowatt-hour for active_reactive")

In [None]:
# Quarterly - Maximum Kilowatt-hour for active_reactive

df_active_reactive.groupby(df_active_reactive.index.quarter).max().plot(kind="bar",xlabel='Quarter',ylabel='Readings in kilowatt-hour',figsize=(16,6),title="Quarterly - Maximum Kilowatt-hour for active_reactive")

In [None]:
# Quarterly - Minimum watt-hour for active_reactive

df_active_reactive.groupby(df_active_reactive.index.quarter).min().plot(kind="bar",xlabel='Quarter',ylabel='Readings in kilowatt-hour',figsize=(16,6),title="Quarterly - Minimum watt-hour for active_reactive")

In [None]:
# Quarterly - Average watt-hour for active_reactive

df_active_reactive.groupby(df_active_reactive.index.quarter).mean().plot(kind="bar",xlabel='Quarter',ylabel='Readings in kilowatt-hour',figsize=(16,6),title="Quarterly - Average watt-hour for active_reactive")

### Observation for Quarterly Global Reactive Power, Global Reactive Power and Global Intensity (in kilowatt hour)

1. Consumption of Global Reactive Power, Global Reactive Power and Global Intensity is almost same in First Quarter and Fourth Quarter, whereas Global Reactive Power, Global Reactive Power and Global Intensity consumes less energy in 2nd and 3rd quarter as compared to 1st and 4th quarter.

We can conclude from above visualizations, increase or decrease in Global Reactive Power there is drastic change in Global Intensity.

In [None]:
# Weekly - Total kilowatt-hour for active_reactive

df_active_reactive.groupby(df_active_reactive.index.week).sum().plot(kind="bar",xlabel='Week',ylabel='Readings in kilowatt-hour',figsize=(16,6),title="Weekly - Total kilowatt-hour for active_reactive")

In [None]:
# Weekly - Maximum kilowatt-hour for active_reactive

df_active_reactive.groupby(df_active_reactive.index.week).max().plot(kind="bar",xlabel='Week',ylabel='Readings in kilowatt-hour',figsize=(16,6),title="Weekly - Maximum kilowatt-hour for active_reactive")

In [None]:
# Weekly - Minimum kilowatt-hour for active_reactive

df_active_reactive.groupby(df_active_reactive.index.week).min().plot(kind="bar",xlabel='Week',ylabel='Readings in kilowatt-hour',figsize=(16,6),title="Weekly - Minimum kilowatt-hour for active_reactive")

In [None]:
# Weekly - Average kilowatt-hour for active_reactive

df_active_reactive.groupby(df_active_reactive.index.week).mean().plot(kind="bar",xlabel='Week',ylabel='Readings in kilowatt-hour',figsize=(16,6),title="Weekly - Average kilowatt-hour for active_reactive")

###Observation for Weekly Global Reactive Power, Global Reactive Power and Global Intensity (in kilowatt hour)

Consumption of energy for Weekly Global Reactive Power, Global Reactive Power and Global Intensity on average remains high in almost all weeks except weeks between 28th to 33rd

#### Summary
1. From the above all plots, we cans ee that the data is seasonal data, non-stationary
2. Consumption of energy is maximum mostly in 1st and 4th quarter, whereas the consumption of energy starts reducing from second quarter and decreases till 3rd quarter.
3. May be due to winter or Summers, consumption of power because of water heater or AC increases or due to festival celebrations at home with relatives and family members, whereas during rainy season, there is possibility of power cut due to heavy rain falls, less use of AC, some family might prefer outing in monsoon or so.


In [None]:
# Pair plot to see the relationship between variables in dataset

sns.pairplot(data=df_sub_meterings,kind="scatter")

#### Summary
1. We can see that the Submetering 1 and 2 data is left skewed. most values are clustered around the left tail of the distribution while the right tail of the distribution is longer, whereas the Submetering 3 has normally distributed data
2. There are some positive relation between all the three submeterings

In [None]:
sns.pairplot(data=df_active_reactive,kind="scatter")

#### Summary
1. We can see that the Global Active, Global Reactive power and Global Intensity are almost normally distributed 
2. Global Active power and Global Intensity has positive linear relationship 
3. Global Reactive power does not much relationship with Global Intensity and Global Active power
4. From week 9 analysis, we also understood why there is less consumption of energy for year 2007 and 2008, which will help us to evaluate while forecasting.


- As, we are now able to connect dots and almost found the underlying patterns and trends from the data, we are ready to move towards next step of model Building. Before that we need to check whether the data is stationary or non stationary. 

- If non-stationary convert the data to stationary, check whether it has cyclic or seasonal trend and then we can use this data in our three models ARIMA, SARIMA and SARIMAX.

- After evaluating results, which model is performing better on our data, we'll finalize our model and conclude it.