# Individual Household Electric Power Consumption
The dataset contains measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years. 

Different electrical quantities and some sub-metering values are available.

### Variable Information

1.date: Date in format dd/mm/yyyy

2.time: time in format hh:mm:ss

3.global_active_power: household global minute-averaged active power (in kilowatt)

4.global_reactive_power: household global minute-averaged reactive power (in kilowatt)

5.voltage: minute-averaged voltage (in volt)

6.global_intensity: household global minute-averaged current intensity (in ampere)

7.sub_metering_1: energy sub-metering 
- No. 1 (in watt-hour of active energy). 
It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave (hot plates are not electric but gas powered).

8.sub_metering_2: energy sub-metering 
- No. 2 (in watt-hour of active energy). It corresponds to the laundry room, containing a washing-machine, a tumble-drier, a refrigerator and a light.

9.sub_metering_3: energy sub-metering 
- No. 3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air-conditioner.

# Import the necesary libraries
We will import the necessary liraries for this task

In [2]:
import pandas as pd
import numpy as np
import matplotlib as plt


## Data Cleaning using Python.
In this step, we will:
   - Load the CSV file
   - Explore the dataset
   - Perform data cleaning
   - Displaying the first 30 rows of the cleaned dataset

#### Data loading

The data is a .txt document, which we will convert to csv suing pandas

In [3]:
df = pd.read_csv('household_power_consumption.txt', delimiter=';')

# Save the csv file
df.to_csv('household_power_consumption.csv', index=False)

  df = pd.read_csv('household_power_consumption.txt', delimiter=';')


### Data Exploration
We explore the dataset, by viewing the dataset, describing, etc

In [4]:
df.head()

Unnamed: 0,Date,Time,Global_active_power,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
0,16/12/2006,17:24:00,4.216,0.418,234.84,18.4,0.0,1.0,17.0
1,16/12/2006,17:25:00,5.36,0.436,233.63,23.0,0.0,1.0,16.0
2,16/12/2006,17:26:00,5.374,0.498,233.29,23.0,0.0,2.0,17.0
3,16/12/2006,17:27:00,5.388,0.502,233.74,23.0,0.0,1.0,17.0
4,16/12/2006,17:28:00,3.666,0.528,235.68,15.8,0.0,1.0,17.0


Let's see the columns we have in this dataset

In [5]:
df.columns

Index(['Date', 'Time', 'Global_active_power', 'Global_reactive_power',
       'Voltage', 'Global_intensity', 'Sub_metering_1', 'Sub_metering_2',
       'Sub_metering_3'],
      dtype='object')

##### .Decsribe()
Used for descriptive statistics summarizing the central tendency, dispersion, and shape of a dataset's distribution, excluding NaN values. 

In [6]:
df.describe

<bound method NDFrame.describe of                Date      Time Global_active_power Global_reactive_power  \
0        16/12/2006  17:24:00               4.216                 0.418   
1        16/12/2006  17:25:00               5.360                 0.436   
2        16/12/2006  17:26:00               5.374                 0.498   
3        16/12/2006  17:27:00               5.388                 0.502   
4        16/12/2006  17:28:00               3.666                 0.528   
...             ...       ...                 ...                   ...   
2075254  26/11/2010  20:58:00               0.946                   0.0   
2075255  26/11/2010  20:59:00               0.944                   0.0   
2075256  26/11/2010  21:00:00               0.938                   0.0   
2075257  26/11/2010  21:01:00               0.934                   0.0   
2075258  26/11/2010  21:02:00               0.932                   0.0   

         Voltage Global_intensity Sub_metering_1 Sub_metering_2  

##### Check for null values

In [7]:
df.isnull()

Unnamed: 0,Date,Time,Global_active_power,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
0,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...
2075254,False,False,False,False,False,False,False,False,False
2075255,False,False,False,False,False,False,False,False,False
2075256,False,False,False,False,False,False,False,False,False
2075257,False,False,False,False,False,False,False,False,False


In [8]:
df.dtypes

Date                      object
Time                      object
Global_active_power       object
Global_reactive_power     object
Voltage                   object
Global_intensity          object
Sub_metering_1            object
Sub_metering_2            object
Sub_metering_3           float64
dtype: object