<a href="https://colab.research.google.com/github/Manish927/EDA-Data-Science/blob/feat/analyze_household_peak_usage.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

You are working as an energy analyst in a smart city initiative. You are provided with hourly electricity consumption data for 10 households (observations) over a 24-hour period. Each household’s usage is recorded as 24 hourly readings (features from 'Hour_0' to 'Hour_23'), representing electricity consumed in kilowatt-hours (kWh) in those hours.

Implement a Python function analyze_household_peak_usage(household_id) to analyse the data for a specific household and identify:
• Which part of the day had the highest average energy consumption
• Which specific hour(s) in that segment had consumption above the segment's average

The 24-hour day is segmented as follows:
• Column indices 0 to 5 of the data (12 AM to 5:59 AM) -> late night or early morning
• Column indices 6 to 11 of the data (6 AM to 11:59 AM) -> morning
• Column indices 12 to 17 of the data (12 PM to 5:59 PM) -> afternoon
• Column indices 18 to 23 of the data (6 PM to 11:59 PM) -> evening or night

Dataset Description
• 'Household_ID': Unique identifier for a household (str)
• 'Hour_0', 'Hour_1', ..., 'Hour_23': Hourly electricity consumption values in kWh (int)

Input Format
• The household ID (str)

Output Format
• A dictionary with the following keys and values
• A key 'Peak Segment' with one of the following values suitably populated: 'Late Night/Early Morning', 'Morning', 'Afternoon', 'Evening/Night'
• A key 'High Usage Hours' whose value is a list of hours (int) extracted from the string column names of the hours, for instance, 'Hour_13' would be 13

Constraints
• Input household ID must be in the range of 'Household_1' to 'Household_10' (both inclusive)
• There are no corrupt or missing values or duplicate observations in the data
• All values other than the household ID are numeric
• Household IDs are strings
• Dataset columns are consistently named from 'Hour_0' to 'Hour_23'
• No reordering or rearrangement of rows or columns is done

Testcases
Testcase 1
Input
Household_3

Expected Output
{'Peak Segment': 'Evening/Night', 'High Usage Hours': [18, 19, 20, 22]}

Testcase 2
Input
Household_9

Expected Output
{'Peak Segment': 'Late Night/Early Morning', 'High Usage Hours': [0, 2, 4, 5]}



DATA:
Household_ID,Hour_0,Hour_1,Hour_2,Hour_3,Hour_4,Hour_5,Hour_6,Hour_7,Hour_8,Hour_9,Hour_10,Hour_11,Hour_12,Hour_13,Hour_14,Hour_15,Hour_16,Hour_17,Hour_18,Hour_19,Hour_20,Hour_21,Hour_22,Hour_23
Household_1,7,4,8,5,7,3,7,8,5,4,8,8,3,6,5,2,8,6,2,5,1,6,9,1
Household_2,3,7,4,9,3,5,3,7,5,9,7,2,4,9,2,9,5,2,4,7,8,3,1,4
Household_3,2,8,4,2,6,6,4,6,2,2,4,8,7,9,8,5,2,5,8,9,9,1,9,7
Household_4,9,8,1,8,8,3,1,8,3,3,1,5,7,9,7,9,8,2,1,7,7,8,5,3
Household_5,8,6,3,1,3,5,3,1,5,7,7,9,3,7,1,4,4,5,7,7,4,7,3,6
Household_6,2,9,5,6,4,7,9,7,1,1,9,9,4,9,3,7,6,8,9,5,1,3,8,6
Household_7,8,9,4,1,1,4,7,2,3,1,5,1,8,1,1,2,2,6,7,5,1,1,3,2
Household_8,5,6,7,4,7,8,1,6,8,5,4,2,6,6,1,9,6,3,4,4,3,3,3,4
Household_9,7,4,9,1,8,7,2,8,1,9,9,2,7,3,7,9,4,1,2,1,5,5,7,9
Household_10,9,3,3,3,4,8,6,8,1,8,4,1,8,4,6,8,4,3,9,3,9,2,2,2


In [4]:
# Loading the data (do not edit)
import pandas as pd
filename = 'https://d3ejq4mxgimsmf.cloudfront.net/hourly_energy_usage-9eb4bd3748964f8da8d95b54df732b1d.csv'
df = pd.read_csv(filename)

def analyze_household_peak_usage(household_id):
    # Code here
    # Select the row from the given household
    row = df[df['Household_ID'] == household_id].iloc[0]

    #define the segments
    segments = {
        'Late Night/Early Morning': list(range(0, 6)),
        'Morning': list(range(6, 12)),
        'Afternoon': list(range(12, 18)),
        'Evening/Night': list(range(18, 24))
    }

    # calculate mean usage for each segment
    segment_means = {}
    for segment, hours in segments.items():
        values = [row[f'Hour_{h}'] for h in hours]
        segment_means[segment] = sum(values) / len(values)

    # Identify peak segment
    peak_segment = max(segment_means, key=segment_means.get)
    peak_avg = segment_means[peak_segment]

    # Identify high usage hours in peak segment
    high_usage_hours = [
        h for h in segments[peak_segment]
        if row[f'Hour_{h}'] > peak_avg
    ]

    return {'Peak Segment': peak_segment,
            'High Usage Hours': high_usage_hours}


# Input and output processing (do not edit)
print(analyze_household_peak_usage(input()))

Household_3
{'Peak Segment': 'Evening/Night', 'High Usage Hours': [18, 19, 20, 22]}
