<a href="https://colab.research.google.com/github/dineshdinz12/prasunethon_extra/blob/main/ElectricityPridict.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Explanation:**

**Random Data Generation:**

Using NumPy's np.random.randint() function, random integer values are generated for each type of electricity consumption (domestic, commercial, industry, public, others).
Total Consumption Calculation:

The total consumption is computed by summing up all types of consumption.

**DataFrame Creation:**

All generated data is stored in a pandas DataFrame.

**Output:**

The first few rows of the DataFrame are printed to verify the dataset's structure.
The dataset is saved as a CSV file named electricity_consumption_chennai.csv in the current working directory.

In [None]:
import pandas as pd
import numpy as np

# Define the number of samples (rows) for the dataset
num_samples = 100

# Generate random data for electricity consumption
np.random.seed(0)  # For reproducibility

data = {
    'City Name': ['Chennai'] * num_samples,
    'Consumption_Domestic': np.random.randint(800, 1500, num_samples),
    'Consumption_Commercial': np.random.randint(600, 1000, num_samples),
    'Consumption_Industry': np.random.randint(400, 600, num_samples),
    'Consumption_Public': np.random.randint(200, 400, num_samples),
    'Consumption_Others': np.random.randint(100, 200, num_samples)
}

# Calculate total consumption as the sum of all types
data['Total_Consumption'] = (data['Consumption_Domestic'] +
                             data['Consumption_Commercial'] +
                             data['Consumption_Industry'] +
                             data['Consumption_Public'] +
                             data['Consumption_Others'])

# Create a pandas DataFrame
df = pd.DataFrame(data)

# Display the first few rows of the dataframe
print(df.head())

# Save the dataset to a CSV file
df.to_csv('electricity_consumption_chennai.csv', index=False)
print("Dataset saved as electricity_consumption_chennai.csv")


  City Name  Consumption_Domestic  Consumption_Commercial  \
0   Chennai                  1484                     787   
1   Chennai                  1359                     730   
2   Chennai                  1429                     977   
3   Chennai                   992                     698   
4   Chennai                  1159                     662   

   Consumption_Industry  Consumption_Public  Consumption_Others  \
0                   467                 292                 138   
1                   403                 243                 139   
2                   435                 283                 108   
3                   589                 377                 113   
4                   597                 241                 107   

   Total_Consumption  
0               3168  
1               2874  
2               3232  
3               2769  
4               2766  
Dataset saved as electricity_consumption_chennai.csv


**Step 1:** Preprocess the Data
First, load the dataset and perform necessary preprocessing

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the dataset
df = pd.read_csv('electricity_consumption_chennai.csv')

# Separate features and target variable
X = df[['Consumption_Domestic', 'Consumption_Commercial', 'Consumption_Industry',
        'Consumption_Public', 'Consumption_Others']]
y = df['Total_Consumption']

# Optionally, perform feature scaling if necessary
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Print the first few rows of X and y to verify
print("Features (X):\n", X.head())
print("\nTarget (y):\n", y.head())


Features (X):
    Consumption_Domestic  Consumption_Commercial  Consumption_Industry  \
0                  1484                     787                   467   
1                  1359                     730                   403   
2                  1429                     977                   435   
3                   992                     698                   589   
4                  1159                     662                   597   

   Consumption_Public  Consumption_Others  
0                 292                 138  
1                 243                 139  
2                 283                 108  
3                 377                 113  
4                 241                 107  

Target (y):
 0    3168
1    2874
2    3232
3    2769
4    2766
Name: Total_Consumption, dtype: int64


**Step 2:** Split Data into Training and Testing Sets

Split the dataset into training and testing sets (80% training, 20% testing):

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

print("Training set size:", X_train.shape[0])
print("Testing set size:", X_test.shape[0])


Training set size: 80
Testing set size: 20


**Step 3:** Train a Regression Model

Choose a regression model and train it on the training data:

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Initialize the model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

# Predict on the test set
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error (MSE):", mse)
print("R-squared (R2) Score:", r2)


Mean Squared Error (MSE): 6.203854594147707e-26
R-squared (R2) Score: 1.0


**Step 4:** Output Analysis

Print and analyze the model's performance metrics:

In [None]:
# Print some predictions and actual values for comparison
comparison_df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
print("\nComparison of Actual vs Predicted Values:\n", comparison_df.head(10))



Comparison of Actual vs Predicted Values:
     Actual  Predicted
83    2678     2678.0
53    3142     3142.0
70    3492     3492.0
45    3233     3233.0
44    2719     2719.0
39    2714     2714.0
22    3169     3169.0
80    2375     2375.0
10    3218     3218.0
0     3168     3168.0


In [None]:
from sklearn.model_selection import cross_val_score

# Perform cross-validation
cv_scores = cross_val_score(model, X_scaled, y, cv=5)

print("Cross-Validation Scores:", cv_scores)
print("Mean Cross-Validation Score:", np.mean(cv_scores))


Cross-Validation Scores: [1. 1. 1. 1. 1.]
Mean Cross-Validation Score: 1.0
