### **Step 1: Define the Environment**

The environment represents the solar power system, including the solar panel, battery, and their interactions, with **time of the day** factored in.

1. **State (What the agent knows at any moment):**
   The state includes all the information the agent uses to make decisions:

   - **Current solar power output** (e.g., in kW).
   - **Current battery charge level** (e.g., percentage or kWh).
   - **Current time of the day** (e.g., hour, or normalized value between 0–1).
   - **Optional:** Current power demand/output (if the system balances demand directly).

   Example state:  
   `[solar_power=2.5 kW, battery_level=70%, time=12:00]`

2. **Actions (What the agent can do):**

   - **Charge the battery** (store energy in the battery).
   - **Discharge the battery** (release energy from the battery).
   - **Do nothing** (leave the system unchanged).

3. **Reward (How the agent learns):**
   Assign rewards to encourage efficient energy management:
   - **Positive reward**:
     - Charging during times of high solar power (e.g., midday).
     - Discharging to meet constant output when solar is low (e.g., at night).
     - Avoiding overcharge or deep discharge of the battery.
   - **Negative reward (penalty)**:
     - Overcharging the battery (exceeding max capacity).
     - Draining the battery to zero (causing inefficiency or downtime).
     - Wasting solar power by not charging when solar power is abundant.

---


In [1]:
import pandas as pd

df = pd.read_csv("smartBMS\smartBMS\Plant_2_Generation_Data.csv")
df

  df = pd.read_csv("smartBMS\smartBMS\Plant_2_Generation_Data.csv")


Unnamed: 0,DATE_TIME,PLANT_ID,SOURCE_KEY,DC_POWER,AC_POWER,DAILY_YIELD,TOTAL_YIELD
0,2020-05-15 00:00:00,4136001,4UPUqMRk7TRMgml,0.0,0.0,9425.000000,2.429011e+06
1,2020-05-15 00:00:00,4136001,81aHJ1q11NBPMrL,0.0,0.0,0.000000,1.215279e+09
2,2020-05-15 00:00:00,4136001,9kRcWv60rDACzjR,0.0,0.0,3075.333333,2.247720e+09
3,2020-05-15 00:00:00,4136001,Et9kgGMDl729KT4,0.0,0.0,269.933333,1.704250e+06
4,2020-05-15 00:00:00,4136001,IQ2d7wF4YD8zU1Q,0.0,0.0,3177.000000,1.994153e+07
...,...,...,...,...,...,...,...
67693,2020-06-17 23:45:00,4136001,q49J1IKaHRwDQnt,0.0,0.0,4157.000000,5.207580e+05
67694,2020-06-17 23:45:00,4136001,rrq4fwE8jgrTyWY,0.0,0.0,3931.000000,1.211314e+08
67695,2020-06-17 23:45:00,4136001,vOuJvMaM2sgwLmb,0.0,0.0,4322.000000,2.427691e+06
67696,2020-06-17 23:45:00,4136001,xMbIugepa2P7lBB,0.0,0.0,4218.000000,1.068964e+08


In [2]:
import pandas as pd

def normalize_time_from_datetime(datetime_value):
    # Extract the time part
    total_seconds = (
        datetime_value.hour * 3600
        + datetime_value.minute * 60
        + datetime_value.second
    )
    # Normalize the time
    normalized_time = total_seconds / 86400  # Total seconds in a day
    return normalized_time



In [3]:
# Example usage
datetime_value = pd.to_datetime("2024-12-21 00:00:1")
normalized = normalize_time_from_datetime(datetime_value)
print(f"Normalized time: {normalized}")


Normalized time: 1.1574074074074073e-05


In [4]:
import numpy as np
np.random.randint(0,566)

280

In [5]:
df_cleaned = df[["DATE_TIME","DC_POWER"]]
# # lable encode the source key column
# df_cleaned["SOURCE_KEY"] = df_cleaned["SOURCE_KEY"].astype('category').cat.codes
df_cleaned

Unnamed: 0,DATE_TIME,DC_POWER
0,2020-05-15 00:00:00,0.0
1,2020-05-15 00:00:00,0.0
2,2020-05-15 00:00:00,0.0
3,2020-05-15 00:00:00,0.0
4,2020-05-15 00:00:00,0.0
...,...,...
67693,2020-06-17 23:45:00,0.0
67694,2020-06-17 23:45:00,0.0
67695,2020-06-17 23:45:00,0.0
67696,2020-06-17 23:45:00,0.0


In [6]:
# group  df_cleaned by SOURCE_KEY and DATE_TIME and sum the DC_POWER
df_cleaned_grouped = df_cleaned.groupby(["DATE_TIME"]).sum()
df_cleaned_grouped

Unnamed: 0_level_0,DC_POWER
DATE_TIME,Unnamed: 1_level_1
2020-05-15 00:00:00,0.0
2020-05-15 00:15:00,0.0
2020-05-15 00:30:00,0.0
2020-05-15 00:45:00,0.0
2020-05-15 01:00:00,0.0
...,...
2020-06-17 22:45:00,0.0
2020-06-17 23:00:00,0.0
2020-06-17 23:15:00,0.0
2020-06-17 23:30:00,0.0


In [7]:
# make the DATE_TIME column just a column and not an index 
df_cleaned_grouped.reset_index(inplace=True)
df_cleaned_grouped

Unnamed: 0,DATE_TIME,DC_POWER
0,2020-05-15 00:00:00,0.0
1,2020-05-15 00:15:00,0.0
2,2020-05-15 00:30:00,0.0
3,2020-05-15 00:45:00,0.0
4,2020-05-15 01:00:00,0.0
...,...,...
3254,2020-06-17 22:45:00,0.0
3255,2020-06-17 23:00:00,0.0
3256,2020-06-17 23:15:00,0.0
3257,2020-06-17 23:30:00,0.0


In [8]:
# remove the date from the DATE_TIME column and keep only the time , and create a new column for the date
# df_cleaned_grouped["TIME"] = pd.to_datetime(df_cleaned_grouped["DATE_TIME"]).dt.time
df_cleaned_grouped["DATE"] = pd.to_datetime(df_cleaned_grouped["DATE_TIME"]).dt.date
# rename the DATE_TIME column to TIME
df_cleaned_grouped.rename(columns={"DATE_TIME":"TIME"},inplace=True)
df_cleaned_grouped

Unnamed: 0,TIME,DC_POWER,DATE
0,2020-05-15 00:00:00,0.0,2020-05-15
1,2020-05-15 00:15:00,0.0,2020-05-15
2,2020-05-15 00:30:00,0.0,2020-05-15
3,2020-05-15 00:45:00,0.0,2020-05-15
4,2020-05-15 01:00:00,0.0,2020-05-15
...,...,...,...
3254,2020-06-17 22:45:00,0.0,2020-06-17
3255,2020-06-17 23:00:00,0.0,2020-06-17
3256,2020-06-17 23:15:00,0.0,2020-06-17
3257,2020-06-17 23:30:00,0.0,2020-06-17


In [9]:
# df_cleaned_grouped.drop(columns=["DATE_TIME"],inplace=True)
df_cleaned_grouped

Unnamed: 0,TIME,DC_POWER,DATE
0,2020-05-15 00:00:00,0.0,2020-05-15
1,2020-05-15 00:15:00,0.0,2020-05-15
2,2020-05-15 00:30:00,0.0,2020-05-15
3,2020-05-15 00:45:00,0.0,2020-05-15
4,2020-05-15 01:00:00,0.0,2020-05-15
...,...,...,...
3254,2020-06-17 22:45:00,0.0,2020-06-17
3255,2020-06-17 23:00:00,0.0,2020-06-17
3256,2020-06-17 23:15:00,0.0,2020-06-17
3257,2020-06-17 23:30:00,0.0,2020-06-17


In [10]:
# lable encode the date column
df_cleaned_grouped["DATE"] = df_cleaned_grouped["DATE"].astype('category').cat.codes
df_cleaned_grouped

Unnamed: 0,TIME,DC_POWER,DATE
0,2020-05-15 00:00:00,0.0,0
1,2020-05-15 00:15:00,0.0,0
2,2020-05-15 00:30:00,0.0,0
3,2020-05-15 00:45:00,0.0,0
4,2020-05-15 01:00:00,0.0,0
...,...,...,...
3254,2020-06-17 22:45:00,0.0,33
3255,2020-06-17 23:00:00,0.0,33
3256,2020-06-17 23:15:00,0.0,33
3257,2020-06-17 23:30:00,0.0,33


In [11]:
# get the number of unique values in Date column
df_cleaned_grouped["DATE"].nunique()

34

In [12]:
# get the date of date 5
data_for_epi_date=df_cleaned_grouped[df_cleaned_grouped["DATE"] == 5].reset_index(drop=True)
data_for_epi_date

Unnamed: 0,TIME,DC_POWER,DATE
0,2020-05-20 00:00:00,0.0,5
1,2020-05-20 00:15:00,0.0,5
2,2020-05-20 00:30:00,0.0,5
3,2020-05-20 00:45:00,0.0,5
4,2020-05-20 01:00:00,0.0,5
...,...,...,...
91,2020-05-20 22:45:00,0.0,5
92,2020-05-20 23:00:00,0.0,5
93,2020-05-20 23:15:00,0.0,5
94,2020-05-20 23:30:00,0.0,5


In [13]:
time_in_time = pd.to_datetime(data_for_epi_date.iloc[95]["TIME"])
time_in_time

Timestamp('2020-05-20 23:45:00')

In [14]:
normalize_time_from_datetime(time_in_time)

0.9895833333333334

In [15]:
data_for_epi_date

Unnamed: 0,TIME,DC_POWER,DATE
0,2020-05-20 00:00:00,0.0,5
1,2020-05-20 00:15:00,0.0,5
2,2020-05-20 00:30:00,0.0,5
3,2020-05-20 00:45:00,0.0,5
4,2020-05-20 01:00:00,0.0,5
...,...,...,...
91,2020-05-20 22:45:00,0.0,5
92,2020-05-20 23:00:00,0.0,5
93,2020-05-20 23:15:00,0.0,5
94,2020-05-20 23:30:00,0.0,5


In [16]:
#  get only the data between 06:00:00 and 18:00:00 
df_cleaned_grouped = df_cleaned_grouped[(pd.to_datetime(df_cleaned_grouped["TIME"]).dt.time >= pd.to_datetime("07:00:00").time()) & (pd.to_datetime(df_cleaned_grouped["TIME"]).dt.time <= pd.to_datetime("17:00:00").time())]
df_cleaned_grouped

Unnamed: 0,TIME,DC_POWER,DATE
28,2020-05-15 07:00:00,6733.253333,0
29,2020-05-15 07:15:00,8379.640000,0
30,2020-05-15 07:30:00,10109.332381,0
31,2020-05-15 07:45:00,12052.231429,0
32,2020-05-15 08:00:00,12379.565238,0
...,...,...,...
3227,2020-06-17 16:00:00,6581.850952,33
3228,2020-06-17 16:15:00,6512.576667,33
3229,2020-06-17 16:30:00,6162.166190,33
3230,2020-06-17 16:45:00,4105.105714,33


In [17]:
# get  the min max and mean of the DC_POWER
min_dc_power = df_cleaned_grouped["DC_POWER"].min()
max_dc_power = df_cleaned_grouped["DC_POWER"].max()
mean_dc_power = df_cleaned_grouped["DC_POWER"].mean()
min_dc_power, max_dc_power, mean_dc_power

(0.0, 26630.506666666668, np.float64(11586.216424036063))

In [18]:
# save the cleaned data to a csv file
df_cleaned_grouped.to_csv("solar_plant_gym_env\envs\Plant_2_Generation_Data_cleaned.csv", index=False)

  df_cleaned_grouped.to_csv("solar_plant_gym_env\envs\Plant_2_Generation_Data_cleaned.csv", index=False)


In [19]:
# get the list of lengths of DC_POWER column value continuasly have 0 s
zero_lengths = []
zero_length = 0
for dc_power in df_cleaned_grouped["DC_POWER"]:
    if dc_power == 0:
        zero_length += 1
    else:
        if zero_length != 0:
            zero_lengths.append(zero_length)
            zero_length = 0
            
zero_lengths

[1, 2, 1, 2, 1, 1, 1]

In [20]:
# get the max , average and min length of 0 s
max_zero_length = max(zero_lengths)
min_zero_length = min(zero_lengths)
average_zero_length = sum(zero_lengths)/len(zero_lengths)
max_zero_length, min_zero_length, average_zero_length

(2, 1, 1.2857142857142858)

In [21]:
# multiply each max, min and average by 15 minutes to get the time in minutes and devide by 60 to get the time in hours
max_zero_length_hours = max_zero_length*15/60
min_zero_length_hours = min_zero_length*15/60
average_zero_length_hours = average_zero_length*15/60
max_zero_length_hours, min_zero_length_hours, average_zero_length_hours

(0.5, 0.25, 0.32142857142857145)

In [25]:
new_df = pd.read_excel("Vydexa lanka data Irradiation Vs Active powerNov 01-Dec01.xlsx")
new_df

Unnamed: 0,Timestamp,Power_Output,Solar_Irradiance
0,2023-11-01 06:00:00,-0.024284,8.101851
1,2023-11-01 06:01:00,-0.024284,8.101851
2,2023-11-01 06:02:00,-0.037689,9.259259
3,2023-11-01 06:03:00,-0.037689,9.259259
4,2023-11-01 06:04:00,-0.037689,9.259259
...,...,...,...
43916,2023-12-01 17:56:00,-0.032078,0.000000
43917,2023-12-01 17:57:00,-0.032078,0.000000
43918,2023-12-01 17:58:00,-0.032078,0.000000
43919,2023-12-01 17:59:00,-0.032078,0.000000


In [26]:
# rename the columns Timestamp to TIME and Power_Output to DC_POWER
new_df.rename(columns={"Timestamp":"TIME","Power_Output":"DC_POWER"},inplace=True)
# drop Solar_Irradiance column
new_df.drop(columns=["Solar_Irradiance"],inplace=True)
# create a DATE column from the TIME column
new_df["DATE"] = pd.to_datetime(new_df["TIME"]).dt.date
# convert the DATE column to lable encoded column
new_df["DATE"] = new_df["DATE"].astype('category').cat.codes
# only keep the data between 07:00:00 and 17:00:00
new_df = new_df[(pd.to_datetime(new_df["TIME"]).dt.time >= pd.to_datetime("07:00:00").time()) & (pd.to_datetime(new_df["TIME"]).dt.time <= pd.to_datetime("17:00:00").time())]
new_df

Unnamed: 0,TIME,DC_POWER,DATE
60,2023-11-01 07:00:00,0.894147,0
61,2023-11-01 07:01:00,0.894147,0
62,2023-11-01 07:02:00,0.894147,0
63,2023-11-01 07:03:00,0.894147,0
64,2023-11-01 07:04:00,0.894147,0
...,...,...,...
43856,2023-12-01 16:56:00,-0.032078,30
43857,2023-12-01 16:57:00,-0.032078,30
43858,2023-12-01 16:58:00,-0.032078,30
43859,2023-12-01 16:59:00,-0.032078,30


In [27]:
# convert the negative values in DC_POWER to 0
new_df.loc[new_df["DC_POWER"] < 0, "DC_POWER"] = 0
new_df

Unnamed: 0,TIME,DC_POWER,DATE
60,2023-11-01 07:00:00,0.894147,0
61,2023-11-01 07:01:00,0.894147,0
62,2023-11-01 07:02:00,0.894147,0
63,2023-11-01 07:03:00,0.894147,0
64,2023-11-01 07:04:00,0.894147,0
...,...,...,...
43856,2023-12-01 16:56:00,0.000000,30
43857,2023-12-01 16:57:00,0.000000,30
43858,2023-12-01 16:58:00,0.000000,30
43859,2023-12-01 16:59:00,0.000000,30


In [29]:
# multiply the DC_POWER by 1000 to convert it to Killo watts
new_df["DC_POWER"] = new_df["DC_POWER"]*1000

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df["DC_POWER"] = new_df["DC_POWER"]*1000


In [30]:
# get the min max and mean of the DC_POWER
min_dc_power = new_df["DC_POWER"].min()
max_dc_power = new_df["DC_POWER"].max()
mean_dc_power = new_df["DC_POWER"].mean()
min_dc_power, max_dc_power, mean_dc_power

(0.0, 11879.23, np.float64(4074.3331025709845))

In [31]:
# save the new_df to a csv file 
new_df.to_csv("solar_plant_gym_env\envs\Vydexa_lanka_data_cleaned.csv", index=False)
new_df.to_csv("Vydexa_lanka_data_cleaned.csv", index=False)

  new_df.to_csv("solar_plant_gym_env\envs\Vydexa_lanka_data_cleaned.csv", index=False)


# Final Suggestions:

- ### Constant Output Power:

  `5,000 kW`
  (A balanced, stable value near the average solar input).

- ### Battery Capacity:

  `86 MWh`
  (Enough to handle nighttime and cloudy conditions, with a safety margin).

- ### Battery Charge/Discharge Rate:
  ~`22 MW`
  (To handle the maximum power discrepancy efficiently).
