Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Length mismatch: Expected axis has 23 elements, new values have 24 elements [related to DST switch?] #331

Closed
fgenoese opened this issue May 17, 2024 · 10 comments

Comments

@fgenoese
Copy link
Collaborator

fgenoese commented May 17, 2024

The following code yields the above-mentioned error (lenght mismatch). It only occurs if start and end are in different time offsets (e.g. due to daylight saving time). Using the latest entsoe-py. Can somebody confirm?

import pandas as pd
import entsoe

client = entsoe.EntsoePandasClient(api_key='XXXXXXXXXXXXXXXXXXXX')
start=pd.Timestamp('20240331', tz='Europe/Berlin')
end=pd.Timestamp('20240401', tz='Europe/Berlin')
neighbour = 'FR'
origin = 'IT'
df = client.query_net_transfer_capacity_dayahead(neighbour, origin, start=start, end=end)
@fgenoese
Copy link
Collaborator Author

Appears to be linked to the data (see raw output below). I was not expecting to see two periods for a single day. But the actual problem seems to be that the first period has 2 points and the second has 22 points, making it 24 in total. But the library correctly expects that Mar 31 should have had 23 hours due to the shift to DST. Will open a ticket on the TP.

<timeseries>
   <mrid>1</mrid>
   <businesstype>A27</businesstype>
   <in_domain.mrid codingscheme="A01">10YIT-GRTN-----B</in_domain.mrid>
   <out_domain.mrid codingscheme="A01">10YFR-RTE------C</out_domain.mrid>
   <quantity_measure_unit.name>MAW</quantity_measure_unit.name>
   <curvetype>A01</curvetype>
   <period>
      <timeinterval>
         <start>2024-03-31T00:00Z</start>
         <end>2024-03-31T02:00Z</end>
      </timeinterval>
      <resolution>PT60M</resolution>
      <point>
         <position>1</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>2</position>
         <quantity>2400</quantity>
      </point>
   </period>
   <period>
      <timeinterval>
         <start>2024-03-31T03:00Z</start>
         <end>2024-04-01T00:00Z</end>
      </timeinterval>
      <resolution>PT60M</resolution>
      <point>
         <position>1</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>2</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>3</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>4</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>5</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>6</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>7</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>8</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>9</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>10</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>11</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>12</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>13</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>14</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>15</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>16</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>17</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>18</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>19</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>20</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>21</position>
         <quantity>2400</quantity>
      </point>
   </period>
</timeseries>

@milannnnn
Copy link

I am running into similar issues with the latest version of the library (0.6.7):

start = pd.Timestamp("2023-03-01T00:00:00+02:00")
end =   pd.Timestamp("2023-04-02T00:00:00+02:00")
client = EntsoePandasClient(getenv("ENTSOE_API_KEY"))
df_transferred_capacity = client.query_net_transfer_capacity_dayahead(
    start=start,
    end=end,
    country_code_from="NL",
    country_code_to="DK_1",
)

The error I'm getting:

ValueError: Length mismatch: Expected axis has 23 elements, new values have 24 elements

@fboerman
Copy link
Collaborator

fboerman commented Jun 1, 2024

@fgenoese did you receive anything back from your ticket?

@fgenoese
Copy link
Collaborator Author

fgenoese commented Jun 1, 2024

Seems to be an error on their side; they'll try to fix in the next TP release which is expected in mid June. I'll keep this issue open for now, in case we have to adapt our library here as well.

@fboerman
Copy link
Collaborator

fboerman commented Jun 1, 2024

great thanks!

@milannnnn
Copy link

milannnnn commented Jun 3, 2024

Looking at the response, it does appear that there is one data point missing from their end (the first period in the time series has values for 00:00 and 01:00, but the second period starts from 03:00, so the 02:00 value is missing):

url = 'https://web-api.tp.entsoe.eu/api'
params = {
   "contract_MarketAgreement.Type": "A01",
   "documentType": "A61",
   "in_Domain": "10YIT-GRTN-----B",
   "out_Domain": "10YFR-RTE------C",
   "periodEnd": "202403312200",
   "periodStart": "202403302300",
   "securityToken": ...
}
response = session.get(url=url, params=params)

print(response.text)
# ---------------------------------------------------
<TimeSeries>
	...
		<Period>
			<timeInterval>
				<start>2024-03-31T00:00Z</start>
				<end>2024-03-31T02:00Z</end>
			</timeInterval>
			<resolution>PT60M</resolution>
				<Point>
					<position>1</position>
					<quantity>2400</quantity>
				</Point>
				<Point>
					<position>2</position>
					<quantity>2400</quantity>
				</Point>
		</Period>
		<Period>
			<timeInterval>
				<start>2024-03-31T03:00Z</start>
				<end>2024-04-01T00:00Z</end>
			</timeInterval>
			<resolution>PT60M</resolution>
				<Point>
					<position>1</position>
					<quantity>2400</quantity>
				</Point>
				...
		</Period>
</TimeSeries>

Also, looking at the way the time-series objects are structured (made up of potentially multiple periods), this issue could be handled by parsing the individual periods into pandas series, and then concatenating those (instead of directly parsing the time series objects):

# rename the original _parse_crossborder_flows_timeseries() to _parse_crossborder_flows_period()
def _parse_crossborder_flows_period(soup):
    """
    Parameters
    ----------
    soup : bs4.element.tag

    Returns
    -------
    pd.Series
    """
    positions = []
    flows = []
    for point in soup.find_all('point'):
        positions.append(int(point.find('position').text))
        flows.append(float(point.find('quantity').text))

    series = pd.Series(index=positions, data=flows)
    series = series.sort_index()
    series.index = _parse_datetimeindex(soup)#[:len(series)]
    return series

# create a new _parse_crossborder_flows_timeseries method (that aggregates individual periods)
def _parse_crossborder_flows_timeseries(soup):
    series = [
        _parse_crossborder_flows_period(soup_period)
        for soup_period in soup.find_all('period')
    ]
    return pd.concat(series)

This approach should work even with missing data (the example stated in this issue), and I believe it should not affect the rest of the data.

@fgenoese
Copy link
Collaborator Author

fgenoese commented Jun 3, 2024

Looking at the response, it does appear that there is one data point missing from their end (the first period in the time series has values for 00:00 and 01:00, but the second period starts from 03:00, so the 02:00 value is missing):

Isn't that the expected behaviour for 31st of March 2024? When switching to DST, we skip 1 hour.

@milannnnn
Copy link

Isn't that the expected behaviour for 31st of March 2024? When switching to DST, we skip 1 hour.

It would be in local time, but ENTSO-E reports these values in UTC (where we have no DST / should not have a missing value).

print(pd.Timestamp("2024-03-31T00:00Z").tz_convert("Europe/Berlin"))
# 2024-03-31 01:00:00+01:00
print(pd.Timestamp("2024-04-01T00:00Z").tz_convert("Europe/Berlin"))
# 2024-04-01 02:00:00+02:00
print(pd.Timestamp("2024-04-01T00:00Z") - pd.Timestamp("2024-03-31T00:00Z"))
# 1 days 00:00:00 (24 hours)

@fgenoese
Copy link
Collaborator Author

fgenoese commented Jun 6, 2024

A fix was applied by the TP, there is no error anymore on the entsoe-py side. Hence, I will close the issue for now.

This is the raw output after their fix:

	<period.timeInterval>
		<start>2024-03-30T23:00Z</start>
		<end>2024-03-31T22:00Z</end>
	</period.timeInterval>
	<TimeSeries>
		<mRID>1</mRID>
		<businessType>A27</businessType>
		<in_Domain.mRID codingScheme="A01">10YIT-GRTN-----B</in_Domain.mRID>
		<out_Domain.mRID codingScheme="A01">10YFR-RTE------C</out_Domain.mRID>
		<quantity_Measure_Unit.name>MAW</quantity_Measure_Unit.name>
		<curveType>A01</curveType>
		<Period>
			<timeInterval>
				<start>2024-03-30T23:00Z</start>
				<end>2024-03-31T22:00Z</end>
			</timeInterval>
			<resolution>PT60M</resolution>
			<Point>
				<position>1</position>
				<quantity>2400</quantity>
			</Point>
			<Point>
				<position>2</position>
				<quantity>2400</quantity>
			</Point>
			<Point>
				<position>3</position>
				<quantity>2400</quantity>
			</Point>
			<Point>
				<position>4</position>
				<quantity>2400</quantity>
			</Point>
			<Point>
				<position>5</position>
				<quantity>2400</quantity>
			</Point>
			<Point>
				<position>6</position>
				<quantity>2400</quantity>
			</Point>
			<Point>
				<position>7</position>
				<quantity>2400</quantity>
			</Point>
			<Point>
				<position>8</position>
				<quantity>2400</quantity>
			</Point>
			<Point>
				<position>9</position>
				<quantity>2400</quantity>
			</Point>
			<Point>
				<position>10</position>
				<quantity>2400</quantity>
			</Point>
			<Point>
				<position>11</position>
				<quantity>2400</quantity>
			</Point>
			<Point>
				<position>12</position>
				<quantity>2400</quantity>
			</Point>
			<Point>
				<position>13</position>
				<quantity>2400</quantity>
			</Point>
			<Point>
				<position>14</position>
				<quantity>2400</quantity>
			</Point>
			<Point>
				<position>15</position>
				<quantity>2400</quantity>
			</Point>
			<Point>
				<position>16</position>
				<quantity>2400</quantity>
			</Point>
			<Point>
				<position>17</position>
				<quantity>2400</quantity>
			</Point>
			<Point>
				<position>18</position>
				<quantity>2400</quantity>
			</Point>
			<Point>
				<position>19</position>
				<quantity>2400</quantity>
			</Point>
			<Point>
				<position>20</position>
				<quantity>2400</quantity>
			</Point>
			<Point>
				<position>21</position>
				<quantity>2400</quantity>
			</Point>
			<Point>
				<position>22</position>
				<quantity>2400</quantity>
			</Point>
			<Point>
				<position>23</position>
				<quantity>2400</quantity>
			</Point>
		</Period>
	</TimeSeries>
</Publication_MarketDocument>

@fgenoese fgenoese closed this as completed Jun 6, 2024
@fboerman
Copy link
Collaborator

fboerman commented Jun 6, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants