# Prorrata ERNC
Este programa tiene por objetivo realizar el re-calculo de curtailment para el sistema, en base a metodología propuesta por la NTCyO.

## LECTURA DE DATOS
Los datos deben ser extraidos desde el accdb en potencia neta (no por la potencia, sino por los marginales no truncados). La lista de datos que se deben extraer son:
1. Generación de cada central.
2. Perfil de generación de cada central.
3. Barra asociada a cada central.
4. Costos marginales para cada.
5. Curtailment por central (quizas por barra es suficiente).
6. Potencia máxima.
7. Generación disponible.
8. Estado de operación.

Para la lectura, a modo de determinar la mejor query al sistema, sin tener que lidiar con los problemas de MS Access, se cargan las tablas en DuckDB y se utiliza jupysql para probar SQL.

In [81]:
import polars as pl
import duckdb as duck

from pathlib import Path
from sqlalchemy.exc import SQLAlchemyError
from sqlalchemy import (
    engine,
    create_engine,
    inspect
)

path_prg = Path(r"../data/Model PRGdia_Full_Definitivo Solution.accdb").absolute()

if not path_prg.exists():
    raise ValueError(f"Path: {path_prg} does not exists.")

connection_string = (
    r"DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};"
    rf"DBQ={path_prg.as_posix()};"
    r"ExtendedAnsiSQL=1;"
)
connection_url = engine.URL.create(
    "access+pyodbc",
    query={"odbc_connect": connection_string}
)

try:
    prg_engine = create_engine(connection_url)
    tables = inspect(prg_engine).get_table_names()

    conn = duck.connect("PCP.duckdb")
    #conn.execute("CREATE SCHEMA IF NOT EXISTS bronze")

    for table in tables:
        print(f"trabajando en tabla: {table}...")
        df = pl.read_database(query=f"SELECT * FROM {table}", connection=prg_engine)
        conn.execute(f"CREATE OR REPLACE TABLE {table} AS SELECT * FROM df")

except SQLAlchemyError as e:
    print(f"Error: {e}")

finally:
    conn.close()
    prg_engine.dispose()

trabajando en tabla: t_attribute...
trabajando en tabla: t_attribute_data...
trabajando en tabla: t_band...
trabajando en tabla: t_category...
trabajando en tabla: t_class...
trabajando en tabla: t_class_group...
trabajando en tabla: t_collection...
trabajando en tabla: t_config...
trabajando en tabla: t_custom_column...
trabajando en tabla: t_data_0...
trabajando en tabla: t_data_1...
trabajando en tabla: t_data_2...
trabajando en tabla: t_data_3...
trabajando en tabla: t_data_4...
trabajando en tabla: t_data_6...
trabajando en tabla: t_data_7...
trabajando en tabla: t_data_current...
trabajando en tabla: t_key...
trabajando en tabla: t_key_index...
trabajando en tabla: t_membership...
trabajando en tabla: t_memo_object...
trabajando en tabla: t_model...
trabajando en tabla: t_object...
trabajando en tabla: t_object_meta...
trabajando en tabla: t_period_0...
trabajando en tabla: t_period_1...
trabajando en tabla: t_period_2...
trabajando en tabla: t_period_3...
trabajando en tabla: t_

## REVISIÓN DUCKDB
Con la data carga en la base de datos, empezamos a mirar como armar la mejor query

In [82]:
# Esto es para carga la extensión y leer la base de datos.
import duckdb

conn_pcp = duckdb.connect("pcp.duckdb")

# load de la extensión para sql
%load_ext sql
%sql conn_pcp --alias duck

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [88]:
# Esto es para cerrar las conexiones, usarlo al terminar de revisar
%sql --close duck
conn.close()

Clasica query para sacar la generación, levemente modificada para sacar las otras propiedades de una, no es necesario tener una query por dato.

In [87]:
%%sql
SELECT 
    t_child.name AS generator,
    t_property.name AS property,
    t_period_0.datetime,
    t_data_0.key_id AS data_key,
    t_data_0.period_id AS data_period,
    t_data_0.value,
FROM ((((((((t_membership
INNER JOIN t_collection ON t_membership.collection_id = t_collection.collection_id)
INNER JOIN t_object AS t_parent ON t_membership.parent_object_id = t_parent.object_id)
INNER JOIN t_object AS t_child ON t_membership.child_object_id = t_child.object_id)
INNER JOIN t_property ON t_collection.collection_id = t_property.collection_id)
INNER JOIN t_key ON t_membership.membership_id = t_key.membership_id AND t_property.property_id = t_key.property_id)
INNER JOIN t_data_0 ON t_key.key_id = t_data_0.key_id)
INNER JOIN t_phase_3 ON t_data_0.period_id = t_phase_3.period_id)
INNER JOIN t_period_0 ON t_phase_3.interval_id = t_period_0.interval_id)
INNER JOIN t_category ON t_child.category_id = t_category.category_id
WHERE t_collection.collection_id = 1 AND t_property.property_id IN (1, 28, 200, 219) AND t_category.category_id IN (95, 96, 99, 100)

generator,property,datetime,data_key,data_period,value
EL_MAITEN_EO,Max Capacity,2024-01-11 00:00:00,10428,1,9.0
EL_MAITEN_EO,Max Capacity,2024-01-11 01:00:00,10428,2,9.0
EL_MAITEN_EO,Max Capacity,2024-01-11 02:00:00,10428,3,9.0
EL_MAITEN_EO,Max Capacity,2024-01-11 03:00:00,10428,4,9.0
EL_MAITEN_EO,Max Capacity,2024-01-11 04:00:00,10428,5,9.0
EL_MAITEN_EO,Max Capacity,2024-01-11 05:00:00,10428,6,9.0
EL_MAITEN_EO,Max Capacity,2024-01-11 06:00:00,10428,7,9.0
EL_MAITEN_EO,Max Capacity,2024-01-11 07:00:00,10428,8,9.0
EL_MAITEN_EO,Max Capacity,2024-01-11 08:00:00,10428,9,9.0
EL_MAITEN_EO,Max Capacity,2024-01-11 09:00:00,10428,10,9.0


Similar a la anterior, una query para sacar los datos con marginales negativos, no se necesita el resto.

In [86]:
%%sql
SELECT 
    t_child.name AS node,
    t_period_0.datetime,
    t_data_0.key_id AS data_key,
    t_data_0.period_id AS data_period,
    t_data_0.value AS marginal_cost,
FROM ((((((((t_membership
INNER JOIN t_collection ON t_membership.collection_id = t_collection.collection_id)
INNER JOIN t_object AS t_parent ON t_membership.parent_object_id = t_parent.object_id)
INNER JOIN t_object AS t_child ON t_membership.child_object_id = t_child.object_id)
INNER JOIN t_property ON t_collection.collection_id = t_property.collection_id)
INNER JOIN t_key ON t_membership.membership_id = t_key.membership_id AND t_property.property_id = t_key.property_id)
INNER JOIN t_data_0 ON t_key.key_id = t_data_0.key_id)
INNER JOIN t_phase_3 ON t_data_0.period_id = t_phase_3.period_id)
INNER JOIN t_period_0 ON t_phase_3.interval_id = t_period_0.interval_id)
INNER JOIN t_category ON t_child.category_id = t_category.category_id
WHERE t_collection.collection_id = 245 AND t_property.property_id = 1233 AND t_data_0.value < 0

node,datetime,data_key,data_period,marginal_cost
ElPenon110,2024-01-11 09:00:00,29602,10,-4.9626358250636
Francisco220,2024-01-11 10:00:00,29637,11,-1.86184524305163
ElPenon110,2024-01-11 11:00:00,29602,12,-0.216977423477753
ElPenon110,2024-01-11 12:00:00,29602,13,-0.263556176073346
ElPenon110,2024-01-11 13:00:00,29602,14,-0.215534871550137
ElPenon110,2024-01-11 14:00:00,29602,15,-0.19682939692686
ElPenon110,2024-01-11 15:00:00,29602,16,-0.183725258261048
ElPenon110,2024-01-11 16:00:00,29602,17,-0.174420067605554
ElPenon110,2024-01-11 17:00:00,29602,18,-0.156235601290092
ElPenon110,2024-01-11 18:00:00,29602,19,-0.289653561310179


Query bonita con CTE para extraer la relación entre barra y generador. Lamentablemente no hay CTE en MSACCESS por lo que se reformula en el .sql

In [85]:
%%sql
WITH node_obj AS (
    SELECT 
        t_object.object_id AS node_id,
        t_object.name AS node,
    FROM t_object
    INNER JOIN t_class ON t_object.class_id = t_class.class_id
    WHERE t_class.name = 'Node'
), gen_obj AS (
    SELECT 
        t_object.object_id AS gen_id,
        t_object.name AS generator,
    FROM t_object
    INNER JOIN t_class ON t_object.class_id = t_class.class_id
    WHERE t_class.name = 'Generator' AND t_object.category_id IN (95, 96, 99, 100)
)

SELECT
    node_obj.node,
    gen_obj.generator,
FROM t_membership
INNER JOIN node_obj ON t_membership.child_object_id = node_obj.node_id
INNER JOIN gen_obj ON t_membership.parent_object_id = gen_obj.gen_id
WHERE t_membership.collection_id = 12

node,generator
Andes220,SOL_DEL_NORTE_ANDES_FV
Angamos220,TALLADO_FV
Arica066,PAMPA_CAMARONES_FV
Cachiyuyal220,PAMPA_SOLAR_NORTE_FV
Capricornio110,CAPRICORNIO_FV
Cardones110,VALLE_SOLAR_OESTE_FV
Cardones220,VALLE_ESCONDIDO_FV
Condores220,WILLKA_FV
CPinto220,SAN_ANDRES_FV
Crucero220,LAS_SALINAS_FV


## Juntando la información
Con el trabajo de armar los SQL, ahora se pasa a solo usar polars para disminuir la necesidad de otra libreria `DuckDB` (por mucho que me guste esta db).

In [125]:
import polars as pl

from pathlib import Path
from sqlalchemy.exc import SQLAlchemyError
from sqlalchemy import (
    engine,
    create_engine,
)

# Lectura de SQL Barra-Generador
path_sql_node = Path(r"../poc_prorrataerv/sql/gen_node.sql").absolute()
with open(path_sql_node, "r") as file:
    sql_node = file.read()

# Lectura de SQL con data de generacion
path_sql_gen = Path(r"../poc_prorrataerv/sql/gen_data.sql").absolute()
with open(path_sql_gen, "r") as file:
    sql_gen = file.read()

# lectura de SQL con data de barras con costos marginales menor a 0
path_sql_cmg = Path(r"../poc_prorrataerv/sql/cmg_data.sql").absolute()
with open(path_sql_cmg, "r") as file:
    sql_cmg = file.read()

# Inicio de captura de datos en dataframes
path_prg = Path(r"../data/Model PRGdia_Full_Definitivo Solution.accdb").absolute()

if not path_prg.exists():
    raise ValueError(f"Path: {path_prg} does not exists.")

connection_string = (
    r"DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};"
    rf"DBQ={path_prg.as_posix()};"
    r"ExtendedAnsiSQL=1;"
)
connection_url = engine.URL.create(
    "access+pyodbc",
    query={"odbc_connect": connection_string}
)

try:
    prg_engine = create_engine(connection_url)

    df_nodes = pl.read_database(query=sql_node, connection=prg_engine)
    df_gen = pl.read_database(query=sql_gen, connection=prg_engine)
    df_cmg = pl.read_database(query=sql_cmg, connection=prg_engine)

except SQLAlchemyError as e:
    print(f"Error: {e}")

finally:
    prg_engine.dispose()


In [170]:
# Lectura de otros datos pmgd
path_pmgd = Path(r"W:/41 Dpto Pronosticos/Vertimiento_ERNC/Lista_PMGDs.xlsx").absolute()
df_pmgd = pl.read_excel(
    source=path_pmgd,
    sheet_name="Hoja1",
    xlsx2csv_options={"skip_empty_lines": True},
    read_csv_options={"new_columns": ["Nombre_CDC","Centrales"]},
)

# lectura de lista de centrales vetadas
path_vetados = Path(r"R:/Aplicaciones/Prorrateo_Vertimiento/Centrales_Vetadas.xlsx").absolute()
df_vetados = pl.read_excel(
    source=path_vetados,
    sheet_name="Hoja1",
    xlsx2csv_options={"skip_empty_lines": True},
    read_csv_options={"new_columns": ["Centrales"]},
)

In [189]:
(
    df_gen
    .filter(
        pl.col("generator").is_in(
            pl.concat(
                [df_vetados["Centrales"].unique(),
                 df_pmgd["Centrales"].unique()]
            )
        )
    )
    .pivot(
        values="value",
        columns="property",
        index=["generator", "datetime"]
    )
    .filter(
        pl.col("Units Generating") == 1,
        pl.col("Capacity Curtailed") != 0,
    )
    .select(
        pl.exclude("Units Generating")
    )
    .group_by(pl.col("generator").alias("Generator"))
    .agg(pl.col("Capacity Curtailed").sum().alias("Total Capacity Curtailed"))
    .write_csv("errores.csv")
)

In [175]:
df_gen_pivot = (
    df_gen
    .filter(
        ~pl.col("generator").is_in(df_vetados["Centrales"].unique()),
        ~pl.col("generator").is_in(df_pmgd["Centrales"].unique()),
    )
    .pivot(
        values="value",
        columns="property",
        index=["generator", "datetime"]
    )
    .filter(
        pl.col("Units Generating") == 1,
    )
    .select(
        pl.exclude("Units Generating")
    )
)
df_gen_pivot

generator,datetime,Generation,Capacity Curtailed,Max Capacity,Available Capacity
str,datetime[μs],f64,f64,f64,f64
"""ANCOA""",2024-01-11 00:00:00,23.2,0.0,24.656,23.2
"""ANCOA""",2024-01-11 01:00:00,23.2,0.0,24.656,23.2
"""ANCOA""",2024-01-11 02:00:00,23.2,0.0,24.656,23.2
"""ANCOA""",2024-01-11 03:00:00,23.2,0.0,24.656,23.2
"""ANCOA""",2024-01-11 04:00:00,23.2,0.0,24.656,23.2
"""ANCOA""",2024-01-11 05:00:00,23.2,0.0,24.656,23.2
"""ANCOA""",2024-01-11 06:00:00,23.2,0.0,24.656,23.2
"""ANCOA""",2024-01-11 07:00:00,23.2,0.0,24.656,23.2
"""ANCOA""",2024-01-11 08:00:00,23.2,0.0,24.656,23.2
"""ANCOA""",2024-01-11 09:00:00,23.2,0.0,24.656,23.2


In [None]:
def calc_prorrata(df: pl.DataFrame) -> pl.DataFrame:
    df = df.filter(pl.col("Capacity Curtailed") != 0)
    df = df.with_column(
        pl.when(
            pl.col("Capacity Curtailed") == 0,
            0
        ).otherwise(
            pl.col("Capacity Curtailed") / pl.col("Total Capacity Curtailed")
        ).alias("Prorrata")
    )
    return df

In [204]:
data = (
    df_cmg
    .join(df_nodes, on="node", how="inner")
    .join(df_gen_pivot, on=["generator","datetime"], how="inner")
    .with_columns(
        (
            pl.col("Max Capacity") / pl.col("Max Capacity").sum().over("datetime")
        ).alias("Factor_ERNC"),
        (
            pl.col("Capacity Curtailed").sum().over("datetime") * pl.col("Max Capacity") / pl.col("Max Capacity").sum().over("datetime")
        ).alias("Prorrata_Curt"),
    )
    .with_columns(
        (
            pl.col("Available Capacity") - pl.col("Prorrata_Curt")
        ).alias("Prorrata_Gen"),
    )
)
data

node,datetime,generator,Generation,Capacity Curtailed,Max Capacity,Available Capacity,Factor_ERNC,Prorrata_Curt,Prorrata_Gen
str,datetime[μs],str,f64,f64,f64,f64,f64,f64,f64
"""Linares154""",2024-01-11 09:00:00,"""ANCOA""",23.2,0.0,24.656,23.2,0.002409,0.182818,23.017182
"""AJahuel220""",2024-01-11 09:00:00,"""CARENA""",2.75,0.0,10.0,2.75,0.000977,0.074148,2.675852
"""Polpaico220""",2024-01-11 09:00:00,"""CHACABUQUITO""",17.4,0.0,25.7,17.4,0.002511,0.19056,17.20944
"""Arica066""",2024-01-11 15:00:00,"""CHAPIQUINA""",3.5,0.0,10.14,3.5,0.001456,1.529139,1.970861
"""Arica066""",2024-01-11 16:00:00,"""CHAPIQUINA""",3.5,0.0,10.14,3.5,0.001456,1.712279,1.787721
"""Arica066""",2024-01-11 17:00:00,"""CHAPIQUINA""",3.5,0.0,10.14,3.5,0.001456,1.473143,2.026857
"""Arica066""",2024-01-11 18:00:00,"""CHAPIQUINA""",3.5,0.0,10.14,3.5,0.001456,0.486272,3.013728
"""Itahue154""",2024-01-11 09:00:00,"""CONVENTO_VIEJO…",14.5,0.0,19.4,14.5,0.001895,0.143846,14.356154
"""Tinguiririca15…",2024-01-11 09:00:00,"""EL_PASO""",57.0,0.0,60.0,57.0,0.005862,0.444886,56.555114
"""Florida110""",2024-01-11 09:00:00,"""FLORIDA_2""",11.0,0.0,28.0,11.0,0.002736,0.207613,10.792387


In [202]:
(
    data
    .sort(by="datetime")
    .group_by("datetime")
    .agg(
        pl.col("Factor_ERNC").sum().alias("Factor_ERNC"),
        pl.col("Prorrata_Curt").sum().alias("Total_Curatiled"),
        pl.col("Generation").sum().alias("Total_Gen"),
        pl.col("Prorrata_Gen").sum().alias("Total_Gen_Prorrata"),
        pl.col("Prorrata_Gen").min().alias("Min_Gen_Prorrata"),
    )
)

datetime,Factor_ERNC,Total_Curatiled,Total_Gen,Total_Gen_Prorrata,Min_Gen_Prorrata
datetime[μs],f64,f64,f64,f64,f64
2024-01-11 09:00:00,1.0,75.889864,5192.164816,5192.164816,-1.230249
2024-01-11 10:00:00,1.0,463.813237,1267.639286,1267.639286,-16.156788
2024-01-11 11:00:00,1.0,664.503039,3819.634563,3819.634563,-16.384868
2024-01-11 12:00:00,1.0,756.976858,3940.779408,3940.779408,-16.199442
2024-01-11 13:00:00,1.0,961.410298,3939.977646,3939.977646,-17.370719
2024-01-11 14:00:00,1.0,1134.514926,3917.010157,3917.010157,-16.555661
2024-01-11 15:00:00,1.0,1050.451622,4154.263024,4154.263024,-5.531484
2024-01-11 16:00:00,1.0,1176.260922,4034.743655,4034.743655,-10.298306
2024-01-11 17:00:00,1.0,1011.984914,4063.363033,4063.363033,-2.506424
2024-01-11 18:00:00,1.0,334.047598,4092.832172,4092.832172,-0.140733
