Skip to content

Commit

Permalink
Refactor io into reading and processing and create table tree structu…
Browse files Browse the repository at this point in the history
…re (#607)

The function to create graph et al from a csv folder has been split into
two functions. The first reads the csv folder into a new TableTree
structure. The second processes the TableTree structure into graph et
al.
  • Loading branch information
abelsiqueira committed Apr 26, 2024
1 parent 49a243d commit 29814b7
Show file tree
Hide file tree
Showing 6 changed files with 146 additions and 68 deletions.
23 changes: 21 additions & 2 deletions docs/src/how-to-use.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,10 +171,29 @@ It hides the complexity behind the energy problem, making the usage more friendl

The `EnergyProblem` can also be constructed using the minimal constructor below.

- `EnergyProblem(graph, representative_periods, timeframe)`: Constructs a new `EnergyProblem` object with the given graph, representative periods, and timeframe. The `constraints_partitions` field is computed from the `representative_periods`, and the other fields are initialized with default values.
- `EnergyProblem(table_tree)`: Constructs a new `EnergyProblem` object with the given [`table_tree`](@ref TableTree) object. The `graph`, `representative_periods`, and `timeframe` are computed using `create_internal_structures`. The `constraints_partitions` field is computed from the `representative_periods`, and the other fields are initialized with default values.

See the [basic example tutorial](@ref basic-example) to see how these can be used.

### TableTree

To move and keep data, we use [DataFrames](https://dataframes.juliadata.org) and a tree-like structure to link to these structures.
Each field in this structure is a NamedTuple. Below, you will find its fields:

- `static`: Stores the data that does not vary inside a year. Its fields are
- `assets`: Assets data.
- `flows`: Flows data.
- `profiles`: Stores the profile data indexed by:
- `assets`: Dictionary with the reference to assets' profiles indexed by periods (`"rep-periods"` or `"timeframe"`).
- `flows`: Reference to flows' profiles for representative periods.
- `profiles`: Actual profile data. Dictionary of dictionary indexed by periods and then by the profile name.
- `partitions`: Stores the partitions data indexed by:
- `assets`: Dictionary with the specification of the assets' partitions indexed by periods.
- `flows`: Specification of the flows' partitions for representative periods.
- `periods`: Stores the periods data, indexed by:
- `rep_periods`: Representative periods.
- `timeframe`: Timeframe periods.

### Graph

The energy problem is defined using a graph.
Expand All @@ -185,7 +204,7 @@ Using MetaGraphsNext we can define a graph with metadata, i.e., associate data w
Furthermore, we can define the labels of each asset as keys to access the elements of the graph.
The assets in the graph are of type [GraphAssetData](@ref), and the flows are of type [GraphFlowData](@ref).

The graph can be created using the [`create_graph_and_representative_periods_from_csv_folder`](@ref) function, or it can be accessed from an [EnergyProblem](@ref).
The graph can be created using the [`create_internal_structures`](@ref) function, or it can be accessed from an [EnergyProblem](@ref).

See how to use the graph in the [graph tutorial](@ref graph-tutorial).

Expand Down
10 changes: 8 additions & 2 deletions docs/src/tutorials.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,15 +83,21 @@ energy_problem.objective_value, energy_problem.termination_status
### Manually creating all structures without EnergyProblem

For additional control, it might be desirable to use the internal structures of `EnergyProblem` directly.
This can be error-prone, but it is slightly more efficient.
This can be error-prone, so use it with care.
The full description for these structures can be found in [Structures](@ref).

```@example manual
using TulipaEnergyModel
input_dir = "../../test/inputs/Tiny" # hide
# input_dir should be the path to Tiny
graph, representative_periods, timeframe = create_graph_and_representative_periods_from_csv_folder(input_dir)
table_tree = create_input_dataframes_from_csv_folder(input_dir)
```

The `table_tree` contains all tables in the folder, which are then processed into the internal structures below:

```@example manual
graph, representative_periods, timeframe = create_internal_structures(table_tree)
```

We also need a time partition for the constraints to create the model.
Expand Down
126 changes: 69 additions & 57 deletions src/io.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
export create_energy_problem_from_csv_folder,
create_graph_and_representative_periods_from_csv_folder,
create_input_dataframes_from_csv_folder,
create_internal_structures,
save_solution_to_file,
compute_assets_partitions!,
compute_flows_partitions!
Expand All @@ -14,15 +15,14 @@ the `EnergyProblem` structure.
Set `strict = true` to error if assets are missing from partition data.
"""
function create_energy_problem_from_csv_folder(input_folder::AbstractString; strict = false)
graph, representative_periods, timeframe =
create_graph_and_representative_periods_from_csv_folder(input_folder; strict = strict)
return EnergyProblem(graph, representative_periods, timeframe)
table_tree = create_input_dataframes_from_csv_folder(input_folder; strict = strict)
return EnergyProblem(table_tree)
end

"""
graph, representative_periods, timeframe = create_graph_and_representative_periods_from_csv_folder(input_folder; strict = false)
table_tree = create_input_dataframes_from_csv_folder(input_folder; strict = false)
Returns the `graph` structure that holds all data, and the `representative_periods` array.
Returns the `table_tree::TableTree` structure that holds all data.
Set `strict = true` to error if assets are missing from partition data.
The following files are expected to exist in the input folder:
Expand All @@ -39,48 +39,31 @@ The following files are expected to exist in the input folder:
- `profiles-rep-periods-<type>.csv`: Following the schema `schemas.rep_periods.profiles_data`.
- `rep-periods-data.csv`: Following the schema `schemas.rep_periods.data`.
- `rep-periods-mapping.csv`: Following the schema `schemas.rep_periods.mapping`.
The returned structures are:
- `graph`: a MetaGraph with the following information:
+ `labels(graph)`: All assets.
+ `edge_labels(graph)`: All flows, in pair format `(u, v)`, where `u` and `v` are assets.
+ `graph[a]`: A [`TulipaEnergyModel.GraphAssetData`](@ref) structure for asset `a`.
+ `graph[u, v]`: A [`TulipaEnergyModel.GraphFlowData`](@ref) structure for flow `(u, v)`.
- `representative_periods`: An array of
[`TulipaEnergyModel.RepresentativePeriod`](@ref) ordered by their IDs.
- `timeframe`: Information of
[`TulipaEnergyModel.Timeframe`](@ref).
"""
function create_graph_and_representative_periods_from_csv_folder(
input_folder::AbstractString;
strict = false,
)
function create_input_dataframes_from_csv_folder(input_folder::AbstractString; strict = false)
df_assets_data = read_csv_with_implicit_schema(input_folder, "assets-data.csv")
df_flows_data = read_csv_with_implicit_schema(input_folder, "flows-data.csv")
df_rep_period = read_csv_with_implicit_schema(input_folder, "rep-periods-data.csv")
df_rep_periods = read_csv_with_implicit_schema(input_folder, "rep-periods-data.csv")
df_rp_mapping = read_csv_with_implicit_schema(input_folder, "rep-periods-mapping.csv")

df_assets_profiles = Dict(
profile_type =>
read_csv_with_implicit_schema(input_folder, "assets-$profile_type-profiles.csv") for
profile_type in ["timeframe", "rep-periods"]
period_types = ["rep-periods", "timeframe"]

dfs_assets_profiles = Dict(
period_type =>
read_csv_with_implicit_schema(input_folder, "assets-$period_type-profiles.csv") for
period_type in period_types
)
df_flows_profiles =
read_csv_with_implicit_schema(input_folder, "flows-rep-periods-profiles.csv")
df_assets_partitions = Dict(
"timeframe" =>
read_csv_with_implicit_schema(input_folder, "assets-timeframe-partitions.csv"),
"rep-periods" =>
read_csv_with_implicit_schema(input_folder, "assets-rep-periods-partitions.csv"),
dfs_assets_partitions = Dict(
period_type =>
read_csv_with_implicit_schema(input_folder, "assets-$period_type-partitions.csv")
for period_type in period_types
)
df_flows_partitions =
read_csv_with_implicit_schema(input_folder, "flows-rep-periods-partitions.csv")

df_profiles = Dict(
dfs_profiles = Dict(
period_type => Dict(
begin
regex = "profiles-$(period_type)-(.*).csv"
Expand All @@ -90,13 +73,13 @@ function create_graph_and_representative_periods_from_csv_folder(
key => value
end for filename in readdir(input_folder) if
startswith("profiles-$period_type-")(filename)
) for period_type in ["rep-periods", "timeframe"]
) for period_type in period_types
)

# Error if partition data is missing assets (if strict)
if strict
missing_assets =
setdiff(df_assets_data[!, :name], df_assets_partitions["rep-periods"][!, :asset])
setdiff(df_assets_data[!, :name], dfs_assets_partitions["rep-periods"][!, :asset])
if length(missing_assets) > 0
msg = "Error: Partition data missing for these assets: \n"
for a in missing_assets
Expand All @@ -108,24 +91,53 @@ function create_graph_and_representative_periods_from_csv_folder(
end
end

# Sets and subsets that depend on input data
table_tree = TableTree(
(assets = df_assets_data, flows = df_flows_data),
(assets = dfs_assets_profiles, flows = df_flows_profiles, data = dfs_profiles),
(assets = dfs_assets_partitions, flows = df_flows_partitions),
(rep_periods = df_rep_periods, mapping = df_rp_mapping),
)

return table_tree
end

"""
graph, representative_periods, timeframe = create_internal_structures(table_tree)
Return the `graph`, `representative_periods`, and `timeframe` structures given the input dataframes structure.
The details of these structures are:
- `graph`: a MetaGraph with the following information:
+ `labels(graph)`: All assets.
+ `edge_labels(graph)`: All flows, in pair format `(u, v)`, where `u` and `v` are assets.
+ `graph[a]`: A [`TulipaEnergyModel.GraphAssetData`](@ref) structure for asset `a`.
+ `graph[u, v]`: A [`TulipaEnergyModel.GraphFlowData`](@ref) structure for flow `(u, v)`.
- `representative_periods`: An array of
[`TulipaEnergyModel.RepresentativePeriod`](@ref) ordered by their IDs.
- `timeframe`: Information of
[`TulipaEnergyModel.Timeframe`](@ref).
"""
function create_internal_structures(table_tree::TableTree)
# TODO: Depending on the outcome of issue #294, this can be done more efficiently with DataFrames, e.g.,
# combine(groupby(df_rp_mapping, :rep_period), :weight => sum => :weight)
# combine(groupby(input_df_periods.mapping, :rep_period), :weight => sum => :weight)

# Create a dictionary of weights and populate it.
weights = Dict{Int,Dict{Int,Float64}}()
for sub_df in DataFrames.groupby(df_rp_mapping, :rep_period)
for sub_df in DataFrames.groupby(table_tree.periods.mapping, :rep_period)
rp = first(sub_df.rep_period)
weights[rp] = Dict(Pair.(sub_df.period, sub_df.weight))
end

representative_periods = [
RepresentativePeriod(weights[row.id], row.num_timesteps, row.resolution) for
row in eachrow(df_rep_period)
row in eachrow(table_tree.periods.rep_periods)
]

timeframe = Timeframe(maximum(df_rp_mapping.period), df_rp_mapping)
timeframe = Timeframe(maximum(table_tree.periods.mapping.period), table_tree.periods.mapping)

asset_data = [
row.name => GraphAssetData(
Expand All @@ -147,7 +159,7 @@ function create_graph_and_representative_periods_from_csv_folder(
row.initial_storage_capacity,
row.initial_storage_level,
row.energy_to_power_ratio,
) for row in eachrow(df_assets_data)
) for row in eachrow(table_tree.static.assets)
]

flow_data = [
Expand All @@ -164,11 +176,11 @@ function create_graph_and_representative_periods_from_csv_folder(
row.initial_export_capacity,
row.initial_import_capacity,
row.efficiency,
) for row in eachrow(df_flows_data)
) for row in eachrow(table_tree.static.flows)
]

num_assets = length(asset_data)
name_to_id = Dict(name => i for (i, name) in enumerate(df_assets_data.name))
name_to_id = Dict(name => i for (i, name) in enumerate(table_tree.static.assets.name))

_graph = Graphs.DiGraph(num_assets)
for flow in flow_data
Expand All @@ -181,7 +193,7 @@ function create_graph_and_representative_periods_from_csv_folder(
for a in MetaGraphsNext.labels(graph)
compute_assets_partitions!(
graph[a].rep_periods_partitions,
df_assets_partitions["rep-periods"],
table_tree.partitions.assets["rep-periods"],
a,
representative_periods,
)
Expand All @@ -190,19 +202,19 @@ function create_graph_and_representative_periods_from_csv_folder(
for (u, v) in MetaGraphsNext.edge_labels(graph)
compute_flows_partitions!(
graph[u, v].rep_periods_partitions,
df_flows_partitions,
table_tree.partitions.flows,
u,
v,
representative_periods,
)
end

# For timeframe, only the assets where is_seasonal is true are selected
for row in eachrow(df_assets_data)
for row in eachrow(table_tree.static.assets)
if row.is_seasonal
# Search for this row in the df_assets_partitions and error if it is not found
# Search for this row in the table_tree.partitions.assets and error if it is not found
found = false
for partition_row in eachrow(df_assets_partitions["timeframe"])
for partition_row in eachrow(table_tree.partitions.assets["timeframe"])
if row.name == partition_row.asset
graph[row.name].timeframe_partitions = _parse_rp_partition(
Val(partition_row.specification),
Expand All @@ -220,11 +232,11 @@ function create_graph_and_representative_periods_from_csv_folder(
end
end

for asset_profile_row in eachrow(df_assets_profiles["rep-periods"]) # row = asset, profile_type, profile_name
for asset_profile_row in eachrow(table_tree.profiles.assets["rep-periods"]) # row = asset, profile_type, profile_name
gp = DataFrames.groupby( # 3. group by RP
filter(
row -> row.profile_name == asset_profile_row.profile_name, # 2. Filter profile_name
df_profiles["rep-periods"][asset_profile_row.profile_type], # 1. Get the profile of given type
table_tree.profiles.data["rep-periods"][asset_profile_row.profile_type], # 1. Get the profile of given type
),
:rep_period,
)
Expand All @@ -236,11 +248,11 @@ function create_graph_and_representative_periods_from_csv_folder(
end
end

for flow_profile_row in eachrow(df_flows_profiles)
for flow_profile_row in eachrow(table_tree.profiles.flows)
gp = DataFrames.groupby(
filter(
row -> row.profile_name == flow_profile_row.profile_name,
df_profiles["rep-periods"][flow_profile_row.profile_type],
table_tree.profiles.data["rep-periods"][flow_profile_row.profile_type],
),
:rep_period,
)
Expand All @@ -252,10 +264,10 @@ function create_graph_and_representative_periods_from_csv_folder(
end
end

for asset_profile_row in eachrow(df_assets_profiles["timeframe"]) # row = asset, profile_type, profile_name
for asset_profile_row in eachrow(table_tree.profiles.assets["timeframe"]) # row = asset, profile_type, profile_name
df = filter(
row -> row.profile_name == asset_profile_row.profile_name, # 2. Filter profile_name
df_profiles["timeframe"][asset_profile_row.profile_type], # 1. Get the profile of given type
table_tree.profiles.data["timeframe"][asset_profile_row.profile_type], # 1. Get the profile of given type
)
graph[asset_profile_row.asset].timeframe_profiles[asset_profile_row.profile_type] = df.value
end
Expand Down
47 changes: 43 additions & 4 deletions src/structures.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,42 @@ export GraphAssetData,
const TimestepsBlock = UnitRange{Int}
const PeriodsBlock = UnitRange{Int}

const PeriodType = String
const TableNodeStatic = @NamedTuple{assets::DataFrame, flows::DataFrame}
const TableNodeProfiles = @NamedTuple{
assets::Dict{PeriodType,DataFrame},
flows::DataFrame,
data::Dict{PeriodType,Dict{Symbol,DataFrame}},
}
const TableNodePartitions = @NamedTuple{assets::Dict{PeriodType,DataFrame}, flows::DataFrame}
const TableNodePeriods = @NamedTuple{rep_periods::DataFrame, mapping::DataFrame}

"""
Structure to hold the tabular data.
## Fields
- `static`: Stores the data that does not vary inside a year. Its fields are
- `assets`: Assets data.
- `flows`: Flows data.
- `profiles`: Stores the profile data indexed by:
- `assets`: Dictionary with the reference to assets' profiles indexed by periods (`"rep-periods"` or `"timeframe"`).
- `flows`: Reference to flows' profiles for representative periods.
- `profiles`: Actual profile data. Dictionary of dictionary indexed by periods and then by the profile name.
- `partitions`: Stores the partitions data indexed by:
- `assets`: Dictionary with the specification of the assets' partitions indexed by periods.
- `flows`: Specification of the flows' partitions for representative periods.
- `periods`: Stores the periods data, indexed by:
- `rep_periods`: Representative periods.
- `timeframe`: Timeframe periods.
"""
struct TableTree
static::TableNodeStatic
profiles::TableNodeProfiles
partitions::TableNodePartitions
periods::TableNodePeriods
end

"""
Structure to hold the data of the timeframe.
"""
Expand Down Expand Up @@ -197,6 +233,7 @@ It hides the complexity behind the energy problem, making the usage more friendl
See the [basic example tutorial](@ref basic-example) to see how these can be used.
"""
mutable struct EnergyProblem
table_tree::TableTree
graph::MetaGraph{
Int,
SimpleDiGraph{Int},
Expand All @@ -221,15 +258,17 @@ mutable struct EnergyProblem
time_solve_model::Float64

"""
EnergyProblem(graph, representative_periods, timeframe)
EnergyProblem(dfs_input)
Constructs a new EnergyProblem object with the given graph, representative periods, and timeframe. The `constraints_partitions` field is computed from the `representative_periods`,
and the other fields and nothing or set to default values.
Constructs a new EnergyProblem object from the input dataframes.
This will call [`create_internal_structures`](@ref).
"""
function EnergyProblem(graph, representative_periods, timeframe)
function EnergyProblem(dfs_input)
graph, representative_periods, timeframe = create_internal_structures(dfs_input)
constraints_partitions = compute_constraints_partitions(graph, representative_periods)

return new(
dfs_input,
graph,
representative_periods,
constraints_partitions,
Expand Down
Loading

0 comments on commit 29814b7

Please sign in to comment.