# Attributed C-Sets as Data Structure

This notebook is based on the paper
[Categorical Data Structures for Technical Computing](https://arxiv.org/pdf/2106.04703.pdf).

An Attributed C-Set, ACSet for short, is a data structure that aims to solve the
divide of combinatorial vs atomic data.

Data can be stored in many different formats, such as SQL tables, NO-SQL tables, data frames, and so on.
These different formats make difficult to analyze the data directly, since simple tasks, such as calculating the mean aggregating the data according to
an specific attribute, will require a different set of commands for each data format in order to be performed.
In order to avoid having to deal with each possible variation, most data analysis starts by turning the dataset into
data frame format.

The choice of the data frame as the centralizing data structure is understandable, since most of analysis
consists of data that can be thought of as single observations (rows) comprised of many features (columns).
Yet, there are many common scenarios where such data structure is not the most natural one. Perhaps the most
clear example are graphs. Here, the main aspect of the data is not it's "atomic" nature, but it's relational
information ("which nodes are connected").

The use o relational databases (SQL) can deal with such divide, but are usually too stiff, since they usually
are part of a monolithic system with it's own langugae, which is not always straightfoward to integrate
with general purpose programming languages such as Julia.

ACSets were the solution proposed by Evan Patterson, Owen Lynch, and James Fairbanks.
It consists of an efficient in-memory implementation of categorical databases, which encompasses data structures
such as data frames, graphs and more. Thus, solving the combinatorial vs atomic data representation problem.

In the implementation of ACSets, combinatorial data is always represented by integers, while atomic data
is represented by type parameters which can be Julia types.

In [6]:
using Pkg
Pkg.activate(".")
using Catlab,Catlab.CategoricalAlgebra

[32m[1m  Activating[22m[39m project at `~/MEGA/EMAp/Mathematical-Short-Notes/Fields/Category-Theory/notebooks`


In [56]:
@present TheoryRoadMap(FreeSchema) begin
    (V,E)::Ob
    (src,tgt)::Hom(E,V)
    T::AttrType
    (x,y)::Attr(V,T)
    length::Attr(E,T)
end

@acset_type RoadMap(TheoryRoadMap, index=[:src,:tgt])

function make_path(coords::Vector{Tuple{Float64, Float64}})
    # Create an empty roadmap
    path = RoadMap{Float64}()
    # This is a convenient function that calculates the Euclidean distance between two
    # vertices in the road map. Notice that we can reference attributes using indexing
    # and that the system knows that these attributes belong to vertices, not edges.
    dist(i,j) = sqrt((path[i,:x] - path[j,:x])^2 + (path[i,:y] - path[j,:y])^2)
    x, y = coords[1]
    # add_part! mutates path to add a part, returning the index of the added part.
    # The named arguments to this function assign the attributes of that part.
    src = add_part!(path, :V, x=x, y=y)
    for i in 2:length(coords)
        x, y = coords[i]
        tgt = add_part!(path, :V, x=x, y=y)
        add_part!(path, :E, src=src, tgt=tgt, length=dist(src,tgt))
        src = tgt
    end
    path
end
    
make_path([(x[1],x[2]) for x in eachrow(rand(10,2))])

V,x,y
1,0.903047,0.741224
2,0.384085,0.459403
3,0.698849,0.511971
4,0.596281,0.360679
5,0.510332,0.118437
6,0.291815,0.571631
7,0.922333,0.752791
8,0.530991,0.763886
9,0.672866,0.233335
10,0.568758,0.8135

E,src,tgt,length
1,1,2,0.590547
2,2,3,0.319123
3,3,4,0.182783
4,4,5,0.257037
5,5,6,0.503124
6,6,7,0.656027
7,7,8,0.391499
8,8,9,0.549193
9,9,10,0.589432


In [10]:
# Write down the schema for a weighted graph
@present TheoryWeightedGraph(FreeSchema) begin
  V::Ob
  E::Ob
  src::Hom(E,V)
  tgt::Hom(E,V)
  T::AttrType
  weight::Attr(E,T)
end

# Construct the type used to store acsets on the previous schema
# We *index* src and tgt, which means that we store not only
# the forwards map, but also the backwards map.
@acset_type WeightedGraph(TheoryWeightedGraph, index=[:src,:tgt])

# Construct a weighted graph, with floats as edge weights
g = @acset WeightedGraph{Float64} begin
  V = 4
  E = 5
  src = [1,1,1,2,3]
  tgt = [2,3,4,4,4]
  weight = [7.2, 9.3, 9.4, 0.1, 42.0]
end

E,src,tgt,weight
1,1,2,7.2
2,1,3,9.3
3,1,4,9.4
4,2,4,0.1
5,3,4,42.0


RoadMap