# Attributed C-Sets as Data Structure

This notebook is based on the paper
[Categorical Data Structures for Technical Computing](https://arxiv.org/pdf/2106.04703.pdf).

An Attributed C-Set, ACSet for short, is a data structure that aims to solve the
divide of combinatorial vs atomic data.

Data can be stored in many different formats, such as SQL tables, NO-SQL tables, data frames, and so on.
These different formats make difficult to analyze the data directly, since simple tasks, such as calculating the mean aggregating the data according to
an specific attribute, will require a different set of commands for each data format in order to be performed.
In order to avoid having to deal with each possible variation, most data analysis starts by turning the dataset into
data frame format.

The choice of the data frame as the centralizing data structure is understandable, since most of analysis
consists of data that can be thought of as single observations (rows) comprised of many features (columns).
Yet, there are many common scenarios where such data structure is not the most natural one. Perhaps the most
clear example are graphs. Here, the main aspect of the data is not it's "atomic" nature, but it's relational
information ("which nodes are connected").

The use o relational databases (SQL) can deal with such divide, but are usually too stiff, since they usually
are part of a monolithic system with it's own langugae, which is not always straightfoward to integrate
with general purpose programming languages such as Julia.

ACSets were the solution proposed by Evan Patterson, Owen Lynch, and James Fairbanks.
It consists of an efficient in-memory implementation of categorical databases, which encompasses data structures
such as data frames, graphs and more. Thus, solving the combinatorial vs atomic data representation problem.

In the implementation of ACSets, combinatorial data is always represented by integers, while atomic data
is represented by type parameters which can be Julia types.

In [6]:
using Pkg
Pkg.activate(".")
using Catlab,Catlab.CategoricalAlgebra

[32m[1m  Activating[22m[39m project at `~/MEGA/EMAp/Mathematical-Short-Notes/Fields/Category-Theory/notebooks`


## Example 1 - RoadMap

This first example is directly from the paper. The idea here is to create an Acset to store
information about roads, where `vertices` are intersctions, `edges` are the roads from each intersection,
and `lenght` is the actual distance between interesection.

In [59]:
@present TheoryRoadMap(FreeSchema) begin
    (V,E)::Ob
    (src,tgt)::Hom(E,V)
    T::AttrType
    (x,y)::Attr(V,T)
    length::Attr(E,T)
end

@acset_type RoadMap(TheoryRoadMap, index=[:src,:tgt])

RoadMap

First we define the data schema, i.e. what are our tables and their relations. This is similar to what we have
in relational databases, but in the categorical form. In our example, we want to have two tables,
one is a list of vertices with their coordinates, and the other is the list of edges, where we
have the source and target vertices of each edge, and the length.

Hence, `(V,E)::Ob` states that we have two objects (tables) `V` and `E`.
The line `T::AttrType` indicates that our data has 
Next, `(x,y)::Attr(V,T)`.

In [60]:
function make_path(coords::Vector{Tuple{Float64, Float64}})
    # Create an empty roadmap
    path = RoadMap{Float64}()
    # This is a convenient function that calculates the Euclidean distance between two
    # vertices in the road map. Notice that we can reference attributes using indexing
    # and that the system knows that these attributes belong to vertices, not edges.
    dist(i,j) = sqrt((path[i,:x] - path[j,:x])^2 + (path[i,:y] - path[j,:y])^2)
    x, y = coords[1]
    # add_part! mutates path to add a part, returning the index of the added part.
    # The named arguments to this function assign the attributes of that part.
    src = add_part!(path, :V, x=x, y=y)
    for i in 2:length(coords)
        x, y = coords[i]
        tgt = add_part!(path, :V, x=x, y=y)
        add_part!(path, :E, src=src, tgt=tgt, length=dist(src,tgt))
        src = tgt
    end
    path
end
    
ac = make_path([(x[1],x[2]) for x in eachrow(rand(10,2))])

V,x,y
1,0.133569,0.470923
2,0.987304,0.413828
3,0.733985,0.325527
4,0.367374,0.92214
5,0.465141,0.0182601
6,0.506971,0.523084
7,0.11325,0.593782
8,0.882649,0.681493
9,0.194184,0.222629
10,0.466333,0.749666

E,src,tgt,length
1,1,2,0.855642
2,2,3,0.268267
3,3,4,0.700251
4,4,5,0.909153
5,5,6,0.506554
6,6,7,0.400019
7,7,8,0.774382
8,8,9,0.827369
9,9,10,0.593155


In [90]:
subpart(ac, 2, [:src, :x]) # Get source vertex from edge 2 and take :x attribute.

0.9873035294420925

In [87]:
ACSetInterface.tables(ac)

(V = Catlab.CSetDataStructures.StructACSetTable{RoadMap{Float64}, :V} with 10 rows, 2 columns, and an unknown schema.,
 E = Catlab.CSetDataStructures.StructACSetTable{RoadMap{Float64}, :E} with 9 rows, 3 columns, and an unknown schema.,)

In [84]:
ac.homs.tgt

9-element Vector{Int64}:
  2
  3
  4
  5
  6
  7
  8
  9
 10

## More complex examples

In [109]:
@present PASchema(FreeSchema) begin
    (Authors,Papers, Authorship)::Ob
    (p)::Hom(Authorship,Papers)
    (a)::Hom(Authorship,Authors)
    (T,N)::AttrType
    name::Attr(Authors,N)
    title::Attr(Papers,N)
    year::Attr(Papers,T)
end

@acset_type APA(PASchema, index=[:p,:a])

APA

In [110]:
ac = @acset APA{Real,String} begin
    Authors = 2
    Papers = 2
    Authorship = 3
    p = [1,2,2]
    a = [1,1,2]
    name = ["A","B"]
    title = ["Paper1","Paper2"]
    year  = [2000,2001]
end

Authors,name
1,A
2,B

Papers,title,year
1,Paper1,2000
2,Paper2,2001

Authorship,p,a
1,1,1
2,2,1
3,2,2


In [111]:
# Write down the schema for a weighted graph
@present TheoryWeightedGraph(FreeSchema) begin
  V::Ob
  E::Ob
  src::Hom(E,V)
  tgt::Hom(E,V)
  T::AttrType
  weight::Attr(E,T)
end

# Construct the type used to store acsets on the previous schema
# We *index* src and tgt, which means that we store not only
# the forwards map, but also the backwards map.
@acset_type WeightedGraph(TheoryWeightedGraph, index=[:src,:tgt])

# Construct a weighted graph, with floats as edge weights
g = @acset WeightedGraph{Float64} begin
  V = 4
  E = 5
  src = [1,1,1,2,3]
  tgt = [2,3,4,4,4]
  weight = [7.2, 9.3, 9.4, 0.1, 42.0]
end

E,src,tgt,weight
1,1,2,7.2
2,1,3,9.3
3,1,4,9.4
4,2,4,0.1
5,3,4,42.0


In [13]:
ϕ = ACSetTransformation(e,w,E=[1], V=[1,2])

RoadMap