# About tstree space

A *tstree* object created from the smarter dataset occupies a lot of space. Even 
by providing *REF* allele as ancestor alleles and dropping mutation in tstree object
there is still a lot of space occupied by the object. So try to deal with *tables* 
and inspect how much data is stored here

In [None]:
from pprint import pprint
import numpy as np

import tskit

from tskitetude import get_project_dir

In [None]:
ts = tskit.load(str(get_project_dir() / "results-reference/background_samples/tsinfer/SMARTER-OA-OAR3-forward-0.4.10.focal.26.trees"))
ts

Inspecting table nodes:

In [None]:
print(f"Nodes are {len(ts.tables.nodes)}")
pprint(ts.tables.nodes.asdict())
pprint(ts.tables.nodes[-1])

It seems that are *edges* the most predominant data structure in *tstree* object:

In [None]:
print(f"Edges are {len(ts.tables.edges)}")
pprint(ts.tables.edges.asdict())
pprint(ts.tables.edges[-1])

Ok, try to focus only on a SNP. Get the tree for the first SNP and try to get stuff
from tables:

In [None]:
POS = 209049
tree = ts.at(POS)
tree

Now get the intervals of this tree. Then try to filter out edges between those positions:

In [None]:
interval = tree.interval
left_bound = interval.left
right_bound = interval.right

filtered_edges = ts.tables.edges[
    np.logical_and(ts.tables.edges.left >= left_bound, ts.tables.edges.right <= right_bound)]
filtered_edges[:10]

In [None]:
len(filtered_edges)

Can I filter out the nodes in the same way? In this case I don't have a left and right 
position like in the edge table. However, from the edge table I can derive which nodes are
*child* of *parents*:

In [None]:
parents = set(filtered_edges.parent)
childs = set(filtered_edges.child)

node_ids = parents.union(childs)
print(f"Got {len(node_ids)} distinct nodes")


In [None]:
tree.draw_svg(
    size=(800, 400),
    time_scale="log_time",
)