# Load built-in and ported datasets from TGB
This tutorial shows you how to load built-in datasets


In [None]:
import tgx

### Access TGB datasets
In order to load TGB datasets you should first install the TGB package:

`pip install py-tgb`

Then write name of the dataset in the parantheses:

`tgx.data.tgb("name")`

The dataset names are as follow

`tgbl-wiki`, `tgbl-review`, `tgbl-coin`, `tgbl-comment`, `tgbl-flight`

`tgbn-trade`, `tgbn-genre`, `tgbn-reddit`

In [2]:
data_name = "tgbl-wiki" 
dataset = tgx.tgb_data(data_name) #tgb datasets
ctdg = tgx.Graph(dataset)

raw file found, skipping download
Dataset directory is  /mnt/f/code/TGB/tgb/datasets/tgbl_wiki
loading processed file
Number of loaded edges: 157474
Number of unique edges:18257
Available timestamps:  152757


### Access other datasets

To load built-in TGX datasets (from [Poursafaei et al. 2022](https://openreview.net/forum?id=1GVpwr2Tfdg)). You can write the name of the dataset instead of `datasest_name`:

`tgx.data.dataset_name`

The dataset names are as:

`mooc`, `uci`, `uslegis`, `unvote`, `untrade`, `flight`, `wikipedia`, `reddit`, `lastfm`, `contact`, `canparl`, `socialevo`, `enron`

In [3]:
dataset = tgx.builtin.uci()
ctdg = tgx.Graph(dataset)

Number of loaded edges: 59835
Number of unique edges:20296
Available timestamps:  58911


### Custom Datasets

You can load your own custom dataset from `.csv` files and read it into a `tgx.Graph` object

Let's start by loading a toy dataset into pandas and then visualize the rows

In [4]:
import pandas as pd
toy_fname = 'toy_data.csv'
df = pd.read_csv(toy_fname)
df

Unnamed: 0,time,source,destination
0,0,1,2
1,0,2,1
2,0,3,1
3,1,2,2
4,1,1,2
5,1,3,1


In [5]:
from tgx.io.read import read_csv
# header indicates if there is a header row at the top
# index whether the first column is row indices
# t_col indicates which column corresponds to timestamps
edgelist = read_csv(toy_fname, 
         header=True,
         index=False,
         t_col=0,)
tgx.Graph(edgelist=edgelist)

Number of loaded edges: 5
Number of unique edges: 4
Available timestamps:  2


<tgx.classes.graph.Graph at 0x7fde4755aca0>

### Subsampling graphs

To perform subsmpling graphs you should follow these steps:

1. descritize the data

2. create a graph object of data (G)

3. subsample the graph by `tgx.utils.graph_utils.subsampling`

4. create a new graph from the subsampled subgraph

In [6]:
from tgx.utils.graph_utils import subsampling

sub_edges = subsampling(ctdg, selection_strategy="random", N=1000) #N is # of nodes to be sampled 
subgraph = tgx.Graph(edgelist=sub_edges)

Generate graph subsample...
