<a href="https://colab.research.google.com/github/amazzoli/ComponentSystemsData/blob/main/tutorials/1_Load_system.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tutorial 1: Import and inspect a system

Here we load on a colab notebook a component system and we look at its three main features: the **object** table, the **component** table and its **count matrix**.

### Setting up the colab notebook

In [1]:
# Cloning the repository in the colab folder structure
%%bash
git clone https://github.com/amazzoli/ComponentSystemsData.git

Cloning into 'ComponentSystemsData'...


In [2]:
import numpy as np
import pandas as pd
import sys

# Importing a module that helps in loading and performing basic operations with
# the component system
sys.path.append('/content/ComponentSystemsData/py_utils/')
import comp_sys as cs

### Metadata table

We first import the **metadata table** (https://github.com/amazzoli/ComponentSystemsData/blob/main/metadata.tsv) listing the available systems.

In [3]:
metadata = cs.read_metadata()
metadata

Unnamed: 0_level_0,category,n_objects,n_components,total_size,description,link
label,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
books_gutenberg,Linguistics,3035,447232,211892492,Books from the Gutenberg Project,https://www.gutenberg.org/
legos,Buildings,33530,91248,4218346,Lego sets and their composition in colored bricks,https://rebrickable.com/
proteomes_bacteria,Proteomes,8346,14693,38554091,Bacterial genomes and their composition in pro...,https://www.uniprot.org/


### Loading a system

Using `load_system` in the `comp_sys` utility you can create a variable that contains all the features of the desired system.
You can choose the system to load by using the labels in the **metadata table**.




In [4]:
system = cs.load_system('legos')
print(system)

"legos" component-system with 33530 objects and 91051 components.


### The object table

The **object table** lists all the realizations in the dataset with additional information.
In the case of legos we have, for example, the year of release and the theme.

Some fields are common across all datasets:

- **sparse_id**: column index of the (sparse) count matrix
- **size**: total number of components in the object
- **vocabulary**: number of different components in the object



In [5]:
system.objects

Unnamed: 0_level_0,set_id,name,year,n_themes,theme1,theme2,theme3,size,vocabulary
sparse_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
12800,7349-1,Skater Boy,2004.0,3,Duplo,Town,Legoville,2,2
17890,Sp12-1,Wheel Bearings for Locomotives,1977.0,2,Service Packs,Train,,2,1
20544,fig-002544,BB-8,,1,Minifigure,,,2,2
26667,fig-008787,"Baby - Bright Light Blue Body, Bib",,1,Minifigure,,,2,2
12842,7450-1,Stretchy,2003.0,2,Duplo,Little Robots,,2,2
...,...,...,...,...,...,...,...,...,...
13058,75192-1,Millennium Falcon,2017.0,2,Star Wars,Ultimate Collector Series,,7691,704
328,10276-1,Colosseum,2020.0,1,Icons,,,9125,220
344,10294-1,Titanic,2021.0,1,Icons,,,9185,692
360,10307-1,Eiffel Tower,2022.0,1,Icons,,,10063,277


### The component table

The **component table** lists all the components in the dataset with additional information.
In the case of lego bricks we have, for example, the color and the category.

Some fields are common across all datasets:

- **sparse_id**: row index of the (sparse) count matrix
- **abundance**: total number of component count in the dataset
- **occurrence**: number of objects in which it appears

In [6]:
system.components

Unnamed: 0_level_0,name,part_id,color_id,color,category,material,abundance,occurrence
sparse_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,Sticker Sheet for Set 663-1,003381,9999,[No Color/Any Color],Stickers,Plastic,1,1
69418,Slope 30° 1 x 2 x 2/3 with White '04' print,85984pr0044,4,Red,Bricks Sloped,Plastic,1,1
69421,"Slope 30° 1 x 2 x 2/3 with Black/Red Stripe, A...",85984pr0047,72,Dark Bluish Gray,Bricks Sloped,Plastic,1,1
69424,Slope 30° 1 x 2 x 2/3 with Black/Gold Lamborgh...,85984pr0050,15,White,Bricks Sloped,Plastic,1,1
69430,Slope 30° 1 x 2 x 2/3 with White 'TAXI' print,85984pr9996,0,Black,Bricks Sloped,Plastic,1,1
...,...,...,...,...,...,...,...,...
21208,Plate 1 x 2,3023,0,Black,Plates,Plastic,16633,2643
62545,Technic Pin Long with Friction Ridges Lengthwi...,6558,1,Blue,Technic Pins,Plastic,16701,923
44849,Technic Axle Pin with Friction Ridges Lengthwise,43093,1,Blue,Technic Pins,Plastic,20739,2222
58559,Technic Pin with Friction Ridges Lengthwise wi...,61332,0,Black,Technic Pins,Plastic,22535,814


### The count matrix

The matrix is store in a sparse representation to save memory, in particular a list-of-list matrix (https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.lil_matrix.html)

In [7]:
system.sparse_mat

<List of Lists sparse matrix of dtype 'int64'
	with 1240234 stored elements and shape (91051, 33530)>

The matrix can be converted in a numpy array. Be careful however that it will use much more memory

In [8]:
system.sparse_mat.toarray()

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

The sparse lil matrix can be indexed very similarly to a normal array.
Below we choose an object and a component and we check how many istances of that component are in the object.

In [9]:
comp_id, obj_id = 21240, 13058
comp_name = system.components.loc[comp_id, 'name']
obj_name = system.objects.loc[obj_id, 'name']
n = system.sparse_mat[comp_id, obj_id]
print('Number of "' + comp_name + '" in "' + obj_name + '": ' + str(n))

Number of "Plate 1 x 2" in "Millennium Falcon": 243


### Bag of components

The method `comps_in_obj` crosses the information between the sparse matrix and the component table to list components in a given object

In [12]:
obj_id = 13058
print('Listing the components in ' + system.objects.loc[obj_id]['name'])

system.comps_in_obj(obj_id)

Listing the components in Millennium Falcon


Unnamed: 0_level_0,name,part_id,color_id,color,category,material,count
sparse_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
16119,Technic Pin with Friction Ridges Lengthwise an...,2780,0,Black,Technic Pins,Plastic,280
21240,Plate 1 x 2,3023,28,Dark Tan,Plates,Plastic,243
62545,Technic Pin Long with Friction Ridges Lengthwi...,6558,1,Blue,Technic Pins,Plastic,139
5881,Tile Special 1 x 1 with Clip with Rounded Edges,15712,72,Dark Bluish Gray,Tiles Special,Plastic,139
21048,Plate 2 x 3,3021,71,Light Bluish Gray,Plates,Plastic,105
...,...,...,...,...,...,...,...
79229,Torso Open Jacket with Pockets and White Shirt...,973c05h02pr3914,272,Dark Blue,Minifig Upper Body,Plastic,1
79494,"Torso Jacket, Open over White Shirt Print, Dar...",973c07h02pr3175,308,Dark Brown,Minifig Upper Body,Plastic,1
79519,"Torso, Dark Brown Arms and Hands [Plain]",973c07h07,308,Dark Brown,Minifig Upper Body,Plastic,1
80277,"Torso Open Neck Shirt, Dark Tan Tied Robe, Dar...",973c14h02pr3145,71,Light Bluish Gray,Minifig Upper Body,Plastic,1


### Objects of component

The method `objs_of_comp` crosses the information between the sparse matrix and the object table to list objects in which a given component appears.

In [15]:
comp_id = 22574
print('Listing the objects of ' + system.components.loc[comp_id]['name'])

system.objs_of_comp(comp_id)

Listing the objects of Bar 4L (Lightsaber Blade / Wand)


Unnamed: 0_level_0,set_id,name,year,n_themes,theme1,theme2,theme3,count
sparse_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
402,10355-1,Blacktron Renegade,2025.0,1,Icons,,,8
6820,42127-1,The Batman - Batmobile,2021.0,1,Technic,,,7
13626,76131-1,Avengers Compound Battle,2019.0,2,Super Heroes Marvel,Avengers,,7
13214,75336-1,Inquisitor Transport Scythe,2022.0,1,Star Wars,,,6
9352,60008-1,Museum Break-in,2013.0,2,City,Police,,6
...,...,...,...,...,...,...,...,...
16421,9500-1,Sith Fury-Class Interceptor,2012.0,1,Star Wars,,,1
16521,9526-1,Palpatine's Arrest,2012.0,1,Star Wars,,,1
16428,9515-1,The Malevolence,2012.0,1,Star Wars,,,1
17311,COMCON007-1,Collectible Display Set 5,2009.0,1,Star Wars,,,1
