# Introduction

This tutorial illustrates how to use *ObjTables* to merge multiple datasets into a single dataset, as well as how to cut a dataset into multiple pieces.

##### Composition or merging
Composition, or merging, is a powerful mechanism for teams to collaboratively build datasets. Briefly, *ObjTables* merges datasets by representing each dataset as a graph where each node represents an object (an instance of `obj_tables.Model`) and each edge represents a relationship between two objects (an element of `obj_tables.*To*Attribute`), identifying the common nodes, and taking the union of their edges. Datasets can be merged using `obj_tables.Model.merge`.

This makes it easy for users to merge datasets. For example, [WC-Lang](https://github.com/KarrLab/wc_lang) enables modelers to use this feature to merge multiple submodels (e.g., submodels of metabolism, transcription, translation) into a single model (e.g., a model of an entire cell).

##### Decomposition or cutting
Similarly, decomposition, or cutting, is a useful technique for splitting data sets into smaller, more manageable pieces. Briefly, *ObjTables* can generate seperate datasets for all of the children of an object by representing the dataset (the object, its children, and all additional ancestors) as a graph, removing one or more edges, and identifying all of the connected subgraphs.

Because there are many ways to decompose a dataset, *ObjTables* makes it easy to define each desired decomposition by specifying the set of edges that should be retained (the inverse of the set of edges that should be cut). The desired decompositions should be defined by setting `obj_tables.Model.Meta.children` equal to a dictionary which maps the name of each desired decomposition to a tuple of the names of the relational attributes of each class that should not be cut. These decompositions can be executed using `obj_tables.RelatedAttribute.cut(kind=key)`.

This makes it easy to cut datasets. For example, [WC-Lang](https://github.com/KarrLab/wc_lang) enables modelers to use this feature to split a model into multiple submodels, creating submodels which are easier to simulate, calibrate, and validate.

##### Example used in this tutorial
This tutorial uses an address book of CEOs as an example.

# Define a schema for an address book

First, as described in [Tutorial 1](1.%20Building%20and%20visualizing%20schemas.ipynb), use *ObjTables* to define a schema for an address book.

In [1]:
import enum
import obj_tables
import types


# Define classes to represent companies, their CEOs, and their addresses
class Address(obj_tables.Model):
    street = obj_tables.StringAttribute(unique=True, primary=True, verbose_name='Street')
    city = obj_tables.StringAttribute(verbose_name='City')
    state = obj_tables.StringAttribute(verbose_name='State')
    zip_code = obj_tables.StringAttribute(verbose_name='Zip code')
    country = obj_tables.StringAttribute(verbose_name='Country')

    class Meta(obj_tables.Model.Meta):
        table_format = obj_tables.TableFormat.multiple_cells
        attribute_order = ('street', 'city', 'state', 'zip_code', 'country',)
        verbose_name = 'Address'
        verbose_name_plural = 'Addresses'
        children = {
            'company_address_book': ('company', 'people',),
        }

class Company(obj_tables.Model):
    name = obj_tables.StringAttribute(unique=True, primary=True, verbose_name='Name')
    url = obj_tables.UrlAttribute(verbose_name='URL')
    address = obj_tables.OneToOneAttribute(Address, related_name='company', verbose_name='Address')

    class Meta(obj_tables.Model.Meta):
        table_format = obj_tables.TableFormat.column
        attribute_order = ('name', 'url', 'address',)
        verbose_name = 'Company'
        verbose_name_plural = 'Companies'
        children = {
            'company_address_book': ('address_book', 'employees', 'address',),
        }


class PersonType(str, enum.Enum):
    family = 'family'
    friend = 'friend'
    business = 'business'


class Person(obj_tables.Model):
    name = obj_tables.StringAttribute(unique=True, primary=True, verbose_name='Name')
    type = obj_tables.EnumAttribute(PersonType, verbose_name='Type')
    company = obj_tables.ManyToOneAttribute(Company, related_name='employees', verbose_name='Company')
    email_address = obj_tables.EmailAttribute(verbose_name='Email address')
    phone_number = obj_tables.StringAttribute(verbose_name='Phone number')
    address = obj_tables.ManyToOneAttribute(Address, related_name='people', verbose_name='Address')

    class Meta(obj_tables.Model.Meta):
        table_format = obj_tables.TableFormat.row
        attribute_order = ('name', 'type', 'company', 'email_address', 'phone_number', 'address',)
        verbose_name = 'Person'
        verbose_name_plural = 'People'
        children = {
            'company_address_book': ('address_book', 'company', 'address',),
        }
        

class AddressBook(obj_tables.Model):
    id = obj_tables.StringAttribute(unique=True, primary=True, verbose_name='Id')
    companies = obj_tables.OneToManyAttribute(Company, related_name='address_book')
    people = obj_tables.OneToManyAttribute(Person, related_name='address_book')
    
    class Meta(obj_tables.Model.Meta):
        table_format = obj_tables.TableFormat.column
        attribute_order = ('id', 'companies', 'people')
        verbose_name = 'Address book'
        verbose_name_plural = 'Address books'
        children = {
            'company_address_book': (),
        }

# Create two address books of the CEOs and COOs of technology companies

##### Use the address book schema to build an address book of the CEOs of technology companies

In [2]:
# Tim Cook of Apple
apple = Company(name='Apple',
                url='https://www.apple.com/',
                address=Address(street='10600 N Tantau Ave',
                                city='Cupertino',
                                state='CA',
                                zip_code='95014',
                                country='US'))
cook = Person(name='Tim Cook',
              type=PersonType.business,
              company=apple,
              email_address='tcook@apple.com',
              phone_number='408-996-1010',
              address=apple.address)

# Reed Hasting of Netflix
netflix = Company(name='Netflix',
                  url='https://www.netflix.com/',
                  address=Address(street='100 Winchester Cir',
                                  city='Los Gatos',
                                  state='CA',
                                  zip_code='95032',
                                  country='US'))
hastings = Person(name='Reed Hastings',
                  type=PersonType.business,
                  company=netflix,
                  email_address='reed.hastings@netflix.com',
                  phone_number='408-540-3700',
                  address=netflix.address)

# Sundar Pichai of Google
google = Company(name='Google',
                 url='https://www.google.com/',
                 address=Address(street='1600 Amphitheatre Pkwy',
                                 city='Mountain View',
                                 state='CA',
                                 zip_code='94043',
                                 country='US'))
pichai = Person(name='Sundar Pichai',
                type=PersonType.business,
                company=google,
                email_address='sundar@google.com',
                phone_number='650-253-0000',
                address=google.address)

# Mark Zuckerberg of Facebook
facebook = Company(name='Facebook',
                   url='https://www.facebook.com/',
                   address=Address(street='1 Hacker Way #15',
                                   city='Menlo Park',
                                   state='CA',
                                   zip_code='94025',
                                   country='US'))
zuckerberg = Person(name='Mark Zuckerberg',
                    type=PersonType.business,
                    company=facebook,
                    email_address='zuck@fb.com',
                    phone_number='650-543-4800',
                    address=facebook.address)

# Merge the companies and CEOs into a single address book
ceos = AddressBook(
    id = 'tech',
    companies = [apple, facebook, google, netflix],
    people = [zuckerberg, hastings, pichai, cook],
)

##### Use the address book schema to build an address book of the COOs of technology companies

In [3]:
# Jeff Williams of Apple
apple = Company(name='Apple',
                url='https://www.apple.com/',
                address=Address(street='10600 N Tantau Ave',
                                city='Cupertino',
                                state='CA',
                                zip_code='95014',
                                country='US'))
williams = Person(name='Jeff Williams',
              type=PersonType.business,
              company=apple,
              email_address='jwilliams@apple.com',
              phone_number='408-996-1010',
              address=apple.address)

# Sheryl Sandberg of Facebook
facebook = Company(name='Facebook',
                   url='https://www.facebook.com/',
                   address=Address(street='1 Hacker Way #15',
                                   city='Menlo Park',
                                   state='CA',
                                   zip_code='94025',
                                   country='US'))
sandberg = Person(name='Sheryl Sandberg',
                    type=PersonType.business,
                    company=facebook,
                    email_address='sheryl@fb.com',
                    phone_number='650-543-4800',
                    address=facebook.address)

# Bret Taylor of Salesforce
salesforce = Company(name='Salesforce',
                  url='https://www.salesforce.com/',
                  address=Address(street='415 Mission Street, 3rd Floor',
                                  city='San Francisco',
                                  state='CA',
                                  zip_code='94105',
                                  country='US'))
taylor = Person(name='Bret Taylor',
                  type=PersonType.business,
                  company=salesforce,
                  email_address='btaylor@gmail.com',
                  phone_number='667-6389',
                  address=salesforce.address)

# Merge the companies and COOs into a single address book
coos = AddressBook(
    id = 'tech',
    companies = [apple, facebook, salesforce],
    people = [sandberg, taylor, williams],
)

# Merge the CEOs and COOs into a single address book

##### Create a merged address book by copying the CEO address book and merging the COOs into the copy

In [4]:
execs = ceos.copy()
execs.merge(coos)

###### Check that the companies have been merged

In [5]:
assert sorted(co.name for co in execs.companies) == ['Apple', 'Facebook', 'Google', 'Netflix', 'Salesforce']

##### Check that the CEOs and COOs of Apple and Facebook are joined to the merged companies

In [6]:
apple = execs.companies.get_one(name='Apple')
assert sorted(exec.name for exec in apple.employees) == ['Jeff Williams', 'Tim Cook']

In [7]:
facebook = execs.companies.get_one(name='Facebook')
assert sorted(exec.name for exec in facebook.employees) == ['Mark Zuckerberg', 'Sheryl Sandberg']

# Cut the merged address book into separate books for each company

In [8]:
company_address_books = [co.address_book for co in execs.companies.cut(kind='company_address_book')]

##### Print the name of the company of each book

In [9]:
print([address_book.companies[0].name for address_book in company_address_books])

['Apple', 'Facebook', 'Google', 'Netflix', 'Salesforce']


##### Check that the book for Apple has one company and two people

In [10]:
apple_address_book = next(book for book in company_address_books if book.companies[0].name == 'Apple')
assert len(apple_address_book.companies) == 1
assert len(apple_address_book.people) == 2
apple = apple_address_book.companies.get_one(name='Apple')
assert sorted(person.name for person in apple_address_book.people) == ['Jeff Williams', 'Tim Cook']
assert sorted(exec.name for exec in apple.employees) == ['Jeff Williams', 'Tim Cook']

##### Check that the book for Google has one company and one person

In [11]:
google_address_book = next(book for book in company_address_books if book.companies[0].name == 'Google')
assert len(google_address_book.companies) == 1
assert len(google_address_book.people) == 1
google = google_address_book.companies.get_one(name='Google')
assert [person.name for person in google_address_book.people] == ['Sundar Pichai']
assert [exec.name for exec in google.employees] == ['Sundar Pichai']