# DataFrame Conversions

This notebook demonstrates how to [convert](https://nexus-forge.readthedocs.io/en/latest/interaction.html#converting) a [Resource](https://nexus-forge.readthedocs.io/en/latest/interaction.html#resource) to pandas DataFrame and vice-versa.

In [1]:
from kgforge.core import KnowledgeGraphForge

A configuration file is needed in order to create a KnowledgeGraphForge session. A configuration can be generated using the notebook [00-Initialization.ipynb](00%20-%20Initialization.ipynb).

In [None]:
forge = KnowledgeGraphForge("../../configurations/forge.yml")

## Imports

In [3]:
import pandas as pd
import numpy as np

In [4]:
from kgforge.core import Resource

## List of Resources to DataFrame

In [5]:
address = Resource(type="PostalAddress", country="Switzerland", locality="Geneva")

In [6]:
jane = Resource(type="Person", name="Jane Doe", address=address, email="(missing)")

In [7]:
john = Resource(type="Person", name="John Smith", email="john.smith@epfl.ch")

In [8]:
persons = [jane, john]

In [9]:
forge.register(persons)

<count> 2
<action> _register_many
<succeeded> True


In [10]:
forge.as_json(jane)

{'id': 'https://sandbox.bluebrainnexus.io/v1/resources/github-users/mfsy/_/e39679b0-7f12-4aa9-b21e-b8cf96c2ae1a',
 'type': 'Person',
 'address': {'type': 'PostalAddress',
  'country': 'Switzerland',
  'locality': 'Geneva'},
 'email': '(missing)',
 'name': 'Jane Doe'}

In [11]:
forge.as_json(john)

{'id': 'https://sandbox.bluebrainnexus.io/v1/resources/github-users/mfsy/_/8e05ba96-0eff-45eb-85d5-c8520a44b9ec',
 'type': 'Person',
 'email': 'john.smith@epfl.ch',
 'name': 'John Smith'}

In [12]:
john._store_metadata

{'id': 'https://sandbox.bluebrainnexus.io/v1/resources/github-users/mfsy/_/8e05ba96-0eff-45eb-85d5-c8520a44b9ec',
 '_constrainedBy': 'https://bluebrain.github.io/nexus/schemas/unconstrained.json',
 '_createdAt': '2022-03-22T11:06:58.666Z',
 '_createdBy': 'https://sandbox.bluebrainnexus.io/v1/realms/github/users/mfsy',
 '_deprecated': False,
 '_incoming': 'https://sandbox.bluebrainnexus.io/v1/resources/github-users/mfsy/_/8e05ba96-0eff-45eb-85d5-c8520a44b9ec/incoming',
 '_outgoing': 'https://sandbox.bluebrainnexus.io/v1/resources/github-users/mfsy/_/8e05ba96-0eff-45eb-85d5-c8520a44b9ec/outgoing',
 '_project': 'https://sandbox.bluebrainnexus.io/v1/projects/github-users/mfsy',
 '_rev': 1,
 '_schemaProject': 'https://sandbox.bluebrainnexus.io/v1/projects/github-users/mfsy',
 '_self': 'https://sandbox.bluebrainnexus.io/v1/resources/github-users/mfsy/_/8e05ba96-0eff-45eb-85d5-c8520a44b9ec',
 '_updatedAt': '2022-03-22T11:06:58.666Z',
 '_updatedBy': 'https://sandbox.bluebrainnexus.io/v1/realms

In [13]:
forge.as_dataframe(persons)

Unnamed: 0,id,type,address.type,address.country,address.locality,email,name
0,https://sandbox.bluebrainnexus.io/v1/resources...,Person,PostalAddress,Switzerland,Geneva,(missing),Jane Doe
1,https://sandbox.bluebrainnexus.io/v1/resources...,Person,,,,john.smith@epfl.ch,John Smith


It is possible to specify what values (here '(missing)') should be replaced by `NaN` using the `na` parameter.

In [14]:
forge.as_dataframe(persons, na="(missing)")

Unnamed: 0,id,type,address.type,address.country,address.locality,email,name
0,https://sandbox.bluebrainnexus.io/v1/resources...,Person,PostalAddress,Switzerland,Geneva,,Jane Doe
1,https://sandbox.bluebrainnexus.io/v1/resources...,Person,,,,john.smith@epfl.ch,John Smith


It is possible to specify a string to use in the column names to show nested values, the default is dot `.`.

In [15]:
forge.as_dataframe(persons, nesting="__")

Unnamed: 0,id,type,address__type,address__country,address__locality,email,name
0,https://sandbox.bluebrainnexus.io/v1/resources...,Person,PostalAddress,Switzerland,Geneva,(missing),Jane Doe
1,https://sandbox.bluebrainnexus.io/v1/resources...,Person,,,,john.smith@epfl.ch,John Smith


The `expanded` parameter will show fields and values according to the JSON-LD context.

In [16]:
forge.as_dataframe(persons, expanded=True)

Unnamed: 0,@id,@type,http://schema.org/address.@type,http://schema.org/address.https://neuroshapes.org/country,http://schema.org/address.https://neuroshapes.org/locality,http://schema.org/email,http://schema.org/name
0,https://sandbox.bluebrainnexus.io/v1/resources...,http://schema.org/Person,https://neuroshapes.org/PostalAddress,Switzerland,Geneva,(missing),Jane Doe
1,https://sandbox.bluebrainnexus.io/v1/resources...,http://schema.org/Person,,,,john.smith@epfl.ch,John Smith


In [17]:
forge.as_dataframe(persons, store_metadata=True)

Unnamed: 0,id,type,address.type,address.country,address.locality,email,name,_constrainedBy,_createdAt,_createdBy,_deprecated,_incoming,_outgoing,_project,_rev,_schemaProject,_self,_updatedAt,_updatedBy
0,https://sandbox.bluebrainnexus.io/v1/resources...,Person,PostalAddress,Switzerland,Geneva,(missing),Jane Doe,https://bluebrain.github.io/nexus/schemas/unco...,2022-03-22T11:06:58.705Z,https://sandbox.bluebrainnexus.io/v1/realms/gi...,False,https://sandbox.bluebrainnexus.io/v1/resources...,https://sandbox.bluebrainnexus.io/v1/resources...,https://sandbox.bluebrainnexus.io/v1/projects/...,1,https://sandbox.bluebrainnexus.io/v1/projects/...,https://sandbox.bluebrainnexus.io/v1/resources...,2022-03-22T11:06:58.705Z,https://sandbox.bluebrainnexus.io/v1/realms/gi...
1,https://sandbox.bluebrainnexus.io/v1/resources...,Person,,,,john.smith@epfl.ch,John Smith,https://bluebrain.github.io/nexus/schemas/unco...,2022-03-22T11:06:58.666Z,https://sandbox.bluebrainnexus.io/v1/realms/gi...,False,https://sandbox.bluebrainnexus.io/v1/resources...,https://sandbox.bluebrainnexus.io/v1/resources...,https://sandbox.bluebrainnexus.io/v1/projects/...,1,https://sandbox.bluebrainnexus.io/v1/projects/...,https://sandbox.bluebrainnexus.io/v1/resources...,2022-03-22T11:06:58.666Z,https://sandbox.bluebrainnexus.io/v1/realms/gi...


## DataFrame to list of Resources

In [41]:
data = pd.DataFrame([
    {
        "type": "Person",
        "address.type": "PostalAddress",
        "address.country": "Switzerland",
        "address.locality": "Geneva",
        "email": "(missing)",
        "name": "Jane Doe",
    },
    {
        "type": "Person",
        "address.type": np.nan,
        "address.country": np.nan,
        "address.locality": np.nan,
        "email": "john.smith@epfl.ch",
        "name": "John Smith",
    }
])

In [42]:
data

Unnamed: 0,type,address.type,address.country,address.locality,email,name
0,Person,PostalAddress,Switzerland,Geneva,(missing),Jane Doe
1,Person,,,,john.smith@epfl.ch,John Smith


In [43]:
resources = forge.from_dataframe(data)

In [44]:
address = Resource(type="PostalAddress", country="Switzerland", locality="Geneva")

In [45]:
jane = Resource(type="Person", name="Jane Doe", address=address, email="(missing)")

In [46]:
john = Resource(type="Person", name="John Smith", email="john.smith@epfl.ch")

In [47]:
persons = [jane, john]

In [48]:
resources == persons

True

In [49]:
resources_na = forge.from_dataframe(data, na="(missing)")

In [50]:
print(resources[0])

{
    type: Person
    address:
    {
        type: PostalAddress
        country: Switzerland
        locality: Geneva
    }
    email: (missing)
    name: Jane Doe
}


In [51]:
print(resources_na[0])

{
    type: Person
    address:
    {
        type: PostalAddress
        country: Switzerland
        locality: Geneva
    }
    name: Jane Doe
}


In [52]:
resources_nesting = forge.from_dataframe(data, nesting=".")

In [53]:
print(resources_nesting[0])

{
    type: Person
    address:
    {
        type: PostalAddress
        country: Switzerland
        locality: Geneva
    }
    email: (missing)
    name: Jane Doe
}


In [54]:
data = pd.DataFrame([
    {
        "type": "Person",
        "address_type": "PostalAddress",
        "address_country": "Switzerland",
        "address_locality": "Geneva",
        "email": "(missing)",
        "name": "Jane Doe",
    },
    {
        "type": "Person",
        "address_type": np.nan,
        "address_country": np.nan,
        "address_locality": np.nan,
        "email": "john.smith@epfl.ch",
        "name": "John Smith",
    }
])

In [55]:
resources_nesting = forge.from_dataframe(data, nesting="_")

In [56]:
print(resources_nesting[0])

{
    type: Person
    address:
    {
        type: PostalAddress
        country: Switzerland
        locality: Geneva
    }
    email: (missing)
    name: Jane Doe
}
