Skip to content
This repository has been archived by the owner on Jan 22, 2020. It is now read-only.

New features: from_package() and add_entity() #4

Open
lapidus opened this issue Mar 4, 2019 · 2 comments
Open

New features: from_package() and add_entity() #4

lapidus opened this issue Mar 4, 2019 · 2 comments

Comments

@lapidus
Copy link

lapidus commented Mar 4, 2019

@miroli, regarding new functionality to f2p, this can provide some inspiration when it comes to the section of “ingredients”:
https://ddf-utils.readthedocs.io/en/latest/recipe.html

cooking:
    concepts:
        # procedures for concepts here
    entities:
        # procedures for entities here
    datapoints:
        # procedures for datapoints here

But at the same time I think good that f2p is lean and focused on the most common workflows we see from Pandas.

I guess what I'm primarily looking for is a way to map more quickly define the basics (what was previously in the constructor of f2p) of a dataset.

For example, let's say I already have a dataset and would like to just add one more file, pseudo:

f2p.from_package("path")
f2p.add_data(data=df, concepts=concepts)
f2p.to_package("path")

Or my simpler use case where I would like to create a dataset with some concepts and entities (no datapoints):


concepts = [
        {
                'concept': 'country',
                'concept_type': 'entity_domain'
        },
        {
                'concept': 'some-other-country-metadata',
                'concept_type': 'String'
       }
]

f2p()
f2p.add_concepts(concepts=concepts)
f2p.add_entity(concept="country", data=dataFrame)
f2p.to_package("path")
@miroli
Copy link
Collaborator

miroli commented Mar 5, 2019

I will look into updating an existing package. As to the second request, I believe this functionality is already available. The update_entity method can accept non-existing entities, and simply adds them to the entities list if that's the case.

See the following example.

concepts = [
        {
                'concept': 'country',
                'concept_type': 'entity_domain'
        },
        {
                'concept': 'capital',
                'concept_type': 'string'
       }
]

df = pd.DataFrame([
    {
        'country': 'sweden',
        'capital': 'Stockholm'
    },
    {
        'country': 'norway',
        'capital': 'Oslo'
    },
])

f2p = Frame2Package()
f2p.add_concepts(concepts=concepts)
f2p.update_entity(name='country', data=df, id='country')
f2p.to_package('ddf')

@miroli
Copy link
Collaborator

miroli commented Mar 6, 2019

As to the first point, I'm thinking it would probably be sufficient to add a .append_to_package method that:

  1. Adds any new concepts to ddf--concepts.csv.
  2. Adds one or more data files if they don't yet exist in the package.
  3. Adds new entity files or appends to existing ones.
  4. Recreates the JSON schema and updates datapackage.json.

Do you think that would do it @lapidus?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants