Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to distinguish different types of content and content that belongs to different institutions #478

Open
uconnjeustis opened this issue Jan 6, 2017 · 20 comments
Labels
Subject: Multi-tenancy related to having content from multiple Drupal sites in one system Type: use case proposes a new feature or function for the software using user-first language.

Comments

@uconnjeustis
Copy link

This use case is slightly out of scope from Islandora-CLAW/CLAW#396. This use case was moved to this new issue.

On the Islandora Metadata Interest Group, a discussion was started on OAI-PMH support. In addition to some wanted features, the idea of namespaces came up. Our use case is different from that of @rosiel and wanted to add it here.

Use Type Description
Title (Goal) Ability to distinguish and/or assign content to multiple institutions
Primary Actor Sysadmin, Repository Admin, Repository curators
Scope Islandora Site Architecture
Level Medium?
Story Currently, the Connecticut Digital Archive works with over 40 institutions who add and manage content in the repository and in multiple sites. To distinguish one institutions' content from another, CTDA implements namespaces. Each institution has a namespace that is a range. For example, 20002-29999 is the namespace range for UConn Archives & Special Collections. The reason for this is that UConn ASC can have general content in the 20002 namespace, research data in 20003, and university records in 20004. Each institution has such a range where the first one or two numbers never change. We not only use namespaces to distinguish content from different institutions and within an institution different types of content but also namespaces are used on various sites. For example, we have a site for UConn ASC and CT State Library. For CTDA, we really need an easy way to ensure that institutions and users can quickly determine if the content is theirs. Namespaces allow us to do that especially as they appear in the PID, in the url, etc. Going forward we need a way to ensure these institutional distinctions remain in place and can be continued in such a way that non-technical volunteers are easily able to assign content to a particular institution.
@ajs6f
Copy link

ajs6f commented Jan 6, 2017

Is this addressable via LDP containment? If so, that would be the most natural idiom.

@uconnjeustis
Copy link
Author

We use namespaces to do this currently. What makes this easy is that we can identify in the PID quickly which institution is the content owner. For any sparql or SOLR queries, it's possible to filter by namespace which is great. Recently, I searched for OBJ size by namespace. This was to determine how much institutions had uploaded (of course only in terms of the lastest OBJ datastream size). Also, the use of namespaces is convenient when sorting through harvest results. With those results, you can sort by namespace and produce nice reports for institutions that want an inventory of their metadata at the end of the year.

@ajs6f
Copy link

ajs6f commented Jan 7, 2017

This may not be a good solution for the long-term or large scale. It's better to use opaque identifiers. Wouldn't your needs be met by using a property the values of which would partition your repository by institution?

@uconnjeustis
Copy link
Author

@ajs6f What do you mean by property the values?

@ajs6f
Copy link

ajs6f commented Jan 9, 2017

...a property, the values of which.... (The values of that property) would partition...

@acoburn
Copy link
Contributor

acoburn commented Jan 9, 2017

I believe that what is meant is something like this -- resources each use opaque identifiers (a very good idea) but then have a property that points to the institution managing that resource (there may be a more appropriate property, but this is an example):

</ae5e022f87f74c9a717> dcterms:isPartOf <info:repository/uconn> .

and:

</c03825dc32fab94c439ca> dcterms:isPartOf <info:repository/amherst> .

Or, simply via LDP containment:

</uconn/ae5e022f87f74c9a717>
</amherst/c03825dc32fab94c439ca>

@DiegoPino
Copy link
Contributor

@acoburn i kinda like the idea of <> dcterms:isPartOf <info:repository/amherst> . wich really could be any object property depending on each use case but could be done also as simple as a data property (a.k.a a string) like flagging or tagging your resources. <> someont:inspaceof "amherst" right?
For another use case:
What about using WebAC for the same purpose? AuthZ based alternative in addition to an extra property, making good use of agents and groups you get the accessible/not accessible results and also making use of "automagic" filtering by fedora4. Currently, in Islandora, namespaces are also used to exclude resources. My 2 cents

@ajs6f
Copy link

ajs6f commented Jan 9, 2017

Yeah, this is the idea (I'm not totally sure that dcterms:isPartOf is the best choice here, but that's not important). Either a property or LDP containment. The advantage of the LDP containment is that it is connected with authZ via WebAC. Then there isn't much need for a property.

@uconnjeustis
Copy link
Author

Thanks for the clarification @ajs6f and @acoburn. I could be way off on this but it seems that the LDP containment might be a better way to go. Or is it better to have this information in more than one place? I'm not sure someone would want to duplicate this information but just thought I ask the question anyway.

@ajs6f
Copy link

ajs6f commented Jan 9, 2017

I'd be inclined to LDP containment. We haven't worked out a complete scheme for multitenancy from the Fedora side, but I don't think there is much question that it will pivot on LDP.

@ajs6f
Copy link

ajs6f commented Jan 9, 2017

@ruebot Do you want to take this up on a CLAW call or too early?

@acoburn
Copy link
Contributor

acoburn commented Jan 9, 2017

+1 on using LDP containment. That will also be much easier to make work with WebAC.

@dannylamb
Copy link
Contributor

@ajs6f We're going to have to figure out some basic multitenancy scheme for fedora at some point. Both @rosiel's and this use case imply it.

And containment feels like a pretty natural way to attempt this.

And anyone can feel free to slap this onto the CLAW call agenda if they'd like. I think it'll eventually bring us to the conundrum we've had around translating 'islandora:root' from Fedora 3 to 4, and how it would work with multisites and multitenancy.

@ruebot
Copy link
Member

ruebot commented Jan 10, 2017

@ajs6f what @dannylamb said:

anyone can feel free to slap this onto the CLAW call agenda if they'd like

@rosiel
Copy link
Member

rosiel commented Jan 10, 2017

The main difference that I see between a property vs. LDP containment is that you can have

</ae5e022f87f74c9a717> dcterms:isPartOf <info:repository/uconn> .
</ae5e022f87f74c9a717> dcterms:isPartOf <info:repository/amherst> . 

But you probably can't have

</uconn/ae5e022f87f74c9a717>
</amherst/ae5e022f87f74c9a717>

LDP containment would therefore be more similar to the existing namespace method, and if it integrates with WebAC, all the better.

@dannylamb in this context what do you mean by "multitenancy"?

@uconnjeustis
Copy link
Author

The LDP does seem to be similar to the namespace method as it can distinguish content by institutions. Please forgive my ignorance... but to further distinguish different types of content within an institution, would it be possible to have something like...

</uconn/general/ae5e022f87f74c9a717>
</uconn/researchdata/ae5e022f87f74c9a717>
</uconn/univrecords/ae5e022f87f74c9a717>
</barnum/general/ae5e022f87f74c9a717>

@ajs6f
Copy link

ajs6f commented Jan 10, 2017

@uconnjeustis Yes, absolutely. That's just the sort of thing for which LDP is meant to be used.

@rosiel Careful-- you're wrong in that certainly can have resources in more than one container (via Direct and Indirect container action), but you're right that they can't have more than one URI in a given API instance. It's a bit confusing that way.

@ajs6f
Copy link

ajs6f commented Jan 10, 2017

@uconnjeustis One point to consider-- if you want to make the best use of LDP for that kind of problem, try to stick to classifications/categorizations that have a partitioning quality; i.e. for which each resource belongs to one and only container. You can do more complicated things, certainly, but you start to slip towards a point of complexity for which you would do better to use a multivalued property. And consider the interaction of the various systems of categories. It's a design choice for which we need to look at a specific use case to make an informed decision.

For example, let's say that your various resources are never owned by more than one institution. Then using LDP to put them in containers-by-institution is a great idea:

</uconn/researchdata/ae5e022f87f74c9a717>
</uconn/projectX/ae5e022f87f74c9a717>
</uconn/projectY/ae5e022f87f74c9a717>
</barnum/researchdata/ae5e022f87f74c9a717>

But let's say that you want to be able to search across all research data at once, and that data in some specific project is also considered research data for the purposes of that search . Then you might do better to put that information (a type of resource) into a property, like a literal or an rdf:type.

Fedora 4 offers you a much more flexible and powerful set of techniques and possible practices for data modeling, but data modeling is still work and it's still the heart of what it is to "do Fedora", so look forward to it!

@dannylamb
Copy link
Contributor

@rosiel I'm using 'multitenancy' to describe when more than one group/organzation is sharing a single Fedora.

@dannylamb
Copy link
Contributor

Linking to #926

@kstapelfeldt kstapelfeldt added Subject: Multi-tenancy related to having content from multiple Drupal sites in one system Type: use case proposes a new feature or function for the software using user-first language. and removed Multi-tenancy labels Sep 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Subject: Multi-tenancy related to having content from multiple Drupal sites in one system Type: use case proposes a new feature or function for the software using user-first language.
Projects
Development

No branches or pull requests

9 participants