-
Notifications
You must be signed in to change notification settings - Fork 55
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #787 from aaxelb/eng-2100--docs
[ENG-2100] docs: delete old, add new
- Loading branch information
Showing
37 changed files
with
313 additions
and
2,456 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
# Architecture of SHARE/Trove | ||
|
||
This document is a starting point and reference to familiarize yourself with this codebase. | ||
|
||
## Bird's eye view | ||
In short, SHARE/Trove takes metadata records (in any supported input format), | ||
ingests them, and makes them available in any supported output format. | ||
``` | ||
┌───────────────────────────────────────────┐ | ||
│ Ingest │ | ||
│ ┌──────┐ │ | ||
│ ┌─────────────────────────┐ ┌──►Format├─┼────┐ | ||
│ │ Normalize │ │ └──────┘ │ │ | ||
│ │ │ │ │ ▼ | ||
┌───────┐ │ │ ┌─────────┐ ┌────────┐ │ │ ┌──────┐ │ save as | ||
│Harvest├─┬─┼─┼─►Transform├──►Regulate├─┼─┬─┼──►Format├─┼─┬─►FormattedMetadataRecord | ||
└───────┘ │ │ │ └─────────┘ └────────┘ │ │ │ └──────┘ │ │ | ||
│ │ │ │ │ . │ │ ┌───────┐ | ||
│ │ └─────────────────────────┘ │ . │ └──►Indexer│ | ||
│ │ │ . │ └───────┘ | ||
│ └─────────────────────────────┼─────────────┘ some formats also | ||
│ │ indexed separately | ||
▼ ▼ | ||
save as save as | ||
RawDatum NormalizedData | ||
``` | ||
|
||
## Code map | ||
|
||
A brief look at important areas of code as they happen to exist now. | ||
|
||
### Static configuration | ||
|
||
`share/schema/` describes the "normalized" metadata schema/format that all | ||
metadata records are converted into when ingested. | ||
|
||
`share/sources/` describes a starting set of metadata sources that the system | ||
could harvest metadata from -- these will be put in the database and can be | ||
updated or added to over time. | ||
|
||
`project/settings.py` describes system-level settings which can be set by | ||
environment variables (and their default values), as well as settings | ||
which cannot. | ||
|
||
`share/models/` describes the data layer using the [Django](https://www.djangoproject.com/) ORM. | ||
|
||
`share/subjects.yaml` describes the "central taxonomy" of subjects allowed | ||
in `Subject.name` fields of `NormalizedData`. | ||
|
||
### Harvest and ingest | ||
|
||
`share/harvest/` and `share/harvesters/` describe how metadata records | ||
are pulled from other metadata repositories. | ||
|
||
`share/transform/` and `share/transformers/` describe how raw data (possibly | ||
in any format) are transformed to the "normalized" schema. | ||
|
||
`share/regulate/` describes rules which are applied to every normalized datum, | ||
regardless where or what format it originally come from. | ||
|
||
`share/metadata_formats/` describes how a normalized datum can be formatted | ||
into any supported output format. | ||
|
||
`share/tasks/` runs the harvest/ingest pipeline and stores each task's status | ||
(including debugging info, if errored) as a `HarvestJob` or `IngestJob`. | ||
|
||
### Outward-facing views | ||
|
||
`share/search/` describes how the search indexes are structured, managed, and | ||
updated when new metadata records are introduced -- this provides a view for | ||
discovering items based on whatever search criteria. | ||
|
||
`share/oaipmh/` describes the [OAI-PMH](https://www.openarchives.org/OAI/openarchivesprotocol.html) | ||
view for harvesting metadata from SHARE/Trove in bulk. | ||
|
||
`api/` describes a mostly REST-ful API that's useful for inspecting records for | ||
a specific item of interest. | ||
|
||
### Internals | ||
|
||
`share/admin/` is a Django-app for administrative access to the SHARE database | ||
and pipeline logs | ||
|
||
`osf_oauth2_adapter/` is a Django app to support logging in to SHARE via OSF | ||
|
||
### Testing | ||
|
||
`tests/` are tests. | ||
|
||
## Cross-cutting concerns | ||
|
||
### Immutable metadata | ||
|
||
Metadata records at all stages of the pipeline (`RawDatum`, `NormalizedData`, | ||
`FormattedMetadataRecord`) should be considered immutable -- any updates | ||
result in a new record being created, not an old record being altered. | ||
|
||
Multiple records which describe the same item/object are grouped by a | ||
"source-unique identifier" or "suid" -- essentially a two-tuple | ||
`(source, identifier)` that uniquely and persistently identifies an item in | ||
the source repository. In most outward-facing views, default to showing only | ||
the most recent record for each suid. | ||
|
||
## Why this? | ||
inspired by [this writeup](https://matklad.github.io/2021/02/06/ARCHITECTURE.md.html) | ||
and [this example architecture document](https://github.com/rust-analyzer/rust-analyzer/blob/d7c99931d05e3723d878bea5dc26766791fa4e69/docs/dev/architecture.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,71 +1,7 @@ | ||
# CONTRIBUTING | ||
|
||
## Style Guide | ||
|
||
In the following templates, `TYPE` may be any of `Task`, `Bug`, `Feature`, `Improvement`, or `Quick`. | ||
|
||
### Commit Messages | ||
|
||
Commit messages should be formatted as: | ||
|
||
``` | ||
[SHARE-###][TYPE] Brief description | ||
* More details about the code changes. | ||
* Formatted as a bulleted list | ||
* If you have a really long line, wrap it | ||
at 80 characters and line up with the first | ||
letter, not the bullet point. | ||
``` | ||
|
||
Here are some excellent commit messages, for reference. | ||
* https://github.com/CenterForOpenScience/SHARE/commit/0fe503f0dc5f90da366246086ae76ee5281843cf | ||
* https://github.com/CenterForOpenScience/SHARE/commit/226bac6a9010cde6aed7ac037c9186ac889b5132 | ||
* https://github.com/CenterForOpenScience/SHARE/commit/0e02dbb9d06920623e0dfb6a32fd1b38771de74b | ||
|
||
### Pull Requests | ||
|
||
Titles should be formatted as `[SHARE-###][TYPE] Brief description` | ||
|
||
Here are some excellent pull requests, for reference. | ||
* https://github.com/CenterForOpenScience/SHARE/pull/658 | ||
* https://github.com/CenterForOpenScience/SHARE/pull/642 | ||
|
||
### Code | ||
|
||
#### Docstrings | ||
|
||
Python docstrings should follow the [Google docstring style guide](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html). | ||
|
||
To easily distinguish them, docstrings should use triple double-quotes, `"""`, and large strings should use triple single-quotes, `'''` | ||
|
||
## Reporting Issues | ||
|
||
If you find a bug in osf.io or would like to propose a new feature, please file an issue report in CenterForOpenScience/osf.io. Below we have some information on how to best report the issue, but if you’re short on time or new to this, don’t worry! We really want to know about the problem, so go ahead and report it. If you do this a lot, or you just want to know how to make it easier for us to find and fix the problem, keep reading. | ||
|
||
If you would like to report a security issue, please email contact@cos.io for instructions on how to report the security issue. Do not include details of the issue in that email. | ||
|
||
### Quick link | ||
[Submit an issue](https://github.com/CenterForOpenScience/SHARE/issues/new?body=Steps%0A-------%0A1.%20%0A%0AExpected%0A------------%0A%0AActual%0A--------%0A) | ||
using that link and you will have a handy template to save you a little time in your issue reporting. | ||
|
||
### How to make the best issue | ||
-------------------------- | ||
|
||
First, please make sure that the issue has not already been reported by searching through the issue archives. | ||
|
||
When submitting an issue, be as descriptive as possible: | ||
* What you did (step by step) | ||
* Where does this happen on SHARE? | ||
* What you expected | ||
* What actually happened | ||
* Check the JavaScript console in the browser (e.g. In Chrome go to View → Developer → JavaScript console) and report errors | ||
* If it's an issue with staging, report whether or not it also occurs on production | ||
* If an error was generated, report what time it occurred, and the specific URL. | ||
* Potential causes | ||
* Suggest a solution | ||
* What will it look like when this issue is resolved? | ||
|
||
Include pictures (e.g., in OSX press Cmd+Shift+4 to draw a box to screenshot) | ||
|
||
TODO: how do we want to guide community contributors? | ||
|
||
For now, if you're interested in contributing to SHARE/Trove, feel free to | ||
[open an issue on github](https://github.com/CenterForOpenScience/SHARE/issues) | ||
and start a conversation. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# "What is this, even?" | ||
|
||
Imagine a vast, public library full of the outputs and results of some scientific | ||
research -- shelves full of articles, preprints, datasets, data analysis plans, | ||
and so on. | ||
|
||
You can think of SHARE/Trove as that library's card catalog. | ||
|
||
## "...What is a card catalog?" | ||
|
||
A [card catalog](https://en.wikipedia.org/wiki/Card_catalog) is that weird, cool cabinet you might see at the front of a | ||
library with a bunch of tiny drawers full of index cards -- each index card | ||
contains information about some item on the library shelves. | ||
|
||
The card catalog is where you go when you want to: | ||
- locate a specific item in the library | ||
- discover items related to a specific topic, author, or other keywords | ||
- make a new item easily discoverable by others | ||
|
||
## "OK but what 'library' is this?" | ||
As of July 2021, SHARE/Trove contains metadata on over 4.5 million items originating from: | ||
- [OSF](https://osf.io) (including OSF-hosted Registries and Preprint Providers) | ||
- [REPEC](http://repec.org) | ||
- [arXiv](https://arxiv.org) | ||
- [ClinicalTrials.gov](https://clinicaltrials.gov) | ||
- ...and more! | ||
|
||
Updates from OSF are reflected within seconds, while updates from third-party sources are | ||
harvested once daily. | ||
|
||
## "How can I use it?" | ||
|
||
You can search the full SHARE/Trove catalog at | ||
[share.osf.io/discover](https://share.osf.io/discover). | ||
|
||
Other search pages can also be built on SHARE/Trove, showing only a specific | ||
collection of items. For example, [OSF Preprints](https://osf.io/preprints/discover) | ||
and [OSF Registries](https://osf.io/registries/discover) show only registrations | ||
and preprints, respectively, which are hosted on OSF infrastructure. | ||
|
||
To learn about using the API (instead of a user interface), see | ||
[USING-THE-API.md](./USING-THE-API.md) |
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.