Skip to content

Commit

Permalink
migrate the site objects to LazyCollection instances
Browse files Browse the repository at this point in the history
  • Loading branch information
pjc09h committed Jun 9, 2024
1 parent edca738 commit 5915b29
Show file tree
Hide file tree
Showing 67 changed files with 929 additions and 1,158 deletions.
111 changes: 107 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,121 @@ The application as a whole is object-oriented at a rather low level where object
The custom model system has a sensible amount of helpers but you're responsible for many things:

- custom CRUD operations, e.g., transparently handling JSON and arrays
- defining relationships to active, e.g., `$request->loadTorrentGroups()`
- defining relationships to active, e.g., `$collage->torrentGroups()`
- all database logic, especially fors linking tables and API integrations

The core objects all follow the [JSON:API specification format](https://jsonapi.org/format/1.2/) from instantiation.
Relationships can be loaded one at a time, with `["id" => "string", "type" => "string"]` pairs the default.
These objects are available to supported clients (e.g., `$app->env->executionContext`) to use as needed.

## Request timeline breakdown
### Request timeline breakdown

A typical request starts in `/public/index.php` to bootstrap the correct client.
Web and API requests each have their own bootstrap logic (so does the CLI).
In either case, the application makes more checks and starts the Flight router.
This maps routes to `require` statements, e.g., `/sections/torrentGroups/browse.php`.
These files call methods, e.g., `Gazelle\TorrentGroups->loadTorrents()` to query data.
This data goes to either a Twig template or a JSON response; both use JSON:API objects.
These files call methods, e.g., `Gazelle\TorrentGroups->torrents()` to query data.
This data goes to either a Twig template or a JSON response; both use JSON:API objects.

## Object API

There are a set of core objects in `/app/Models` that are implemented as Laravel `LazyCollection` instances.
This has the dual benefit of making the objects memory efficient and also making them immutable (to prevent ORM creep).
The API generally follows Laravel method naming coventions, e.g., `updateOrCreate()`, but without magic.
Here's a simple example of how you'd work with an object:

```php
$id ??= null;

# read on instantiation
$torrentGroup = new Gazelle\TorrentGroups($id);
!d($torrentGroup);

# relationships are always available
$torrents = $torrentGroup->torrents();
!d($torrents);

# create or update any or all data
$data = [
"title" => "new title",
"subject" => "new subject",
"object" => "new object",
];

# get the new object back
$newTorrentGroup = $torrentGroup->updateOrCreate($data);
!d($newTorrentGroup);

# use any laravel method on the instance
# https://laravel.com/docs/master/collections#the-enumerable-contract
```

### Attributes

Attributes hold the main metadata of the object, a `LazyCollection` based on the JSON:API specification.
JSON is deserialized on read, but the strings can always be obtained by, e.g., `$creator->attributes->concepts->raw()`.
Remember that `LazyCollection` instances are immutable so this code won't work:

```php
$id ??= null;

$literature = new Gazelle\Literature($id);
$literature->title = "new title";

$literature->save();
```

### Relationships

Relationships are implemented as reciprocal 1 : 1 "links" stored in one table for each object, e.g., `creators_links`.
There are no concepts of ownership, one-to-one vs. one-to-many, or any kind of hierarchy or definition involved.

As a result, the metadata ecosystem is flat so it's possible to call, e.g., `organizations()` from any object.
This lazily returns either an array of objects or an empty array, ready for loops and compatible with JSON:API.

Each object is also primed with dehydrated `relationships`, e.g., `["id" => "666", "type" => "torrrents"]`.
Twig also treats dot notation as method calls, so `{{ torrentGroup.torrents }}` works like `$torrentGroup->torrents()`.

## Search, metadata, and indexing

The search engine is Manticore and its indexing strategy is conceptually similar to "links."
It attempts to index every attribute of every object associated with each index, uniquely prefixed.
This massive amount of data is filtered into specific form fields and matched against user inputs.
At the cost of some boilerplate in code, it allows for searches like this:

```php
# map of search form fields => index fields
private array $fieldMaps = [
"shared" => [
"creators" => ["creators_openAlexId", "creators_orcid", "creators_scopusId", "creators_semanticScholarId", "creators_name", "creators_slug", "creators_aliases"],
"literature" => ["literature_doi", "literature_openAlexId", "literature_semanticScholarId", "literature_title", "literature_slug", "literature_bibtex", "literature_abstract"],

"workgroups" => ["torrentGroups_workgroup", "creators_affiliations", "creators_affiliationsOverTime", "organizations_grid", "organizations_openAlexId", "organizations_rorId", "organizations_wikidataId", "organizations_name", "organizations_slug", "organizations_acronym", "organizations_reverseGeocode"],
"locations" => ["torrentGroups_location", "organizations_latitude", "organizations_longitude", "organizations_reverseGeocode", "organizations_country", "organizations_state", "organizations_city", "organizations_postalCode"],

# etc., as broad or granular as desired, for any attribute
],
];
```

### Remote metadata

A significant amount of metadata not for torrents, collages, and requests is programmatically acquired.
These objects are creators, literature, publications, and organizations and are called the "ecosystem."
The data comes from a variety of sources including OpenAlex, Crossref, Semantic Scholar, ROR, Google, etc.

To prevent public API abuse and the infinite growth of metadata, objects have a `degreesOfSeparation` and a `failCount`.
The `degreesOfSeparation` is configurable (default `6`) and determines how distantly related to a piece of UGC it is.
The `failCount` (default `3`) determines how many API error responses, e.g., `404`, to tolerate for the content.
Automated queries for remote metadata cease once either threshold is crossed for any particular object.

### Typeahead search

Typeahead search (autocomplete) is implemented with `corejavascript/typeahead.js`](https://github.com/corejavascript/typeahead.js).
Bloodhound uses a single remote data source that's an internal API endpoint.
It calls `Gazelle\Autocomplete->fetch()` and returns `[ ["id" => string, "text" => string, "openAlexId" => string, "isLocal" => bool] ]`.
Remote data is fetched by default, but this can be disabled with a flag.

## Utilities and developer tools

todo
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright (c) 2023 Omics Tools LLC <hello@torrents.bio>
Copyright (c) 2024 Omics Tools LLC <hello@torrents.bio>

Permission to use, copy, modify, and distribute this software for any
purpose with or without fee is hereby granted, provided that the above
Expand Down
59 changes: 39 additions & 20 deletions app/Database.php
Original file line number Diff line number Diff line change
Expand Up @@ -304,47 +304,54 @@ public function determineId(int|string $id, bool $insteadExtractId = false): str
$app = App::go();

# cast to string
$id = strval($id);

# openAlex
$good = preg_match("/{$app->env->regexOpenAlex}/", $id, $matches);
if ($good) {
return (!$insteadExtractId ? "openAlexId" : $matches[0]);
}
$id = urldecode(strval($id));

# doi
$good = preg_match("/{$app->env->regexDoi}/i", $id, $matches);
if ($good) {
return (!$insteadExtractId ? "doi" : $matches[0]);
}

# orcid
$good = preg_match("/{$app->env->regexOrcid}/i", $id, $matches);
# info hash
$good = preg_match("/{$app->env->regexInfoHash}/i", $id, $matches);
if ($good) {
return (!$insteadExtractId ? "orcid" : $matches[0]);
return (!$insteadExtractId ? "info_hash" : $matches[0]);
}

# issn
$good = preg_match("/{$app->env->regexIssn}/", $id, $matches);
$good = preg_match("/{$app->env->regexIssn}/i", $id, $matches);
if ($good) {
return (!$insteadExtractId ? "issn" : $matches[0]);
}

# rorId
$good = preg_match("/{$app->env->regexRor}/", $id, $matches);
# openAlex
$good = preg_match("/{$app->env->regexOpenAlex}/i", $id, $matches);
if ($good) {
return (!$insteadExtractId ? "openAlexId" : $matches[0]);
}

# orcid
$good = preg_match("/{$app->env->regexOrcid}/i", $id, $matches);
if ($good) {
return (!$insteadExtractId ? "orcid" : $matches[0]);
}

# rorId: collides with id
$good = preg_match("/{$app->env->regexRor}/i", $id, $matches);
if ($good && strlen($id) !== 18) {
return (!$insteadExtractId ? "rorId" : $matches[0]);
}

# default id
return (!$insteadExtractId ? "id" : $id);

/*
# normal numeric id
if (is_int($id) || is_numeric($id)) {
$good = preg_match("/{$app->env->regexShortUuid}/i", $id, $matches);
if ($good) {
return (!$insteadExtractId ? "id" : $id);
}
# default slug
return (!$insteadExtractId ? "slug" : $id);

/*
# https://ihateregex.io/expr/uuid/
if (is_string($id) && strlen($id) === 36 && preg_match("/{$app->env->regexUuid}/iD", $id)) {
return "uuid";
Expand All @@ -354,6 +361,18 @@ public function determineId(int|string $id, bool $insteadExtractId = false): str
if (Text::isBinary($id) && strlen($id) === 16) {
return "uuid";
}
# semantic scholar author
$good = preg_match("/{$app->env->regexSemanticScholarAuthor}/i", $id, $matches);
if ($good) {
return (!$insteadExtractId ? "semanticScholarId" : $matches[0]);
}
# semantic scholar paper
$good = preg_match("/{$app->env->regexSemanticScholarPaper}/i", $id, $matches);
if ($good) {
return (!$insteadExtractId ? "semanticScholarId" : $matches[0]);
}
*/
}

Expand Down Expand Up @@ -386,8 +405,8 @@ public function fullId(int|string $id): string
$uniqueId = $this->extractId($id);

return match ($column) {
"openAlexId" => "https://openalex.org/{$uniqueId}",
"doi" => "https://doi.org/{$uniqueId}",
"openAlexId" => "https://openalex.org/{$uniqueId}",
"orcid" => "https://orcid.org/{$uniqueId}",
"rorId" => "https://ror.org/{$uniqueId}",
default => strval($id),
Expand Down Expand Up @@ -743,7 +762,7 @@ public function upsert(string $table, array $data = []): array

# it was updated, resolve a key from the data
foreach ($data as $key => $value) {
if (in_array(strtolower(strval($key)), ["id", "uuid", "slug"])) {
if (in_array(strtolower(strval($key)), ["id"])) {
$column = $this->determineId($value);
$query = "select * from {$table} where {$column} = ?";
return $this->row($query, [$value], ["hostname" => "source"]);
Expand Down
3 changes: 2 additions & 1 deletion app/Http.php
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@
class Http
{
# cookie params
private static string $cookiePrefix = "__Secure-";
private static string $cookiePrefix = "";
#private static string $cookiePrefix = "__Secure-";
private static string $cookieDuration = "tomorrow";


Expand Down
114 changes: 114 additions & 0 deletions app/LazyCollection.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
<?php

declare(strict_types=1);


/**
* LazyCollection
*
* Laravel LazyCollection wrapper intended for site objects.
* This makes all objects immutable, something to remember.
*
* @see https://laravel.com/docs/master/collections
* @see https://github.com/spatie/laravel-collection-macros
*/

namespace Gazelle;

class LazyCollection extends \Illuminate\Support\LazyCollection
{
/**
* __get
*
* @param mixed $key the key to get
* @return mixed the value of the key
*
* @see https://laravel.com/docs/master/collections#method-get
*/
public function __get(mixed $key): mixed
{
# native laravel function
$value = $this->get($key);

# try to decode any json fields that might be present
# (because custom reading is tedious and no longer works)
if (is_string($value)) {
$good = json_decode($value, true);
if (json_last_error() === JSON_ERROR_NONE) {
return $good;
}
}

return $value;
}


/**
* raw
*
* @param mixed $key the key to get
* @return mixed the value of the key
*
* @see https://laravel.com/docs/master/collections#method-get
*/
public function raw(mixed $key): mixed
{
return $this->get($key);
}


/**
* __isset
*
* @param mixed $key the key to check
* @return bool whether the key is set
*
* @see https://laravel.com/docs/master/collections#method-has
*/
public function __isset(mixed $key): bool
{
return $this->has($key);
}


/**
* __unset
*
* @param mixed $key the key to unset
* @return void
*
* @see https://laravel.com/docs/master/collections#method-forget
*/
public function __unset(mixed $key): void
{
$this->forget($key);
}


/**
* toArray
*
* Recursively convert the collection to an array.
*
* @return array
*/
public function toArray(): array
{
$array = [];

foreach ($this as $key => $value) {
if ($value instanceof self) {
$array[$key] = $value->toArray();
} else {
$array[$key] = $value;
}
}

return $array;

/*
# the old standy, just in case
return json_decode($this->toJson(), true);
*/
}
} # class
Loading

0 comments on commit 5915b29

Please sign in to comment.