You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Atomizing is about turning non-atomic data into atomic data. This (often) means converting some existing file into JSON-AD, and then sending it / publishing it to an Atomic Server Importer #390.
Ideally, we'd have one application (the Atomizer) that can:
run as a CLI. Pipe to your locally running Atomic Server to have a highly performant importer.
run on the server. Upload a file / link to a file and the server will automatically perform conversions / atomizations. Or atomize already uploaded files automagically.
run in the browser. Let the JS client perform imports. Highly flexible, can even ask user for extra input when needed.
That will be able to:
Recognize files and file types
Convert them to JSON-AD
Send them to an Atomic Server
Considerations:
How do we deal with changes to files? I suppose we'd create new commits that typically never remove any properties, but do overwrite them.
Should users be able to manually re-trigger extracting data from a source file? Does this needs to be an endpoint?
The function signature seems to be very simple: File + Parent in, JSON-AD out.
I suppose the Atomizer needs to know the parent - where the resource needs to go. It also needs to know how to upload the file. This can be extracted from the partent URL example.com/upload
How do we deal with changes to JSON-AD? Let's say a user edits the location on an image, which originated from the EXIF data. The user might expect this would update the values on the image file. However, it does not do this - it only updates the AD resource. This could definitely be confusing. We could solve this by adding write capabilities, but that would definitely make things far more complicated. Another solution is to just not allow updates to metadata.
Implementation
I think a sensible technological approach is to write all of this in Rust, as a new Crate inside this repo. If it's rust, we can easily embed it in Atomic Server. Also, we can still (later) compile it to WASM and run it in the browser.
I'd like users to register handlers for various files types as plugins. Each handler can be registered for a specific mime type, and has a handler functions that reads a bunch of bytes and creates one (or more?) resources.
Therefore it might make sense to have a bunch of plugins that do this.
Mime type recognition. Before There are tools that help to identify the file type. A notable one is libmagic, and its rust wrapper magic. A lightweight alternative that only uses filetype extensions is mime_guess.
Filetypes / data types to atomize:
Files. We already have the File class, which describes a file with some size and some filetype. If the Atomizer does not know what to do with a file, it will simply upload it as-is: as a file.
Plaintext files. Anything that's a programming language file, or other text file, can be converted into this.
Markdown. These can be converted to Articles. We still need a proper model for this.
Documents. PDF, word, etc. can be converted to plaintext, which makes them searchable. Crates: pdf-extract
Image. Similar to Files, but with extra data, such as EXIF location / camera / aperture / ISO, etc. These can be transformed to smaller images using an endpoint. Image endpoint (resize, crop) #257
CSV. Parse the headers and convert it to a Table, with Class + Properties.
Atomizing is about turning non-atomic data into atomic data. This (often) means converting some existing file into JSON-AD, and then sending it / publishing it to an Atomic Server Importer #390.
Ideally, we'd have one application (the Atomizer) that can:
That will be able to:
Considerations:
File
+Parent
in,JSON-AD
out.parent
- where the resource needs to go. It also needs to know how to upload the file. This can be extracted from the partent URLexample.com/upload
location
on animage
, which originated from theEXIF
data. The user might expect this would update the values on the image file. However, it does not do this - it only updates the AD resource. This could definitely be confusing. We could solve this by adding write capabilities, but that would definitely make things far more complicated. Another solution is to just not allow updates to metadata.Implementation
I think a sensible technological approach is to write all of this in Rust, as a new Crate inside this repo. If it's rust, we can easily embed it in Atomic Server. Also, we can still (later) compile it to WASM and run it in the browser.
I'd like users to register handlers for various files types as plugins. Each handler can be registered for a specific mime type, and has a handler functions that reads a bunch of bytes and creates one (or more?) resources.
Therefore it might make sense to have a bunch of plugins that do this.
Mime type recognition. Before There are tools that help to identify the file type. A notable one is
libmagic
, and its rust wrappermagic
. A lightweight alternative that only uses filetype extensions ismime_guess
.Filetypes / data types to atomize:
Inspiration:
The text was updated successfully, but these errors were encountered: