Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research Obsidian dataview approach to a markdown db #5

Closed
8 of 11 tasks
Tracked by #3
rufuspollock opened this issue Mar 12, 2023 · 5 comments
Closed
8 of 11 tasks
Tracked by #3

Research Obsidian dataview approach to a markdown db #5

rufuspollock opened this issue Mar 12, 2023 · 5 comments
Assignees
Labels

Comments

@rufuspollock
Copy link
Member

rufuspollock commented Mar 12, 2023

Obsidian dataview contains a sophisticated markdowndb index. its open source and we could learn from or even reuse some of.

In progress notes about obsidian where we could include these: https://datahub.io/notes/obsidian

Acceptance

The core question we want to answer:

We have researched what dataview does https://github.com/blacksmithgu/obsidian-dataview work? specifically ...

  • What is the "database" structure?
  • What is indexed? e.g. tags, tasks (where is code for this - see next item)
  • What code does the indexing/parsing?
    • tags extraction e.g. #tag-name
    • tasks extraction
    • ...
  • What is the query API?
  • What is the query language?
  • What is the code for converting queries to db access?
@rufuspollock
Copy link
Member Author

@demenech please dump any notes you have so far and then let's pause this for now.

@demenech
Copy link
Member

demenech commented Mar 22, 2023

Dump - 2023-03-22 (not reviewed)

Notes

What's the database structure?

TL;DR:

  • There's a PageMetadata class that contains fields that describe an indexed file, such as:
    • path
    • title
    • tags
    • links (All OUTGOING links (including embeds, header + block links) in this file.)
    • lists
    • frontmatter
  • Some of the fields are not primitive types i.e Links and Lists.
    • Lists is an array of ListItem
      • ListItem is mainly described by:
        • symbol (e.g '*', '1.')
        • line (the line that the list item starts on)
        • line count
        • text
        • tags and links
        • task related metadata (status, checked, completed)
    • Link is mainly described by:
      • path (the path of the file it points to)
      • subpath (the block or header this link points to withing a file)
      • type ("file" | "header" | "block")
      • display (what is going to be displayed?) and embed (whether the link is embedded or not)

The DB operations are defined in persister.ts (https://github.com/blacksmithgu/obsidian-dataview/blob/master/src/data-import/persister.ts). Based on the following code, it seems that the PageMetadata class defines the metadata entity:

/** Load file metadata by path. */
public async loadFile(path: string): Promise<Cached<Partial<PageMetadata>> | null | undefined> {
    return this.persister.getItem(this.fileKey(path)).then(raw => {
        let result = raw as any as Cached<Partial<PageMetadata>>;
        if (result) result.data = Transferable.value(result.data);
        return result;
    });
}

The PageMetadata class (https://github.com/blacksmithgu/obsidian-dataview/blob/master/src/data-model/markdown.ts#L10) contains the following fields (note that some behave as "calculated fields"):

/** All extracted markdown file metadata obtained from a file. */
export class PageMetadata {
    /** The path this file exists at. */
    public path: string;
    /** Obsidian-provided date this page was created. */
    public ctime: DateTime;
    /** Obsidian-provided date this page was modified. */
    public mtime: DateTime;
    /** Obsidian-provided size of this page in bytes. */
    public size: number;
    /** The day associated with this page, if relevant. */
    public day?: DateTime;
    /** The first H1/H2 header in the file. May not exist. */
    public title?: string;
    /** All of the fields contained in this markdown file - both frontmatter AND in-file links. */
    public fields: Map<string, Literal>;
    /** All of the exact tags (prefixed with '#') in this file overall. */
    public tags: Set<string>;
    /** All of the aliases defined for this file. */
    public aliases: Set<string>;
    /** All OUTGOING links (including embeds, header + block links) in this file. */
    public links: Link[];
    /** All list items contained within this page. Filter for tasks to get just tasks. */
    public lists: ListItem[];
    /** The raw frontmatter for this document. */
    public frontmatter: Record<string, Literal>;

    public constructor(path: string, init?: Partial<PageMetadata>) {
       
      ...
    
    }

    /** The name (based on path) of this file. */
    public name(): string {
        return getFileTitle(this.path);
    }

    /** The containing folder (based on path) of this file. */
    public folder(): string {
        return getParentFolder(this.path);
    }

    /** The extension of this file (likely 'md'). */
    public extension(): string {
        return getExtension(this.path);
    }

    /** Return a set of tags AND all of their parent tags (so #hello/yes would become #hello, #hello/yes). */
    public fullTags(): Set<string> {
        let result = new Set<string>();
        for (let tag of this.tags) {
            for (let subtag of extractSubtags(tag)) result.add(subtag);
        }

        return result;
    }

    /** Convert all links in this file to file links. */
    public fileLinks(): Link[] {
        // We want to make them distinct, but where links are not raw links we
        // now keep the additional metadata.
        let distinctLinks = new Set<Link>(this.links);
        return Array.from(distinctLinks);
    }
}

Note that tags are typed as Set<string>, while links and lists have their own types.

Link (https://github.com/blacksmithgu/obsidian-dataview/blob/master/src/data-model/value.ts#L416) is defined as:

/** The Obsidian 'link', used for uniquely describing a file, header, or block. */
export class Link {
    /** The file path this link points to. */
    public path: string;
    /** The display name associated with the link. */
    public display?: string;
    /** The block ID or header this link points to within a file, if relevant. */
    public subpath?: string;
    /** Is this link an embedded link (!)? */
    public embed: boolean;
    /** The type of this link, which determines what 'subpath' refers to, if anything. */
    public type: "file" | "header" | "block";
  
    ...
}

And lists are defined as arrays of ListItem (https://github.com/blacksmithgu/obsidian-dataview/blob/master/src/data-model/markdown.ts#L164):

/** A list item inside of a list. */
export class ListItem {
    /** The symbol ('*', '-', '1.') used to define this list item. */
    symbol: string;
    /** A link which points to this task, or to the closest block that this task is contained in. */
    link: Link;
    /** A link to the section that contains this list element; could be a file if this is not in a section. */
    section: Link;
    /** The text of this list item. This may be multiple lines of markdown. */
    text: string;
    /** The line that this list item starts on in the file. */
    line: number;
    /** The number of lines that define this list item. */
    lineCount: number;
    /** The line number for the first list item in the list this item belongs to. */
    list: number;
    /** Any links contained within this list item. */
    links: Link[];
    /** The tags contained within this list item. */
    tags: Set<string>;
    /** The raw Obsidian-provided position for where this task is. */
    position: Pos;
    /** The line number of the parent list item, if present; if this is undefined, this is a root item. */
    parent?: number;
    /** The line numbers of children of this list item. */
    children: number[];
    /** The block ID for this item, if one is present. */
    blockId?: string;
    /** Any fields defined in this list item. For tasks, this includes fields underneath the task. */
    fields: Map<string, Literal[]>;

    task?: {
        /** The text in between the brackets of the '[ ]' task indicator ('[X]' would yield 'X', for example.) */
        status: string;
        /** Whether or not this task has been checked in any way (it's status is not empty/space). */
        checked: boolean;
        /** Whether or not this task was completed; derived from 'status' by checking if the field 'X' or 'x'. */
        completed: boolean;
        /** Whether or not this task and all of it's subtasks are completed. */
        fullyCompleted: boolean;
    };

What is indexed? e.g. tags, tasks (where is code for this - see next item)

What code does the indexing/parsing?

pages and markdown files

tags extraction e.g. #tag-name

tasks extraction

What's the query API?

DataviewAPI query function (https://github.com/blacksmithgu/obsidian-dataview/blob/81ba6a0dd31d6562de852144112922bb33e084d9/src/api/plugin-api.ts#L264):

public async query(
    source: string | Query,
    originFile?: string,
    settings?: QueryApiSettings
): Promise<Result<QueryResult, string>> {
  
  ...
  

This function calls parseQuery (https://github.com/blacksmithgu/obsidian-dataview/blob/81ba6a0dd31d6562de852144112922bb33e084d9/src/query/parse.ts#L191) with the source argument.

TODO

@rufuspollock rufuspollock transferred this issue from another repository Apr 28, 2023
@rufuspollock rufuspollock changed the title Research Obsidian dataview approach to a markdowndb Research Obsidian dataview approach to a markdown db Nov 9, 2023
@rufuspollock rufuspollock mentioned this issue Nov 10, 2023
12 tasks
@rufuspollock
Copy link
Member Author

Our intuition (though we should check) is you can't reuse outside obsidian. However, we can take inspiration.

@rufuspollock
Copy link
Member Author

As we suspected this seems dependent on obsidian so i don't think you can reuse very directly.

I installed the plugin and tried using it:

import { getAPI } from "obsidian-dataview";

const api = getAPI();

and got following error.

node:internal/modules/cjs/loader:1060
  const err = new Error(message);
              ^

Error: Cannot find module 'obsidian'
Require stack:
- .../node_modules/obsidian-dataview/lib/index.js
    at Module._resolveFilename (node:internal/modules/cjs/loader:1060:15)
    at Module._load (node:internal/modules/cjs/loader:905:27)
    at Module.require (node:internal/modules/cjs/loader:1127:19)
    at require (node:internal/modules/helpers:112:18)
    at Object.<anonymous> (.../dataview-experiment/node_modules/obsidian-dataview/lib/index.js:5:1)
    at Module._compile (node:internal/modules/cjs/loader:1246:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1300:10)
    at Module.load (node:internal/modules/cjs/loader:1103:32)
    at Module._load (node:internal/modules/cjs/loader:942:12)
    at ModuleWrap.<anonymous> (node:internal/modules/esm/translators:168:29) {
  code: 'MODULE_NOT_FOUND',
  requireStack: [
    '.../dataview-experiment/node_modules/obsidian-dataview/lib/index.js'
  ]
}

Node.js v19.5.0

You have the following at the top of the index.js

require('obsidian');

@rufuspollock
Copy link
Member Author

FIXED.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants