Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to reduce router development and maintenance costs by using standardized manifest files #14685

Closed
1 task done
DIYgod opened this issue Mar 6, 2024 · 20 comments · Fixed by #14718
Closed
1 task done
Labels
core enhancement RSS enhancement New feature or request to existing RSS

Comments

@DIYgod
Copy link
Owner

DIYgod commented Mar 6, 2024

What feature is it?

To add a new route, developers currently have to create three files in the namespace: maintainer.ts, radar.ts, and router.ts, as well as manually adding documents. This process is cumbersome, non-standardized, error-prone, and challenging for review. To address this issue, a new development and maintenance approach is suggested: simply include a manifest.ts file with all necessary information in a fixed format within the namespace. The RSSHub core will then read this manifest file to load routes. Moreover, automated scripts will generate the required markdown files for documentation purposes.

What problem does this feature solve?

As mentioned above

Additional description

Update2: Implement here #14718

Update1: #14685 (comment)

(Outdated) I propose a preliminary manifest format here:

type Route = {
    name: string;
    url?: string;
    maintainers: string[];
    handler: string;
    example: string;
    parameters?: Record<string, string>;
    description?: string;
    categories: string[];

    features: {
        requireConfig?: string[];
        requirePuppeteer?: boolean;
        antiCrawler?: boolean;
        supportRadar?: boolean;
        supportBT?: boolean;
        supportPodcast?: boolean;
        supportScihub?: boolean;
    };
    radar: {
        source: string[];
        target: string;
    };
};

type Manifest = {
    name: string;
    url?: string;
    categories: string[];
    description?: string;

    routes: Record<string, Route>;
};

For example:

export default {
    name: 'GitHub',
    url: 'https://github.com',
    categories: ['programming'],
    description: `
:::tip
GitHub provides some official RSS feeds:

-   Repo releases: 'https://github.com/:owner/:repo/releases.atom'
-   Repo commits: 'https://github.com/:owner/:repo/commits.atom'
-   User activities: 'https://github.com/:user.atom'
-   Private feed: 'https://github.com/:user.private.atom?token=:secret' (You can find **Subscribe to your news feed** in [dashboard](https://github.com) page after login)
-   Wiki history: 'https://github.com/:owner/:repo/wiki.atom'
:::
    `,

    routes: {
        '/trending/:since/:language/:spoken_language?': {
            name: 'Trending',
            url: 'https://github.com/trending',
            maintainers: ['DIYgod', 'jameschensmith'],
            handler: './trending',
            example: 'https://rsshub.app/github/trending/daily/javascript',
            parameters: {
                since: 'time frame, available in [Trending page](https://github.com/trending/javascript?since=monthly) \'s URL, possible values are: `daily`, `weekly` or `monthly`',
                language: 'the feed language, available in [Trending page](https://github.com/trending/javascript?since=monthly) \'s URL, don\'t filter option is `any`',
                spoken_language: 'natural language, available in [Trending page](https://github.com/trending/javascript?since=monthly) \'s URL',
            },
            description: '',
            features: {
                requireConfig: ['GITHUB_ACCESS_TOKEN'],
                requirePuppeteer: false,
                antiCrawler: false,
                supportRadar: true,
                supportBT: false,
                supportPodcast: false,
                supportScihub: false,
            },

            radar: {
                source: 'github.com/trending',
                target: '/trending/daily',
            },

            // Override the parent's configuration
            categories: ['programming'],
        }
    },
};

This is not a duplicated feature request or new RSS proposal

@DIYgod DIYgod added RSS enhancement New feature or request to existing RSS core enhancement labels Mar 6, 2024
@DIYgod DIYgod changed the title Lower routes development and maintenance expenses by using standardized configuration files Proposal to reduce router development and maintenance costs by using standardized configuration files Mar 6, 2024
@SettingDust
Copy link
Contributor

SettingDust commented Mar 7, 2024

How about manifest.ts or metadata.ts?
Support this since split into the three files is unnecessary.

@DIYgod DIYgod changed the title Proposal to reduce router development and maintenance costs by using standardized configuration files Proposal to reduce router development and maintenance costs by using standardized manifest files Mar 7, 2024
@DIYgod
Copy link
Owner Author

DIYgod commented Mar 7, 2024

How about manifest.ts or metadata.ts? Support this since split into the three files is unnecessary.

manifest is a good idea, I have updated my content

@NeverBehave
Copy link
Collaborator

NeverBehave commented Mar 7, 2024

The reason behind file splits is because we have cold start issue before: every time when there is a need to fetch and compile artifacts based on routes info, or development purposes, we will have to loads all codes even though only small portion of them is useful. It is time consuming to read these files(IO bound) and parse them (as JS file, CPU bound) so I break them as smaller parts based on their usage.

From what I observe so far, at least radar.js and maintainer.js are just objects and should be just fine given parsing should be fast nowadays. At that time I was thinking if performance was still an issue, since maintainer.js is just simple an js object and could be even treated as json file.

Happy to see this change takes place 🥳

@DIYgod
Copy link
Owner Author

DIYgod commented Mar 7, 2024

These three configuration files are significant improvements over previous practices, providing great convenience for current updates. Thanks to @NeverBehave for this innovation.

The most important area that urgently needs improvement now is documents, which have caused the greatest inconvenience and lack of standardization, and are also the biggest obstacle to this proposal implementation.

@SettingDust
Copy link
Contributor

SettingDust commented Mar 7, 2024

satisfies from TS 5(?) is useful for better code completion

export default {} satisfies RouterManifest

I think it should be in doc and not related to this proposal. Proposed here as an enhancement.

@NeverBehave
Copy link
Collaborator

NeverBehave commented Mar 7, 2024

@DIYgod Thanks for the kind word, and the docs part is annoying indeed.

I have some preliminary thoughts of the current proposal:

  1. For routes, I think it would be better to use name or path as key as most of the time we probably needs to lookup this array e.g. find maintainers, and if we do want to iterate over them, and also it prevents developer accidentally creates the same path if the list grows longer, easier to identify changes.
  2. Multi-language support: it looks like we are embedding markdown into manifest.ts directly, how do we handle en/cn docs generation?

Instead of coupling these markdowns in manifest.ts, I think we could still separate this into different file, say docs.ts

{
  description: {
    "en-us": `some description about this route`
    // if a specific route translate is missing, show corresponded key instead. 
    // However I think there are better way/lib to handle this, just build the right key, not sure if we can enforce this as type check
  },
  "routes.<path>.parameters.since": {
    "en-us": `...`,
    "zh-cn": `...`
  },
}

When building the docs, we can group all of these keys with their prefix, say someroute/docs.ts and we will append folder name as namespace someroute.description and feed to generator, and we just reference them with full path so there is no need to compile each of them separately and then concat them. We can simply pre-populate docs given a list of routes' name and replace all at once.

Still I think multi line markdown in ts is a bit tricky to handle but it should be fine...?


The following might be a bit nitpicks :P so just leave here for some ideas.

  1. Since we are also includes maintainers, maybe it is a good time to switch to handles. An example would be https://github.com/NixOS/nixpkgs/blob/master/maintainers/maintainer-list.nix, where each maintainer register themselves into a central source of truth and use handles to represent themselves, so if they changed their username or switch account we could handle these case better. We could also grow other attributes like teams. However for now we could just use github username as default handle name would be username most of the time.
  2. For options like requirePuppeteer, antiCrawler, etc. I think it would be better to put them under options. It might be helpful for type checks as we can create subtype(?)
  3. Instead of having a top level categories: ['programming'],(to me it is acting as defaultCategories?), maybe we should calculate this attributes based on routes? If we provide override options. Not entirely sure how this would work -- if this is used to place routes to the correct category, it has to be under exactly one(1) categories?

@SettingDust
Copy link
Contributor

Instead of coupling these markdowns in manifest.ts, I think we could still separate this into different file, say docs.ts

I think if the doc is split into a file. tsx/mdx is better than the js object. Just render them with the manifest/metadata

@DIYgod
Copy link
Owner Author

DIYgod commented Mar 7, 2024

@NeverBehave

  1. Good idea
  2. I am worried that multi-language will increase complexity. I want to keep it as simple as possible. My ideal state is for multi-language websites to use English, and single-language websites to use English or the corresponding language. In addition, using translation service instead of manual maintenance is also a better method. We can only retain one language, and other languages can be automatically translated.
  3. I want to strongly bind the document with the route. If we separate a docs.ts, I am worried that there will be inconsistencies with the route during development. And I want the markdown file to be completely generated by scripts rather than need manually pre-populate. Indeed, I also found that it is not convenient to write markdown in ts files, but I don't have a better idea.
  4. Same as above, I want to keep it as simple as possible. I think the GitHub handle is sufficient for current use.
  5. Good idea
  6. Yes, if the route does not define a category, use the parent's category. The calculation is still based on the route. It can be multiple because some routes simultaneously meet multiple categories, such as "Bilibili Anime" can be both Multimedia and ACG. When rendering, it will be placed in two documents.

@NeverBehave
Copy link
Collaborator

I understand the fact that we want to keep it simple as we goes, while from my pov, if we strip out this flexibility that allow maintainer/community to add custom notes specific to their language it would a great setback.

With that being said, there are two things to add here:

  • a language attr 'lang' so we only compile route to correspond language. I am not entirely sure how's the reading experience if we go with mixed language docs, direct mixing probably not what we want
  • We could also use 'manifest.tsx, a separate file to keep all markdown/multiline text attributes, and merge them to a single object when processing it?

@DIYgod
Copy link
Owner Author

DIYgod commented Mar 7, 2024

I think it's a good idea to use separate files for different languages, because the name and url may vary depending on the language. In this way, we can have multiple manifest files manifest.[lang].ts, similar to subtitle files.

@Songkeys
Copy link
Contributor

Songkeys commented Mar 7, 2024

If we just export some customized "middleare" functions to define manifest metadata and write handler logic at the same time so that we don't need many manifest files, and also ensure the typings easily.

Example:

// index.ts
import { defineRootRoute } from "common"
import trending from "./trending"

export default defineRootRoute({
  name: "GitHub",
  url: "https://github.com/",
  description: "...",
  // ... other basic metadata,
  routes: [
    trending
  ]
})
// trending.ts
import { defineSubRoute } from "common"

export default defineSubRoute({
  name: 'Trending',
  description: "...",
  url: "/trending/:since/:language/:spoken_language?"
  // ... other basic metadata,
  route: "./trending",
  parameters: {
    since: {
      type: "string",
      description: "time frame, available in [Trending page](https://github.com/trending/javascript?since=monthly) \'s URL, possible values are: `daily`, `weekly` or `monthly`",
    },
    // ... we could even use "zod" to define the parameters for more strict typings
  },
  handler: async (ctx) => {
     // ...
  }
})

@DIYgod
Copy link
Owner Author

DIYgod commented Mar 7, 2024

@Songkeys

Although there are not many manifest files, there are many index files, the difference is not very obvious 🤣

It is a good idea to use the factory method to ensure type correctness.

Defining manifest in each route module seems good, but it is not a good idea in the special scenario of RSSHub, because this requires RSSHub to load all route files. In the current situation where there are too many routes, it will cause significant performance issues (RSSHub still has to use require for lazy loading routes now 🤣)

@Songkeys
Copy link
Contributor

Songkeys commented Mar 7, 2024

because this requires RSSHub to load all route files

Yes, in this case, loading all modules could be time-consuming. The current dynamic loading method is more ideal. We could actually keep this dynamic loading with the factory methods, e.g., do not relative importing routes or use manual dynamic importing:

// ...
  routes: [() => import("./trending")]
// ...

But it doesn't make anything more beneficial than it is now.


I was also thinking that this docs+logic merging could avoid writing duplicated information. Since we still have to load all the files when generating documents anyway, what do you think of the way to merge the manifest and handler function?

@DIYgod
Copy link
Owner Author

DIYgod commented Mar 7, 2024

@Songkeys But the problem is, if the route module is not loaded, the route information cannot be obtained, and the path of the route cannot be obtained to initialize Hono (Hono needs to pass in all paths at the beginning and cannot be passed dynamically). The problem lies not in index.ts but in trending.ts.

I think it's a good idea to define a factory function, and the handler can also be included in this factory function.

I think it's okay to load all files when generating documents, even if loading all routes is not a problem, because it is one-time and does not need to run on users' machines that may have poor performance.

@SettingDust
Copy link
Contributor

I think satisfies is enough for typings.

The imported routes shall be asynchronous iterable/generator. The performance should be adequate.

@Songkeys
Copy link
Contributor

Songkeys commented Mar 7, 2024

if the route module is not loaded, the route information cannot be obtained, and the path of the route cannot be obtained to initialize Hono

I think it's okay to load all files when generating documents

So, I think it's also okay to generate the route information when pushing every commit? This is like the idea of caching the result instead of dynamically browsing and loading the directories every time when starting up.

@DIYgod
Copy link
Owner Author

DIYgod commented Mar 7, 2024

@SettingDust Performance issues refer to the need to load a lot of module during initialization and running, rather than during the request process.

@Songkeys This is feasible, but I doubt if it's worth it. Imagine a developer who wants to add a route. He needs to first add the module, then execute this generation script before debugging and running so that Hono can load his newly added route correctly.

@Songkeys
Copy link
Contributor

Songkeys commented Mar 7, 2024

Imagine a developer who wants to add a route. He needs to first add the module, then execute this generation script before debugging and running so that Hono can load his newly added route correctly.

If this generating step only takes a few seconds, I think it's okay. But I'm not sure. The whole idea is to avoid writing verbose route information (i.e. some repeated metadata in splitted files). It's more like a tradeoff - writing less code means generating more code in dev-stage or spending more time figuring out the result in prod-stage.

@DIYgod
Copy link
Owner Author

DIYgod commented Mar 7, 2024

If this generating step only takes a few seconds, I think it's okay. But I'm not sure. The whole idea is to avoid writing verbose route information (i.e. some repeated metadata in splitted files). It's more like a tradeoff - writing less code means generating more code in dev-stage or spending more time figuring out the result in prod-stage.

I came up with a good solution. We can import all modules in the development environment, and use cached results in the production environment. This is great as it can avoid confusion when there are too many routes under the namespace.

@DIYgod
Copy link
Owner Author

DIYgod commented Mar 8, 2024

@NeverBehave @SettingDust @Songkeys @TonyRL Hey guys, based on the suggestions above, I have implemented a minimal version here. Please help check if it looks reasonable. I will migrate all routes later today.

The cache routing path for the production environment and document generation are not yet completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core enhancement RSS enhancement New feature or request to existing RSS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants