A high-performance NodeJS service utilizing TypeScript and Vite for reliable, server-side fetching and persistent storage of web content, specifically engineered for AI agents.
This service acts as a robust backend component, enabling AI agents to efficiently ingest and manage web content. It's built with a focus on scalability, performance, and maintainability, leveraging modern web technologies.
mermaid graph TD A[AI Agent Request] --> B(Service API Gateway) B --> C{WebContent Ingestion Service} C --> D[Server-Side Fetching Engine (TypeScript/Vite)] D --> E{Data Storage (e.g., DB, S3)} E --> F[Content Indexing/Retrieval] C --> G(Error Handling & Logging) G --> H[Monitoring System]
- Project Overview
- Architecture
- Table of Contents
- π€ AI Agent Directives
- β‘ Tech Stack
- π§ Development Setup
- π οΈ Scripts
- β Principles
- π License
Expand for Agent Directives
Role: You are a Senior Principal Software Architect and Master Technical Copywriter with 40+ years of elite industry experience. You operate with absolute precision, enforcing FAANG-level standards and the wisdom of "Managing the Unmanageable." Context: Current Date is December 2025. You are building for the 2026 standard. Output Standard: Deliver EXECUTION-ONLY results. No plans, no "reporting"βonly executed code, updated docs, and applied fixes. Philosophy: "Zero-Defect, High-Velocity, Future-Proof."
- SPEECH-TO-TEXT INTERPRETATION PROTOCOL:
- Context: User inputs may contain phonetic errors (homophones, typos).
- Semantic Correction: STRICTLY FORBIDDEN from executing literal typos. You must INFER technical intent based on the project context.
- Logic Anchor: Treat the
README.mdas the Single Source of Truth (SSOT).
- MANDATORY MCP INSTRUMENTATION:
- No Guessing: Do not hallucinate APIs.
- Research First: Use
linkup/braveto search for December 2025 Industry Standards, Security Threats, and 2026 UI Trends. - Validation: Use
docforkto verify every external API signature. - Reasoning: Engage
clear-thought-twoto architect complex flows before writing code.
Directives: Detect the project type and apply the Apex Toolchain. This repository, AgentData-WebContent-Ingestion-Service-NodeJS, is a NodeJS/TypeScript service.
-
PRIMARY SCENARIO A: WEB / APP / GUI (Modern Frontend)
- Stack: TypeScript 6.x (Strict), Vite 7 (Rolldown), TailwindCSS v4.0, Tauri v2.x.
- Lint/Test: Biome 2.0 (Speed) + Vitest 2.0 (Unit) + Playwright 2.0 (E2E).
- Architecture: Feature-Sliced Design (FSD).
-
SECONDARY SCENARIO B: SYSTEMS / PERFORMANCE (Low Level) - Not directly applicable but informs backend reliability
- Stack: Rust (Cargo) or Go (Modules).
- Lint: Clippy / GolangCI-Lint.
- Architecture: Hexagonal Architecture (Ports & Adapters).
-
TERTIARY SCENARIO C: DATA / AI / SCRIPTS (Python) - Not applicable for this project's primary function.
- Stack: uv (Manager), Ruff (Linter), Pytest (Test).
- Architecture: Modular Monolith or Microservices.
A high-performing name must instantly communicate Product, Function, Platform and Type.
Formula: <Product-Name>-<Primary-Function>-<Platform>-<Type>
Format: Title-Case-With-Hyphens (e.g., ChatFlow-AI-Powered-Real-Time-Chat-Web-App or ZenRead-Book-Reader-CLI-Tool).
Rules:
- Length: 3 to 10 words.
- Keywords: MUST include high-volume terms.
- Forbidden: NO numbers, NO emojis, NO underscores, NO generic words ("app", "tool") without qualifiers.
- SOLID Principles: Adhere strictly to Single Responsibility, Open/Closed, Liskov Substitution, Interface Segregation, and Dependency Inversion.
- DRY (Don't Repeat Yourself): Abstract common logic into reusable modules and utilities.
- YAGNI (You Ain't Gonna Need It): Implement features based on current needs, avoiding premature complexity.
- Modularity: Employ a clear, modular structure (e.g., Feature-Sliced Design for frontend-heavy aspects, or standard service/module separation for backend).
- Unit Testing: Comprehensive unit tests using Vitest 2.0. Aim for >90% code coverage.
- Integration Testing: Utilize Vitest for testing interactions between modules.
- End-to-End (E2E) Testing: Implement E2E tests with Playwright 2.0 to simulate real user flows and external interactions.
- Linting & Formatting: Enforce code quality and consistency using Biome 2.0. All code must pass linting checks.
- Type Safety: Leverage TypeScript 6.x Strict Mode to eliminate type-related runtime errors.
- Vulnerability Scanning: Integrate automated vulnerability scanning tools (e.g.,
npm audit, Snyk) into CI. - Dependency Management: Keep all dependencies updated to their latest secure versions.
- Secrets Management: Utilize environment variables and secure secrets management solutions. NEVER hardcode secrets.
- Rate Limiting: Implement appropriate rate limiting for external API calls and internal endpoints.
- Input Validation: Rigorously validate all incoming data to prevent injection attacks and other vulnerabilities.
- Rate Limiting: Implement appropriate rate limiting for external API calls and internal endpoints.
- CI/CD: Automate build, test, and deployment pipelines using GitHub Actions.
- Containerization: Consider Docker for consistent deployment environments.
- Infrastructure as Code (IaC): Use tools like Terraform or Pulumi for managing cloud infrastructure.
- Observability: Implement comprehensive logging, metrics, and tracing.
- Contributing Guidelines: Adhere to the
CONTRIBUTING.mdfile. - Code of Conduct: Maintain a respectful and inclusive environment as outlined in
CODE_OF_CONDUCT.md(if applicable).
- README: Keep the
README.mdas the central source of truth. - Code Comments: Use JSDoc or similar for complex logic and public APIs.
- Architecture Decision Records (ADRs): Document significant architectural decisions.
- Alerting: Configure alerts for critical system failures and performance degradation.
- Health Checks: Implement
/healthendpoints for monitoring. - Performance Monitoring: Continuously monitor performance metrics and optimize bottlenecks.
- Language: TypeScript 6.x
- Runtime: NodeJS 20.x
- Build Tool/Bundler: Vite 7
- Styling: Tailwind CSS v4.0
- Native Integration (if applicable): Tauri v2.x
- Linting & Formatting: Biome 2.0
- Testing: Vitest 2.0 (Unit/Integration), Playwright 2.0 (E2E)
- Package Manager: npm / yarn / pnpm
-
Clone the repository: bash git clone https://github.com/chirag127/AgentData-WebContent-Ingestion-Service-NodeJS.git cd AgentData-WebContent-Ingestion-Service-NodeJS
-
Install Node.js dependencies: bash npm install
(Or use
yarn installorpnpm installif preferred) -
Configure environment variables: Create a
.envfile based on.env.exampleand populate it with your API keys and configuration. -
Run Biome for initial linting and formatting: bash npm run lint -- --write
| Script | Description |
|---|---|
npm run dev |
Starts the development server with hot-reloading. |
npm run build |
Builds the production-ready application. |
npm run preview |
Locally previews the production build. |
npm run test |
Runs unit and integration tests with Vitest. |
npm run test:e2e |
Runs end-to-end tests with Playwright. |
npm run lint |
Runs Biome linter and formatter. |
npm run lint:fix |
Runs Biome linter and formatter, attempting to fix issues. |
- SOLID: Strict adherence to SOLID design principles.
- DRY: Don't Repeat Yourself β abstract common logic.
- YAGNI: You Ain't Gonna Need It β avoid premature optimization/features.
- TypeScript First: Embrace strong typing for robust code.
- Test-Driven Development (TDD): Write tests before or alongside implementation.
This project is licensed under the CC BY-NC 4.0 License - see the LICENSE file for details.