From d81a59499c4392be72f1ffb0ec7bad54788f3c08 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=A9r=C3=A9mie=20Tarot=20=28=40silopolis=29?= Date: Fri, 24 Nov 2023 02:47:01 +0100 Subject: [PATCH 1/2] reqs: project requirements initial commit --- docs/project/requirements/.pages | 6 + docs/project/requirements/index.md | 121 +----- docs/project/requirements/key_components.md | 192 ++++++++++ .../project/requirements/tech_requirements.md | 359 ++++++++++++++++++ docs/project/requirements/user_stories.md | 149 ++++++++ 5 files changed, 712 insertions(+), 115 deletions(-) create mode 100644 docs/project/requirements/.pages create mode 100644 docs/project/requirements/key_components.md create mode 100644 docs/project/requirements/tech_requirements.md create mode 100644 docs/project/requirements/user_stories.md diff --git a/docs/project/requirements/.pages b/docs/project/requirements/.pages new file mode 100644 index 0000000..01e0129 --- /dev/null +++ b/docs/project/requirements/.pages @@ -0,0 +1,6 @@ +title: Project requirements +nav: + - index.md + - user_stories.md + - tech_requirements.md + - key_components.md diff --git a/docs/project/requirements/index.md b/docs/project/requirements/index.md index c266377..d830c29 100644 --- a/docs/project/requirements/index.md +++ b/docs/project/requirements/index.md @@ -1,121 +1,12 @@ ---- -tags: - - requirements - - specifications ---- +# Project requirements -# Requirements and specifications +A DevOps project implementing a containerized microservices application architecture in the Cloud requires a detailed study and careful elicitation of the functional and non-functional requirements. -## Project context +We start by gathering needs by collecting user stories, from which we extract technical requirements. Finally, before specifying and design the target architecture, we focus our attention on the specific requirements for the key components of the system. -L'administrateur système DevOps automatise le déploiement des infrastructures sur un cloud privé, public ou hybride. +[Architectural and Technical User Stories](user_stories.md) -Secteurs d’activités : +[Architectural and Technical Requirements](tech_requirements.md) - une entreprise de services du numérique (ESN) - une ESN spécialisée qui offre des prestations d'hébergement Cloud (Cloud Provider) - un éditeur de logiciel - la DSI d'une entreprise qui possède un service dédié aux développements informatiques - -Type d'emplois accessibles : - - ingénieur DEVOPS - SysOPS DEVOPS - ingénieur système DEVOPS - ingénieur Cloud - développeur Cloud - -### Contexte de l’examen du titre professionnel (RNCP) - -L’infrastructure des entreprises se déploie de plus en plus dans le cloud. -32% des entreprises en 2018 contre 97 % en 2021 ont déployé une partie de leur infrastructure en cloud public. -Selon l’étude annuelle de Flexera « state of the cloud report » en 2021, 80% des entreprises gère leur infrastructure en cloud hybride, intégrant cloud public et cloud privé. -Cette évolution oblige les équipes systèmes, qui doivent automatiser les tâches de déploiement, à acquérir de nouvelles compétences. -Parallèlement, les équipes de développeurs ont mis en place la méthode Agile, qui leur permet de produire très régulièrement des nouvelles versions de leurs applications. -La mise en production des applications nécessitait auparavant le respect d’une procédure longue suivie par les équipes systèmes, appelées Ops (comme opérations en anglais). -Il était nécessaire que le temps de mise en production soit raccourci comme l’a été celui du -développement des nouvelles versions. -La démarche DevOps qui vise à faciliter la collaboration entre les « Dev » (développeurs) et les « Ops » poursuit cet objectif. -80% des entreprises ont adopté la démarche DevOps en 2021, pour 41% en 2017. -(source : statistiques IDC EMEA Cloud and infrastructures Services, février 2021) -Les administrateurs système DevOps interviennent sur le déploiement des infrastructures en cloud et sur la mise en production des applications et ils sont très recherchés sur le marché de l’emploi. -Cet emploi nécessite une bonne connaissance des systèmes d’exploitation, de la virtualisation et du développement de scripts d’automatisation ainsi que la maitrise de nombreux outils spécifiques. -L’emploi fait appel à un ensemble particulier de compétences et de connaissances qui relèvent à la fois du développement de scripts, de l‘utilisation de différentes plateformes, de l’administration de systèmes, de méthodologie de tests et de collaboration avec d’autres équipes de techniciens. -Le titre professionnel « Administrateur système DevOps » (niveau 6) réunit les compétences nécessaires pour l’emploi. -Il est structuré en trois blocs de compétences : -* Automatiser le déploiement d’une infrastructure dans le cloud -* Déployer une application en continu -* Superviser les services déployés -Il permet de répondre aux besoins des entreprises et favorise l’évolution professionnelle des administrateurs système en poste souhaitant développer de nouvelles compétences. - - -### Définition de l’emploi type et des conditions d’exercice (RNCP) - -L’administrateur système DevOps automatise le déploiement des infrastructures sur un cloud privé, public ou hybride. -Lorsqu’il travaille chez un éditeur de logiciel ou dans une DSI en relation avec les équipes de développeurs, l’administrateur système DevOps déploie en continu les applications. -Il supervise les services déployés et traite les alertes remontées. -Afin d’automatiser le déploiement des infrastructures sur le cloud, l’administrateur système DevOps automatise la création de serveurs à l’aide de scripts, il les configure et les connecte entre eux, puis il utilise une plateforme de type Ansible pour configurer le déploiement et contrôler celui-ci. -Lorsqu’il est chargé de déployer une application en continu, en relations avec les équipes de développeurs, il prépare des environnements de tests et de pré-production. -Il prépare les différents serveurs de données et le stockage associé, ainsi que les containers destinés à recevoir l’application. -Ensuite il migre les données et déploie l’application dans l’environnement de pré-production. -Il échange en permanence avec l’équipe des développeurs pour corriger les dysfonctionnements -découverts lors des différentes phases de tests. -A l’aide d’une plateforme de type Kubernetes, il déploie l’application et ses mises à jour successives sur l’environnement de production. -L’administrateur système DevOps supervise les infrastructures et applications qu’il a déployées, pour ce faire, il définit les indicateurs à surveiller et installe et configure une solution de supervision. -Lorsqu’il constate une anomalie ou qu’une alerte est remontée, il corrige ou fait corriger le problème. -Afin de résoudre un problème de configuration, comprendre la cause d’un dysfonctionnement ou installer un nouvel outil, il échange sur les forums des communautés professionnelles, éventuellement en anglais. -La plupart des documentations techniques sont rédigées en anglais, l’administrateur système DevOps doit être capable de les lire pour y retrouver l’information qu’il recherche et interpréter correctement les conseils qui y sont donnés. Cela correspond au niveau B2 du cadre européen pour la compréhension écrite. -Il devra parfois poser des questions ou apporter une réponse sur des forums d’utilisateurs en langue anglaise, le niveau B1 du cadre européen pour l'expression écrite est suffisant. -Il utilise une démarche logique pour diagnostiquer la cause d’un dysfonctionnement et y remédier, il effectue une veille active pour maintenir à jour ses compétences. -Cet emploi nécessite la maitrise de nombreux outils et langages ainsi que la compréhension de concepts abstraits. -L’administrateur système DevOps est en contact avec les équipes de développeurs, son responsable technique, les équipes réseau et sécurité, les fournisseurs des solutions d’hébergement, les communautés professionnelles des outils qu’il utilise. -L’administrateur système DevOps travaille dans une entreprise de services du numérique (ESN), chez un opérateur Cloud, chez un éditeur de logiciel ou dans la DSI d’une grande entreprise. -Il travaille en équipe sous la responsabilité du responsable technique ou du directeur des systèmes d’information de son entreprise. -Dans certains cas, cette activité se réalise entièrement à distance - - -## Project perimeter - -Activités visées : - -Lorsqu'il travaille chez un éditeur de logiciel ou dans une DSI en relation avec les équipes de développeurs, l'administrateur système DevOps déploie en continu les applications. - -Il supervise les services déployés et traite les alertes remontées. Afin d'automatiser le déploiement des infrastructures sur le cloud, l'administrateur système DevOps automatise la création de serveurs à l'aide de scripts, il les configure et les connecte entre eux, puis il utilise une plateforme de type Ansible pour configurer le déploiement et contrôler celui-ci. - -Lorsqu'il est chargé de déployer une application en continu, en relations avec les équipes de développeurs, il prépare des environnements de tests et de pré-production. - -Il prépare les différents serveurs de données et le stockage associé, ainsi que les containers destinés à recevoir l'application. Ensuite il migre les données et déploie l'application dans l'environnement de pré-production. - -Il échange en permanence avec l'équipe des développeurs pour corriger les dysfonctionnements découverts lors des différentes phases de tests. - -A l'aide d'une plateforme de type Kubernetes, il déploie l'application et ses mises à jour successives sur l'environnement de production. - -L'administrateur système DevOps supervise les infrastructures et applications qu'il a déployées, pour ce faire, il définit les indicateurs à surveiller et installe et configure une solution de supervision. - -Lorsqu'il constate une anomalie ou qu'une alerte est remontée, il corrige ou fait corriger le problème. - -Afin de résoudre un problème de configuration, comprendre la cause d'un dysfonctionnement ou installer un nouvel outil, il échange sur les forums des communautés professionnelles, éventuellement en anglais. - -La plupart des documentations techniques sont rédigées en anglais, l'administrateur système DevOps doit être capable de les lire pour y retrouver l'information qu'il recherche et interpréter correctement les conseils qui y sont donnés. Cela correspond au niveau B2 du cadre européen pour la compréhension écrite. - -Il devra parfois poser des questions ou apporter une réponse sur des forums d'utilisateurs en langue anglaise, le niveau B1 du cadre européen pour l'expression écrite est suffisant. - -Il utilise une démarche logique pour diagnostiquer la cause d'un dysfonctionnement et y remédier, il effectue une veille active pour maintenir à jour ses compétences. - -Cet emploi nécessite la maitrise de nombreux outils et langages ainsi que la compréhension de concepts abstraits. - -L'administrateur système DevOps est en contact avec les équipes de développeurs, son responsable technique, les équipes réseau et sécurité, les fournisseurs des solutions d'hébergement, les communautés professionnelles des outils qu'il utilise. - -L'administrateur système DevOps travaille dans une entreprise de services du numérique (ESN), chez un opérateur Cloud, chez un éditeur de logiciel ou dans la DSI d'une grande entreprise. - -Il travaille en équipe sous la responsabilité du responsable technique ou du directeur des systèmes d'information de son entreprise. - -Dans certains cas, cette activité se réalise entièrement à distance. - - -## Project objectives - -* Acquire the technical and methodological knowledge and know-how necessary to perform the function of administrator/engineer/architect DevOps/Cloud. -* Validate the french certification ["Administrateur Système DevOps” RNCP36061](https://www.francecompetences.fr/recherche/rncp/36061/). -* Validate the [DevUniversity](https://www.devuniversity.com/) ["DevOps engineer"](https://www.devuniversity.com/formation-devops) training in its Bootcamp format. +[Specific Requirements for Key Components](key_componenents.md) diff --git a/docs/project/requirements/key_components.md b/docs/project/requirements/key_components.md new file mode 100644 index 0000000..ecc6f0d --- /dev/null +++ b/docs/project/requirements/key_components.md @@ -0,0 +1,192 @@ +# Specific Requirements for Key Components + + +Once functional and non-functional requirement extracted from user stories, and before specifying and designing the target architecture, it's important to detail the individual functionalities, configurations, and integrations needed, focusing on the specific requirements for each key component in your stack. + +Given your stack includes Traefik, FastAPI, PostgreSQL, Kubernetes, EKS, EBS, Helm, Terraform, GitHub, GitHub Actions, Prometheus, Grafana, and ELK, here's an outline that covers the specific requirements for each: + + +## Traefik (Load Balancer/Reverse Proxy) + +- **Traffic Routing Rules**: Define rules for routing traffic to various microservices. + - Specify **dynamic routing** configurations to handle incoming requests and direct them to appropriate FastAPI services. + - Implement **load balancing** strategies to distribute traffic efficiently across microservices. + - Set up **middleware** for additional functionalities like rate limiting or access control. +- **SSL/TLS Configuration**: Requirements for HTTPS support and SSL/TLS certificate management. + - **Automate SSL/TLS certificate issuance and renewal**, possibly integrating with *Let's Encrypt* for free certificates. + - **Ensure secure TLS configuration** to prevent common vulnerabilities and support modern encryption standards. +- **Integration with Kubernetes**: How Traefik integrates with the Kubernetes cluster for dynamic routing. + - Configure Traefik to work seamlessly with Kubernetes, using **Ingress resources for route definition**. + - **Set up Traefik as an Ingress controller** to automatically discover and manage routing rules based on Kubernetes services and deployments. +- **Integration points with FastAPI and PostgreSQL** + + +## FastAPI (Web Framework) + +- **API Endpoints**: Specification of RESTful API endpoints and their functionalities. +- **Authentication and Authorization**: Implement security mechanisms for API access. +- **Data Validation and Serialization**: Requirements for input validation and response data formats. + + +## PostgreSQL (Database) + +- **Database Schema**: Define the database schema, including tables, relationships, and indexing strategies. +- **Data Storage and Retrieval**: Requirements for efficient data storage and retrieval mechanisms. +- **Backup and Recovery**: Strategies for database backup, retention policies, and disaster recovery. + + +## Kubernetes (Container Orchestration) + +- **Cluster Configuration**: Define the setup and configuration of the Kubernetes cluster. + - Determine the **size and number of nodes**, considering the application load and scaling requirements. + - Define **network policies** for inter-pod communication and external access controls. + - **Node Configuration**: Decide on the type and size of nodes (e.g., CPU, memory specifications) based on the expected workload. + - **Master Node Setup**: Configure master nodes for cluster management, including high availability setups if required. + - **Network Policies**: Implement network policies for pod-to-pod communication, ensuring secure and controlled network access within the cluster. + +- **Resource Allocation and Scaling**: Requirements for resource management and auto-scaling. + - Establish **resource requests and limits** for pods to ensure efficient utilization of cluster resources. + - Implement **horizontal pod autoscalers** to automatically scale workloads based on CPU or memory usage. + - **Resource Quotas**: Define quotas for namespaces to manage the consumption of resources like CPU and memory within the cluster. + - **Auto-Scaling Policies**: Set up Horizontal Pod Autoscaling (HPA) and Cluster Autoscaler for dynamic scaling of applications and nodes based on workload. + - **Resource Limits and Requests**: Specify resource requests and limits for each pod to optimize resource allocation and prevent resource starvation. + +- **Pod and Service Definitions**: Specific configurations for pods and services. + - Specify configurations for pods, including container images, environment variables, and volume mounts. + - Define services for **internal load balancing** and **service discovery** within the Kubernetes cluster. + - **Pod Configuration**: Define pod specifications, including container images, environment variables, volume mounts, and health checks. + - **Service Discovery**: Set up Kubernetes services for internal load balancing and service discovery. Define service types like ClusterIP, NodePort, or LoadBalancer as needed. + - **Ingress Controllers**: Configure Ingress controllers for managing external access to services, including routing rules and SSL termination. + +- Security and Compliance + + - Role-Based Access Control (RBAC): Implement RBAC for secure and granular access control to Kubernetes resources. + Security Contexts: Define security contexts for pods and containers to restrict privileges and control access to resources. + - Compliance Checks: Ensure the cluster configuration complies with relevant security standards and best practices. + +- Backup and Disaster Recovery + + - Etcd Backup: Plan for regular backups of the Kubernetes etcd database, which stores the state of the cluster. + - Disaster Recovery Plan: Develop a disaster recovery strategy for the Kubernetes cluster, including scenarios like master node failure and data loss. + + +## Amazon EKS (Managed Kubernetes Service) + +- **Cluster Management**: Specific requirements for managing the EKS cluster. + - Determine the version of Kubernetes to be used and the update policy for the EKS cluster. + - Integrate EKS with AWS Identity and Access Management (IAM) for secure access control. + - Version Management: Determine the strategy for EKS cluster version upgrades to balance new features and stability. + - IAM Integration: Configure AWS IAM roles and policies for EKS, granting necessary permissions for cluster management and AWS resource access. + - Logging and Monitoring: Integrate with Amazon CloudWatch for logging and monitoring of the EKS cluster. + +- **Integration with AWS Services**: How EKS integrates with other AWS services like IAM, CloudWatch, etc. + - Plan for integration with AWS services like CloudWatch for logging and monitoring. + - Configure VPC and subnet settings for network isolation and security. + - Elastic Load Balancing (ELB): Leverage ELB for distributing external traffic to Kubernetes services. + - Amazon RDS Integration: If needed, plan for integration with Amazon RDS for managed database services. + - VPC Configuration: Configure the EKS cluster within a VPC for network isolation. Set up subnets, NAT gateways, and route tables as required. + +- Scalability and Performance + + - EKS Auto Scaling: Utilize EKS-specific auto-scaling features for managing the scaling of worker nodes. + - Performance Optimization: Optimize cluster performance based on AWS best practices, considering aspects like pod density and network throughput. + +- Security and Compliance + + - EKS Security Groups: Define and manage security groups for controlling access to EKS worker nodes. + - AWS Compliance Features: Leverage AWS features for compliance monitoring, like AWS Config and AWS Trusted Advisor. + +- Cost Management + + - Cost Optimization: Implement strategies for cost-effective resource utilization in EKS, such as using spot instances for worker nodes. + - Budget Tracking: Set up AWS Budgets to monitor and manage the costs associated with the EKS cluster. + + + + +## Amazon EBS (Elastic Block Store) + +- **Volume Management**: Define requirements for persistent storage volumes for Kubernetes. + - Define requirements for persistent volume claims in Kubernetes, specifying size and throughput needs. + - Plan for dynamic provisioning of EBS volumes using Kubernetes storage classes. +- **Performance Specifications**: Specific performance needs, such as IOPS or throughput. + - Specify performance requirements like throughput and IOPS based on the application's storage needs. + - Plan for high-availability and disaster recovery using EBS snapshots and replication. + + +## Helm (Kubernetes Package Manager) + +- **Chart Development**: Requirements for developing Helm charts for application deployment. + - Develop Helm charts for each microservice, defining templates for deployments, services, and other Kubernetes resources. + - Ensure parameterization in Helm charts for flexibility and environment-specific configurations. +- **Chart Repository Management**: How and where Helm charts will be stored and managed. + - Decide on the repository for storing and managing Helm charts (e.g., internal repository, Helm Hub). + - Implement versioning and release management strategies for Helm charts. + + +## Ansible (Configuration Management) + +- **Automation Scripts**: Specific scripts for automating server and application configuration. + - Write Ansible playbooks for automating the setup and configuration of servers and other infrastructure components. + - Include tasks for installing and configuring necessary software and dependencies. +- **Role-Based Tasks**: Define roles and tasks within Ansible for modular configuration management. + - Organize playbooks into roles for modularity, focusing on different aspects like database setup, web server configuration, etc. + - Ensure idempotency in Ansible roles to allow safe reruns of playbooks. + + +## Terraform (Infrastructure as Code) + +- **Infrastructure Provisioning**: Define the infrastructure resources to be managed as code. + - Write Terraform scripts to automate the provisioning of cloud resources like EC2 instances, VPCs, and security groups. + - Plan for the integration of Terraform with cloud providers, focusing on AWS resources for EKS and EBS. +- **State Management**: Strategies for managing and storing Terraform state. + - Configure Terraform state storage and locking, possibly using an S3 bucket and DynamoDB for state locking. + - Implement state management best practices to handle Terraform state in a team environment. + + +## GitHub (Version Control System) + +- **Repository Structure**: Define the structure and branching strategy for code repositories. + - Define a clear structure for the GitHub repository, including directory layouts and branching models (e.g., Git Flow, GitHub Flow). + - Establish guidelines for commit messages, pull requests, and code reviews. +- **Access Controls**: Requirements for repository access and security. + - Set up repository access controls, defining roles and permissions for team members. + - Implement branch protection rules to ensure code quality and review processes. + + +## GitHub Actions (CI/CD) + +- **Automated Workflows**: Define workflows for continuous integration and continuous deployment. + - Design workflows for continuous integration, including steps for code checkout, build, test, and analysis. + - Set up continuous deployment pipelines, automating the deployment of applications to Kubernetes. +- **Integration Testing**: Requirements for automated testing within pipelines. + - Integrate automated testing into the CI/CD pipeline, including unit tests, integration tests, and possibly end-to-end tests. + - Ensure test reports and feedback are easily accessible for developers. + + +## Prometheus (Monitoring) + +- **Metric Collection**: Define the metrics to be collected for monitoring system and application performance. + - Define the key performance metrics to be collected by Prometheus, relevant to the application and infrastructure health. + - Configure Prometheus exporters in the Kubernetes cluster to collect metrics from nodes, pods, and services. +- **Alert Rules**: Set up rules for triggering alerts based on specific metric thresholds. + - Set up alerting rules in Prometheus for critical conditions that require immediate attention or intervention. + - Integrate alerts with notification systems like email, Slack, or PagerDuty to ensure timely response to critical issues. + + +## ELK Stack (Logging and Monitoring) + +- **Log Collection**: Specify the logs to be collected and the format. + - Define the types of logs to be collected (e.g., application logs, system logs, access logs) and their formats. + - Set up Filebeat or similar log shippers on servers to forward logs to the ELK stack. +- **Elasticsearch Configuration**: Define requirements for data indexing and searching. + - Design the Elasticsearch cluster for efficient data indexing and querying, considering factors like shard size and index lifecycle management. + - Establish policies for log retention, archiving, and purging to balance storage needs and accessibility. +- **Kibana Dashboards**: Requirements for dashboard creation for monitoring insights. + - Create Kibana dashboards for real-time monitoring and visualization of logs and metrics. + - Customize dashboards to provide insights specific to different stakeholder needs, like operational dashboards for system administrators and business analytics for management. + + + + + \ No newline at end of file diff --git a/docs/project/requirements/tech_requirements.md b/docs/project/requirements/tech_requirements.md new file mode 100644 index 0000000..651f961 --- /dev/null +++ b/docs/project/requirements/tech_requirements.md @@ -0,0 +1,359 @@ +## Architectural and Technical Requirements + + + + +### Functional Requirements + + +#### Containerized Microservices Architecture (Kubernetes) + +- **Requirement**: Implement a scalable and resilient containerized microservices architecture. +- **Details**: + - Utilize Kubernetes for container orchestration and management. + - Design for high availability and fault tolerance. + - Enable service discovery and dynamic routing. + + +#### Automated CI/CD Pipeline (GitHub Actions) + +- **Requirement**: Automate testing, building, and deployment processes. +- **Details**: + - Integrate GitHub Actions for continuous integration and deployment. + - Set up automated testing including unit, integration, and end-to-end tests. + - Automate deployment to different environments (staging, production). + + +#### High Performance Asynchronous API Service (FastAPI) + +- **Requirement**: Develop APIs capable of handling high concurrency. +- **Details**: + - Use FastAPI for its asynchronous features. + - Ensure APIs are scalable and can handle simultaneous requests efficiently. + - Implement proper error handling and validation in API endpoints. + + +#### Database High Availability (PostgreSQL) + +- **Requirement**: Ensure data integrity and availability with database replication and failover. +- **Details**: + - Configure PostgreSQL for replication. + - Set up failover mechanisms to switch to a standby database in case of primary database failure. + - Regularly test failover to ensure data integrity and availability. + + +#### Dynamic Resource Allocation (Kubernetes) + +- **Requirement**: Implement pod autoscaling based on usage metrics. +- **Details**: + - Use Kubernetes Horizontal Pod Autoscaler to adjust the number of pods in a deployment. + - Monitor CPU and memory usage to trigger scaling. + - Ensure autoscaling does not impact ongoing transactions or operations. + + +#### Comprehensive Application Testing + +- **Requirement**: Integrate end-to-end testing tools in the CI/CD pipeline. +- **Details**: + - Select and integrate testing tools like Selenium or Cypress. + - Develop a suite of end-to-end tests that mimic real user interactions. + - Automate test execution as part of the CI/CD pipeline. + + +#### API Documentation Automation + +- **Requirement**: Ensure API documentation is auto-generated and stays current with code changes. +- **Details**: + - Implement tools like Swagger for API documentation. + - Set up documentation to update automatically on code changes. + - Ensure documentation is clear, accurate, and easily accessible. + + +#### Data Persistence (EBS with Kubernetes) + +- **Requirement**: Implement persistent storage solutions for Kubernetes. +- **Details**: + - Integrate Amazon EBS for persistent data storage. + - Configure persistent volume claims in Kubernetes for stateful applications. + - Manage data backup and recovery processes. + + +#### Kubernetes Resource Management (Helm) + +- **Requirement**: Simplify and manage Kubernetes deployments using Helm. +- **Details**: + - Develop Helm charts for each microservice. + - Parameterize Helm charts for flexibility across different environments. + - Manage Helm chart versions and releases. + + +#### Zero-Downtime Deployment + +- **Requirement**: Implement deployment strategies to avoid downtime. +- **Details**: + - Use Kubernetes rolling updates for gradual deployment. + - Explore blue-green deployment strategies to reduce risk. + - Ensure seamless transition between old and new versions without affecting users. + + +These detailed functional requirements provide a clear roadmap for developing and managing the microservices architecture, ensuring robustness, scalability, and efficiency. They cover the core aspects of system functionality, from infrastructure setup to application deployment and operation. + + +### Non-Functional Requirements + +Detailing and structuring the architectural and technical non-functional requirements will focus on how the system should operate and the qualities it should possess. These requirements are crucial for ensuring the system's reliability, security, and efficiency. + + +#### SSL/TLS Integration (Traefik) + +- **Requirement**: Ensure secure communication via SSL/TLS. +- **Details**: + - Automate SSL/TLS certificate management and renewal. + - Implement strong cipher suites and TLS protocols for security. + - Integrate SSL/TLS termination with Traefik for encrypted traffic handling. + + +#### Code Quality Assurance (CI Pipeline) + +- **Requirement**: Maintain high standards of code quality. +- **Details**: + - Integrate static code analysis tools into the CI pipeline. + - Enforce coding standards and perform automated code reviews. + - Set up quality gates to prevent merging substandard code. + + +#### Kubernetes Cluster Monitoring (Prometheus) + +- **Requirement**: Implement comprehensive monitoring of Kubernetes clusters. +- **Details**: + - Use Prometheus for capturing detailed metrics of the Kubernetes cluster. + - Set up Grafana dashboards for visualizing performance data. + - Configure alerts for abnormal metrics or system behaviors. + + +#### Effective Log Management (ELK Stack) + +- **Requirement**: Efficient handling and analysis of logs. +- **Details**: + - Deploy Elasticsearch, Logstash, and Kibana (ELK) for log management. + - Implement log rotation and archiving strategies. + - Ensure real-time log analysis and accessibility. + + +#### Consistent Infrastructure Provisioning (Terraform) + +- **Requirement**: Achieve consistent and repeatable cloud infrastructure setup. +- **Details**: + - Use Terraform for scripting the provisioning of AWS resources. + - Implement Infrastructure as Code (IaC) practices for consistency. + - Maintain Terraform state files for tracking and managing infrastructure changes. + + +#### Enhanced Kubernetes Security (Network Policies) + +- **Requirement**: Implement robust network security within Kubernetes. +- **Details**: + - Define and enforce network policies to control pod-to-pod communication. + - Secure ingress and egress traffic at the pod level. + - Regularly audit and update network policies to address new security needs. + + +#### Routine Performance Benchmarking + +- **Requirement**: Establish performance benchmarks and regular evaluations. +- **Details**: + - Develop benchmark tests to measure system performance against defined metrics. + - Schedule regular performance testing to identify degradation or improvements. + - Use benchmark results to guide performance optimization efforts. + + +#### GDPR Compliance in Logging/Monitoring + +- **Requirement**: Adhere to GDPR and other data protection regulations. +- **Details**: + - Implement data anonymization and encryption in logging. + - Ensure data retention policies comply with GDPR. + - Regularly review and update compliance measures as regulations evolve. + + +#### Cloud Resource Cost Monitoring (AWS Tools) + +- **Requirement**: Optimize and monitor cloud resource usage and costs. +- **Details**: + - Utilize AWS cost management tools for tracking resource usage. + - Implement cost-saving strategies, like using spot instances. + - Set up alerts for budget overruns or unexpected cost spikes. + + +#### DNS Routing and Service Discovery (Kubernetes) + +- **Requirement**: Efficiently manage service accessibility and DNS routing. +- **Details**: + - Configure internal DNS for service discovery within Kubernetes. + - Set up external DNS routing for access to services from outside the cluster. + - Ensure DNS routing configurations are scalable and resilient. + + +These categorized requirements give a clearer view of what the system is expected to do (functional) and how it should perform or be constrained (non-functional), covering aspects like security, scalability, performance, compliance, and cost-efficiency. + +These non-functional requirements address critical aspects such as security, compliance, cost management, and performance optimization. They are essential for the system's overall robustness, security, and operational efficiency, providing a comprehensive framework for the non-functional aspects of the system's architecture and technical setup. + + +#### Secure Admin Access (Bastion and VPN) + +- **Requirement**: Implement secure and controlled administrative access mechanisms. +- **Details**: + - **Bastion Hosts and VPN**: Utilize bastion hosts and VPNs to create secure pathways for admin access. + - **Audit Trails**: Maintain comprehensive logs for all access and actions taken during administrative sessions for security audits. + - **Security Reviews and Assessments**: Regularly assess and update the security measures to ensure the highest level of protection against new threats. + + +#### Technical Content Management (Docs as Code, DocOps) + +- **Requirement**: Manage technical documentation efficiently using Docs as Code and DocOps methodologies. +- **Details**: + - **Version Control Integration**: Use version control systems to manage changes and history of documentation. + - **Automated Publishing**: Set up automated processes for building and deploying documentation. + - **Collaboration Tools**: Implement tools that support collaborative writing and reviewing. + + +#### GitOps + +- **Requirement**: Apply GitOps principles for managing and automating the cloud-native application lifecycle. +- **Details**: + - **Repository Best Practices**: Enforce best practices for repository management, including branching and merging strategies, and access controls. + - **Change Management**: Automate the application of changes to infrastructure and applications through pull requests ensuring review and enhancing consistency and traceability. + - **Backup and Recovery of Repositories**: Implement strategies for backing up Git repositories and quick recovery in case of data loss. + + +#### Event Monitoring and Alerting + +- **Requirement**: Set up a comprehensive event monitoring and alerting system. +- **Details**: + - **Real-Time Event Tracking**: Monitor system events in real-time for immediate response. + - **Configurable Alerts**: Offer customizable alert tresholds together with configurable alerting mechanisms for different severity levels and types of events. + - **Integration with Notification Systems**: Seamlessly integrate with communication tools for prompt alert dissemination. + + +#### API Endpoints Routing (Traefik) + +- **Requirement**: Efficiently manage and route API endpoints using Traefik. +- **Details**: + - **Dynamic Load Balancing**: Distribute API traffic to optimize load, performance, and resource utilization. + - **High Availability**: Ensure high availability of API routing mechanisms to prevent downtime. + - **Dynamic Configuration**: Allow for dynamic changes in routing rules without disrupting service. + + +#### Terraform State Management + +- **Requirement**: Ensure secure and efficient management of Terraform states. +- **Details**: + - **State Locking**: Implement state locking to prevent conflicts during concurrent operations. + - **State Security**: Secure Terraform state files, especially in remote storage scenarios, to protect sensitive data. + - **Backup and Recovery**: Automate backups of the state files and setup clear recovery mechanism in case of corruption or loss. + + +#### Artifact/Container Registry + +- **Requirement**: Maintain a secure and efficient artifact and container registry. +- **Details**: + - **Registry Security**: Implement strong security measures for artifact and container registry, including access controls and vulnerability scanning. + - **High Availability and Scalability**: Ensure the registry can handle high loads and is resilient to failures. + - **CI/CD Integration**: Seamlessly integrate with CI/CD pipelines for automated pushing and pulling of artifacts/containers. + + +#### IP Address Management + +- **Requirement**: Automate and optimize the management of IP addresses. +- **Details**: + - **Automated IP Allocation**: Automate the allocation and management of IP addresses to reduce conflicts and ensure efficient use of address space. + - **Tracking and Reporting**: Provide capabilities for tracking IP address usage and generating reports for audit and management purposes. + - **Network Service Integration**: Ensure seamless integration with existing network infrastructure like DNS, DHCP, and other network services. + + +#### Identity and Access Management (IAM) + +- **Requirement**: Implement comprehensive identity and access management solutions. +- **Details**: + - **Role-Based Access Control (RBAC)**: Implement RBAC to provide granular access controls based on roles and responsibilities. + - **Single Sign-On (SSO)**: Simplify user access across multiple platforms and services. + - **Audit and Compliance**: Maintain detailed access logs of access and changes for security audits and compliance with regulations. + + +#### Requirements Documentation and Traceability + +- **Requirement**: Maintain comprehensive and traceable documentation of all system requirements. +- **Details**: + - **Documentation Practices**: Use tools and practices that support the clear documentation of requirements, including user stories, use cases, and diagrams. + - **Traceability Matrix**: Develop a traceability matrix to link requirements to their implementation and testing artifacts. + - **Version Control**: Use version control systems to track changes in requirements over time. + + +#### Service Discovery and Service Mesh + +- **Requirement**: Implement efficient service discovery mechanisms and service mesh architecture. +- **Details**: + - **Service Registry**: Utilize a service registry for dynamic service discovery in a microservices architecture. + - **Service Mesh Implementation**: Deploy a service mesh solution like Istio or Linkerd for managing service-to-service communication. + - **Resilience and Observability**: Ensure the service mesh provides enhanced resilience and observability features. + + +#### Nodes and Containers Hardening + +- **Requirement**: Harden nodes and containers to enhance security. +- **Details**: + - **Minimal Base Images**: Use minimal and trusted base images for containers to reduce the attack surface. + - **Security Benchmarks**: Apply industry-standard security benchmarks and guidelines for node and container hardening. + - **Regular Security Scans**: Conduct regular security scans and vulnerability assessments. + + +#### Network Segmentation/Separation + +- **Requirement**: Implement network segmentation to improve security and manageability. +- **Details**: + - **Segmentation Policies**: Define clear policies for network segmentation, isolating different components of the system. + - **Access Control**: Use firewalls and access control lists to enforce network segmentation rules. + - **Monitoring and Logging**: Monitor network segments for unusual activities and maintain logs for security and troubleshooting. + + +#### Backup + +- **Requirement**: Establish a robust backup strategy for all critical components and data. +- **Details**: + - **Backup Policies**: Define policies for frequency, scope, and methods of backups. + - **Automated Backup Solutions**: Implement automated solutions for backing up data and configurations. + - **Backup Testing**: Regularly test backups to ensure data integrity and recoverability. + + +#### Disaster Recovery + +- **Requirement**: Develop and implement a comprehensive disaster recovery plan. +- **Details**: + - **Recovery Objectives**: Define clear Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO). + - **Disaster Recovery Drills**: Conduct regular drills to test and refine the disaster recovery process. + - **Offsite Storage**: Use offsite storage solutions for critical backups to safeguard against local disasters. + + +#### PKI and Certificate Authority + +- **Requirement**: Establish and manage a Public Key Infrastructure (PKI) and Certificate Authority for digital certificate management. +- **Details**: + - **Certificate Lifecycle Management**: Implement processes for issuing, renewing, and revoking certificates. + - **Security and Compliance**: Ensure the PKI adheres to industry security standards and compliance requirements. + - **Integration with Services**: Integrate certificate management seamlessly with services requiring encryption and authentication. + + + \ No newline at end of file diff --git a/docs/project/requirements/user_stories.md b/docs/project/requirements/user_stories.md new file mode 100644 index 0000000..3337fd0 --- /dev/null +++ b/docs/project/requirements/user_stories.md @@ -0,0 +1,149 @@ +# Architectural and Technical User Stories + + + +## Functional user stories + +Theses stories try to capture the functional requirements pertinent to the architecture and operation of a robust, scalable, and secure microservices application in a cloud environment. + +They should provide a solid foundation for the development, deployment, and maintenance of the system, ensuring that all critical aspects are addressed. + + +### Application user + +- As an application user, I expect the system to remain available no mater what problem the system may face. + +- As an application user, I expect the system to automatically scale during peak usage times to maintain performance and availability. + + +### Architects & DevOps + +- As a system architect, I want to design a scalable microservices architecture so that the application can handle varying loads efficiently. + +- As a DevOps engineer, I need to automate server provisioning using Terraform scripts to ensure a consistent and repeatable cloud environment setup. + +- As a cloud engineer, I need to integrate Kubernetes and Amazon EKS, as well as other AWS services for enhanced functionality and compliance with AWS best practices. + +- As a DevOps team lead, I want to establish a continuous feedback loop with the development team to iteratively improve the system based on real-time usage data. + + +### Ops + +- As a system administrator, I want to automate server creation using scripts and IaC tools and techniques so that the infrastructure setup is efficient (fast, unattended), reliable (tested) and repeatable. + +- As a network administrator, I need to configure Traefik as a reverse proxy to efficiently route incoming requests to the appropriate FastAPI services. + +- As a Kubernetes operator, I want to configure Kubernetes clusters with appropriate resource allocation and scaling policies to manage application deployment effectively. + +- As an infrastructure manager, I need to ensure high availability and disaster recovery capabilities for the Kubernetes cluster and its workloads. + +- As a performance analyst, I want to track and optimize the resource utilization of the cloud infrastructure to maintain cost-effectiveness. + +- As a system administrator, I want to implement Prometheus and Grafana for monitoring system performance and setting up alerts for any anomalies. + +- As a system administrator, I need to configure the ELK stack for efficient log collection, analysis, and visualization. + + +### Devs & DBAs + +- As a developer, I need a continuous integration process that automatically tests the latest code changes with static analysis and unit testing tools, ensuring code quality. + +- As a developer, I need a continuous deployment process that automatically deploys the latest code changes to a test environment, ensuring rapid feedback and iteration. + +- As a developer, I want to utilize Helm charts for easy deployment and management of Kubernetes resources. + +- As a backend developer, I want to build RESTful APIs using FastAPI that are well-documented and easy to consume for front-end developers. + +- As a database administrator, I need to design a normalized PostgreSQL database schema that supports efficient data retrieval and storage. + + +### Testing & QA + +- As a CI/CD engineer, I need to set up GitHub Actions workflows for automating the build, test, and deployment processes of the application. + +- As a QA engineer, I want to integrate automated testing in the CI/CD pipeline to ensure code quality and reliability. + + +### Technical Content Creators & UX/DX + +- As a documentation writer, I need to create comprehensive user guides and API documentation for the system to facilitate easy usage and maintenance. + + +### Compliance & Security Specialists + +- As a security specialist, I want to implement robust security measures in the infrastructure setup to protect against vulnerabilities and attacks. + +- As a security analyst, I need to enforce role-based access control in Kubernetes to provide secure and restricted access to the cluster resources. + +- As a compliance officer, I need to ensure that the entire application stack, including all tools and processes, adheres to relevant industry standards and regulations. + + +## Non functional user stories + +These user stories focus on the specifics of the technology stack, how each component serves specific technical needs and contributes to the overall functionality and efficiency of the system. + + +### Application user + +- As an application user, I expect quick and seamless deployment of updates without downtime using rolling updates or blue-green deployments in Kubernetes. + + +### Architects and DevOps + +- As a system architect, I want to design a containerized application architecture using Kubernetes to ensure scalability and resilience. + +- As a DevOps engineer, I need to implement a CI/CD pipeline using GitHub Actions that automates testing, building, and deployment of microservices. + +- As a cloud engineer, I need to use Terraform to script the provisioning of AWS resources like EKS clusters, ensuring infrastructure consistency. + +- As a DevOps team lead, I want to establish a performance benchmarking routine for the application to ensure it meets the desired performance criteria. + +- As an infrastructure manager, I need to integrate EBS for persistent storage in Kubernetes, ensuring data persistence across pod restarts. + + +### Ops + +- As a network administrator, I need to set up internal and external DNS routing in Kubernetes to efficiently manage service discovery and accessibility. + +- As a Kubernetes operator, I want to set up pod autoscaling based on CPU and memory usage metrics to handle load dynamically. + +- As a system administrator, I want to set up Prometheus for detailed monitoring of Kubernetes cluster metrics and application performance indicators. + +- As a logging manager, I need to configure log rotation and archiving strategies in the ELK stack to manage log data effectively. + +- As a performance analyst, I want to monitor the cost of cloud resources using AWS cost management tools and optimize where necessary. + + +### Devs & DBAs + +- As a backend developer, I want to utilize FastAPI to create asynchronous APIs that can handle high volumes of requests concurrently. + +- As a database administrator, I need to configure PostgreSQL replication and failover mechanisms to ensure data integrity and availability. + +- As a developer, I want to utilize Helm for managing Kubernetes resource complexities and simplifying the deployment process. + + +### Testing & QA + +- As a CI/CD engineer, I need to integrate static code analysis tools in the CI pipeline to enforce code quality standards. + +- As a QA engineer, I want to implement end-to-end testing using tools like Selenium or Cypress in the CI/CD pipeline for thorough application testing. + + +### Compliance & Security Specialists + +- As a security specialist, I want to integrate SSL/TLS in Traefik for secure communication and data protection. + +- As a security analyst, I need to set up network policies in Kubernetes to restrict traffic flow between pods, enhancing network security. + +- As a compliance officer, I need to ensure logging and monitoring setups comply with GDPR and other relevant data protection regulations. + + +### Technical Content Creators & UX/DX + +- As a documentation writer, I need to use Swagger or similar tools to auto-generate API documentation that stays up-to-date with code changes. From c663b004cf77c491d5edd2bbb6d6bc989c200b1b Mon Sep 17 00:00:00 2001 From: JoAngel8 Date: Fri, 1 Dec 2023 12:56:32 +0000 Subject: [PATCH 2/2] proj: presentation application --- docs/project/overview/pres_app.md | 41 +++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) create mode 100644 docs/project/overview/pres_app.md diff --git a/docs/project/overview/pres_app.md b/docs/project/overview/pres_app.md new file mode 100644 index 0000000..beb44d5 --- /dev/null +++ b/docs/project/overview/pres_app.md @@ -0,0 +1,41 @@ +# Presentation de l'application + +Voici une succincte description de l’application, point de départ du projet DevOps. + +**Description des fichiers et contenu** + +L'application est actuellement une implémentation de FastAPI, conçue pour une exécution dans un environnement Dockerisé. +Elle utilise Traefik comme reverse proxy et gestionnaire de certificats SSL, ce qui renforce la sécurité des échanges de données. Les données utilisateur sont gérés à l'aide d'une base de données PostgreSQL assurant fiabilité et performance. Cette approche centralisée facilite la gestion et le déploiement sur une seule machine ou serveur. + +## Présentation des composants de l'application + +On peut décrire les différents composants comme suit: +* FastAPI est un framework web moderne et réputé, conçu pour la création d'APIs avec Python, qui offre des performances élevées et une écriture de code facile et intuitive. La validation des données et la sérialisation sont automatiquement gérées, réduisant ainsi le travail manuel et les risques d'erreur. FastAPI supporte la programmation asynchrone, permettant la gestion efficace de requêtes simultanées, ce qui est particulièrement utile pour les opérations d'entrée/sortie ou pour les services qui doivent gérer de nombreuses connexions simultanément. +* Docker, un outil qui permet de 'containeriser' l'application pour un déploiement et une gestion simplifiés. Le code est structuré et pensée pour un déploiement via Docker. +* Traefik joue un double rôle. D'une part, il agit comme un reverse proxy. Il gère le routage redirigeant les requêtes vers les bons services de l'application. D'autre part, il sert de gestionnaire de certificats SSL, une fonctionnalité essentielle pour sécuriser les communications, les échanges de données sur Internet. Cette intégration montre un souci de sécurisation et d'optimisation du trafic réseau. +* PostgrèsSQL est chargé de gérer, stocker les données utilisateur. Il est reconnu pour sa robustesse, sa fiabilité, sa stabilité, sa performance et sa conformité aux standards SQL. + +L'ensemble du code montre une cohérence dans le choix des technologies qui s’intègrent parfaitement ensemble. + +## Description de l'Architecture Monolithique + +Dans son format d’origine, l’application est structurée en tant que système monolithique. ce qui signifie que toutes ses composantes fonctionnelles - base de données, traitement des données, interface utilisateur - sont intégrées dans une unique Répertoire de code source. Cette architecture centralisée facilite la gestion et le déploiement. + +**Avantages** + +Cette architecture offre des avantages initiaux tels que la simplicité de développement et de déploiement. En effet, en concentrant toutes les fonctions en un seul point, la coordination entre différents composants est intrinsèquement simplifiée, réduisant ainsi la complexité de communication entre divers modules. + +**Limites de l'Architecture Monolithique** + +Toutefois, cette architecture présente des limites, notamment en termes de scalabilité et de flexibilité. Avec l'évolution des besoins et des fonctionnalités, le système monolithique peut devenir lourd et difficile à maintenir. Chaque mise à jour ou modification nécessite le redéploiement de l'intégralité de l'application, ce qui augmente les risques d'erreurs et de temps d'arrêt. En outre, les performances peuvent être affectées par la taille croissante de l'application. Le système monolithique devient moins réactif et plus difficile à optimiser. La sécurité peut également devenir une préoccupation, car une faille dans un composant pourrait potentiellement compromettre l'ensemble du système. + + +## L'évolution vers le cloud et Micro-services +**Transition, réflexion sur l'Évolution vers le Cloud et Micro-services** + +Face à ces défis, Nous allons repenser l'architecture de l'application pour mieux répondre aux exigences modernes de scalabilité, de performance et de sécurité. C'est dans ce contexte que l'idée de la faire évoluer vers une architecture orientée micro-services, déployée dans un environnement cloud, que notre projet prendra forme. + + +**Objectif** + +Le projet s'orientera donc vers une migration de l'application vers le cloud, en utilisant les méthodes DevOps apprises en cours. Bien que la structure monolithique actuelle soit maintenue, l'objectif est d'optimiser son déploiement et sa gestion dans le cloud, en exploitant les avantages des technologies et pratiques DevOps, sans nécessairement procéder à une restructuration complète ou au développement de nouvelles fonctionnalités, compte tenu des contraintes de temps. \ No newline at end of file