-
Notifications
You must be signed in to change notification settings - Fork 0
Deployment Plan and Infrastructure
graph TD
%% --- STYLING ---
classDef plain fill:#fff,stroke:#333,stroke-width:1px;
classDef box fill:#e1f5fe,stroke:#01579b,stroke-width:1px;
%% --- NODES (HARDWARE) ---
%% Node 1: User's PC
subgraph UserPC ["<<device>> User's PC"]
subgraph Browser ["<<execution environment>> Web Browser"]
WebApp["<<artifact>> <br> Web App Bundle <br> (HTML, CSS, JS)"]
end
end
%% Node 2: iOS Device
subgraph iOSDevice ["<<device>> iOS Device"]
iOSApp["<<artifact>> <br> Native iOS App <br> (.ipa)"]
end
%% Node 3: Android Device
subgraph AndroidDevice ["<<device>> Android Device"]
AndroidApp["<<artifact>> <br> Native Android App <br> (.apk)"]
end
%% --- DISTRIBUTION & ENTRY POINT ---
AppStore["<<cloud>> Apple App Store"]
PlayStore["<<cloud>> Google Play Store"]
Domain["domain.com <br> (DNS / HTTPS)"]
%% Node 4: Linux Server
subgraph LinuxServer ["<<device>> Linux Server"]
%% Component: Firewall
Firewall["<<component>> Firewall <br> (Ports 80/443 Open)"]
%% Runtime Environment: Kubernetes
%% Added <br> to title to prevent overlap with the node below
subgraph K8s ["<<execution environment>> <br> Kubernetes Cluster"]
%% Artifact 1: Reverse Proxy
RevProxy["<<artifact>> <br> Reverse Proxy Pod <br>(Traefik / Nginx)"]
%% Artifact 2: Backend Service (with 3 replicas)
subgraph BackendService ["Backend Service (ReplicaSet)"]
direction TB
Hono1["<<artifact>> <br> Hono Container 1"]
Hono2["<<artifact>> <br> Hono Container 2"]
Hono3["<<artifact>> <br> Hono Container 3"]
end
%% Artifact 3: Database
DB[("<<artifact>> <br> PostgreSQL Pod <br> (Persistent Volume)")]
end
end
%% --- COMMUNICATION PATHS & RELATIONSHIPS ---
%% 1. Distribution Path (Deployment)
AppStore -. "Deploy" .-> iOSApp
PlayStore -. "Deploy" .-> AndroidApp
%% 2. Client Traffic to Domain
%% Extended dashes (----) create more vertical space for labels
Browser -- "HTTPS:443 <br> (Get Static Assets)" ----> Domain
WebApp -- "HTTPS:443 <br> (JSON API)" ----> Domain
iOSApp -- "HTTPS:443 <br> (JSON API)" ----> Domain
AndroidApp -- "HTTPS:443 <br> (JSON API)" ----> Domain
%% 3. External Server Path (Internet to Server)
Domain -- "HTTPS:443" --> Firewall
Firewall -- "Forward Port 80" --> RevProxy
%% 4. Internal Server Path: Proxy to Backend
RevProxy -- "HTTP (Load Balanced)" --> BackendService
%% 5. Internal Server Path: Backend to DB
BackendService -- "TCP:5432 <br> (SQL)" --> DB
graph TD
%% --- STYLING ---
classDef plain fill:#fff,stroke:#333,stroke-width:1px;
classDef box fill:#e1f5fe,stroke:#01579b,stroke-width:1px;
classDef layer fill:#f9f9f9,stroke:#bbb,stroke-width:1px,stroke-dasharray: 5 5;
%% --- NODES (HARDWARE) ---
%% Node 1: User's PC
subgraph UserPC ["<<device>> User's PC"]
subgraph Browser ["<<execution environment>> Web Browser"]
WebApp["<<artifact>> <br> Web App Bundle <br> (HTML, CSS, JS)"]
end
end
%% Node 2: iOS Device
subgraph iOSDevice ["<<device>> iOS Device"]
iOSApp["<<artifact>> <br> Native iOS App <br> (.ipa)"]
end
%% Node 3: Android Device
subgraph AndroidDevice ["<<device>> Android Device"]
AndroidApp["<<artifact>> <br> Native Android App <br> (.apk)"]
end
%% --- DISTRIBUTION & ENTRY POINT ---
AppStore["<<cloud>> Apple App Store"]
PlayStore["<<cloud>> Google Play Store"]
Cloudflare["<<cloud>> Cloudflare Network <br> (aero.ngrenier.com, n8n.ngrenier.com)"]
%% Node 4: Linux Server
subgraph LinuxServer ["<<device>> Linux Server"]
%% Runtime Environment: Docker
subgraph Docker ["<<execution environment>> <br> Docker Engine"]
subgraph Ingress ["Ingress Layer"]
Cloudflared["<<artifact>> <br> Cloudflared Tunnel <br>(aero_tunnel)"]
end
subgraph Apps ["Application Layer"]
App["<<artifact>> <br> App Container <br> (aero_app)"]
N8n["<<artifact>> <br> n8n Container <br> (aero_n8n_prod)"]
end
subgraph Utilities ["Utility Layer"]
Cron["<<artifact>> <br> Cron Container <br> (aero_cron)"]
Watchtower["<<artifact>> <br> Watchtower Container <br> (aero_watchtower)"]
end
subgraph Data ["Data Layer"]
DB[("<<artifact>> <br> PostgreSQL Container <br> (postgres_data)")]
end
%% Apply styling to sub-layers to make them distinct but subtle
class Ingress,Apps,Utilities,Data layer;
end
end
%% --- COMMUNICATION PATHS & RELATIONSHIPS ---
%% 1. Distribution Path (Deployment)
AppStore -. "Deploy" .-> iOSApp
PlayStore -. "Deploy" .-> AndroidApp
%% 2. Client Traffic to Cloudflare
Browser -- "HTTPS:443" ----> Cloudflare
iOSApp -- "HTTPS:443" ----> Cloudflare
AndroidApp -- "HTTPS:443" ----> Cloudflare
%% 3. Secure Tunnel Path
Cloudflare -- "Secure Outbound Tunnel" <--> Cloudflared
%% 4. Internal Server Path: Tunnel to Services
Cloudflared -- "HTTP Route <br> (aero)" --> App
Cloudflared -- "HTTP Route <br> (n8n)" --> N8n
%% 5. Internal Server Path: Services to DB
App -- "TCP:5432 (SQL)" --> DB
N8n -- "TCP:5432 (SQL)" --> DB
Cron -- "TCP:5432 (SQL)" --> DB
%% 6. Watchtower Monitoring (Implicit Docker Socket Connection)
Watchtower -. "Updates" .-> App
Watchtower -. "Updates" .-> Cron
As your system operates, specific tables (logs and historical records) will continue to grow indefinitely. Therefore, to limit growth and avoid performance-related issues, we use the following methods for managing database growth.
To provide clarity regarding how long records should be retained, we define data retention policies for all types of data. The data retention policy for each of the different categories of data are defined below:
- Flight/logs (also known as system logs): Retained for 12 months.
- Audits (based on legal/compliance guidelines): Retained per statutory/organizational requirements (minimum 12 months unless otherwise communicated).
- Deriving or temporary records: Deleted when they are no longer required. The archival status of old records does not serve any purpose in relation to the daily functions of the business. They are considered "archival."
Automated background processing via cron or worker jobs run at predefined intervals. Usually, these are run during off-peak hours to limit the workload on the database.
An example (using PostgreSQL): When the background job deletes flight logs, it will use the following SQL statement: DELETE FROM flights_logs WHERE created_at < NOW() - INTERVAL '1 YEAR';
All cleanup jobs run via cron/worker tasks must run at defined intervals during off-peak hours to limit the workload on the database. PostgreSQL Considerations:
- Once all of the data has been deleted,
VACUUM(orAUTOVACUUM) will reclaim the storage space. - Cleanup jobs will be designed to run multi-threaded if the database table has large amounts of data stored in it. This helps prevent long-running locks.
The methods mentioned above help to ensure that the database will grow in a predictable manner and have a better ability to scale with growth.
If you want to perform analytics and/or generate reports from data that is considered 'historically valuable,' you will have the option of the following practices with regards to your data that is considered older than your retention period; before deleting it you can archive it or export it to cold storage such as object storage or a data warehouse. Archived data is stored outside the primary transactional database and is read-only. This practice is intended to keep your production database optimized for active workloads.
All indexing decisions are made, optimized, and refined for PostgreSQL. Indexes are essential for the performance of queries, but they also represent a compromise between the amount of storage required for Indexed Columns and the speed at which they can be written. Therefore, we aim to strike a balance between read efficiency and write performance with our indexing strategy.
- Only index those columns which are used frequently, for example:
- Columns used in
WHEREclauses to locate data - Those columns that are used in
JOINconditions to combine tables - Columns used to order records using an
ORDER BYclause
- Columns used in
- Do not over-index tables that receive heavy write activity, such as log files.
- Consider that you typically benefit more from having few well-chosen Indexes than to have a lot of unused Indexes.
- All tables must have and therefore will automatically be indexed by a Primary Key.
- All Foreign Keys used in a
JOINshould be indexed to help maximize the performance of those related Queries.
For example:
- flight_id is a Foreign Key in
flight_logs, which is a JOINed table. - user_id is a Foreign Key in any User related table.
As tables grow over time (for example, logs, events, etc.), the timestamps will typically be indexed.
- Adding an Index on the created_at timestamp will allow:
- Faster retrieval of recent records.
- Greater efficiency when performing cleanup and archival tasks. For example (PostgreSQL):
- Create a B-tree index on (created_at) for
flight_logs. - PostgreSQL will use this index when using range querys and delete operations on this index efficiently.
Composite indexes are defined as such when there are cases in the specific applications that often need to filter for multiple columns in many queries. For example, when filtering on logs of a flight that was created during a particular period, the columns of (flight_id, created_at) can be defined as a composite index.
The key point is that Composite Indexes cannot be created based solely on assumptions of which indexes would be helpful or efficient, but rather must be based on actual, common, or repeated query patterns.
Some common practices for monitoring query performance are:
- Use PostgreSQL's
EXPLAINandEXPLAIN ANALYZEcommand line tools to evaluate query performance periodically. - Queries that have low performance or are run frequently may need to be optimized or may warrant adding additional or modifying existing indexes.
- Remove unused or low-value indexes to reduce overhead on updates and Insert and Delete operations.
- Use
pg_stat_user_tablesandpg_stat_user_indexesstatistics in PostgreSQL as aids to determine how to tune your indexes.
- All schema changes have a version control system applied and that all environments implement them in the same manner.
- Backups of the database are created on a schedule, in order to prevent loss of data.
- All destructive type operations, such as bulk deletes and cleanup jobs, are recorded and tracked.
This is our initial, ongoing release phase focused on our primary stakeholder.
- At each "signoff" point (e.g., after completing a major feature like "User Login" or "Dashboard"), we deploy the latest build to a private server.
- Our goal is to get formal "Go / No-Go" approval from our stakeholder on a feature-by-feature basis.
- Process:
- Complete a development milestone.
- Deploy to our private environment.
- Conduct a live demo with the stakeholder.
- Stakeholder provides feedback and signs off.
- This process ensures we are always aligned with the project's business goals.
Once R1 was fully signed off, we deploy R2 to a trusted group for real-world testing.
-
This release serves to identify critical bugs, gather initial usability feedback, and test the application on a variety of real devices (e.g., different phone models, browsers) before exposing it to real clients.
-
We will release to 12 users:
- 1 Stakeholder: To observe the test and confirm the flow.
- 5 Novice Users (Family): To test for intuitive design and ease of use.
- 6 Power Users (Friends): To test for advanced features and potential edge cases.
-
By giving them specific tasks (e.g. "Start a claim", "Create an account"). This allowed us to observe them, record issues in real-time, and prioritize all feedback in a post-deployment meeting.
-
We used Sentry to automatically monitor and uncover backend/frontend errors that users might not have seen or reported. All critical and high-priority feedback was fixed before planning R3.
-
All critical and high-priority feedback was fixed before planning R3.
To make sure the project stays on track and meets high quality standards, we have decided to push the User Feedback Phase for Release 3. By moving to the Maze platform, we are replacing small group observations with a more advanced, data driven approach that allows for more precise testing and usability metrics.
After all critical and high-priority feedback from R2 is fixed and signed off, we will proceed with the final release. This phase scales our deployment to real, unbiased users to validate our core logic and test for edge cases.
For the final release, we will use Maze platform to create a user feedback system that collects user data for our evaluation process. This allows us to conduct more extensive evaluations than simple "bug hunting" testing.
Testing Methodology: Users are assigned specific, mission-critical tasks (e.g "Submit a manual claim"). The system measures user performance through these factors:
- Success Rate: Percentage of users who completed the task without dropping off.
- Completion Time: How long it takes a user to complete an assigned task.
- Ease of Use Score: After each mission, users provide a rating on a scale of 1-5.
- Knowledge Retention: Using multiple-choice follow-up questions to ensure the UI effectively educates the user.
Continuous Improvement Loop The Maze results show which UI elements need final touch ups before the Full Release. Any flow with a high drop-off rate or low "Ease of Use" score is flagged for immediate revision.
- We will release to 15-20 stakeholder-trusted clients.
- To validate the end-to-end claim submission process with real-world flight data. This phase is critical for testing our business logic and compensation calculations.
- We will use the in-app "Shake" tool for users to easily send bug and crash reports. We will also maintain a direct communication channel with this group for detailed feedback.
- We will recruit 50-100 testers through social media for an open beta. There are plenty of Facebook groups and Reddit airline subreddits that would appreciate the invitation.
- To test the application at a wider scale and, most importantly, to uncover edge cases (e.g., multi-leg flights, different airline data formats, user data entry errors) that our smaller test groups may have missed.
- In addition to "Shake," we will send a Google Form survey to all beta testers to gather broader opinions and ideas for future improvements.
The Updated Plan:
-
Consolidated Testing: Based on the professor's guidance, we replaced the two rounds approach with one comprehensive and large scale user testing phase to ensure higher data quality.
-
Advanced Tooling (Maze): We moved away from basic "Shake" reports and adopted Maze which is a more advanced user testing platform. This allowed us to gather deep behavioral insights (path analysis, success rates and task completion times) that go beyond simple bug reporting.
-
Dual-Feedback Loop: * Poster Presentation (April 2nd): We created a simplified Google Form during our poster presentation for instant feedback from attendees.
-
** Extended Network: ** We distributed both surveys to friends, family and peers to capture a wider range of usability perspectives.
As our application is transactional (users come for a specific purpose), a traditional percentage-based rollout is not necessary. Once the Beta phase is complete and all critical bugs are fixed, we will execute a "Big Bang" release.
- This makes the app publicly available on our domain and in the App/Play Stores. This meets the stakeholder's plan to "go forward" with a general release, and we will continue to monitor Sentry and "Shake" for post-launch issues.
The Updated Plan:
-
Platform Focus: We decided to focus our deployment exclusively on the Web Application.
-
Mobile Compatibility: Our codebase remains mobile-ready, providing the flexibility to deploy to the App/Play Stores in the future.
-
Post-Graduation Potential: Although there is no longer a formal stakeholder, the team is considering the possibility of maintaining the product post graduation. This release serves as a V1 that keeps the door open for future growth or a potential transition into a standalone project.
| Date | Version | Type | Iteration | Status |
|---|---|---|---|---|
| 02-09-2026 | v2.1.0 | Infrastructure & Security Hardening | Iteration 9 | Completed |
| 02-23-2026 | v2.3.1 | Initial deployment - Release 3 | Iteration 10 | Completed |
| 03-08-2026 | v2.3.1 | End of Iteration Deployment | Iteration 11 | Completed |
| 03-22-2026 | v2.4.0 | End of Iteration Deployment | Iteration 12 | Completed |
| 04-01-2026 | v2.4.5 | Poster Presentation Deployment | Iteration 13 | Completed |
| 04-13-2026 | v3.0.0 | Final Deployment - Release 3 | Iteration 13 | Active |