Assumptions

Acquiring bank

The statement of the exercise text:

It also performs some validation of the card information and then sends the payment details to the appropriate 3rd party organization for processing.
1. I suppose the Acquiring bank is a component of the Information System, not the real bank.
2. I suppose the Third party is a legal entity who liaise the Acquiring bank and Issuing bank and potentially a Custodian to complete the payment if eligible.
Determination of Banks
In the payment workflow, we should determine the Issuing bank and the Acquiring bank.
1. Issuing bank
  After have done some research, now I am aware of the fact that we can determine the Issuing bank via card number.
2. Acquiring bank
  
  The statement of the exercise text, in the payment request does not mention an identifier of the Merchant.
  
  A payment request should include appropriate fields such as the card number, expiry month/date, amount, currency, and cvv.
  
  The question is how the system can forward a Payment Request to the Merchant's Acquiring bank? I find no way in the beginning to determine it. The way that can determine the acquiring bank, could be: We associate the Acquiring bank to the Merchant.
Things can be done when we onboard a Merchant. The idea is
- Stored the Merchant to Acquiring bank mapping
- Include Merchant id in the PaymentRequest, thus the Gateway will be able to route the request to the bank of the merchant.
I added Merchant's id to the Payment Request.

Acquiring bank is stubbed in the solution.

Architecture

CQRS

CQRS is chosen for these reasons:
I am asked to develop two features: Payment request and Payment details retrieval.

They should be scaled differently.
Do payment is the core function of Gateway, the benefit of the company may main come from the amount/number of transactions achieved by the platform. We should not disturb the payment request handling by a flood of payment details queries for reporting purpose.

Implementation

Three components:

Write API: Handle payment requests, saving to write model: events. (events: will be explained in Event sourcing section)
Read Projector: Project write model to read model which fits read payment details requirement.
Read API: Feed payment retrieval queries.

Here we have only one read model which is asked for. But in real world, we probably have many of them, for performance enhancement. We can imagine that company's revenue comes from transaction volume, thus we can imagine a read model that give us live PnL vision (all Merchants consolidated or segregated) as transactions go on.

Write and Read API can scale to multiple instance. If we want to scale Read Projector, we should ensure that message consumption is competing and that the processing of messages should respect order of event sequence number.

! Disclaimer: In real world, above three components should be hosted to 3 separate processes, for scaling easily. Here for the sake of simplicities of the exercise, I have not implemented neither external storage (events and read model) nor external message bus. It will be hence difficult to separate them to different processes.

Still you can see the embryonic form of the 3 processes.

EventSourcing

For a Gateway which handles sensitive financial transactions second to to second, it is critical that we have a full audit trail of what has happened.

Event sourcing also helps constructing CQRS. i.e. we have always capabilities to construct diverse and varied read models, as events recorded all information chronologically.

Hexagonal

The motivation of Hexagonal is very general, can be found for example here

Design

Command handling asynchrony
For managing:
- unreliable network, unknown bank API availability and latency
- burst/back pressure: i.e. if we handle PaymentRequest synchronously, because of network and potential long bank response time, our Gateway may suffer from high I/O waiting, the system will congested, potentially have I/O threads starvation and kestrel will eventually reject new connections.
I decided to handle PaymentRequest asynchronously. i.e. When PaymentRequest arrives, Gateway create immediately a Payment resource. The request forwarding and bank response handling are done asynchronously. HTTP status 202 Accepted along with a resource identifier in location header will be returned. Merchant can follow up (polling) the payment with the given address.

In real world, we can consider long polling, Server Sent Event or Webhooks.

In real world, to be pragmatic, we can do more smart handling. i.e. We can say: if the Gateway get a response from the bank within 50 ms, it returns 201 Created with the Payment final status: Accepted or Rejected (by the bank); otherwise returns 202 Accepted. This needs a bit more thread synchronizations.
Link bank response to Gateway space For reporting requirement, Gateway API client will query Payment details using bank's payment id.

If we do request handling synchronously, we don't need any additional id. BankPaymentId coming back from acquiring bank suffice.

Handling payment request asynchronously introduces complexity of storing a link id to liaise initial PaymentRequest and AcquiringBankResponse.

Once bank response comes back to Gateway, Gateway should find the corresponding payment which is the origin of the response and store the link somewhere.

This link id, I called it GatewayPaymentId, beside the function of linking, it is also the internal unique identifier of the Payment. Thus, the stored link is the tuple { AcquiringBankPaymentId, GatewayPaymentId }
Ids: three types of ids
- Payment request id: Payment unique identifier from merchants. Is part of payment request payload. Cf. C# struct PaymentRequestId. In real world, each Merchant will send their own format of request unique identifier. We should adapt it to the one of Gateway . For simplicity of exercise, I used System.Guid.
- Gateway payment id: Unique identifier of payment in Gateway internal system, Cf. C# struct Domain.GatewayPaymentId.
- Acquiring bank payment id: Unique identifier returned from acquiring banks, Cf C# struct Domain.AcquiringBankPaymentId. In real world, each Acquiring bank will send their own unique identifer. We should adapt it to the one of Gateway . For simplicity of exercise, I used System.Guid.
Entity:
Payment represent a financial transaction achieved with the help of a bank payment card. A Payment can fail or succeed.
Anti corruption:
- Never put HTTP dto & external library into Domain and never expose domain type to HTTP.
- Always do adaptation from one world to another.
Event structure: flat
no embedded type, for easing event versioning.
Simulate I/O, avoid blocking thread pool thread waiting for I/O
Anti Corruption
Never leak external libraries (acquiring bank ones) to Domain Entity / Aggregate, do mapping instead
Storage For sake of simplicity of the exercise, I used InMemory for:
- Write models storage: Event Store
- Message bus
- Read models storage
In real world, we should for sure using external storage and message bus, for cluster configuration.

For storing events we may use EventStore (native events) or Azure blob storage (should code something for serving it as event store), or other things

For message bus: RabbitMQ/Azure service bus/...

For read models: choose suitable SQL or NoSql storage.

API

If you use Restlet Client, you can import payment-gateway-apis.json (in the root folder), to view all APIs with examples. Otherwise, please use provided swagger.

Public API

Request a payment:

POST api/Payments Endpoint to send payment request.

Request example:

How to get a onboarded Merchant id? Cf. Private API

{
   "requestId": "ccd8af8e-5a27-40dc-93c5-f19e78984391",
   "merchantId": "2d0ae468-7ac9-48f4-be3f-73628de3600e",
   "card":{
       "number": "4524 4587 5698 1200",
       "Expiry": "05/19",
       "Cvv": "321"
   },
   "amount": {
       "currency": "EUR",
       "value": 42.66
   }
}

Response example:

202 Accepted

{
  "gatewayPaymentId": "41b49021-98a2-41cf-80dc-6f87382322f8",
  "acquiringBankPaymentId": null,
  "status": "Pending",
  "requestId": "ccd8af8e-5a27-40dc-93c5-f19e78984391",
  "approved": null
}

with the header location.

404 Bad request with the invalidity details, if the request is invalid

{
  "type": "https://tools.ietf.org/html/rfc7231#section-6.5.1",
  "title": "Invalid request",
  "status": 400,
  "detail": "Invalid card CVV"
}

Get payment and payment details:

GET api/Payments/{gateWayPaymentId}
Endpoint to retrieving payment status. Write controller redirect to this controller (why? Cf. Command handling asynchrony).

Response example:

approved is the boolean indicating if payment is accepted by the bank or not.

status gives the reason/description of approved boolean. It can be success, rejected or timeout.

4 possible status

a) Success:
json { "gatewayPaymentId": "f72d3230-d08c-409f-a03b-c2872b7f762f", "acquiringBankPaymentId": "b49739f0-c193-49de-967f-fdbb1d8f7218", "status": "Success", "requestId": "bab81817-8f09-4c32-b1e0-e76b40039ec1", "approved": true } > acquiringBankPaymentId should be used for further querying payment details. See below.

b) Rejected:

{
  "gatewayPaymentId": "41b49021-98a2-41cf-80dc-6f87382322f8",
  "acquiringBankPaymentId": "0bfa5d5b-8742-459f-94c9-484d61ad6093",
  "status": "RejectedByBank",
  "requestId": "ccd8af8e-5a27-40dc-93c5-f19e78984391",
  "approved": false
}

c) Timeout: Obsolete, circuit breaker will buffer timeout request.

{
   "gatewayPaymentId": "68e56457-d7b9-4c88-9f42-1075a8d18d13",
   "acquiringBankPaymentId": null,
   "status": "Timeout",
   "requestId": "8e8bcc4a-3fe7-4834-8257-eb8aaa948af3",
   "approved": false
}

Production code, use random bank latency from 0 to 4 sec; and timeout is set to 2 sec

d) ReceivedDuplicatedBankPaymentIdFailure

   {
      "gatewayPaymentId": "67fda8d9-008e-4751-9f14-7e41a464a3e8",
      "acquiringBankPaymentId": null,
      "status": "ReceivedDuplicatedBankPaymentIdFailure",
      "requestId": "79ed0d59-1833-4c6f-80f0-840d91dc9734",
      "approved": false
   }

e) Unable to connect to bank

 {
    "gatewayPaymentId": "a05d1562-1d6e-4881-932f-bd73037e5e5a",
    "acquiringBankPaymentId": null,
    "status": "BankUnavailable",
    "requestId": "841054ad-1bf6-4e19-886f-e4d7b10df766",
    "approved": false
 }

GET api/PaymentsDetails/{acquiringBankPaymentId}
Endpoint to retrieving payment details

Response example:

{
  "status": "RejectedByBank",
  "acquiringBankPaymentId": "0bfa5d5b-8742-459f-94c9-484d61ad6093",
  "card":{
  "number": "4524 XXXX XXXX XXXX",
  "expiry": "05/19",
  "cvv": "321"
  },
  "approved": false
}

Switch out for a real bank Specific MyBankAdapter should be implemented implementing domain port PaymentGateway.Domain.IAdaptToBank.

Private API

For you code reviewer's convenience, some private endpoints are exposed. They are

GET api/Merchants Return all merchants. The merchant id will be useful when you construct you PaymentRequest.

Response example:

[
   {
   "id": "2d0ae468-7ac9-48f4-be3f-73628de3600e",
   "name": "Amazon"
   },
   {
   "id": "06c6116f-1d4e-44d3-ae9f-8df90f991a52",
   "name": "Apple"
   },
   {
   "id": "8d443f3b-55a3-4931-ba4a-3fa771bb1066",
   "name": "FailFromThe2ndPaymentMerchant"
   }
]

FailFromThe2ndPaymentMerchant's id can be used to test a stupid bank who sends back always the same bank payment id

GET api/AcquiringBankPaymentsIds
Returns all Acquiring banks' payment ids Response example:

[
  "593b4d51-8e5c-4ecc-a2b8-1946c9048275",
  "027f704c-531d-4bd7-bfda-09817926db49"
]

GET api/GatewayPaymentsIds

Returns all Acquiring banks' payment ids Response example:

[
  "b541bda3-a0da-46f4-b51a-5d3673c0fd93",
  "63bcff37-364a-4f83-904a-2b9a339d2e4f"
]

Configuration

Retries and timeouts When we can Bank API, timeout can happen. The system will try three times (with sparse incremented wait time before retrying). If still fail in the end, we consider the payment timeouts definitely. For better demo effect, I configured:
```
"AppSettings": {
     "TimeoutInMilliseconds": 2000,
     "MaxBankLatencyInMilliseconds": 4000  
 }
```
The very specific behavior "during retries, if timeouts once, timeouts always" is purely for better demo feeling, i.e. to see timeout without submitting a lot of payment requests. Cf. RandomDelayProvider.cs
Call bank API synchronously or asynchronously For production code, use API; for testing use Tests
```
"AppSettings": {
     "Executor": "API"
 }
```

SLA

A PaymentRequestId will be handled once and only once.

Tradeoff

Identical PaymentRequest submitted more than once. We have two options:
1. Idempotency: remind client of API that payment has already been created, and it is available at this location.
2. Reject duplicated PaymentRequest.
I chose the 2nd.

Performance

For a Payment Gateway, what is important:

High availability
Throughput
Low latency
Scalability

Resilience

1. The async command handling

will better resist burst situation. As explained in Command handling asynchrony).

2. Retry and Circuit breaker

When bank API is unavailable or unusually slow, there is no reason to continuously consume I/O resource creating internal back pressure on Gateway. After have retried several times, Gateway should open the circuit and buffer the temporarily failed PaymentRequests into a queue.

We should also have dedicated thread to reprocess these pending requests.
Now what is naively implemented is, when the following request succeed to contact bank API, close the circuit.

In edge case: after have closed the circuit breaker, the dequeued Payment request fails again for whatever reason, the failed request will be, again enqueued to the buffer.

Circuit breaking, as it is concerns a specific bank. The implementation is one circuit breaker per AcquiringBank.

What is decided arbitrarily is we don't buffer payment request if cannot connect to bank (after retries).

What we can also imagine is: we expose /health endpoint to merchant. The health depends on the Gateway to Bank connection status. We ask merchant stop calling us, if bank API is unavailable. Or we can buffer the payment requests which is impossible to send out. It is question of tradeoff.

Tests

I have done in solution:

throughput tests
latency tests
how the application cope under burst condition
1. large number clients launched in parallel requesting payments
2. large number of clients, plus large number of payment details, do parallel query on combination of the two.

In real world, above testing need fit realistic production scenario.

when IGenerateBankPaymentId is configured as NoDelay, performances in Performance.xlsx.

For read payments, 93100 parallel requests seem to be the limit of the system. We can configure proper max limit parallel calls to kestrel.

Nevertheless, under burst situation

API does not crash
When clients disconnected by rejection of connection, other read/write operation continue to work well

 .ConfigureKestrel((context, options) =>
   {
      options.Limits.MaxConcurrentConnections = 10_000;
      options.Limits.MaxConcurrentUpgradedConnections = 1000;
   })

For performance consideration, all coming requests thread is offloaded to thread pool threads.
To resist burst, we can add requestTimeout to kestrel configuration. We can also scale the server instances using Kubernetes cluster or Swarm cluster. This can help for achieving high availability.

Memory consumption is due to in memory cache in my system. In real world, specific caching might be considered, when unacceptable latency is caused by no-caching. Consider caching only when necessary. Caching introduces two complexities / problem: 1) synchronization. 2) large memory footprint triggers GC, causing latency overhead.

To run performance tests:

Goto API csproj folder
Run:
```
Dotnet publish -c Release -r win10-x64
```
Run the tests in PaymentGateway.Write.PerformanceTests and PaymentGateway.Read.PerformanceTests

Further: If I have more time, I will also test:

Endurance / Soak testing
Test individual components: currently my Read Projector is not performance tested
Test components hosted in cluster
Integrate performance testing to CI
Monitor production systems: metrics and perf indicators should be monitored

Unit and Acceptance tests

The coding is entirely test driven.

Coverage

Excluding performance tests assembly, Code coverage: 83.05%. (report on PaymentGateway.coveragexml in the root folder)

Non covered codes are:

API bootstrap
Some infrastructure code borrowed from Greg Young's git repository
Some randomness generation only for production. (Acceptance tests use output deterministic behavior)
Properties in acquiring bank stubs, they are there just to show the design.
Guid ids generator

Edge cases

Bank sends payment id which conflicts with a previously received one. Not asked to do as per:

We should assume that a bank response returns a unique identifier

But I still implemented and tested. In this situation, we should consider that the payment is on a unknown state. Two possibilities at least:
- Bank accepted the second conflicting one, but sent a PaymentId already used.
- Bank never proceeded the second payment request, it instead just resent a payment status for the very first one.
This will be a production incident, hence should be investigated.

Prerequisite for building the solution in Visual Studio

Ensure that you have .NET Core 2.2 SDK installed.

For Visual Studio 2017 (which I am actually using) compatibility reason, please use https://dotnet.microsoft.com/download/thank-you/dotnet-sdk-2.2.107-windows-x64-installer

Improvements

Hereunder some improvements should be definitely done:

I use Merchant id to determine its Acquiring bank (cf. Assumptions), it is part of Payment Request payload. This is not safe. And in a very general way, the exchanges of messages between Gateway and Merchant is not protected by authentication.

In real world we should do authentication negotiation to let Gateway to know which Merchant I am dialoguing. This can be achieved as follows:
1. When we onboard a Merchant, we distribute a secret in a very safe manner to Merchant.
2. In all exchanges between Merchant to Gateway, the secret key should be included in HTTP header 'Authorization'
Alls simulated async, I/O should add timeout cancellation

Go further

Retrieving a payment’s details API The exercise text states a basic requirement:

The second requirement for the payment gateway is to allow a merchant to retrieve details of a previously made payment using its identifier.

In real world, we may consider adding:
1. Query for a time window
2. Query pagination (consider if unbounded queries are allowed, deal with manageable chunks)
3. Other filters
For achieving query for a time window, I should add payment timestamp to both my Events and Read models.
Require PaymentRequest Smart Batching here)

The motivations are:
- Maybe for a merchant, say Amazon, the receives 50,000 payment requests per second from shopper. Batching 5000 requests is an option, because shopper doesn't care about 1s of delay.
- For our Gateway, we will have less resources to consume, thus improve the performance.
A combination of time window and number of requests can be used to size the Smart Batching.

Open source used:

The event sourcing infrastructure is borrowed from Greg Young's git repository

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
images		images
src/PaymentGateway		src/PaymentGateway
.gitignore		.gitignore
Checkout - .Net Challenge - 2.0.pdf		Checkout - .Net Challenge - 2.0.pdf
PaymentGateway.coveragexml		PaymentGateway.coveragexml
Performance.xlsx		Performance.xlsx
README.md		README.md
payment-gateway-apis.json		payment-gateway-apis.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assumptions

Architecture

CQRS

Implementation

EventSourcing

Hexagonal

Design

API

Public API

Private API

Configuration

SLA

Tradeoff

Performance

Resilience

1. The async command handling

2. Retry and Circuit breaker

Tests

Unit and Acceptance tests

Coverage

Edge cases

Prerequisite for building the solution in Visual Studio

Improvements

Go further

Open source used:

About

Releases

Packages

Languages

delokman/payment.gateway.checkout.com

Folders and files

Latest commit

History

Repository files navigation

Assumptions

Architecture

CQRS

Implementation

EventSourcing

Hexagonal

Design

API

Public API

Private API

Configuration

SLA

Tradeoff

Performance

Resilience

1. The async command handling

2. Retry and Circuit breaker

Tests

Unit and Acceptance tests

Coverage

Edge cases

Prerequisite for building the solution in Visual Studio

Improvements

Go further

Open source used:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages