# `Bixi-Api`

`Bixi-Api` is a REST API by which users can access historic Bixi data that we collected and processed for easy access and analysis. This notebook documents design decisions this API. 

## Functional requirements

- Calculate and return average Bixi trip duration for a specified time frame.

## Non-functional requirements

Not elucidated. For an exploratory development, we'll consider these minimums:
- Performance: Aggregations for extensive periods should complete within 5 seconds.
- Maintainability: Adhere to RESTful API conventions and ensure documentation.
- Error Handling: Offer clear error messages for invalid inputs or system errors.
- Logging and Monitoring: Record API usage and errors; enable live monitoring.

## Architecture: `REST API`

Our frontend application needs to query Bixi data. `REST APIs` are a common choice for connecting frontends to databases because they are straightforward to implement, secure, and easy to maintain, supported by a wide range of frameworks and community support.

`GraphQL` is an alternative API architecture specifically designed for querying data. According to an [AWS post](https://aws.amazon.com/compare/the-difference-between-graphql-and-rest/), `GraphQL` is preferable under conditions of limited bandwidth, involvement of multiple data sources, and highly variable client requests. However, our application is relatively simple with a gradually expanding set of defined data queries, making `REST` the preferable choice. This decision aligns with our existing development practices, maintaining consistency as `REST` has been used by our predecessors.

## Language: `Python`

We selected `Python` as our primary language, largely due to our team's familiarity with it. Python is a mature, community-supported language that integrates seamlessly into our AWS ecosystem. Key drawbacks include the Global Interpreter Lock (GIL) and lower performance compared to some other languages. Nevertheless, for a small-scale API managed by a lambda function, ease of use and familiarity were the decisive factors.

We also considered `Typescript` and `Go` but ultimately chose not to use them, mainly because of our greater familiarity and comfort with Python.

## Framework: `FastAPI`

We opted for `FastAPI` due to its popularity, ease of setup and usage, and extensive support. It automatically provides API documentation (Swagger) without needing extra code. FastAPI simplifies defining endpoint parameter constraints and error handling. It fits well into our serverless architecture using the `Mangum` wrapper.

Considered alternatives included using no framework (just lambda functions) and `Flask`. The no-framework approach posed challenges in routing and error management. While `Flask` is effective, we chose `FastAPI` for its out of the box Swagger integration.

## Endpoints

![routes](./img/routes.png)

The main route for UC006 is `/trips/duration/average`. Parameters `min_start_time` and `max_start_time` must be in milliseconds to be consistent with data formatting. It returns the average trip duration and the trip count. The endpoint returns a `400 error` if the request is malformed with details about the error.

The other `/trip` routes provide information on the available date range for querying. They each return the corresponding trip to add more context to the data.

The `/stations` routes provide information about station locations.

The `/health` route is a standard health check.

The `/docs` route is the Swagger documentation with real access to the API.

The API will return a `404 error` if a query on a route other than the ones cited above is received and a `500 error` if an unexpected error occurs internally.

## Backend patterns

We used the MVC and Repository patterns. In FastAPI, the data presentation (View) is taken care of automatically. We have a controller that manages the app logic, models that format the data, and repositories that deal with the database queries.

## Average duration calculation and addressing performance

We improved the performance of average calculation by using `range limiting`, `parallelization`, and `caching`. We restricted the queryable time frame to `one year`. By default, we divide the requested time frames into `one-day` segments. For each segment, we query the database for aggregated durations and trip counts in parallel. After receiving the results, we calculate the total trip count and the average. If it is the first query, we cache the results before returning them to the caller.

![average](./img/average.png)
