Skip to content

ImDoubD/Stimuler-Task

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Database Used

I have used PostgreSQL as the database to complete this task as of now but the ideal database will be Apache Cassandra in this task use-case. Cassandra is often used in case of handling large datasets. While PostgreSQL provides vertical scaling, and indexing for optimized queries, Cassandra is both vertically and horizontally scalable. Cassandra is built for distributed systems and in this case I have used batched processing ensuring requests hits for different batches to decrease regular write on the database where cassandra would have been ideal to use. Cassandra allows data to be spread across multiple nodes, providing easy scaling without compromising performance. Since Cassandra can handle high write loads with minimum latency, utterances generated by millions of users which further generates multiple error data points can be recorded quickly. Also its peer-to-peer architecture design ensures fast data access and maintains low latency which is ideal for real-time responses.

P.S: I have used PostgreSQL for this task as I have not worked with Cassandra.

Database Schema

I have used 5 tables:

  • Users table: user_id, email, name, created_at : This table records users details.
  • Conversations table: conversation_id, user_id(foreign key), started_at with relation to users table, utterances table, errors table : This table records a user’s conversation/whole chat details.
  • Utterances table: user_id(foreign key), conversation_id(foreign key), utterance_id, content, timestamp : This table records the user's individual chats in a particular conversation.
  • Error table: user_id, concersation_id, utterance_id, error_category, error_subcategory, timestamp : This table records the users error points in an utterance.
    [*Currently, there is no API in the code which involves this table as it was designed with the thought of recording the utterance content from any LLM or AI model. It is for future purposes.]
  • Error Frequency table: user_id(foreign key), error_category, error_subcategory, frequency. : This table stores the user’s error points frequency. It is associatively related to the users table.

Since the task involved low latency requests, a structure which can store the result data itself was needed and error frequency table serves as that structure through which required response can be generated without any complex queries.

Code design

  • I used the Redis cache block to precompute the frequency of the error points and after precomputation, distributed it to batches where writes in the database are not done frequently but in bulk according to those batches.

  • When an error occurs, the update_error_frequency function first checks Redis for an existing frequency count (redis_cache.get(key)), avoiding a direct database query unless the cache is empty. This ensures sub-millisecond latency in most cases, keeping the API response time low.

  • Frequent writes to a database can introduce high latency and increase load, especially when millions of users are sending requests. To prevent this, the code accumulates changes in Redis for a short interval (10 minutes, defined by batch_interval).

  • The queue_batch_update function increments error frequencies in Redis and queues them for batch processing. It does not update the database immediately but stores the frequency count in Redis for later aggregation. This reduces the number of database writes, improving performance and scalability.

  • The process_batch_update function processes the accumulated error data periodically by reading from Redis and updating the database in one go. This minimizes the number of database transactions, further reducing load and improving performance under high traffic.

Another end-point named /simulate-and-generate , stores the result error frequency by utilizing these 3 methods update_error_frequencies, queue_batch_update, process_batch_update which then is accessed by the /generate-exercise endpoint from the error_frequency table.

Requirements

  • fastapi
  • uvicorn
  • sqlalchemy
  • psycopg-binary
  • pydantic
  • python-dotenv
  • redis

How to run the application

  • Clone the github repository
  • Pip install all the above given requirements in virtual environment
  • use commmand uvicorn main:app --reload to start the server
  • Hit the API's in this order: post http://127.0.0.1:8000/users/ : to create users.
    sample input: { "username": "anant", "email": "anant.doe@example.com" }
  • Then post http://127.0.0.1:8000/conversations/ : to create conversation.
    sample input: { "user_id":"01485c6d-e5a9-4050-903e-bafb49b19014" }
  • Then post http://127.0.0.1:8000/utterances/ : to create utterances.
    sample input: { "conversation_id": "f0c8623a-f992-4421-a373-bd151d83ef9c", "user_id": "01485c6d-e5a9-4050-903e-bafb49b19014", "content": "it is wrong not a sentence but is it." }
  • Finally post http://127.0.0.1:8000/simulate-and-generate: to process the error points.
    sample input: { "user_id": "01485c6d-e5a9-4050-903e-bafb49b19014", "conversation_id": "f0c8623a-f992-4421-a373-bd151d83ef9c", "utterance_id": "82073ac7-1582-4fc1-ac68-7367f7eb2238", "errors": [ { "errorCategory": "Grammar", "errorSubCategory": "Subject-Verb Agreement" }, { "errorCategory": "Grammar", "errorSubCategory": "Pronouns" }, { "errorCategory": "Vocabulary", "errorSubCategory": "Word Choice" } ] }
  • After this, hit the get endpoint http://127.0.0.1:8000/generate-exercise - sample input: user_id in query params

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages