GitHub - alokrkmv/Eluvio_coding_challange_Option_2

Problem Statement

Imagine you have a program that needs to look up information about items using their item ID, often in large batches.

Unfortunately, the only API available for returning this data takes one item at a time, which means you will have to perform one query per item. Additionally, the API is limited to five simultaneous requests. Any additional requests will be served with HTTP 429 (too many requests).

Write a client utility for your program to use that will retrieve the information for all given IDs as quickly as possible without triggering the simultaneous requests limit, and without performing unnecessary queries for item IDs that have already been seen.

API Usage:

GET https://challenges.qluv.io/items/:id

Required headers:

Authorization: Base64(:id)

Example:

curl https://challenges.qluv.io/items/cRF2dvDZQsmu37WGgK6MTcL7XjH -H "Authorization: Y1JGMmR2RFpRc211MzdXR2dLNk1UY0w3WGpI"

My Approach

To solve this problem, I broke down the problem into three major chunks and then solved the chunks one by one. The major three chunks of this problem are:

The most basic part of the problem statement requires performing a GET request and parsing the response. I wrote a wrapper function to perform GET requests using go's standard HTTP package. The function also contains a handler which is capable of handling various kinds of errors like failed request, too many request, invalid response body etc. In case everything goes right it parses the response body and returns the response.
After creating the handler I wrote another function capable of performing simultaneous API requests to the given endpoint. To achieve this I used goroutines and managed them using wait groups. Morever as stated in the problem statement the API is capable of handling only 5 simultaneous requests at a time, so I spawned the goroutines in a group of five to eschew error 429 and to ensure maximum throughput.
Last but not least as per the requirement of a problem statement, I added the logic to prevent API calls for duplicate ids. For this, I used Redis caching layer. The reason for choosing Redis caching over a local data structure like Map or array is that Redis is a high performant persistent data store that allows data to persist even after the execution of the program finishes. This helps in avoiding API calls for already fetched ids not in a single run but even in the subsequent run of the program.

Why Go??

The soul of this project is handling concurrent calls and that is why I choose Go for this task. Go is a language built for concurrency. The goroutines, channels and wait groups provides an elegant and highly efficient way of handling concurrency compared to thread and processes which are used by most other languages like Python, Java etc. Handling race condition using Go's inbuilt mutex implementation is quite convenient. Considering all these aspects I felt that Go would be an ideal choice for this project.

Brief code Snippet of the Concurrent data Fetcher function is provided below

Talk is cheap show me the code

// Function to fetch data from the API concurrently
func GetConcurrentData(urls []string) (final_res map[string]string, meta_map map[string]interface{}) {
	// This map conatins the final result
	final_res = make(map[string]string)
	// This map contains various meta data
	var failed_ids []string
	var successful_ids []string
	var duplicate_ids_count int
	var duplicate_id []string
	var count_429 int
	wg := sync.WaitGroup{}
	i := 0
	for i < len(urls) {
		var spliced_url []string
		// Splicing the array of ids into a set of 5 as the API can't handle more than
		// five concurrent requests at a time.
		// This will prevent API from throwing 429 error and also ensure to get
		// maximum throughput from the API
		if i+5 < len(urls) {
			spliced_url = urls[i : i+5]
		} else {
			spliced_url = urls[i:]
		}

		// Filtering any duplicate id to avoid unwanted goroutine spwans
		spliced_url, duplicate_count, duplicate_ids := helpers.RemoveDuplicateValues(spliced_url)
		duplicate_ids_count = duplicate_ids_count + duplicate_count
		duplicate_id = append(duplicate_id, duplicate_ids...)
		for _, url := range spliced_url {
			// If the result is already present in redis cache then serve from cache
			// This will prevent API calls for duplicate ids
			val, err := cache.Get(context.TODO(), url).Result()
			if err == nil {
				final_res[url] = val
				duplicate_ids_count++
				continue
			}
			wg.Add(1)
			// Initialize go routines for concurrent calls.
			go func(url string) {

				res, is_429, err := get_request_handler(url)
				if is_429 {
					count_429++
				}
				if err != nil {
					log.Fatal(err)
					// Adding mutex lock to prevent race condition
					mutex.Lock()
					failed_ids = append(failed_ids, url)
					mutex.Unlock()
				}
				// Adding mutex lock to prevent race condition while writing result to map
				mutex.Lock()
				final_res[url] = string(res.([]uint8))
				mutex.Unlock()

				// Once the data is fetched caching it for one hour in redis cache so that
				// repeated API calls for duplicate ids can be avoided.
				// The TTL is for one hour so that any update in the response for the same
				// id can be reflected after an hour. This value can be changed based on the
				// the actual scenario

				// We don't need to add any mutex lock while writing to redis cache
				// as redis is capable of handling concurrent read and writes

				err = cache.Set(context.TODO(), url, string(res.([]uint8)), 1440*time.Second).Err()
				mutex.Lock()
				successful_ids = append(successful_ids, url)
				mutex.Unlock()
				if err != nil {
					log.Fatal(err)
				}
				wg.Done()
			}(url)

		}
		wg.Wait()
		i = i + 5
	}
	// Storing unique duplicate ids
	duplicate_id, _, _ = helpers.RemoveDuplicateValues(duplicate_id)

	meta_map = map[string]interface{}{
		"total_ids":            urls,
		"total_ids_count":      len(urls),
		"failed_ids":           failed_ids,
		"successful_ids":       successful_ids,
		"duplicate_ids_count":  duplicate_ids_count,
		"unique_duplicate_ids": duplicate_id,
		"count_429":            count_429,
		"number_of_failed_ids": len(failed_ids),
		"number_of_api_calls":  len(successful_ids),
	}
	return final_res, meta_map
}

Steps to run the project in local

These steps have only been tested on linux operating system with Debian distribution. As the project is independent of Operating system, ideally it should run fine on any OS although setup process may vary for other OS.

Clone the project to your local machine using git clone https://github.com/alokrkmv/Eluvio_coding_challange_Option_2.git
Setup the Go environment on your local machine. Checkout the official documentation for the same Download and install go
Once the Go is setup in your machine, add project GOPATH using the following steps
- Go to the root folder of the project (Eluvio_coding_challange_Option_2)(https://github.com/alokrkmv/Eluvio_coding_challange_Option_2)** and use pwd command to get the current path of the root folder.
- Now export the path that you got in previous step as GOPATH using export GOPTAH=<path_obtained_from_pwd>
- Verify the GOPATH using ````echo $GOPATH```
Once GOPATH is properly setup install all the dependencies by running bash install_dependecies.sh from the root folder.
This project uses REDIS as a caching layer so an instance of up and running redis server is required in the local machine. To setup the Redis server you can use either of the following two ways.
- Setting up the Redis server using docker (recommended)
  - Setup docker in your local machine by following the official documentation setup docker for ubuntu
  - Pull docker redis container using docker pull redis. This will pull the official redis image to your local machine.
  - Make container up and running on port 6379 using docker run --name redis-test-instance -p 6379:6379 -d redisdocker run --name redis-test-instance -p 6379:6379 -d redis
  - Get the container id using docker ps
  - Exec inside the container using docker exec -it container_id bash
  - Once inside the container start the redis client using redis-cli
  - Now you can see any changes happening into the cache in real time.
- Setting up the Redis server without using docker (not recommended)
  - Install redis in your local machine by following the official documentation getting started with redis
  - Once redis is successfully installed in your machine start the redis server using redis-server
  - Start the redis client in a separate terminal using redis-cli.
  - Now you can see any changes happening into the cache in real time.
Once the setup part for the project is complete cd into src/github.com/alokrkmv and then execute go run main.go
If everything goes right you will see the message Program Executed Successfully in the console. In case of any error some error message will pop up.
If the execution goes right two files output.json and meta_data.json will be generated inside the src/github.com/alokrkmv/writer
output.json will have all the response data for all the unique request ids. This is the final output of the program
meta_data.json will have various meta data like * total run time, number of duplicate calls prevented, list of failed ids, number of 429 requests occurred* etc. This data can help in providing further analysis about the program.

Results

I have added sample snippets of both output.json and metadata.json file below

Output.json

metadata.json

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
screenshots		screenshots
src/github.com/alokrkmv		src/github.com/alokrkmv
.gitignore		.gitignore
README.md		README.md
install_dependecies.sh		install_dependecies.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

screenshots

screenshots

src/github.com/alokrkmv

src/github.com/alokrkmv

.gitignore

.gitignore

README.md

README.md

install_dependecies.sh

install_dependecies.sh

Repository files navigation

Problem Statement

My Approach

Why Go??

Talk is cheap show me the code

Steps to run the project in local

Results

About

Releases

Packages

Languages

alokrkmv/Eluvio_coding_challange_Option_2

Folders and files

Latest commit

History

Repository files navigation

Problem Statement

My Approach

Why Go??

Talk is cheap show me the code

Steps to run the project in local

Results

About

Resources

Stars

Watchers

Forks

Languages