# Multiple API Calls
* How to make multiple API calls efficiently?
* Multiprocessing (Ray) vs Asynchronous (asyncio/aiohttp)
* Asynchronous way **is more efficient** than multiprocessing (Check keywords like Asynchronous, Non-Blocking, I/O Bound)

In [1]:
import os
import requests
import time

In [2]:
!lscpu

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) CPU @ 2.00GHz
Stepping:            3
CPU MHz:             2000.144
BogoMIPS:            4000.28
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            39424K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3

## One-by-One

In [2]:
def generate_chatgpt(messages, params={}):
    # OpenAI API Key
    openai_api_key = os.getenv("OPENAI_API_KEY")
    
    # Request
    res = requests.post(
        "https://api.openai.com/v1/chat/completions",
        headers={
            "Authorization": "Bearer "+openai_api_key,
            "Content-Type": "application/json"
        },
        json={
            "model": "gpt-3.5-turbo",
            "messages": messages,
            **params
        }
    )
    
    # Error
    if res.status_code != 200:
        return res.json()["error"]["message"]
    
    return [msg["message"]["content"] for msg in res.json()["choices"]]

In [3]:
def measure_time(n_calls):
    start = time.time()

    generations = []
    for _ in range(n_calls):
        generations.extend(generate_chatgpt(messages=[
            {"role": "user", "content": "마이멜로디에 대해 소개해 줘"}
        ]))

    end = time.time()
    elapsed_time = end - start

    print(len(generations), "Generations")
    print("Execution time:", elapsed_time, "seconds")

In [4]:
measure_time(n_calls=1)

1 Generations
Execution time: 19.57789921760559 seconds


In [5]:
measure_time(n_calls=3)

3 Generations
Execution time: 49.191596031188965 seconds


In [6]:
measure_time(n_calls=8)

8 Generations
Execution time: 144.65509366989136 seconds


In [7]:
measure_time(n_calls=16)

16 Generations
Execution time: 349.0199410915375 seconds


## Multiprocessing with Ray
* Ray Core documentation: https://docs.ray.io/en/latest/ray-core/walkthrough.html

In [2]:
import ray

ray.init()

2023-04-23 15:02:52,053	INFO worker.py:1553 -- Started a local Ray instance.


0,1
Python version:,3.10.9
Ray version:,2.3.1


In [3]:
@ray.remote
def generate_chatgpt(messages, params={}):
    # OpenAI API Key
    openai_api_key = os.getenv("OPENAI_API_KEY")
    
    # Request
    res = requests.post(
        "https://api.openai.com/v1/chat/completions",
        headers={
            "Authorization": "Bearer "+openai_api_key,
            "Content-Type": "application/json"
        },
        json={
            "model": "gpt-3.5-turbo",
            "messages": messages,
            **params
        }
    )
    
    # Error
    if res.status_code != 200:
        return res.json()["error"]["message"]
    
    return [msg["message"]["content"] for msg in res.json()["choices"]]

In [4]:
def measure_time(n_calls):
    start = time.time()

    generations = []
    for _ in range(n_calls):
        generations.append(generate_chatgpt.remote(messages=[
            {"role": "user", "content": "마이멜로디에 대해 소개해 줘"}
        ]))
    generations = ray.get(generations)

    end = time.time()
    elapsed_time = end - start

    print(len(generations), "Generations")
    print("Execution time:", elapsed_time, "seconds")

In [5]:
measure_time(n_calls=1)

1 Generations
Execution time: 19.4450044631958 seconds


In [6]:
measure_time(n_calls=3)

3 Generations
Execution time: 21.497418642044067 seconds


In [7]:
measure_time(n_calls=8)

8 Generations
Execution time: 44.92633819580078 seconds


In [8]:
measure_time(n_calls=16)

16 Generations
Execution time: 104.43220949172974 seconds


In [9]:
measure_time(n_calls=100)

100 Generations
Execution time: 521.1891481876373 seconds


In [10]:
ray.shutdown()

## Asynchronous with asyncio/aiohttp
* asyncio documentation: https://docs.python.org/3/library/asyncio.html
* aiohttp documentation: https://docs.aiohttp.org/en/stable/

In [2]:
import asyncio
import aiohttp
import nest_asyncio

# For Running on Jupyter
nest_asyncio.apply()

In [3]:
async def generate_chatgpt(messages, params={}):
    # Headers
    headers = {
        # OpenAI API Key
        "Authorization": "Bearer "+os.getenv("OPENAI_API_KEY"),
        "Content-Type": "application/json"
    }
    
    # Data
    data = {
        "model": "gpt-3.5-turbo",
        "messages": messages,
        **params
    }
    
    async with aiohttp.ClientSession(headers=headers) as session:
        async with session.post("https://api.openai.com/v1/chat/completions", json=data) as res:
            result = await res.json()
            
            # Error
            if res.status != 200:
                return result["error"]["message"]
            
            return [msg["message"]["content"] for msg in result["choices"]]

In [4]:
async def main(n_calls):
    messages = [{"role": "user", "content": "마이멜로디에 대해 소개해 줘"}]
    
    tasks = [asyncio.create_task(generate_chatgpt(messages=messages)) for _ in range(n_calls)]
    
    generations = await asyncio.gather(*tasks)
    
    return generations

In [5]:
def measure_time(n_calls):
    start = time.time()

    generations = asyncio.run(main(n_calls=n_calls))

    end = time.time()
    elapsed_time = end - start

    print(len(generations), "Generations")
    print("Execution time:", elapsed_time, "seconds")

In [6]:
measure_time(n_calls=1)

1 Generations
Execution time: 17.0317063331604 seconds


In [7]:
measure_time(n_calls=3)

3 Generations
Execution time: 19.35207772254944 seconds


In [8]:
measure_time(n_calls=8)

8 Generations
Execution time: 29.192562580108643 seconds


In [9]:
measure_time(n_calls=16)

16 Generations
Execution time: 38.55629849433899 seconds


In [10]:
measure_time(n_calls=100)

100 Generations
Execution time: 47.21856093406677 seconds
