Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ARM-based AWS instances #1528

Closed
deliahu opened this issue Nov 3, 2020 · 6 comments · Fixed by #2268
Closed

Support ARM-based AWS instances #1528

deliahu opened this issue Nov 3, 2020 · 6 comments · Fixed by #2268
Labels
enhancement New feature or request research Determine technical constraints timecapped Assigned a limited amount of time
Milestone

Comments

@deliahu
Copy link
Member

deliahu commented Nov 3, 2020

Notes

Just the containers that run on worker nodes need to be compiled for ARM:

  1. Dequeuer - add target os/arch args to dockerfile and run with docker buildx.
  2. Enqueuer - add target os/arch args to dockerfile and run with docker buildx.
  3. Proxy - add target os/arch args to dockerfile and run with docker buildx.
  4. Async gateway - add target os/arch args to dockerfile and run with docker buildx.
  5. Fluentbit - run with docker buildx.
  6. Node exporter - run with docker buildx.
  7. Kube rbac proxy - doesn’t have an arm64 version, but we can build one.
  8. Kubexit - need to enable the fork to build an arm64 version.
@deliahu deliahu added the enhancement New feature or request label Nov 3, 2020
@deliahu deliahu added this to To prioritize in Cortex via automation Nov 3, 2020
@imagine3D-ai
Copy link

Notes

Just the containers that run on worker nodes need to be compiled for ARM:

  • fluentd has an ARM build
  • cloudwatch-agent doesn't seem to have an ARM build
  • image-downloader containers will need to be updated
  • Modifications will likely need to be made to the API pod containers

What is the timeline on these enhacements?

@deliahu
Copy link
Member Author

deliahu commented Nov 4, 2020

@imagine3D-ai we don't currently have a timeline for ARM instance support. Which instance type are you hoping to use, and is cost reduction your only motivation for using it (and if so, how much would it save you)?

@imagine3D-ai
Copy link

Cost is not my only motivation (although c6g.medium is cheaper than t3.medium and more powerful) since Compute Optimized instances seem to be more powerful and more suitable for machine learning inference applications than T3 instances

@deliahu
Copy link
Member Author

deliahu commented Nov 4, 2020

@imagine3D-ai each model behaves a bit differently, so some lend themselves to machines with more memory compared to CPU, and others lend themselves to more/faster CPU compared to memory. The latest "Compute Optimized" non-ARM instances would be the c5 or c5a series. "large" is the smallest size for those (as opposed to "medium"), but since you can serve multiple API or multiple replicas of the same API on a single instance, using a larger instance type will not be more expensive if you have multiple APIs or multiple replicas in a single API.

@deliahu deliahu removed this from To prioritize in Cortex Nov 26, 2020
@sevro
Copy link

sevro commented Mar 1, 2021

Are the required enhancements listed the same for say running the realtime API locally on a Jetson? I am considering taking a swing at this vs. using another model server, I would much rather use Cortex.

The CLI fails to run at all so I guess that would need to be fixed also:

datenstrom@ant:~$ cortex
Traceback (most recent call last):
  File "/home/datenstrom/.local/bin/cortex", line 8, in <module>
    sys.exit(run())
  File "/home/datenstrom/.local/lib/python3.6/site-packages/cortex/binary/__init__.py", line 32, in run
    process = subprocess.run([get_cli_path()] + sys.argv[1:], cwd=os.getcwd())
  File "/usr/lib/python3.6/subprocess.py", line 423, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 8] Exec format error: '/home/datenstrom/.local/lib/python3.6/site-packages/cortex/binary/cli'

@vishalbollu
Copy link
Contributor

The features of Cortex that used to manage docker container deployments (also referred to as Cortex local) has been deprecated and is no longer being supported. We happened to build a model server along our journey to building a distributed model inference cluster. Creating a model server isn't our primary focus.

Having said that, if you would like adopt Cortex local for a different architecture, you can take a look at Cortex v0.25 which is the last version of Cortex with local support. The requirements listed in this ticket pertain to making the different components of Cortex cluster compatible with ARM before ARM instances can be supported. From the top of my head, Cortex local relies on Docker. You may have to recompile the cortex go binary for your architecture as well.

@vishalbollu vishalbollu added research Determine technical constraints timecapped Assigned a limited amount of time labels Jun 8, 2021
@deliahu deliahu added this to the v0.37 milestone Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request research Determine technical constraints timecapped Assigned a limited amount of time
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants