File tree Expand file tree Collapse file tree 1 file changed +35
-0
lines changed Expand file tree Collapse file tree 1 file changed +35
-0
lines changed Original file line number Diff line number Diff line change @@ -122,6 +122,41 @@ Optional flags:
122122*  ` --sharding_config=<path> `  This makes use of alternative sharding config instead of
123123  the ones in default_shardings directory.
124124
125+ 
126+ # Run the server with ray  
127+ Below are steps run server with ray:
128+ 1 .  Ssh to Cloud Multiple Host TPU VM (v5e-16 TPU VM)
129+ 2 .  Step 2 to step 5 in Outline 
130+ 3 .  Setup ray cluster 
131+ 4 .  Run server with ray
132+ 
133+ ## Setup Ray Cluster   
134+ Login host 0 VM, start ray head with below command: 
135+ 
136+ ``` bash 
137+ 
138+ ray start --head
139+ 
140+ ``` 
141+ 
142+ Login other host VMs, start ray head with below command:
143+ 
144+ ``` bash 
145+ 
146+ ray start --address=' $ip:$port' 
147+ 
148+ ``` 
149+ 
150+ Note: Get address ip and port information from ray head.
151+ 
152+ ## Run server with ray  
153+ 
154+ Here is an example to run the server with ray for llama2 7B model:
155+ 
156+ ``` bash 
157+ python run_server_with_ray.py --tpu_chips=16 -model_name=$model_name  --size=7b --batch_size=96 --max_cache_length=2048 --quantize_weights=$quantize  --quantize_type=$quantize_type  --quantize_kv_cache=$quantize  --checkpoint_path=$output_ckpt_dir    --tokenizer_path=$tokenizer_path  --sharding_config=" default_shardings/llama.yaml" 
158+ ``` 
159+ 
125160# Run benchmark  
126161Start the server and then go to the deps/JetStream folder (downloaded during ` install_everything.sh ` )
127162
    
 
   
 
     
   
   
          
     
  
    
     
 
    
      
     
 
     
    You can’t perform that action at this time.
  
 
    
  
     
    
      
        
     
 
       
      
     
   
 
    
    
  
 
  
 
     
    
0 commit comments