FIX: Many fixes for huggingface and llama2 inference #335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

xfalcox merged 5 commits into main from fixes-for-huggingface-and-llama

Dec 6, 2023

Member

xfalcox commented Dec 5, 2023

No description provided.


          FIX: Many fixes for huggingface and llama2 inference

3c93f31

xfalcox commented

View reviewed changes

lib/completions/dialects/llama2_classic.rb

    
            @@ -5,7 +5,7 @@ module Completions
          
                  module Dialects

                    class Llama2Classic

                      def self.can_translate?(model_name)

                        "Llama2-*-chat-hf" == model_name

                        %w[Llama2-*-chat-hf Llama2-chat-hf].include?(model_name)

Member Author

xfalcox Dec 5, 2023

style used by our summarization strategy setting

xfalcox commented

View reviewed changes

lib/completions/endpoints/base.rb Outdated

@@ @@ -90,6 +90,7 @@ def perform_completion!(prompt, user, model_params = {}) @@
                                   begin
                                     partial = extract_completion_from(raw_partial)
+                                    next if partial.blank?

Member Author

xfalcox Dec 5, 2023

you get a blank partial from a non-blank raw_partial when streaming and getting an special response, like the last message in a stream from huggingface

xfalcox commented

View reviewed changes

lib/completions/endpoints/hugging_face.rb Outdated

@@ @@ -5,7 +5,7 @@ module Completions @@
                   module Endpoints
                     class HuggingFace < Base
                       def self.can_contact?(model_name)
-                        %w[StableBeluga2 Upstage-Llama-2-*-instruct-v2 Llama2-*-chat-hf].include?(model_name)
+                        %w[StableBeluga2 Upstage-Llama-2-*-instruct-v2 Llama2-*-chat-hf Llama2-chat-hf].include?(model_name)

Member Author

xfalcox Dec 5, 2023

style used by our summarization strategy setting

xfalcox commented

View reviewed changes

lib/completions/endpoints/hugging_face.rb

-                        URI(SiteSetting.ai_hugging_face_api_url).tap do |uri|
-                          uri.path = @streaming_mode ? "/generate_stream" : "/generate"
-                        end
+                        URI(SiteSetting.ai_hugging_face_api_url)

Member Author

xfalcox Dec 5, 2023

Moving to headers instead of path makes it compatible with both self-hosting and hosted API

xfalcox commented

View reviewed changes

lib/completions/endpoints/hugging_face.rb Outdated


		payload[:parameters][:max_new_tokens] = token_limit - prompt_size(prompt)

		if @streaming_mode

Member Author

xfalcox Dec 5, 2023

Moving to headers instead of path makes it compatible with both self-hosting and hosted API

xfalcox commented

View reviewed changes

lib/completions/endpoints/hugging_face.rb

@@ @@ -56,15 +58,15 @@ def extract_completion_from(response_raw) @@
                           parsed.dig(:token, :text).to_s
                         else
-                          parsed[:generated_text].to_s
+                          parsed[0][:generated_text].to_s

Member Author

xfalcox Dec 5, 2023

Consequence of the move to the header for streaming is this slight response format change

xfalcox commented

View reviewed changes

lib/completions/endpoints/hugging_face.rb

                         end
                       end
                       def partials_from(decoded_chunk)
                         decoded_chunk
                           .split("\n")
                           .map do |line|
-                            data = line.split("data: ", 2)[1]
+                            data = line.split("data:", 2)[1]

Member Author

xfalcox Dec 5, 2023

Bug introduced when porting from old inference class

xfalcox commented

View reviewed changes

lib/inference/hugging_face_text_generation.rb

@@ @@ -22,11 +22,6 @@ def self.perform!( @@
                       raise CompletionFailed if model.blank?
                       url = URI(SiteSetting.ai_hugging_face_api_url)
-                      if block_given?

Member Author

xfalcox Dec 5, 2023

Moving to headers instead of path makes it compatible with both self-hosting and hosted API

xfalcox commented

View reviewed changes

lib/inference/hugging_face_text_generation.rb Outdated

@@ @@ -45,6 +40,10 @@ def self.perform!( @@
                       parameters[:max_new_tokens] = token_limit - prompt_size
                       parameters[:temperature] = temperature if temperature
                       parameters[:repetition_penalty] = repetition_penalty if repetition_penalty
+                      if block_given?

Member Author

xfalcox Dec 5, 2023

Moving to headers instead of path makes it compatible with both self-hosting and hosted API

xfalcox commented

View reviewed changes

lib/inference/hugging_face_text_generation.rb

                             parsed_response = JSON.parse(response_body, symbolize_names: true)
                             log.update!(
                               raw_response_payload: response_body,
                               request_tokens: tokenizer.size(prompt),
-                              response_tokens: tokenizer.size(parsed_response[:generated_text]),
+                              response_tokens: tokenizer.size(parsed_response.first[:generated_text]),

Member Author

xfalcox Dec 5, 2023

Consequence of moving to headers for streaming


          remove debug

e2b2cf8

romanrizzi approved these changes

View reviewed changes

xfalcox added 3 commits

December 5, 2023 19:25


          fix and lint

b564c21


          fix specs

b604cb7


          lint

ef18586

romanrizzi approved these changes

View reviewed changes

xfalcox merged commit d8267d8 into main

xfalcox deleted the fixes-for-huggingface-and-llama branch

December 6, 2023 14:22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet