Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Installation #1624

Open
jaysunl opened this issue May 16, 2024 · 28 comments
Open

GPU Installation #1624

jaysunl opened this issue May 16, 2024 · 28 comments

Comments

@jaysunl
Copy link

jaysunl commented May 16, 2024

May I possibly have specific instructions as for the GPU installation of the tool? I have followed the installation but it still says no GPU detected. I have the following GPU on my system: NVIDIA Corporation GA102GL [A10G] (rev a1) and my nvcc --version output is
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

And tried to install Pytorch CUDA. But when I run
import torch
print(torch.cuda.is_available())

it still says False and when I try to run a model, it still says no GPU detected. Any guidance would be appreciated.

@pseudotensor
Copy link
Collaborator

It's a good start that you checked that torch condition. We only really support cuda 12.1 and above at this point, so maybe there's some issue with the installation because you have old cuda toolkit. It's easy to follow our instructions for installing the cuda toolkit 12.1, but you'll need drivers that are also compatible.

@jaysunl
Copy link
Author

jaysunl commented May 21, 2024

Ok I was actually able to follow the GPU version of PyTorch. But the launching of the actual interface takes forever. I am running this command:
python generate.py --base_model=h2oai/h2ogpt-oig-oasst1-512-6_9b --load_8bit=True --langchain_mode=UserData --user_path=/some/path

but the output gets stuck at this:
soundfile, librosa, and wavio not installed, disabling STT
soundfile, librosa, and wavio not installed, disabling TTS
Using Model h2oai/h2ogpt-oig-oasst1-512-6_9b

with nothing after it. It has been like that for at least an hour. Any tips?

@pseudotensor
Copy link
Collaborator

I recommend not using h2oai/h2ogpt-oig-oasst1-512-6_9b as a model, but instead a GGUF model at first like https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF. Also, ensure to -pass --verbose=True to get more info.

python generate.py --base_model=llama --model_path_llama=https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q5_K_M.gguf?download=true --tokenizer_base_model=meta-llama/Meta-Llama-3-8B-Instruct --max_seq_len=8192 --verbose=True

@jaysunl
Copy link
Author

jaysunl commented May 21, 2024

Sorry, I'm getting a Python: no match error when I enter this command and I copied it exactly as is. I'm using Python 3.10.12.

I also tried this (without the ?download=true at the end of the model_llama_path) and it gets stuck at this:

soundfile, librosa, and wavio not installed, disabling STT
soundfile, librosa, and wavio not installed, disabling TTS
Using Model llama
Generating model with params:
load_8bit: False
load_4bit: False
low_bit_mode: 1
load_half: True
use_flash_attention_2: False
load_gptq: 
use_autogptq: False
load_awq: 
load_exllama: False
use_safetensors: False
revision: None
use_gpu_id: True
base_model: llama
tokenizer_base_model: meta-llama/Meta-Llama-3-8B-Instruct
lora_weights: 
gpu_id: 0
compile_model: None
use_cache: None
inference_server: 
regenerate_clients: True
regenerate_gradio_clients: False
prompt_type: llama2
prompt_dict: {'promptA': '', 'promptB': '', 'PreInstruct': "<s>[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\n", 'PreInput': None, 'PreResponse': '[/INST]', 'terminate_response': ['[INST]', '</s>'], 'chat_sep': ' ', 'chat_turn_sep': ' </s>', 'humanstr': '[INST]', 'botstr': '[/INST]', 'generates_leading_space': False, 'system_prompt': "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.", 'can_handle_system_prompt': True}
system_prompt: auto
allow_chat_system_prompt: True
llamacpp_path: llamacpp_path
llamacpp_dict: {'n_gpu_layers': 100, 'use_mlock': True, 'n_batch': 1024, 'n_gqa': 0, 'model_path_llama': 'https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q5_K_M.gguf', 'model_name_gptj': '', 'model_name_gpt4all_llama': '', 'model_name_exllama_if_no_config': ''}
exllama_dict: {}
gptq_dict: {}
attention_sinks: False
sink_dict: {}
truncation_generation: False
hf_model_dict: {}
force_seq2seq_type: False
force_t5_type: False
model_lock: None
model_lock_columns: None
model_lock_layout_based_upon_initial_visible: False
fail_if_cannot_connect: False
temperature: 0.0
top_p: 1.0
top_k: 1
penalty_alpha: 0
num_beams: 1
repetition_penalty: 1.07
num_return_sequences: 1
do_sample: False
seed: 0
max_new_tokens: 1024
min_new_tokens: 0
early_stopping: False
max_time: 600
memory_restriction_level: 0
debug: False
save_dir: None
local_files_only: False
resume_download: True
use_auth_token: False
trust_remote_code: True
rope_scaling: {}
max_seq_len: 8192
max_output_seq_len: None
offload_folder: offline_folder
src_lang: English
tgt_lang: Russian
prepare_offline_level: 0
cli: False
cli_loop: True
gradio: True
openai_server: True
openai_port: 5000
gradio_offline_level: 0
server_name: 0.0.0.0
share: True
open_browser: False
close_button: True
shutdown_via_api: False
root_path: 
ssl_verify: True
ssl_keyfile: None
ssl_certfile: None
ssl_keyfile_password: None
chat: True
chat_conversation: []
text_context_list: []
stream_output: True
async_output: True
num_async: 3
show_examples: False
verbose: True
h2ocolors: True
dark: False
height: 600
render_markdown: True
show_lora: True
show_llama: True
show_gpt4all: False
login_mode_if_model0: False
block_gradio_exit: True
concurrency_count: 1
api_open: False
allow_api: True
system_api_open: False
input_lines: 1
gradio_size: None
show_copy_button: True
large_file_count_mode: False
gradio_ui_stream_chunk_size: 0
gradio_ui_stream_chunk_min_seconds: 0.2
gradio_ui_stream_chunk_seconds: 2.0
gradio_api_use_same_stream_limits: True
gradio_upload_to_chatbot: False
gradio_upload_to_chatbot_num_max: 2
gradio_errors_to_chatbot: True
pre_load_embedding_model: True
embedding_gpu_id: auto
auth: None
auth_filename: auth.json
auth_access: open
auth_freeze: False
auth_message: None
google_auth: False
guest_name: guest
enforce_h2ogpt_api_key: False
enforce_h2ogpt_ui_key: False
h2ogpt_api_keys: []
h2ogpt_key: None
extra_allowed_paths: []
blocked_paths: []
max_max_time: 1200
max_max_new_tokens: 1024
max_visible_models: None
visible_ask_anything_high: True
visible_visible_models: True
visible_submit_buttons: True
visible_side_bar: True
visible_doc_track: True
visible_chat_tab: True
visible_doc_selection_tab: True
visible_doc_view_tab: True
visible_chat_history_tab: True
visible_expert_tab: True
visible_models_tab: True
visible_system_tab: True
visible_tos_tab: False
visible_login_tab: True
visible_hosts_tab: False
chat_tables: False
visible_h2ogpt_links: True
visible_h2ogpt_qrcode: True
visible_h2ogpt_logo: True
visible_chatbot_label: True
visible_all_prompter_models: False
visible_curated_models: True
actions_in_sidebar: False
document_choice_in_sidebar: True
enable_add_models_to_list_ui: False
max_raw_chunks: 1000000
pdf_height: 800
avatars: True
add_disk_models_to_ui: True
page_title: h2oGPT
model_label_prefix: h2oGPT
favicon_path: None
visible_ratings: False
reviews_file: None
sanitize_user_prompt: False
sanitize_bot_response: False
extra_model_options: []
extra_lora_options: []
extra_server_options: []
score_model: 
verifier_model: None
verifier_tokenizer_base_model: None
verifier_inference_server: None
eval_filename: None
eval_prompts_only_num: 0
eval_prompts_only_seed: 1234
eval_as_output: False
langchain_mode: UserData
user_path: None
langchain_modes: ['UserData', 'MyData', 'LLM', 'Disabled']
langchain_mode_paths: {'UserData': None}
langchain_mode_types: {'UserData': 'shared', 'github h2oGPT': 'shared', 'DriverlessAI docs': 'shared', 'wiki': 'shared', 'wiki_full': ''}
detect_user_path_changes_every_query: False
update_selection_state_from_cli: True
langchain_action: Query
langchain_agents: []
force_langchain_evaluate: False
visible_langchain_actions: ['Query', 'Summarize', 'Extract']
visible_langchain_agents: ['Collection', 'Python', 'CSV', 'Pandas', 'JSON', 'SMART', 'AUTOGPT']
document_subset: Relevant
document_choice: ['All']
document_source_substrings: []
document_source_substrings_op: and
document_content_substrings: []
document_content_substrings_op: and
use_llm_if_no_docs: True
load_db_if_exists: True
keep_sources_in_context: False
db_type: chroma
use_openai_embedding: False
use_openai_model: False
hf_embedding_model: hkunlp/instructor-large
migrate_embedding_model: False
auto_migrate_db: False
cut_distance: 1.64
answer_with_sources: True
append_sources_to_answer: False
append_sources_to_chat: True
show_accordions: True
top_k_docs_max_show: 10
show_link_in_sources: True
langchain_instruct_mode: True
pre_prompt_query: None
prompt_query: None
pre_prompt_summary: None
prompt_summary: None
hyde_llm_prompt: None
add_chat_history_to_context: True
add_search_to_context: False
context: 
iinput: 
allow_upload_to_user_data: True
reload_langchain_state: True
allow_upload_to_my_data: True
enable_url_upload: True
enable_text_upload: True
enable_sources_list: True
chunk: True
chunk_size: 512
top_k_docs: 10
docs_ordering_type: best_near_prompt
min_max_new_tokens: 512
max_input_tokens: -1
max_total_input_tokens: -1
docs_token_handling: split_or_merge
docs_joiner: 


hyde_level: 0
hyde_template: None
hyde_show_only_final: False
hyde_show_intermediate_in_accordion: True
doc_json_mode: False
metadata_in_context: auto
auto_reduce_chunks: True
max_chunks: 100
headsize: 50
n_jobs: 32
n_gpus: 1
clear_torch_cache_level: 1
use_unstructured: True
use_playwright: False
use_selenium: False
use_scrapeplaywright: False
use_scrapehttp: False
use_pymupdf: auto
use_unstructured_pdf: auto
use_pypdf: auto
enable_pdf_ocr: auto
enable_pdf_doctr: auto
try_pdf_as_html: auto
enable_ocr: False
enable_doctr: True
enable_pix2struct: False
enable_captions: True
enable_llava: True
enable_transcriptions: True
pre_load_image_audio_models: False
caption_gpu: True
caption_gpu_id: auto
captions_model: Salesforce/blip-image-captioning-base
doctr_gpu: True
doctr_gpu_id: auto
llava_model: None
llava_prompt: auto
image_file: None
image_control: None
response_format: text
guided_json: 
guided_regex: 
guided_choice: 
guided_grammar: 
asr_model: openai/whisper-medium
asr_gpu: True
asr_gpu_id: auto
asr_use_better: True
asr_use_faster: False
enable_stt: False
stt_model: openai/whisper-base.en
stt_gpu: True
stt_gpu_id: auto
stt_continue_mode: 1
enable_tts: False
tts_gpu: True
tts_gpu_id: auto
tts_model: microsoft/speecht5_tts
tts_gan_model: microsoft/speecht5_hifigan
tts_coquiai_deepspeed: True
tts_coquiai_roles: {}
chatbot_role: None
speaker: None
tts_language: autodetect
tts_speed: 1.0
tts_action_phrases: []
tts_stop_phrases: []
sst_floor: 100
enable_image: False
visible_image_models: []
image_gpu_ids: []
enable_llava_chat: False
jq_schema: .[]
extract_frames: 10
max_quality: False
enable_heap_analytics: True
heap_app_id: 1680123994
roles_state0: {}
base_model0: llama
enable_imagegen: False
enable_imagechange: False
enable_imagestyle: False
is_hf: False
is_gpth2oai: False
is_public: False
admin_pass: None
raise_generate_gpu_exceptions: True
h2ogpt_pid: 13294
lmode: wiki_full
all_inference_server: None
n_gpus1: 1
gpu_ids: [0]
model_lower: llama
model_lower0: llama
first_para: False
text_limit: None
caption_loader: None
doctr_loader: None
pix2struct_loader: None
asr_loader: None
image_audio_loaders_options0: ['Caption']
image_audio_loaders_options: ['Caption', 'CaptionBlip2', 'Pix2Struct']
pdf_loaders_options0: ['PyPDF']
pdf_loaders_options: ['Unstructured', 'PyPDF', 'TryHTML']
url_loaders_options0: ['Unstructured']
url_loaders_options: ['Unstructured', 'ScrapeWithHttp']
jq_schema0: .[]
extract_frames0: 10
image_audio_loaders: ['Caption']
pdf_loaders: ['PyPDF']
url_loaders: ['Unstructured']
placeholder_instruction: 
placeholder_input: 
examples: [['Translate English to French', 'Good morning', '', True, 'llama2', None, 0.0, 1.0, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0, True, '', '', 'Disabled', True, 'Query', [], 10, True, 512, 'Relevant', [], [], 'and', [], 'and', None, None, None, None, None, 'auto', ['Caption'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', None, None, None, False, None, None, 'best_near_prompt', 512, -1, -1, 'split_or_merge', '\n\n', 0, None, False, False, 'auto', 'None', 'None', 'autodetect', 1.0, None, None, 'text', '', '', '', ''], ['Give detailed answer for whether Einstein or Newton is smarter.', '', '', True, 'llama2', None, 0.0, 1.0, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0, True, '', '', 'Disabled', True, 'Query', [], 10, True, 512, 'Relevant', [], [], 'and', [], 'and', None, None, None, None, None, 'auto', ['Caption'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', None, None, None, False, None, None, 'best_near_prompt', 512, -1, -1, 'split_or_merge', '\n\n', 0, None, False, False, 'auto', 'None', 'None', 'autodetect', 1.0, None, None, 'text', '', '', '', ''], ['Explain in detailed list, all the best practices for coding in python.', '', '', True, 'llama2', None, 0.0, 1.0, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0, True, '', '', 'Disabled', True, 'Query', [], 10, True, 512, 'Relevant', [], [], 'and', [], 'and', None, None, None, None, None, 'auto', ['Caption'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', None, None, None, False, None, None, 'best_near_prompt', 512, -1, -1, 'split_or_merge', '\n\n', 0, None, False, False, 'auto', 'None', 'None', 'autodetect', 1.0, None, None, 'text', '', '', '', ''], ['Create a markdown table with 3 rows for the primary colors, and 2 columns, with color name and hex codes.', '', '', True, 'llama2', None, 0.0, 1.0, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0, True, '', '', 'Disabled', True, 'Query', [], 10, True, 512, 'Relevant', [], [], 'and', [], 'and', None, None, None, None, None, 'auto', ['Caption'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', None, None, None, False, None, None, 'best_near_prompt', 512, -1, -1, 'split_or_merge', '\n\n', 0, None, False, False, 'auto', 'None', 'None', 'autodetect', 1.0, None, None, 'text', '', '', '', ''], ['Translate to German:  My name is Arthur', '', '', True, 'llama2', None, 0.0, 1.0, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0, True, '', '', 'Disabled', True, 'Query', [], 10, True, 512, 'Relevant', [], [], 'and', [], 'and', None, None, None, None, None, 'auto', ['Caption'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', None, None, None, False, None, None, 'best_near_prompt', 512, -1, -1, 'split_or_merge', '\n\n', 0, None, False, False, 'auto', 'None', 'None', 'autodetect', 1.0, None, None, 'text', '', '', '', ''], ["Please answer to the following question. Who is going to be the next Ballon d'or?", '', '', True, 'llama2', None, 0.0, 1.0, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0, True, '', '', 'Disabled', True, 'Query', [], 10, True, 512, 'Relevant', [], [], 'and', [], 'and', None, None, None, None, None, 'auto', ['Caption'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', None, None, None, False, None, None, 'best_near_prompt', 512, -1, -1, 'split_or_merge', '\n\n', 0, None, False, False, 'auto', 'None', 'None', 'autodetect', 1.0, None, None, 'text', '', '', '', ''], ['Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering.', '', '', True, 'llama2', None, 0.0, 1.0, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0, True, '', '', 'Disabled', True, 'Query', [], 10, True, 512, 'Relevant', [], [], 'and', [], 'and', None, None, None, None, None, 'auto', ['Caption'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', None, None, None, False, None, None, 'best_near_prompt', 512, -1, -1, 'split_or_merge', '\n\n', 0, None, False, False, 'auto', 'None', 'None', 'autodetect', 1.0, None, None, 'text', '', '', '', ''], ['Please answer the following question. What is the boiling point of Nitrogen?', '', '', True, 'llama2', None, 0.0, 1.0, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0, True, '', '', 'Disabled', True, 'Query', [], 10, True, 512, 'Relevant', [], [], 'and', [], 'and', None, None, None, None, None, 'auto', ['Caption'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', None, None, None, False, None, None, 'best_near_prompt', 512, -1, -1, 'split_or_merge', '\n\n', 0, None, False, False, 'auto', 'None', 'None', 'autodetect', 1.0, None, None, 'text', '', '', '', ''], ['Answer the following yes/no question. Can you write a whole Haiku in a single tweet?', '', '', True, 'llama2', None, 0.0, 1.0, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0, True, '', '', 'Disabled', True, 'Query', [], 10, True, 512, 'Relevant', [], [], 'and', [], 'and', None, None, None, None, None, 'auto', ['Caption'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', None, None, None, False, None, None, 'best_near_prompt', 512, -1, -1, 'split_or_merge', '\n\n', 0, None, False, False, 'auto', 'None', 'None', 'autodetect', 1.0, None, None, 'text', '', '', '', ''], ['Simplify the following expression: (False or False and True). Explain your answer.', '', '', True, 'llama2', None, 0.0, 1.0, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0, True, '', '', 'Disabled', True, 'Query', [], 10, True, 512, 'Relevant', [], [], 'and', [], 'and', None, None, None, None, None, 'auto', ['Caption'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', None, None, None, False, None, None, 'best_near_prompt', 512, -1, -1, 'split_or_merge', '\n\n', 0, None, False, False, 'auto', 'None', 'None', 'autodetect', 1.0, None, None, 'text', '', '', '', ''], ["Premise: At my age you will probably have learnt one lesson. Hypothesis:  It's not certain how many lessons you'll learn by your thirties. Does the premise entail the hypothesis?", '', '', True, 'llama2', None, 0.0, 1.0, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0, True, '', '', 'Disabled', True, 'Query', [], 10, True, 512, 'Relevant', [], [], 'and', [], 'and', None, None, None, None, None, 'auto', ['Caption'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', None, None, None, False, None, None, 'best_near_prompt', 512, -1, -1, 'split_or_merge', '\n\n', 0, None, False, False, 'auto', 'None', 'None', 'autodetect', 1.0, None, None, 'text', '', '', '', ''], ['The square root of x is the cube root of y. What is y to the power of 2, if x = 4?', '', '', True, 'llama2', None, 0.0, 1.0, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0, True, '', '', 'Disabled', True, 'Query', [], 10, True, 512, 'Relevant', [], [], 'and', [], 'and', None, None, None, None, None, 'auto', ['Caption'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', None, None, None, False, None, None, 'best_near_prompt', 512, -1, -1, 'split_or_merge', '\n\n', 0, None, False, False, 'auto', 'None', 'None', 'autodetect', 1.0, None, None, 'text', '', '', '', ''], ['Answer the following question by reasoning step by step.  The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apple do they have?', '', '', True, 'llama2', None, 0.0, 1.0, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0, True, '', '', 'Disabled', True, 'Query', [], 10, True, 512, 'Relevant', [], [], 'and', [], 'and', None, None, None, None, None, 'auto', ['Caption'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', None, None, None, False, None, None, 'best_near_prompt', 512, -1, -1, 'split_or_merge', '\n\n', 0, None, False, False, 'auto', 'None', 'None', 'autodetect', 1.0, None, None, 'text', '', '', '', ''], ['def area_of_rectangle(a: float, b: float):\n    """Return the area of the rectangle."""', '', '', True, 'llama2', None, 0.0, 1.0, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0, True, '', '', 'Disabled', True, 'Query', [], 10, True, 512, 'Relevant', [], [], 'and', [], 'and', None, None, None, None, None, 'auto', ['Caption'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', None, None, None, False, None, None, 'best_near_prompt', 512, -1, -1, 'split_or_merge', '\n\n', 0, None, False, False, 'auto', 'None', 'None', 'autodetect', 1.0, None, None, 'text', '', '', '', ''], ['# a function in native python:\ndef mean(a):\n    return sum(a)/len(a)\n\n# the same function using numpy:\nimport numpy as np\ndef mean(a):', '', '', True, 'llama2', None, 0.0, 1.0, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0, True, '', '', 'Disabled', True, 'Query', [], 10, True, 512, 'Relevant', [], [], 'and', [], 'and', None, None, None, None, None, 'auto', ['Caption'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', None, None, None, False, None, None, 'best_near_prompt', 512, -1, -1, 'split_or_merge', '\n\n', 0, None, False, False, 'auto', 'None', 'None', 'autodetect', 1.0, None, None, 'text', '', '', '', ''], ['X = np.random.randn(100, 100)\ny = np.random.randint(0, 1, 100)\n\n# fit random forest classifier with 20 estimators', '', '', True, 'llama2', None, 0.0, 1.0, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0, True, '', '', 'Disabled', True, 'Query', [], 10, True, 512, 'Relevant', [], [], 'and', [], 'and', None, None, None, None, None, 'auto', ['Caption'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', None, None, None, False, None, None, 'best_near_prompt', 512, -1, -1, 'split_or_merge', '\n\n', 0, None, False, False, 'auto', 'None', 'None', 'autodetect', 1.0, None, None, 'text', '', '', '', ''], ['Jeff: Can I train a ? Transformers model on Amazon SageMaker?\nPhilipp: Sure you can use the new Hugging Face Deep Learning Container.\nJeff: ok.\nJeff: and how can I get started?\nJeff: where can I find documentation?\nPhilipp: ok, ok you can find everything here. https://huggingface.co/blog/the-partnership-amazon-sagemaker-and-hugging-face', 'Summarize', '', True, 'llama2', None, 0.0, 1.0, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0, True, '', '', 'Disabled', True, 'Query', [], 10, True, 512, 'Relevant', [], [], 'and', [], 'and', None, None, None, None, None, 'auto', ['Caption'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', None, None, None, False, None, None, 'best_near_prompt', 512, -1, -1, 'split_or_merge', '\n\n', 0, None, False, False, 'auto', 'None', 'None', 'autodetect', 1.0, None, None, 'text', '', '', '', '']]
task_info: No task
git_hash: 6c284d58026992bdc9ec20f1be03fac0974d9a42
visible_models: None
Command: generate.py --base_model=llama --model_path_llama=https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q5_K_M.gguf --tokenizer_base_model=meta-llama/Meta-Llama-3-8B-Instruct --langchain_mode=UserData --share=True --max_seq_len=8192 --verbose=True
Hash: 6c284d58026992bdc9ec20f1be03fac0974d9a42

@pseudotensor
Copy link
Collaborator

Can you find the PID of the process and when it gets stuck in a separate terminal do kill -s SIGUSR1 PID ? and share the full traceback.

@jaysunl
Copy link
Author

jaysunl commented May 23, 2024

It says SIGUSR1: Unknown signal; kill -l lists signals. I replaced PID with the actual PID and it said that

@pseudotensor
Copy link
Collaborator

What platform are you using? linux, windows, mac?

@jaysunl
Copy link
Author

jaysunl commented May 23, 2024 via email

@pseudotensor
Copy link
Collaborator

Ok that's very odd, not sure about that. Normal PC w/ Linux has that signal, you must have something else special.

Try downloading the FILE separately and place in the llamacpp_path folder, then start h2oGPT with --base_model=llama --llama_cpp_path=FILE .

@jaysunl
Copy link
Author

jaysunl commented May 24, 2024

It still stops at the same place it does above. The thing is when I try launching it on CPU, it works but when I set the CUDA_VISIBLE_DEVICES env variable to enable GPU, it fails and just doesn't continue after it says Using Model Llama

Edit: I did some debugging and found that it gets stuck at this line here:

h2ogpt/src/gen.py

Line 1956 in e11d4c7

hf_embedding_model = dict(name=hf_embedding_model,

@pseudotensor
Copy link
Collaborator

pseudotensor commented May 24, 2024

So during getting or initializing the embedding model, which should be fairly trivial. It's not clear, and really hard to debug without SIGUSR1.

Have you tried the docker installation to avoid concerns about installation issues?

@jaysunl
Copy link
Author

jaysunl commented May 28, 2024

Sorry for the late reply. I just tried the Docker version and encountered an out of space error running docker build -t h2ogpt .

--> 39936edc8bac
Step 15/25 : RUN chmod -R a+rwx /workspace
---> Running in 216f25a14b33
write /workspace/llamacpp_path/zephyr-7b-beta.Q5_K_M.gguf: no space left on device

How much space is this expected to use?

@pseudotensor
Copy link
Collaborator

When you build a docker image, make sure the local path is clean. E.g. I recommend a separate clone of the repo and be in there.

@jaysunl
Copy link
Author

jaysunl commented May 30, 2024

Ok I have managed to build the container. I ran this example on the README:

mkdir -p ~/.cache
mkdir -p ~/save
mkdir -p ~/user_path
mkdir -p ~/db_dir_UserData
mkdir -p ~/users
mkdir -p ~/db_nonusers
mkdir -p ~/llamacpp_path
mkdir -p ~/h2ogpt_auth
echo '["key1","key2"]' > ~/h2ogpt_auth/h2ogpt_api_keys.json
export GRADIO_SERVER_PORT=7860
export OPENAI_SERVER_PORT=5000
docker run \
       --gpus all \
       --runtime=nvidia \
       --shm-size=2g \
       -p $GRADIO_SERVER_PORT:$GRADIO_SERVER_PORT \
       -p $OPENAI_SERVER_PORT:$OPENAI_SERVER_PORT \
       --rm --init \
       --network host \
       -v /etc/passwd:/etc/passwd:ro \
       -v /etc/group:/etc/group:ro \
       -u `id -u`:`id -g` \
       -v "${HOME}"/.cache:/workspace/.cache \
       -v "${HOME}"/save:/workspace/save \
       -v "${HOME}"/user_path:/workspace/user_path \
       -v "${HOME}"/db_dir_UserData:/workspace/db_dir_UserData \
       -v "${HOME}"/users:/workspace/users \
       -v "${HOME}"/db_nonusers:/workspace/db_nonusers \
       -v "${HOME}"/llamacpp_path:/workspace/llamacpp_path \
       -v "${HOME}"/h2ogpt_auth:/workspace/h2ogpt_auth \
       -e GRADIO_SERVER_PORT=$GRADIO_SERVER_PORT \
       gcr.io/vorvan/h2oai/h2ogpt-runtime:0.2.0 /workspace/generate.py \
          --base_model=HuggingFaceH4/zephyr-7b-beta \
          --use_safetensors=True \
          --prompt_type=zephyr \
          --save_dir='/workspace/save/' \
          --auth_filename='/workspace/h2ogpt_auth/auth.json'
          --h2ogpt_api_keys='/workspace/h2ogpt_auth/h2ogpt_api_keys.json'
          --use_gpu_id=False \
          --user_path=/workspace/user_path \
          --langchain_mode="LLM" \
          --langchain_modes="['UserData', 'LLM']" \
          --score_model=None \
          --max_max_new_tokens=2048 \
          --max_new_tokens=1024 \
          --use_auth_token="${HUGGING_FACE_HUB_TOKEN}" \
          --openai_port=$OPENAI_SERVER_PORT

But seems to be hanging after this output:

WARNING: Published ports are discarded when using host network mode
Using Model huggingfaceh4/zephyr-7b-beta
fatal: not a git repository (or any of the parent directories): .git

Does it have anything to do with git_hash.txt? I am running this in the h2ogpt directory.

Also when I try to make a db in the container prior to running the bot, I am hanging after this output:

100%|██████████| 181/181 [02:31<00:00,  1.19it/s]
Exceptions: 0/17294 []

@jaysunl
Copy link
Author

jaysunl commented May 30, 2024

Actually never mind, I resolved the issue. Thank you so much for your help!

One thing I'm curious about is that if I have multiple doc source directories, how would I launch multiple chatbots? Do I need to specify multiple generate.py commands? Would it all be connected to one container? I'm using docker-compose for reference.

@pseudotensor
Copy link
Collaborator

Yes, if you made multiple collections, but want each to be served separately, then you can make them with make_db, then later launch separate h2oGPT for each, using CLI options like used here: https://github.com/h2oai/h2ogpt/blob/main/docs/README_LangChain.md#multiple-embeddings-and-sources

You can use TEI server to share the embeddings if you want to save on GPU and get better speed. I share in FAQ how gpt.h2o.ai is setup using TEI server: https://github.com/h2oai/h2ogpt/blob/main/docs/FAQ.md#text-embedding-inference-server

@jaysunl
Copy link
Author

jaysunl commented Jun 5, 2024

Ok sounds good thanks! I am also receiving this error

h2ogpt_1  | WARNING:matplotlib:Matplotlib created a temporary cache directory at /tmp/matplotlib-l50c0tvu because the default path (/workspace/.cache/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
h2ogpt_1  | There was a problem when trying to write in your cache folder (/workspace/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory.
h2ogpt_1  | Using Model huggingfaceh4/zephyr-7b-beta
h2ogpt_1  | fatal: not a git repository (or any of the parent directories): .git
h2ogpt_1  | Traceback (most recent call last):
h2ogpt_1  |   File "/workspace/generate.py", line 20, in <module>
h2ogpt_1  |     entrypoint_main()
h2ogpt_1  |   File "/workspace/generate.py", line 16, in entrypoint_main
h2ogpt_1  |     H2O_Fire(main)
h2ogpt_1  |   File "/workspace/src/utils.py", line 72, in H2O_Fire
h2ogpt_1  |     fire.Fire(component=component, command=args)
h2ogpt_1  |   File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
h2ogpt_1  |     component_trace = _Fire(component, args, parsed_flag_args, context, name)
h2ogpt_1  |   File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
h2ogpt_1  |     component, remaining_args = _CallAndUpdateTrace(
h2ogpt_1  |   File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
h2ogpt_1  |     component = fn(*varargs, **kwargs)
h2ogpt_1  |   File "/workspace/src/gen.py", line 1886, in main
h2ogpt_1  |     model=get_embedding(use_openai_embedding, hf_embedding_model=hf_embedding_model,
h2ogpt_1  |   File "/workspace/src/gpt_langchain.py", line 544, in get_embedding
h2ogpt_1  |     embedding = HuggingFaceEmbeddings(model_name=hf_embedding_model, model_kwargs=model_kwargs)
h2ogpt_1  |   File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_community/embeddings/huggingface.py", line 72, in __init__
h2ogpt_1  |     self.client = sentence_transformers.SentenceTransformer(
h2ogpt_1  |   File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 87, in __init__
h2ogpt_1  |     snapshot_download(model_name_or_path,
h2ogpt_1  |   File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/sentence_transformers/util.py", line 476, in snapshot_download
h2ogpt_1  |     os.makedirs(nested_dirname, exist_ok=True)
h2ogpt_1  |   File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/os.py", line 215, in makedirs
h2ogpt_1  |     makedirs(head, exist_ok=exist_ok)
h2ogpt_1  |   File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/os.py", line 215, in makedirs
h2ogpt_1  |     makedirs(head, exist_ok=exist_ok)
h2ogpt_1  |   File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/os.py", line 225, in makedirs
h2ogpt_1  |     mkdir(name, mode)
h2ogpt_1  | PermissionError: [Errno 13] Permission denied: '/workspace/.cache/torch'

after running docker-compose up -d --build. My docker-compose.yml looks like this

version: '3'

services:
  h2ogpt:
    build:
      context: .
      dockerfile: Dockerfile
    restart: always
    shm_size: '2gb'
    ports:
      - '${H2OGPT_PORT}:7860'
    volumes:
      #- cache:/workspace/.cache
      #- save:/workspace/save
      - ../workspace/.cache:/workspace/.cache
      - ../workspace/save:/workspace/save
      - ../workspace/user_path:/workspace/user_path
      - ../workspace/db_dir_UserData:/workspace/db_dir_UserData
      - ../workspace/users:/workspace/users
      - ../workspace/db_nonusers:/workspace/db_nonusers
      - ../workspace/llamacpp_path:/workspace/llamacpp_path
      - ../workspace/h2ogpt_auth:/workspace/h2ogpt_auth
    command: |
      /workspace/generate.py
      --base_model=HuggingFaceH4/zephyr-7b-beta
      --hf_embedding_model=sentence-transformers/all-MiniLM-L6-v2
      --score_model=None
      --use_gpu_id=0
 --langchain_mode='UserData'
      --langchain_modes=['UserData2']
      --enable_tts=False
      --enable_stt=False
      --enable_transcriptions=False
      --max_seq_len=2048
      --load_4bit=True
      --share=True
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: all
            capabilities: [gpu]

volumes:
  cache:
  save:

Any workarounds?

@pseudotensor
Copy link
Collaborator

My guess is that you didn't make the directory as a user before it was created as root by the docker image. I've seen it in that case. If you already have the dirs, please remove them.

Then do as user:

mkdir -p $HOME/.cache/huggingface/hub
mkdir -p $HOME/.triton/cache/
mkdir -p $HOME/.config/vllm

then run docker run so that those paths are mapped, e.g.:

port=5003
tokens=8192
docker run -d --restart=always \
    --runtime=nvidia \
    --gpus '"device=6"' \
    --shm-size=10.24gb \
    -p $port:$port \
        -e NCCL_IGNORE_DISABLED_P2P=1 \
    -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
    -e VLLM_NCCL_SO_PATH=/usr/local/lib/python3.10/dist-packages/nvidia/nccl/lib/libnccl.so.2 \
    -v /etc/passwd:/etc/passwd:ro \
    -v /etc/group:/etc/group:ro \
    -u `id -u`:`id -g` \
    -v "${HOME}"/.cache:$HOME/.cache/ -v "${HOME}"/.config:$HOME/.config/   -v "${HOME}"/.triton:$HOME/.triton/  \
    --network host \
    --name danube2chat \
    vllm/vllm-openai:latest \
        --port=$port \
        --host=0.0.0.0 \
        --model=h2oai/h2o-danube2-1.8b-chat \
        --seed 1234 \
        --trust-remote-code \
        --tensor-parallel-size=1 \
        --max-num-batched-tokens 163840 \
      	--max-model-len=$tokens \
        --download-dir=/home/ubuntu/.cache/huggingface/hub &>> logs.vllm_server.danube2chat.txt

@jaysunl
Copy link
Author

jaysunl commented Jun 7, 2024

Sorry for all the questions but I seem to not be able to get any of those working. I'm also getting some errors with the tool not being able to ingest any documents now

 Download simple failed: Invalid URL '/tmp/gradio/361098a8271e9b9cbd110b0a3bc16f886470e713/16ffc_EDA_tools_certificate_report_DBUS0203020_1.pdf': No scheme supplied. Perhaps you meant https:///tmp/gradio/361098a8271e9b9cbd110b0a3bc16f886470e713/16ffc_EDA_tools_certificate_report_DBUS0203020_1.pdf?, trying other means
h2ogpt_ctx | INFO:     172.31.35.37:0 - "GET /queue/data?session_hash=oi9qxivpotp HTTP/1.1" 200 OK
h2ogpt_ctx | ERROR:langchain_community.document_loaders.url:Error fetching or processing http:///tmp/gradio/361098a8271e9b9cbd110b0a3bc16f886470e713/16ffc_EDA_tools_certificate_report_DBUS0203020_1.pdf, exception: Invalid URL 'http:///tmp/gradio/361098a8271e9b9cbd110b0a3bc16f886470e713/16ffc_EDA_tools_certificate_report_DBUS0203020_1.pdf': No host supplied
h2ogpt_ctx | Failed to ingest /tmp/gradio/361098a8271e9b9cbd110b0a3bc16f886470e713/16ffc_EDA_tools_certificate_report_DBUS0203020_1.pdf due to Traceback (most recent call last):
h2ogpt_ctx |   File "/workspace/src/gpt_langchain.py", line 5274, in path_to_doc1
h2ogpt_ctx |     res = file_to_doc(file,
h2ogpt_ctx |   File "/workspace/src/gpt_langchain.py", line 4531, in file_to_doc
h2ogpt_ctx |     docs1a = asyncio.run(PlaywrightURLLoader(urls=final_urls).aload())
h2ogpt_ctx |   File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/asyncio/runners.py", line 44, in run
h2ogpt_ctx |     return loop.run_until_complete(main)
h2ogpt_ctx |   File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
h2ogpt_ctx |   File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_community/document_loaders/url_playwright.py", line 199, in aload
h2ogpt_ctx |     return [doc async for doc in self.alazy_load()]
h2ogpt_ctx |   File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_community/document_loaders/url_playwright.py", line 199, in <listcomp>
h2ogpt_ctx |     return [doc async for doc in self.alazy_load()]
h2ogpt_ctx |   File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_community/document_loaders/url_playwright.py", line 211, in alazy_load
h2ogpt_ctx |     browser = await p.chromium.launch(headless=self.headless, proxy=self.proxy)
h2ogpt_ctx |   File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/playwright/async_api/_generated.py", line 13957, in launch
h2ogpt_ctx |     await self._impl_obj.launch(
h2ogpt_ctx |   File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/playwright/_impl/_browser_type.py", line 94, in launch
h2ogpt_ctx |     Browser, from_channel(await self._channel.send("launch", params))
h2ogpt_ctx |   File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 59, in send
h2ogpt_ctx |     return await self._connection.wrap_api_call(
h2ogpt_ctx |   File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 514, in wrap_api_call
h2ogpt_ctx |     raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None
h2ogpt_ctx | playwright._impl._errors.Error: BrowserType.launch: Executable doesn't exist at /workspace/.cache/ms-playwright/chromium-1117/chrome-linux/chrome
h2ogpt_ctx | ╔════════════════════════════════════════════════════════════╗
h2ogpt_ctx | ║ Looks like Playwright was just installed or updated.       ║
h2ogpt_ctx | ║ Please run the following command to download new browsers: ║
h2ogpt_ctx | ║                                                            ║
h2ogpt_ctx | ║     playwright install                                     ║
h2ogpt_ctx | ║                                                            ║
h2ogpt_ctx | ║ <3 Playwright Team                                         ║
h2ogpt_ctx | ╚════════════════════════════════════════════════════════════╝

Can I get a step by step example from you if possible that uses Docker setup with document ingestion? I followed the ones on the README docs given but everytime I am met with an error. Thank you.

@pseudotensor
Copy link
Collaborator

Hi, can you give me your startup command and an example URL you are trying to provide (or one that fails in same way)?

@pseudotensor
Copy link
Collaborator

That playwright line should already have been done in docker or local install here:

playwright install --with-deps

But, can you try in expert settings disabling playwright and only forcing unstructured to be used?

i.e. related code:

h2ogpt/src/gpt_langchain.py

Lines 4526 to 4542 in 5b48852

if do_unstructured or use_unstructured:
docs1a = UnstructuredURLLoader(urls=final_urls, headers=dict(ssl_verify="False")).load()
docs1a = [x for x in docs1a if
x.page_content and x.page_content != '403 Forbidden' and not x.page_content.startswith(
'Access Denied')]
add_parser(docs1a, 'UnstructuredURLLoader')
docs1.extend(docs1a)
if len(docs1) == 0 and have_playwright or do_playwright:
# then something went wrong, try another loader:
from langchain_community.document_loaders import PlaywrightURLLoader
docs1a = asyncio.run(PlaywrightURLLoader(urls=final_urls).aload())
# docs1 = PlaywrightURLLoader(urls=[file]).load()
docs1a = [x for x in docs1a if
x.page_content and x.page_content != '403 Forbidden' and not x.page_content.startswith(
'Access Denied')]
add_parser(docs1a, 'PlaywrightURLLoader')
docs1.extend(docs1a)

@pseudotensor
Copy link
Collaborator

It's also possible I broke something very recently w.r.t. name handling. I'll check soon.

@pseudotensor
Copy link
Collaborator

A few things I just tried worked:

In UI for "Ask or Ingest" I put in these urls and they all worked, after clicking ingest:

www.cnn.com
https://cdn.openai.com/papers/whisper.pdf

Are are you not giving a URL, but trying to upload a PDF?

Please provide some details about what you are doing.

@jaysunl
Copy link
Author

jaysunl commented Jun 7, 2024

I am trying to upload PDFs. Actually, I just resolved the document loading issues. But I'm having a few other issues related to the ones I've been getting. First making the database with Docker command. I am running the following

mkdir -p ~/.cache
mkdir -p ~/save
mkdir -p ~/user_path
mkdir -p ~/db_dir_UserData
docker run \
       --gpus all \
       --runtime=nvidia \
       --shm-size=2g \
       --rm --init \
       --network host \
       -v /etc/passwd:/etc/passwd:ro \
       -v /etc/group:/etc/group:ro \
       -u `id -u`:`id -g` \
       -v "${HOME}"/.cache:/workspace/.cache \
       -v "${HOME}"/save:/workspace/save \
       -v "${HOME}"/user_path:/workspace/user_path \
       -v "${HOME}"/db_dir_UserData:/workspace/db_dir_UserData \
       gcr.io/vorvan/h2oai/h2ogpt-runtime:0.2.0 /workspace/src/make_db.py --verbose

which is straight from the documentation and I made sure the directories I've made were writable. I put my documents under ~/user_path (181 PDFs). It manages to ingest but this is all that prints

100%|██████████| 181/181 [02:25<00:00,  1.24it/s]
Exceptions: 0/17294 []
Loading and creating db

But nothing else happens after that. So I'm assuming no db has been made since the command did not terminate.

The second issue I'm getting is the permission issue still. Using this one from the documentation about running the tool using a generated db (after running src/make_db.py)

mkdir -p ~/.cache
mkdir -p ~/save
mkdir -p ~/user_path
mkdir -p ~/db_dir_UserData
mkdir -p ~/users
mkdir -p ~/db_nonusers
mkdir -p ~/llamacpp_path
docker run \
       --gpus '"device=0"' \
       --runtime=nvidia \
       --shm-size=2g \
       -p 7860:7860 \
       --rm --init \
       --network host \
       -v /etc/passwd:/etc/passwd:ro \
       -v /etc/group:/etc/group:ro \
       -u `id -u`:`id -g` \
       -v "${HOME}"/.cache:/workspace/.cache \
       -v "${HOME}"/save:/workspace/save \
       -v "${HOME}"/user_path:/workspace/user_path \
       -v "${HOME}"/db_dir_UserData:/workspace/db_dir_UserData \
       -v "${HOME}"/users:/workspace/users \
       -v "${HOME}"/db_nonusers:/workspace/db_nonusers \
       -v "${HOME}"/llamacpp_path:/workspace/llamacpp_path \
       gcr.io/vorvan/h2oai/h2ogpt-runtime:0.2.0 /workspace/generate.py \
          --base_model=h2oai/h2ogpt-4096-llama2-7b-chat \
          --use_safetensors=True \
          --prompt_type=llama2 \
          --save_dir='/workspace/save/' \
          --use_gpu_id=False \
          --score_model=None \
          --max_max_new_tokens=2048 \
          --max_new_tokens=1024 \
          --langchain_mode=LLM

I get the following output

WARNING: Published ports are discarded when using host network mode
Using Model h2oai/h2ogpt-4096-llama2-7b-chat
fatal: not a git repository (or any of the parent directories): .git

but nothing else after. I will see if I can reproduce the permission error because I saw it quite a few times the past few days.

@pseudotensor
Copy link
Collaborator

After make_db.py does the ingestion of PDFs, it needs to embed the data into the database. I assume you have GPUs, and so then it would use a GPU to do the embedding. The speed depends upon your GPUs, the embedding model, etc. This is what is meant by the line Loading and creating db.

You should see intense activity on the GPU used for embedding.

@pseudotensor
Copy link
Collaborator

These:

WARNING: Published ports are discarded when using host network mode
fatal: not a git repository (or any of the parent directories): .git

are ignorable.

But if it's hanging at:

Using Model h2oai/h2ogpt-4096-llama2-7b-chat

some issue with the model. It could be downloading the model (it should show that though) or a network issue in talking to HF, etc.

@pseudotensor
Copy link
Collaborator

pseudotensor commented Jun 7, 2024

I ran your command on about 290 PDFs in the ~/user_path.

During ingestion, you'll see stuff like this, using all cores efficiently since pymupdf is default PDF parser and only backup is used if it totally fails and pymupdf uses only CPU.

After it becomes less busy, it means it's working on the last remaining files that are left in some batch, so not as parallel if the PDFs are kinda different from each other.

If some PDF fails with pymupdf, it will go to unstructured loader using tesserract, which can be very slow. You can run with --use_unstructured_pdf=False --enable_pdf_ocr=False to avoid that.

image

@pseudotensor
Copy link
Collaborator

pseudotensor commented Jun 7, 2024

Once the GPU kicks in it'll look like the below.

It takes a while to use instructor-large that is default for GPU case. It would be faster if one used a smaller embedding model, like BAAI/bge-small-en-v1.5, e.g.:

mkdir -p ~/.cache
mkdir -p ~/save
mkdir -p ~/user_path
mkdir -p ~/db_dir_UserData
docker run \
       --gpus all \
       --runtime=nvidia \
       --shm-size=2g \
       --rm --init \
       --network host \
       -v /etc/passwd:/etc/passwd:ro \
       -v /etc/group:/etc/group:ro \
       -u `id -u`:`id -g` \
       -v "${HOME}"/.cache:/workspace/.cache \
       -v "${HOME}"/save:/workspace/save \
       -v "${HOME}"/user_path:/workspace/user_path \
       -v "${HOME}"/db_dir_UserData:/workspace/db_dir_UserData \
       gcr.io/vorvan/h2oai/h2ogpt-runtime:0.2.1 /workspace/src/make_db.py --verbose --use_unstructured_pdf=False --enable_pdf_ocr=False --hf_embedding_model=BAAI/bge-small-en-v1.5

when you run ensure to use generate.py with --cut_distance=10000.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants