Using a conda env instead of venv #1088

lopezjuanma96 · 2023-06-27T17:37:30Z

Hi, is there a way I can setup repo to run from a conda environment instead of creating venv? I'm running on a limited linux server and cannot change the Python version to > 3.10.6 ... I tried creating the environment and installing requirements_linux.txt but I get the error:

ERROR: Invalid requirement: 'torch==2.0.1+cu118 torchvision==0.15.2+cu118' (from line 1 of requirements_linux.txt)

And although i can install the requirements.txt I still get errors when running gui.sh

The text was updated successfully, but these errors were encountered:

bmaltais · 2023-06-27T21:29:40Z

Technically you could install everything in conda and the use

python kohya_gui.py to start the GUI... That would allow you to run without using a venv

bmaltais · 2023-06-27T21:30:50Z

Hummm... Maybe not... I think some of the scripts might actually point to a local venv folder... I am not sure...

lopezjuanma96 · 2023-06-27T21:58:09Z

Thanks for the quick response, I'm AFK right now but I'll test again tomorrow, would you mind if I sent you what issues I come up against here while I try doing that?

lopezjuanma96 · 2023-06-28T15:00:09Z

I've been doing some tests, I managed to find out why pip install -r requirements_linux.txt .. It seems like the torch line was breaking it so i extracted the first line and installed it separately and later installed requirements_linux.txt.
I could the start the server with python kohya_gui.py and load settings but when I tried running a train or even printing the train command i got the error:

Traceback (most recent call last):
  File "/home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/gradio/routes.py", line 427, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/gradio/blocks.py", line 1323, in process_api
    result = await self.call_function(
  File "/home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/gradio/blocks.py", line 1051, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/home/jmlopez/stable-diffusion/kohya_ss/dreambooth_gui.py", line 373, in train_model
    output_message(
  File "/home/jmlopez/stable-diffusion/kohya_ss/library/common_gui.py", line 84, in output_message
    msgbox(msg=msg, title=title)
  File "/home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/easygui/boxes/derived_boxes.py", line 230, in msgbox
    return buttonbox(msg=msg,
  File "/home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/easygui/boxes/button_box.py", line 95, in buttonbox
    bb = ButtonBox(
  File "/home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/easygui/boxes/button_box.py", line 147, in __init__
    self.ui = GUItk(msg, title, choices, images, default_choice, cancel_choice, self.callback_ui)
  File "/home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/easygui/boxes/button_box.py", line 263, in __init__
    self.boxRoot = tk.Tk()
  File "/home/jmlopez/miniconda3/envs/kohya/lib/python3.10/tkinter/__init__.py", line 2299, in __init__
    self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: no display name and no $DISPLAY environment variable

So I also tried running training from my Windows PC and copying the accelerate command. I run that copied command in Windows and it worked properly but when trying to run it in this linux server I get an error two.
The command is something like this:

accelerate launch --num_cpu_threads_per_process=2 "train_db.py" --v2 --pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1-base" --train_data_dir="IMAGE_DIR" --resolution=512,512 --output_dir="MODEL_DIR" --logging_dir="LOGGING_DIR" --save_model_as=safetensors --output_name="MODEL_NAME" --max_data_loader_n_workers="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="2" --max_train_steps="42650" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale

and this is the error:

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /home/jmlopez/miniconda3/envs/kohya/bin/accelerate:8 in <module>             │
│                                                                              │
│   5 from accelerate.commands.accelerate_cli import main                      │
│   6 if __name__ == '__main__':                                               │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])     │
│ ❱ 8 │   sys.exit(main())                                                     │
│   9                                                                          │
│                                                                              │
│ /home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/accelerate/ │
│ commands/accelerate_cli.py:45 in main                                        │
│                                                                              │
│   42 │   │   exit(1)                                                         │
│   43 │                                                                       │
│   44 │   # Run                                                               │
│ ❱ 45 │   args.func(args)                                                     │
│   46                                                                         │
│   47                                                                         │
│   48 if __name__ == "__main__":                                              │
│                                                                              │
│ /home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/accelerate/ │
│ commands/launch.py:1000 in launch_command                                    │
│                                                                              │
│    997 │   warned = []                                                       │
│    998 │   # Get the default from the config file.                           │
│    999 │   if args.config_file is not None or os.path.isfile(default_config_ │
│ ❱ 1000 │   │   defaults = load_config_from_file(args.config_file)            │
│   1001 │   │   if (                                                          │
│   1002 │   │   │   not args.multi_gpu                                        │
│   1003 │   │   │   and not args.tpu                                          │
│                                                                              │
│ /home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/accelerate/ │
│ commands/config/config_args.py:64 in load_config_from_file                   │
│                                                                              │
│    61 │   │   │   │   config_class = ClusterConfig                           │
│    62 │   │   │   else:                                                      │
│    63 │   │   │   │   config_class = SageMakerConfig                         │
│ ❱  64 │   │   │   return config_class.from_yaml_file(yaml_file=config_file)  │
│    65                                                                        │
│    66                                                                        │
│    67 @dataclass                                                             │
│                                                                              │
│ /home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/accelerate/ │
│ commands/config/config_args.py:122 in from_yaml_file                         │
│                                                                              │
│   119 │   │   if "dynamo_backend" not in config_dict:                        │
│   120 │   │   │   config_dict["dynamo_backend"] = DynamoBackend.NO           │
│   121 │   │                                                                  │
│ ❱ 122 │   │   return cls(**config_dict)                                      │
│   123 │                                                                      │
│   124 │   def to_yaml_file(self, yaml_file):                                 │
│   125 │   │   with open(yaml_file, "w", encoding="utf-8") as f:              │
╰──────────────────────────────────────────────────────────────────────────────╯
TypeError: ClusterConfig.__init__() got an unexpected keyword argument 'tpu_env'

I am trying to follow the Traceback of the error but I can't seem to find anything, maybe you can see what to do or at least where to start looking.

bmaltais · 2023-06-28T20:20:06Z

Try starting the GUI with: python kohya_gui.py --headless

lopezjuanma96 · 2023-06-29T15:59:16Z

The --headless fix worked fine, I could both "print training command" and "start training", but I got that same 'tpu_env' Error as when I run the command directly, would you know where that could be from?

lopezjuanma96 · 2023-07-11T18:33:26Z

UPDATE:

I was able to run kohya_gui.py without the headless parameter by properly refering to the venv python. For some reason the server I was using was using the conda environment by default, even with an independent python installation, probably something to do with th ~/.bashrc initialization. In any case, I was able to run it using venv/bin/python kohya_gui.py (venv refering obviously to the venv created un the setup execution).

As for the 'tpu_env' error I followed some instructions on #564 and modified the default accelerate configuration YAML file at ~/.cache/huggingface/accelerate/default_config.yaml. This could have probably be avoided by configurationg accelerate manually with venv/bin/accelerate config but it was raising errors on my end on the configuration process, might work for someone else.

ballerburg9005 · 2024-06-09T16:14:51Z

I am on Archlinux, so my python version is 3.11.8 and using conda can fix this.

So I did this to make it run:

conda create -n kohya python=3.10.9
conda activate kohya
conda install pylint-venv
pip install -r requirements.txt

pip install xformers
pip install bitsandbytes

python kohya_gui.py --headless

None of they setup scripts and such work, but they are not required.

Nixellion · 2024-06-29T07:58:18Z

Yeah, IMO, all AI\ML projects should start using conda by default, because of all the version incompatibilities between python, torch, cuda, etc. Without using conda it's practically impossible to install 2 AI projects that use different versions of cuda, for example. Even with venv it's complicated. I don't think you can have both 11.x and 12.x CUDA versions installed on one system. And even on servers, spamming containers for every task is often tedious.

Conda serves all of these problems in a few lines. I don't understand why people spend so much time writing complicated bash install scripts, when 90% of that code - a few conda commands can solve it. Literally @ballerburg9005 just showed all the commands one has to run to install it. And it's completely system agnostic, it will work on windows and linux alike. Not sure about MacOS, probably as well. So you just need to install miniconda, then these commands.

And miniconda is there as well, you don't need to install the full fat anaconda.

I didn't like it myself at first, did not understand the benefits, but now I definitely do.

@ballerburg9005 thanks for the commands. I would however use python -m pip install ... just to be safe and make sure the correct python and pip are used.

You also need to install cuda though, and requirements_linux have torch install lines that dont work, so full set for me is:

# Install miniconda as per https://docs.anaconda.com/miniconda/
git clone --recursive https://github.com/bmaltais/kohya_ss.git
cd kohya_ss
conda create -n kohya python=3.10.9
conda activate kohya
conda install pylint-venv
conda install -y -c "nvidia/label/cuda-11.8.0" cuda
python -m pip install torch==2.1.2+cu118 torchvision==0.16.2+cu118 xformers==0.0.23.post1+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
python -m pip install xformers bitsandbytes

python kohya_gui.py --headless --listen 0.0.0.0

It launched. Did not yet try training anything with it, will do soon.
Added --listen flag to make it available on LAN from other PCs.

EDIT: Yeah it works, however it shows bitsandbytes missing libraries error. Could probably fix that. But ended up installing normally in a separate LXC container. I was getting CUDA OOM errors and only realize it was just a mistake in config (batch size 2, instead of 1) when I already swapped it with a non-conda install. Not sure if bitsandbytes warnings were of any real significance.

bmaltais closed this as completed Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using a conda env instead of venv #1088

Using a conda env instead of venv #1088

lopezjuanma96 commented Jun 27, 2023

bmaltais commented Jun 27, 2023

bmaltais commented Jun 27, 2023

lopezjuanma96 commented Jun 27, 2023

lopezjuanma96 commented Jun 28, 2023

bmaltais commented Jun 28, 2023

lopezjuanma96 commented Jun 29, 2023

lopezjuanma96 commented Jul 11, 2023

ballerburg9005 commented Jun 9, 2024 •

edited

Loading

Nixellion commented Jun 29, 2024 •

edited

Loading

Using a conda env instead of venv #1088

Using a conda env instead of venv #1088

Comments

lopezjuanma96 commented Jun 27, 2023

bmaltais commented Jun 27, 2023

bmaltais commented Jun 27, 2023

lopezjuanma96 commented Jun 27, 2023

lopezjuanma96 commented Jun 28, 2023

bmaltais commented Jun 28, 2023

lopezjuanma96 commented Jun 29, 2023

lopezjuanma96 commented Jul 11, 2023

ballerburg9005 commented Jun 9, 2024 • edited Loading

Nixellion commented Jun 29, 2024 • edited Loading

ballerburg9005 commented Jun 9, 2024 •

edited

Loading

Nixellion commented Jun 29, 2024 •

edited

Loading