Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using a conda env instead of venv #1088

Closed
lopezjuanma96 opened this issue Jun 27, 2023 · 9 comments
Closed

Using a conda env instead of venv #1088

lopezjuanma96 opened this issue Jun 27, 2023 · 9 comments

Comments

@lopezjuanma96
Copy link

Hi, is there a way I can setup repo to run from a conda environment instead of creating venv? I'm running on a limited linux server and cannot change the Python version to > 3.10.6 ... I tried creating the environment and installing requirements_linux.txt but I get the error:

ERROR: Invalid requirement: 'torch==2.0.1+cu118 torchvision==0.15.2+cu118' (from line 1 of requirements_linux.txt)

And although i can install the requirements.txt I still get errors when running gui.sh

@bmaltais
Copy link
Owner

Technically you could install everything in conda and the use

python kohya_gui.py to start the GUI... That would allow you to run without using a venv

@bmaltais
Copy link
Owner

Hummm... Maybe not... I think some of the scripts might actually point to a local venv folder... I am not sure...

@lopezjuanma96
Copy link
Author

Thanks for the quick response, I'm AFK right now but I'll test again tomorrow, would you mind if I sent you what issues I come up against here while I try doing that?

@lopezjuanma96
Copy link
Author

I've been doing some tests, I managed to find out why pip install -r requirements_linux.txt .. It seems like the torch line was breaking it so i extracted the first line and installed it separately and later installed requirements_linux.txt.
I could the start the server with python kohya_gui.py and load settings but when I tried running a train or even printing the train command i got the error:

Traceback (most recent call last):
  File "/home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/gradio/routes.py", line 427, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/gradio/blocks.py", line 1323, in process_api
    result = await self.call_function(
  File "/home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/gradio/blocks.py", line 1051, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/home/jmlopez/stable-diffusion/kohya_ss/dreambooth_gui.py", line 373, in train_model
    output_message(
  File "/home/jmlopez/stable-diffusion/kohya_ss/library/common_gui.py", line 84, in output_message
    msgbox(msg=msg, title=title)
  File "/home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/easygui/boxes/derived_boxes.py", line 230, in msgbox
    return buttonbox(msg=msg,
  File "/home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/easygui/boxes/button_box.py", line 95, in buttonbox
    bb = ButtonBox(
  File "/home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/easygui/boxes/button_box.py", line 147, in __init__
    self.ui = GUItk(msg, title, choices, images, default_choice, cancel_choice, self.callback_ui)
  File "/home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/easygui/boxes/button_box.py", line 263, in __init__
    self.boxRoot = tk.Tk()
  File "/home/jmlopez/miniconda3/envs/kohya/lib/python3.10/tkinter/__init__.py", line 2299, in __init__
    self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: no display name and no $DISPLAY environment variable

So I also tried running training from my Windows PC and copying the accelerate command. I run that copied command in Windows and it worked properly but when trying to run it in this linux server I get an error two.
The command is something like this:

accelerate launch --num_cpu_threads_per_process=2 "train_db.py" --v2 --pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1-base" --train_data_dir="IMAGE_DIR" --resolution=512,512 --output_dir="MODEL_DIR" --logging_dir="LOGGING_DIR" --save_model_as=safetensors --output_name="MODEL_NAME" --max_data_loader_n_workers="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="2" --max_train_steps="42650" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale

and this is the error:

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /home/jmlopez/miniconda3/envs/kohya/bin/accelerate:8 in <module>             │
│                                                                              │
│   5 from accelerate.commands.accelerate_cli import main                      │
│   6 if __name__ == '__main__':                                               │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])     │
│ ❱ 8 │   sys.exit(main())                                                     │
│   9                                                                          │
│                                                                              │
│ /home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/accelerate/ │
│ commands/accelerate_cli.py:45 in main                                        │
│                                                                              │
│   42 │   │   exit(1)                                                         │
│   43 │                                                                       │
│   44 │   # Run                                                               │
│ ❱ 45 │   args.func(args)                                                     │
│   46                                                                         │
│   47                                                                         │
│   48 if __name__ == "__main__":                                              │
│                                                                              │
│ /home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/accelerate/ │
│ commands/launch.py:1000 in launch_command                                    │
│                                                                              │
│    997 │   warned = []                                                       │
│    998 │   # Get the default from the config file.                           │
│    999 │   if args.config_file is not None or os.path.isfile(default_config_ │
│ ❱ 1000 │   │   defaults = load_config_from_file(args.config_file)            │
│   1001 │   │   if (                                                          │
│   1002 │   │   │   not args.multi_gpu                                        │
│   1003 │   │   │   and not args.tpu                                          │
│                                                                              │
│ /home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/accelerate/ │
│ commands/config/config_args.py:64 in load_config_from_file                   │
│                                                                              │
│    61 │   │   │   │   config_class = ClusterConfig                           │
│    62 │   │   │   else:                                                      │
│    63 │   │   │   │   config_class = SageMakerConfig                         │
│ ❱  64 │   │   │   return config_class.from_yaml_file(yaml_file=config_file)  │
│    65                                                                        │
│    66                                                                        │
│    67 @dataclass                                                             │
│                                                                              │
│ /home/jmlopez/miniconda3/envs/kohya/lib/python3.10/site-packages/accelerate/ │
│ commands/config/config_args.py:122 in from_yaml_file                         │
│                                                                              │
│   119 │   │   if "dynamo_backend" not in config_dict:                        │
│   120 │   │   │   config_dict["dynamo_backend"] = DynamoBackend.NO           │
│   121 │   │                                                                  │
│ ❱ 122 │   │   return cls(**config_dict)                                      │
│   123 │                                                                      │
│   124 │   def to_yaml_file(self, yaml_file):                                 │
│   125 │   │   with open(yaml_file, "w", encoding="utf-8") as f:              │
╰──────────────────────────────────────────────────────────────────────────────╯
TypeError: ClusterConfig.__init__() got an unexpected keyword argument 'tpu_env'

I am trying to follow the Traceback of the error but I can't seem to find anything, maybe you can see what to do or at least where to start looking.

@bmaltais
Copy link
Owner

Try starting the GUI with: python kohya_gui.py --headless

@lopezjuanma96
Copy link
Author

The --headless fix worked fine, I could both "print training command" and "start training", but I got that same 'tpu_env' Error as when I run the command directly, would you know where that could be from?

@lopezjuanma96
Copy link
Author

UPDATE:

I was able to run kohya_gui.py without the headless parameter by properly refering to the venv python. For some reason the server I was using was using the conda environment by default, even with an independent python installation, probably something to do with th ~/.bashrc initialization. In any case, I was able to run it using venv/bin/python kohya_gui.py (venv refering obviously to the venv created un the setup execution).

As for the 'tpu_env' error I followed some instructions on #564 and modified the default accelerate configuration YAML file at ~/.cache/huggingface/accelerate/default_config.yaml. This could have probably be avoided by configurationg accelerate manually with venv/bin/accelerate config but it was raising errors on my end on the configuration process, might work for someone else.

@ballerburg9005
Copy link

ballerburg9005 commented Jun 9, 2024

I am on Archlinux, so my python version is 3.11.8 and using conda can fix this.

So I did this to make it run:

conda create -n kohya python=3.10.9
conda activate kohya
conda install pylint-venv
pip install -r requirements.txt

pip install xformers
pip install bitsandbytes

python kohya_gui.py --headless

None of they setup scripts and such work, but they are not required.

@Nixellion
Copy link

Nixellion commented Jun 29, 2024

Yeah, IMO, all AI\ML projects should start using conda by default, because of all the version incompatibilities between python, torch, cuda, etc. Without using conda it's practically impossible to install 2 AI projects that use different versions of cuda, for example. Even with venv it's complicated. I don't think you can have both 11.x and 12.x CUDA versions installed on one system. And even on servers, spamming containers for every task is often tedious.

Conda serves all of these problems in a few lines. I don't understand why people spend so much time writing complicated bash install scripts, when 90% of that code - a few conda commands can solve it. Literally @ballerburg9005 just showed all the commands one has to run to install it. And it's completely system agnostic, it will work on windows and linux alike. Not sure about MacOS, probably as well. So you just need to install miniconda, then these commands.

And miniconda is there as well, you don't need to install the full fat anaconda.

I didn't like it myself at first, did not understand the benefits, but now I definitely do.

@ballerburg9005 thanks for the commands. I would however use python -m pip install ... just to be safe and make sure the correct python and pip are used.

You also need to install cuda though, and requirements_linux have torch install lines that dont work, so full set for me is:

# Install miniconda as per https://docs.anaconda.com/miniconda/
git clone --recursive https://github.com/bmaltais/kohya_ss.git
cd kohya_ss
conda create -n kohya python=3.10.9
conda activate kohya
conda install pylint-venv
conda install -y -c "nvidia/label/cuda-11.8.0" cuda
python -m pip install torch==2.1.2+cu118 torchvision==0.16.2+cu118 xformers==0.0.23.post1+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
python -m pip install xformers bitsandbytes

python kohya_gui.py --headless --listen 0.0.0.0

It launched. Did not yet try training anything with it, will do soon.
Added --listen flag to make it available on LAN from other PCs.

EDIT: Yeah it works, however it shows bitsandbytes missing libraries error. Could probably fix that. But ended up installing normally in a separate LXC container. I was getting CUDA OOM errors and only realize it was just a mistake in config (batch size 2, instead of 1) when I already swapped it with a non-conda install. Not sure if bitsandbytes warnings were of any real significance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants