-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pretrained models config files #45
Comments
You can also fetch this configuration with the following code: import transformers
model_config = transformers.PretrainedConfig.from_pretrained("zxhezexin/openlrm-obj-base-1.1")
print(model_config)
Personaly for pretraining I've changed the code to load the pretrained model directly from openlrm.utils.hf_hub import wrap_model_hub
class LRMTrainer(Trainer):
...
def _build_model(self, cfg):
assert (
cfg.experiment.type == "lrm"
), f"Config type {cfg.experiment.type} does not match with runner {self.__class__.__name__}"
from openlrm.models import ModelLRM
model_class = wrap_model_hub(ModelLRM)
model = model_class.from_pretrained(cfg.experiment.pretrained)
return model you can replace |
@da2r-20 This is amazing! Thanks! |
Hi @ZexinHe , thank you for your advice. training dataThere are 1000 custom glb files, all processed through train-sample.yamlexperiment:
type: lrm
seed: 42
parent: lrm-objaverse
child: small-dummyrun
model:
camera_embed_dim: 1024
rendering_samples_per_ray: 128
transformer_dim: 1024
transformer_layers: 16
transformer_heads: 16
triplane_low_res: 32
triplane_high_res: 64
triplane_dim: 80
encoder_type: dinov2
encoder_model_name: dinov2_vitb14_reg
encoder_feat_dim: 768
encoder_freeze: false
dataset:
subsets:
- name: objaverse
root_dirs:
- "/home/ubuntu/training-tokyo/OpenLRM/views"
meta_path:
train: "/home/ubuntu/training-tokyo/OpenLRM/train_uids.json"
val: "/home/ubuntu/training-tokyo/OpenLRM/val_uids.json"
sample_rate: 1.0
sample_side_views: 3
source_image_res: 1008 # higher resolution
render_image:
low: 512 # higher resolution
high: 1008 # higher resolution
region: 64
normalize_camera: true
normed_dist_to_center: auto
num_train_workers: 4
num_val_workers: 2
pin_mem: true
train:
mixed_precision: bf16 # REPLACE THIS BASED ON GPU TYPE
find_unused_parameters: false
loss:
pixel_weight: 1.0
perceptual_weight: 1.0
tv_weight: 5e-4
optim:
lr: 4e-4
weight_decay: 0.05
beta1: 0.9
beta2: 0.95
clip_grad_norm: 1.0
scheduler:
type: cosine
warmup_real_iters: 3000
batch_size: 3 # reduced it because of the CUDA OOM error
accum_steps: 1
epochs: 2000 # modified it for overfitting
debug_global_steps: null
val:
batch_size: 2 # modified
global_step_period: 1000
debug_batches: null
saver:
auto_resume: true
load_model: "/home/ubuntu/training-tokyo/OpenLRM/model.safetensors" # this refers to "zxhezexin/openlrm-mix-large-1.1"
checkpoint_root: ./exps/checkpoints
checkpoint_global_steps: 1000
checkpoint_keep_level: 5
logger:
stream_level: WARNING
log_level: INFO
log_root: ./exps/logs
tracker_root: ./exps/trackers
enable_profiler: false
trackers:
- tensorboard
image_monitor:
train_global_steps: 100
samples_per_log: 4
compile:
suppress_errors: true
print_specializations: true
disable: true training result[TRAIN STEP]loss=0.21, loss_pixel=0.0265, loss_perceptual=0.184, loss_tv=0.424, lr=3.04e-13: 100%|█| 60000/60000 [15:40:28<00:00, 1.06s/it] as you can see above, the loss value is too high, and the inference result based on this checkpoint model is not good. previous trained inference resultI really need to increase the texture resolution. |
Hi boss Can I ask a question related to fune tuning ? |
Hi @joshkiller, I'm very new to AI, not an expert. However, I'd be happy to help with anything I can! |
I Was wondering if someone can fine tune a model and not change the general behavior of the model. Like i find that a model as stable diffusion generate some time images that can't be use for 3D reconstruction. How and with what kind of data can we remediate to that problem. so that the model cool generate only total and unique objects ? I'm doing my master intership program with text to 3d pipeline |
Usually what you are describing can be achived with LoRA and it's deriviatives. text->3d is a different yet related task, there are other models available for this task. If you want to create data using OpenLRM you could, say use image-text pairs and get the 3d representation using OpenLRM. |
Thanks a lot for your answers. I will try to delve into LoRA more than I did for now |
@hayoung-jeremy I'm also trying to finetune the same model @ZexinHe I've also noticed that the original paper uses perceptual_weight=2.0 Training with this weight didn't improve my results though |
Hi, could you provide the configuration files that you used to train the models that are made available on huggingface? I noticed that the one available in the repo refers to the small model, but I would like to try finetuning the base and large.
The text was updated successfully, but these errors were encountered: