AnimateAnyone_unofficial

Unofficial implementation of Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

Pre-trained model: stable diffusion 1.5
Resolution: 512
Batch size: 2
GPU: single A6000 48G
Trainging time: 12 hours, global iteration: 37800
Trainging time: 2 days, global iteration: 127400
Trainging time: 2.5 days, global iteration: 180000
Under training...

Up to now, after 180,000 training sessions, this unofficial code implementation still seems unable to correctly learn information about the human skeleton. Sometimes, it even fails to generate a normal human figure, displaying only the background. Moreover, this background seems to resemble the style of the reference.

😄😄🚀🚀Due to the absence of official source code release, this unofficial code has not been thoroughly validated, and there are still many details to be verified. We welcome collaboration from the community to collectively implement and refine this algorithm！！！

Description

This repo is mainly to re-implement AnimateAnyone based on official ControlNet repository.

AnimateAnyone: Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

Getting Started

Prerequisites

Linux or macOS
NVIDIA GPU + CUDA CuDNN
Python 3

Installation

Clone the repository:

git clone https://github.com/MingtaoGuo/AnimateAnyone_unofficial.git
cd AnimateAnyone_unofficial

Dependencies:
We recommend running this repository using Anaconda. All dependencies for defining the environment are provided in environment.yaml.

First stage training

Downloading the pre-trained stable diffusion v1-5-pruned.ckpt .
Extraction of CLIP Vision Embedder Weights

python tool_get_visionclip.py

Copying Weights from Pretrained stable diffusion model to ReferenceNet

python tool_add_reference.py ./models/v1-5-pruned.ckpt ./models/reference_sd15_ini.ckpt

Preprocessing Video Dataset (Video Decoding and Human Skeleton Extraction)

python tool_get_pose.py --mp4_path Dataset/fashion_mp4/ \
                        --save_frame_path Dataset/fashion_png/ \
                        --save_pose_path Dataset/fashion_pose/

Dataset Organization Structure

Dataset
  ├── fashion_mp4
      ├── 1.mp4
      ├── 2.mp4
       ...
  ├── fashion_png
      ├── 1.mp4
          ├── 1.png
          ├── 2.png
           ...
      ├── 2.mp4
          ├── 1.png
          ├── 2.png
             ...
         ...
  ├── fashion_pose
      ├── 1.mp4
          ├── 1.png
          ├── 2.png
           ...
      ├── 2.mp4
          ├── 1.png
          ├── 2.png
             ...
         ...

Training 🚀

python tutorial_train_animate.py

Custom Dataset

import json
import os 
import cv2
import numpy as np
from torch.utils.data import Dataset

class MyDataset(Dataset):
    def __init__(self, path="Dataset/"):
        self.path = path
        self.videos = os.listdir(path + "fashion_png")

    def __len__(self):
        return len(self.videos) * 10

    def __getitem__(self, idx):
        video_name = np.random.choice(self.videos)
        frames = np.random.choice(os.listdir(self.path + "/fashion_png/" + video_name), [2])
        ref_frame, tgt_frame = frames[0], frames[1]
        ref_bgr = cv2.imread(self.path + "/fashion_png/"  + video_name + "/" + ref_frame)
        ref_rgb = cv2.cvtColor(ref_bgr, cv2.COLOR_BGR2RGB)
        ref_rgb = (ref_rgb.astype(np.float32) / 127.5) - 1.0

        tgt_bgr = cv2.imread(self.path + "/fashion_png/"  + video_name + "/" + tgt_frame)
        tgt_rgb = cv2.cvtColor(tgt_bgr, cv2.COLOR_BGR2RGB)
        tgt_rgb = (tgt_rgb.astype(np.float32) / 127.5) - 1.0

        skt_bgr = cv2.imread(self.path + "/fashion_pose/"  + video_name + "/" + tgt_frame)
        skt_rgb = cv2.cvtColor(skt_bgr, cv2.COLOR_BGR2RGB)
        skt_rgb = skt_rgb.astype(np.float32) / 255.0

        return dict(target=tgt_rgb, vision=ref_rgb, reference=ref_rgb, skeleton=skt_rgb)

Author

Mingtao Guo E-mail: gmt798714378 at hotmail dot com

Acknowledgement

We are very grateful for the official ControlNet repository.

Reference

[1]. Hu, Li, et al. "Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation." arXiv preprint arXiv:2311.17117 (2023).

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
aldm		aldm
annotator		annotator
display		display
font		font
ldm		ldm
models		models
LICENSE		LICENSE
README.md		README.md
config.py		config.py
environment.yaml		environment.yaml
share.py		share.py
tool_add_reference.py		tool_add_reference.py
tool_get_pose.py		tool_get_pose.py
tool_get_visionclip.py		tool_get_visionclip.py
tutorial_dataset.py		tutorial_dataset.py
tutorial_train_animate.py		tutorial_train_animate.py

License

MingtaoGuo/AnimateAnyone_unofficial

Folders and files

Latest commit

History

Repository files navigation

AnimateAnyone_unofficial

Description

Getting Started

Prerequisites

Installation

First stage training

Author

Acknowledgement

Reference

About

Resources

License

Stars

Watchers

Forks

Languages