<a href="https://colab.research.google.com/github/MingxiaGuo/Artifical_Intelligence/blob/main/ER_NeRF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ER-NeRF

Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis

ICCV23 | [Github](https://github.com/Fictionarry/ER-NeRF) | [Paper](https://openaccess.thecvf.com/content/ICCV2023/html/Li_Efficient_Region-Aware_Neural_Radiance_Fields_for_High-Fidelity_Talking_Portrait_Synthesis_ICCV_2023_paper.html) | [Project](https://github.com/Fictionarry/ER-NeRF) | [ArXiv](https://arxiv.org/abs/2307.09323) | [Video](https://arxiv.org/abs/2307.09323)

## 1. Clone repo

In [None]:
!git clone https://github.com/Fictionarry/ER-NeRF.git
%cd ER-NeRF

Cloning into 'ER-NeRF'...
remote: Enumerating objects: 344, done.[K
remote: Counting objects: 100% (60/60), done.[K
remote: Compressing objects: 100% (50/50), done.[K
remote: Total 344 (delta 31), reused 17 (delta 8), pack-reused 284[K
Receiving objects: 100% (344/344), 24.98 MiB | 31.74 MiB/s, done.
Resolving deltas: 100% (152/152), done.


## 2. Download models

In [None]:
# Prepare face-parsing model.
!wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_parsing/79999_iter.pth?raw=true -O data_utils/face_parsing/79999_iter.pth
# Prepare the 3DMM model for head pose estimation.
!mkdir data_utils/face_tracking/3DMM
!wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/exp_info.npy?raw=true -O data_utils/face_tracking/3DMM/exp_info.npy
!wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/keys_info.npy?raw=true -O data_utils/face_tracking/3DMM/keys_info.npy
!wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/sub_mesh.obj?raw=true -O data_utils/face_tracking/3DMM/sub_mesh.obj
!wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/topology_info.npy?raw=true -O data_utils/face_tracking/3DMM/topology_info.npy


In [None]:
# Prepare 3DMM model from Basel Face Model 2009
from google.colab import drive
drive.mount('/content/drive')
!cp /content/drive/MyDrive/Colab\ Notebooks/01_MorphableModel.mat data_utils/face_tracking/3DMM/01_MorphableModel.mat
%cd /content/ER-NeRF/data_utils/face_tracking
!python convert_BFM.py

Mounted at /content/drive


## 3. Install dependencies

In [None]:

!wget https://repo.anaconda.com/miniconda/Miniconda3-py310_23.10.0-1-Linux-x86_64.sh
!bash ./Miniconda3-py310_23.10.0-1-Linux-x86_64.sh -b -f -p /usr/local
import sys
sys.path.append('/usr/local/lib/python3.10/site-packages/')
!conda create --name ERNeRF python=3.10 -y
!source activate ERNeRF
!conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3 -c pytorch
!apt-get install portaudio19-dev
!apt-get install python3-all-dev
!pip install -r requirements.txt
!pip install "git+https://github.com/facebookresearch/pytorch3d.git"
!pip install tensorflow-gpu==2.8.0

## 4. Optional: Prepare dateset

In [None]:
# Demo Datasets
!mkdir -p data/obama
!wget https://github.com/YudongGuo/AD-NeRF/blob/master/dataset/vids/Obama.mp4?raw=true -O data/obama/obama.mp4

## 5. Pre-processing video/audio

In [None]:
%cd /content/ER-NeRF
#  Pre-process video & audio
# 4.1 Put training video under data/<ID>/<ID>.mp4
#     The video must be 25FPS, with all frames containing the talking person.
#     The resolution should be about 512x512, and duration about 1-5 min.
# 4.2 Run script to process the video. (may take several hours)
!python data_utils/process.py data/<ID>/<ID>.mp4

# 4.3 Obtain AU45 for eyes blinking
#     Run FeatureExtraction in OpenFace, rename and move the output CSV file to data/<ID>/au.csv

#  Pre-process audio
# !python data_utils/deepspeech_features/extract_ds_features.py --input data/<name>.wav # save to data/<name>.npy
# Borrowed from GeneFace. English pre-trained.
# !python data_utils/hubert.py --wav data/<name>.wav # save to data/<name>_hu.npy

## 7. Training

In [None]:
# 7.1 train (head and lpips finetune, run in sequence)
!python main.py data/obama/ --workspace trial_obama/ -O --iters 100000
!python main.py data/obama/ --workspace trial_obama/ -O --iters 125000 --finetune_lips --patch_size 32

# 7.2 train (torso)
# <head>.pth should be the latest checkpoint in trial_obama
!python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --head_ckpt <head>.pth --iters 200000

## 8. Testing

In [None]:
# test on the test split
python main.py data/obama/ --workspace trial_obama/ -O --test # only render the head and use GT image for torso
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test # render both head and torso

## 9. inference with target audio

In [None]:
# Adding "--smooth_path" may help decrease the jitter of the head, while being less accurate to the original pose.
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test --test_train --aud <audio>.npy