
<b><font size = 2><span style="font-family:'Times New Roman';color:#689775">🖼Image Captioning :  ViT + GPT2💬 </span></font></b>  


# <center><font size = 8><span style="color:#A33327;font-family:'Times New Roman'"> 💬Text to 3D Images (Point-E) 🏍</span></font></center>

# <center><font size = 3><span style="color:#422711"> <p style="background-color:#A33327;font-family:newtimeroman;color:#fcf6f6;font-size:200%;text-align:center;border-radius:100px 10px;">INTRODUCTION</p>   </span></font></center>
 
<font size = 5><span style="color:#A33327;font-family:'Times New Roman'">Notebook Overview : </span></font>

This notebook demonstrates the use of OpenAI's Point-E model to generate 3D images from text prompts. The Point-E model combines a text-to-image model with an image-to-3D model, leveraging large datasets of (text, image) and (image, 3D) pairs, respectively.

**Key Highlights:**
- Introduction to the Point-E model.
- Inference examples showcasing text-to-3D object generation.
- The libraries used are Point-E and PyTorch.


## **How Point-E Works:**
The Point-E model generates 3D objects from text prompts by following these steps:
1. **Synthetic View Generation:** Creates an image based on the text caption.
2. **Coarse Point Cloud Production:** Generates a low-resolution (1,024 points) point cloud based on the synthetic view.
3. **Fine Point Cloud Production:** Refines the point cloud to a higher resolution (4,096 points) using the initial point cloud and the synthetic view.

The model assumes that the synthetic view sufficiently represents the information from the text caption.
    
<img src = https://gameworldobserver.com/wp-content/uploads/2022/12/point-e-process.jpg>   



<font size = 3><span style = "color:#3A3E59;font-family:'Times New Roman'">They train their models on several million 3D models</span></font>
<img src = https://raw.githubusercontent.com/openai/point-e/main/point_e/examples/paper_banner.gif>

<a id='top'></a>
# <center><font size = 3><span style="color:#422711"> <p style="background-color:#A33327;font-family:newtimeroman;color:#fcf6f6;font-size:200%;text-align:center;border-radius:100px 10px;">TABLE OF CONTENTS</p>   </span></font></center>

1. [Imports](#Imports)
2. [Hyperparameters](#Hyperparameters)
3. [Helper Functions](#Helper-Functions)
4. [Model Setup](#Model)
5. [Sampler Configuration](#Sampler)
6. [Examples](#Examples)
   - [A Red Motorcycle](#red-motorcycle)
   - [A Red Santa Hat](#red-santa-hat)
   - [Purple Headphones](#purple-headphones)
   - [A Realistic 3D Rendering of a Corgi](#corgi)
   - [A Blue Mug](#blue-mug)
   - [A Robot](#robot)
   - [An Elaborate Fountain](#fountain)
   - [An Orange and White Traffic Cone](#traffic-cone)
   - [A 3D Printable Gear](#gear)
7. [References](#References)

In [1]:
!git clone https://github.com/openai/point-e
%cd point-e
!pip install point -e .
from IPython.display import clear_output
clear_output()

<a id="0"></a>
# <p style="background-color:#A33327;font-family:newtimeroman;color:#fcf6f6;font-size:140%;text-align:center;border-radius:200px 10px;">1. IMPORTS 📂</p>

In [2]:
import torch

from tqdm.auto import tqdm

import plotly.graph_objects as go

from point_e.diffusion.configs import DIFFUSION_CONFIGS, diffusion_from_config
from point_e.diffusion.sampler import PointCloudSampler
from point_e.models.download import load_checkpoint
from point_e.models.configs import MODEL_CONFIGS, model_from_config
from point_e.util.plotting import plot_point_cloud

<a id="1"></a>
# <p style="background-color:#A33327;font-family:newtimeroman;color:#fcf6f6;font-size:140%;text-align:center;border-radius:200px 10px;">2. HYPERPARAMETERS 🔨</p>

In [3]:
class config:
    DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    BASE_NAME = "base40M-textvec"
    GUIDANCE_SCALE = [3.0, 0.0]

<a id="2"></a>
# <p style="background-color:#A33327;font-family:newtimeroman;color:#fcf6f6;font-size:140%;text-align:center;border-radius:200px 10px;">3. HELPER FUNCTIONS</p>

In [4]:
def plot_3d(pc):
    fig = go.Figure(
        data=[
            go.Scatter3d(
                x=pc.coords[:, 0], y=pc.coords[:, 1], z=pc.coords[:, 2],
                mode="markers",
                marker=dict(
                    size=2,
                    color=["rgb({},{},{})".format(r, g, b) for r, g, b in zip(pc.channels["R"], 
                                                                              pc.channels["G"], 
                                                                              pc.channels["B"])]
                )
            )
        ],
        layout=dict(
            scene=dict(
                xaxis=dict(visible=False),
                yaxis=dict(visible=False),
                zaxis=dict(visible=False)
            )
        )
    )
    return fig

<a id="3"></a>
# <p style="background-color:#A33327;font-family:newtimeroman;color:#fcf6f6;font-size:140%;text-align:center;border-radius:200px 10px;">4. Model Setup 🔧
 </p>

In [5]:
print("Creating base model...\n")
base_model = model_from_config(MODEL_CONFIGS[config.BASE_NAME], config.DEVICE)
base_model.eval()
base_diffusion = diffusion_from_config(DIFFUSION_CONFIGS[config.BASE_NAME])

print("Creating upsample model...\n")
upsampler_model = model_from_config(MODEL_CONFIGS["upsample"], config.DEVICE)
upsampler_model.eval()
upsampler_diffusion = diffusion_from_config(DIFFUSION_CONFIGS["upsample"])

print("Downloading base checkpoint...\n")
base_model.load_state_dict(load_checkpoint(config.BASE_NAME, config.DEVICE))

print("Downloading upsampler checkpoint...\n")
upsampler_model.load_state_dict(load_checkpoint("upsample", config.DEVICE))

Creating base model...

Creating upsample model...

Downloading base checkpoint...

Downloading upsampler checkpoint...



<All keys matched successfully>

<a id="4"></a>
# <p style="background-color:#A33327;font-family:newtimeroman;color:#fcf6f6;font-size:140%;text-align:center;border-radius:200px 10px;">5. Sampler Configuration 🎯</p>

In [6]:
sampler = PointCloudSampler(
    device=config.DEVICE,
    models=[base_model, upsampler_model],
    diffusions=[base_diffusion, upsampler_diffusion],
    num_points=[1024, 4096 - 1024],
    aux_channels=["R", "G", "B"],
    guidance_scale=config.GUIDANCE_SCALE,
    model_kwargs_key_filter=("texts", "")
)

<a id="5"></a>
# <p style="background-color:#A33327;font-family:newtimeroman;color:#fcf6f6;font-size:140%;text-align:center;border-radius:200px 10px;">6. EXAMPLES</p>

<a id="6.1"></a>
<p style="background-color:#C6C6C6;font-family:newtimeroman;color:#A33327;font-size:140%;text-align:center;border-radius:200px 10px;">A Red Motorcycle <a name="red-motorcycle"></a></p>

In [7]:
prompt = 'a RED motorcycle'
samples = None
for x in tqdm(sampler.sample_batch_progressive(batch_size=1, model_kwargs=dict(texts=[prompt]))):
    samples = x
pc = sampler.output_to_point_clouds(samples)[0]
plot_3d(pc)

0it [00:00, ?it/s]

<a id="6.2"></a>
<p style="background-color:#C6C6C6;font-family:newtimeroman;color:#A33327;font-size:140%;text-align:center;border-radius:200px 10px;">A Red Santa Hat <a name="red-santa-hat"></a></p>

In [8]:
prompt = 'a RED santa hat'
samples = None
for x in tqdm(sampler.sample_batch_progressive(batch_size=1, model_kwargs=dict(texts=[prompt]))):
    samples = x
pc = sampler.output_to_point_clouds(samples)[0]
plot_3d(pc)

0it [00:00, ?it/s]

<a id="6.3"></a>
<p style="background-color:#C6C6C6;font-family:newtimeroman;color:#A33327;font-size:140%;text-align:center;border-radius:200px 10px;">Purple Headphones <a name="purple-headphones"></a></p>

In [9]:
prompt = 'PURPLE headphones'
samples = None
for x in tqdm(sampler.sample_batch_progressive(batch_size=1, model_kwargs=dict(texts=[prompt]))):
    samples = x
pc = sampler.output_to_point_clouds(samples)[0]
plot_3d(pc)

0it [00:00, ?it/s]

<a id="6.5"></a>
<p style="background-color:#C6C6C6;font-family:newtimeroman;color:#A33327;font-size:140%;text-align:center;border-radius:200px 10px;">A Realistic 3D Rendering of a Corgi <a name="corgi"></a></p>

In [10]:
prompt = 'a very realistic 3D rendering of a corgi'
samples = None
for x in tqdm(sampler.sample_batch_progressive(batch_size=1, model_kwargs=dict(texts=[prompt]))):
    samples = x
pc = sampler.output_to_point_clouds(samples)[0]
plot_3d(pc)

0it [00:00, ?it/s]

<p style="background-color:#C6C6C6;font-family:newtimeroman;color:#A33327;font-size:140%;text-align:center;border-radius:200px 10px;">A Blue Mug <a name="blue-mug"></a></p>

In [16]:
prompt = 'a BLUE mug'
samples = None
for x in tqdm(sampler.sample_batch_progressive(batch_size=1, model_kwargs=dict(texts=[prompt]))):
    samples = x
pc = sampler.output_to_point_clouds(samples)[0]
plot_3d(pc)

0it [00:00, ?it/s]

<p style="background-color:#C6C6C6;font-family:newtimeroman;color:#A33327;font-size:140%;text-align:center;border-radius:200px 10px;">A Robot <a name="robot"></a></p>

In [17]:
prompt = 'a robot'
samples = None
for x in tqdm(sampler.sample_batch_progressive(batch_size=1, model_kwargs=dict(texts=[prompt]))):
    samples = x
pc = sampler.output_to_point_clouds(samples)[0]
plot_3d(pc)

0it [00:00, ?it/s]

<p style="background-color:#C6C6C6;font-family:newtimeroman;color:#A33327;font-size:140%;text-align:center;border-radius:200px 10px;">An Elaborate Fountain <a name="fountain"></a></p>

In [13]:
prompt = 'an elaborate fountain'
samples = None
for x in tqdm(sampler.sample_batch_progressive(batch_size=1, model_kwargs=dict(texts=[prompt]))):
    samples = x
pc = sampler.output_to_point_clouds(samples)[0]
plot_3d(pc)

0it [00:00, ?it/s]

<p style="background-color:#C6C6C6;font-family:newtimeroman;color:#A33327;font-size:140%;text-align:center;border-radius:200px 10px;">An Orange and White Traffic Cone <a name="traffic-cone"></a></p>

In [14]:
prompt = 'ORANGE and WHITE traffic cone'
samples = None
for x in tqdm(sampler.sample_batch_progressive(batch_size=1, model_kwargs=dict(texts=[prompt]))):
    samples = x
pc = sampler.output_to_point_clouds(samples)[0]
plot_3d(pc)

0it [00:00, ?it/s]

<p style="background-color:#C6C6C6;font-family:newtimeroman;color:#A33327;font-size:140%;text-align:center;border-radius:200px 10px;">A 3D Printable Gear <a name="gear"></a>
</p>

In [15]:
prompt = "a 3D printable gear, a single gear 3 inches in diameter and half inch thick"
samples = None
for x in tqdm(sampler.sample_batch_progressive(batch_size=1, model_kwargs=dict(texts=[prompt]))):
    samples = x
pc = sampler.output_to_point_clouds(samples)[0]
plot_3d(pc)

0it [00:00, ?it/s]

<a id="7"></a>
# <p style="background-color:#A33327;font-family:newtimeroman;color:#fcf6f6;font-size:140%;text-align:center;border-radius:200px 10px;">7. REFERENCES 📃</p>


- [OpenAI's Point-E Paper](https://arxiv.org/abs/2212.08751)
- [Point-E GitHub Repository](https://github.com/openai/point-e)
- [YouTube Video on Point-E](https://youtu.be/RJTW0krpU5A)
- [Text to Point Cloud Example](https://github.com/openai/point-e/blob/main/point_e/examples/text2pointcloud.ipynb)

<b><center><font size = 3><span style="color:#917164"> **Thank you for reading! 😊**  
<br>If you have any suggestions or feeback, please let me know
</span></font></center></b>