<a href="https://colab.research.google.com/github/cindysteward/Cindy-Steward-Portfolio/blob/main/ai_faces_and_datasets_generator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Face Generation and Data Set Creation through Stable Diffusion**

---




Here I use Stable Diffusion v1.4 (a latent **text-to-image** diffusion model) to generate random images of faces through a **TTI** model, and transfer them into a numpy file, to create a dataset purposed for machine learning.

The code is split into various sections for debugging and demonstration reasons.

## **Initial Set-Up**

In [None]:
#@title Installing Huggingface and Diffusers
#Installing the necessary libraries.
!pip install huggingface_hub #login to huggingface to use Stable Diffusion.
!pip install -qq diffusers["training"]==0.3.0 transformers ftfy
!pip install -qq "ipywidgets>=7,<8" #to login into huggingface, allows widget to show up in google colab
!pip install diffusers==0.4.0 transformers scipy ftfy

In [None]:
from huggingface_hub import notebook_login #enables us to use the huggingface repository.
notebook_login()

In [None]:
#@title Importing Libraries

#import necessary libraries to use stable diffusion.
#many different ones specifically, because my machine kept bugging when not having all.

import os
import random
import cv2

import numpy
import torch

import PIL
from accelerate import Accelerator
from diffusers import AutoencoderKL, DDPMScheduler, PNDMScheduler, StableDiffusionPipeline, UNet2DConditionModel
from diffusers.pipelines.stable_diffusion import StableDiffusionSafetyChecker
from PIL import Image
from torchvision import transforms
from tqdm.auto import tqdm
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer

In [None]:
#ensure and enable token, which allows access to the hugging face repository.
YOUR_TOKEN="/root/.huggingface/token"

In [None]:
#set up the pipeline so we can inference the Stable Diffusion model.
from diffusers import StableDiffusionPipeline

In [None]:
#@title Set up pipeline
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, use_auth_token=True)
#since there may be limited GPU RAM available, load the pipeline in float16 precision, instead of the default 32 precision preset.

#Move the pipeline to an available GPU. In google colab first change the runtime type,
#and change the hardware accelerator to GPU. Then use CUDA to move it to a dedicated GPU.
pipe.to("cuda")

In [None]:
from google.colab import drive
drive.mount('/content/drive') #allow google colab to access drive, so we can save files.

Mounted at /content/drive


##**Generate Faces for Dataset**

In [None]:
#@title Generate Images
num_images = 3 #note the amount of images you want to generate from the dataset. This can be changed
count = 0 #alternatively, the images could have been made in one single grid. However, we want to save the images seperately in a dataset.
#the count variable is used to generate an image the amount of images requested. The while loop ensures this.
prompt = "a face on a passport photo" #a prompt that defines what we went to generate.
#In this case, I chose for passport photos, as people use deepfakes to identify as someone else, to deceive people or for example, trading or crypto platforms.
while count < num_images:
  image = pipe(prompt)["sample"][0]
  display(image) #let's display the image we generated!
  count+=1 #we start with 0. when count is 2, 3 images have been generated.
  image.save(f"/content/face_data/face_passport_photo{count}.png") #we save each image in the pre-existing face_data folder.

## **Data Set Creation**

Here I create the dataset through the use of a numpy file. I save the generated images in a list within a numpy file.

In [None]:
path = '/content/face_data' #define the path of where we want our dataset to be saved.

In [None]:
face_data = [] #create an empty file.
for image in os.listdir(path):
    pic = cv2.imread(os.path.join(path,image))
    face_data.append([image]) #save each image as a list in the filepath we defined earlier using the cv2 module.

In [None]:
#converting the list to numpy array and saving it to a file
numpy.save(os.path.join(path,'FacePassportDataSet.npy'),numpy.array(face_data))

In [None]:
#here we load the numpy file to see the contents of the dataset
numpy.load('/content/face_data/FacePassportDataSet.npy', mmap_mode=None, allow_pickle=True, fix_imports=True, encoding='ASCII')