## AmirMohammad_Ebrahiminasab - Real-Time-Facial-Recognition-Using-Fine_Tuned_VIT-Model

In [3]:
import cv2

In [4]:
import torch

### Importing a vit model structure in order to be able to update its weights to the previously fine tuned model.

In [36]:
from transformers import ViTForImageClassification, ViTFeatureExtractor


model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224-in21k', num_labels=7)
feature_extractor = ViTFeatureExtractor.from_pretrained('google/vit-base-patch16-224-in21k')

model.safetensors:   3%|3         | 10.5M/346M [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224-in21k and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### Saving the structure so we don't have to download it each time.

In [46]:
torch.save(model.state_dict(), "C:\\Users\\janja\\Downloads\\pre_VIT_model.pth")

### Replacing the weights

In [34]:
add = torch.load("C:\\Users\\janja\\Downloads\\VIT_model.pth", map_location=torch.device('cpu'))

  add = torch.load("C:\\Users\\janja\\Downloads\\VIT_model.pth", map_location=torch.device('cpu'))


In [37]:
model.load_state_dict(add)

<All keys matched successfully>

In [39]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

In [40]:
from PIL import Image

In [43]:
emotions = ['neutral', 'sad', 'fear', 'surprise', 'happy', 'disgust', 'angry']

### We use cascade for spotting the face and then we use the fine-tuned model for recognizing facial expressions and then showing the result using openCV.

In [47]:
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')

# Initialize the webcam
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Detect faces in the frame
    faces = face_cascade.detectMultiScale(gray, 1.3, 5)

    for (x, y, w, h) in faces:
        # Draw rectangle around the face
        cv2.rectangle(frame, (x, y), (x + w, y + h), (255, 0, 0), 2)

        # Extract the face region and resize for ViT input
        face = frame[y:y + h, x:x + w]
        face_resized = cv2.resize(face, (224, 224)) 

    
        pil_image = Image.fromarray(cv2.cvtColor(face_resized, cv2.COLOR_BGR2RGB))
        inputs = feature_extractor(images=pil_image, return_tensors="pt")

        # Forward pass through ViT model
        with torch.no_grad():
            outputs = model(**inputs)
            logits = outputs.logits

        # Get the predicted class
        predicted_class = torch.argmax(logits, dim=-1).item()

        # Display the predicted class label on the frame
        cv2.putText(frame, f"Class: {emotions[predicted_class]}", (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (36, 255, 12), 2)

    # Show the frame with detection
    cv2.imshow('Real-Time Face Detection', frame)

    # Break the loop when 'q' is pressed
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release the webcam and close windows
cap.release()
cv2.destroyAllWindows()