In [1]:
pip install face_recognition

Collecting face_recognition
  Downloading face_recognition-1.3.0-py2.py3-none-any.whl.metadata (21 kB)
Collecting face-recognition-models>=0.3.0 (from face_recognition)
  Downloading face_recognition_models-0.3.0.tar.gz (100.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m100.1/100.1 MB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Downloading face_recognition-1.3.0-py2.py3-none-any.whl (15 kB)
Building wheels for collected packages: face-recognition-models
  Building wheel for face-recognition-models (setup.py) ... [?25l[?25hdone
  Created wheel for face-recognition-models: filename=face_recognition_models-0.3.0-py2.py3-none-any.whl size=100566162 sha256=480846fae1d7a5f9c31650d1522835984c7c9db43383c78b22db8f0f3aa69b35
  Stored in directory: /root/.cache/pip/wheels/7a/eb/cf/e9eced74122b679557f597bb7c8e4c739cfcac526db1fd523d
Successfully built face-recognition-models
Installing collected packages: face-recog

#Workflow Breakdown Brief

**1. Setting Up:**

*  **Libraries:** The script imports necessary libraries like `pandas` for data manipulation, `requests` for downloading videos, `cv2` for computer vision tasks, `face_recognition` for face detection and encoding, `json` for handling JSON data, and `numpy` for numerical computations.
*  **Known Encodings:** It defines file paths for saving known face encodings in a JSON format (`encoding_file`). It also includes functions to save and load these encodings (`save_encodings_json` and `load_encodings_json`).
*  **Downloading Videos:** The `download_video` function retrieves videos from URLs specified in the excel file.

**2. Processing Videos:**

*  **Read Excel Data:** The script reads an excel file (`excel_file`), assuming it has columns like "Video URL" and "Performance" (potentially for further analysis).
*  **Iterate Through Videos:** It loops through each row in the DataFrame.
  * **Download and Open Video:** For each video URL, it downloads the video and opens a video capture object (`cap`) using OpenCV's `cv2.VideoCapture`.
  * **Initialize Counters:** It initializes counters for unique faces detected in the entire process (`unique_faces_count`) and new faces detected in the current video (`new_faces_count`). It also creates an empty list (`faces_in_video`) to store detected faces for the current video and a set (`current_video_faces`) to track unique faces within the video.
  * **Process Video Frames:** It enters a loop that reads video frames one by one using `cap.read()`.
    * **Check for Success:** It checks if the frame is read successfully. If not, it handles the error and breaks the loop if the video ends, or continues if there's a reading issue.
    * **Resize and Convert Frame:** It resizes and converts the frame to RGB format for face recognition.
    * **Face Detection:** The core face detection part uses `face_recognition.face_locations` to identify face locations within the frame using a pre-trained CNN model ("cnn").
    * **No Faces Detected:** If no faces are found in the frame, the loop continues to the next frame.
    * **Face Encoding:** For each detected face location, it extracts facial features (encodings) using `face_recognition.face_encodings`.

**3. Face Matching and Analysis:**

*  **Matching with Known Faces:** If there are known face encodings (`encodeDictKnown`), it iterates through each detected face and its encoding:
    * **Face Distance Calculation:** It calculates the "face distance" between the current face encoding and all known encodings using `face_recognition.face_distance`. This distance represents the similarity between faces.
    * **Matching Threshold:** It finds the minimum distance (closest match) and compares it to a threshold (0.60 in this case). If the distance is lower than the threshold, it considers it a match.
    * **Matched Name Retrieval:** If a match is found, it retrieves the name associated with the closest known encoding.
*  **New Face Detection:**
    * **No Match:** If no match is found, it signifies a new face. It creates a unique name for the person (e.g., "person_X") and adds the new face encoding to the `encodeDictKnown` dictionary.
    * **Updating Counters:** It increments the `unique_faces_count` (total unique faces) and `new_faces_count` (new faces in this video).
*  **Tracking Faces in Video:** It adds the detected name (matched or unique) to the `current_video_faces` set to avoid duplicates within the current video. It then appends the name to the `faces_in_video` list.

**4. Saving and Reporting:**

*  **Save Encodings:** After processing all frames in the video, it saves the updated `encodeDictKnown` dictionary (including potentially new faces) back to the JSON file.
*  **New Faces and Detected People:** It appends the `new_faces_count` and a comma-separated string of detected faces (`faces_in_video`) for the current video to separate lists (`new_faces_detected` and `people_detected`).
*  **Video Cleanup:** It releases the video capture resources and deletes the downloaded video file to save space.
*  **Reporting:** It prints a message indicating the video has been processed and deleted, along with the total number of unique faces detected in that video.

**5. Updating Excel Data:**

* **Length Check:** The script ensures the lengths of `new_faces_detected` and `people_detected` lists match the number of rows in the DataFrame.
* **Adding New Columns:** If the lengths match, it adds two new columns to the DataFrame:
    * "People Detected": This column stores the comma-separated list of faces detected in each video (from `people_detected`).
    * "Number of New Faces Detected": This column stores the count of new faces detected in each video (from `new_faces_detected`).
* **Mismatch Handling:** If the lengths don't match, it prints an error message indicating an issue during face detection.
* **Save Updated Excel:** Finally, it creates a new Excel file with the updated data (`output_excel_file`) using the pandas `to_excel` method.

**Additional Notes:**

* The script utilizes a pre-trained CNN model ("cnn") for face detection. This model is likely part of the `face_recognition` library.
* The threshold value (0.60) for considering a face a match can be adjusted based on the desired accuracy and application.
* The script demonstrates a basic implementation of face recognition with known encodings stored in a JSON file. More advanced approaches might involve database storage and retrieval of encodings.



In [22]:
import pandas as pd
import requests
import cv2
import face_recognition
import json
import numpy as np
import os

# File path to save known encodings
encoding_file = "/content/known_faces.json"

# Function to save encodings with names to a JSON file
def save_encodings_json(encodings_dict, file_path):
    # Convert numpy arrays to lists for JSON serialization
    encodings_dict_serializable = {key: encoding.tolist() for key, encoding in encodings_dict.items()}
    with open(file_path, 'w') as f:
        json.dump(encodings_dict_serializable, f)
    print(f"Encodings saved to {file_path}")

# Function to load encodings from a JSON file
def load_encodings_json(file_path):
    try:
        with open(file_path, 'r') as f:
            encodings_dict = json.load(f)
        print(f"Encodings loaded from {file_path}")
        # Convert lists back to numpy arrays
        encodings_dict = {key: np.array(value) for key, value in encodings_dict.items()}
        return encodings_dict
    except FileNotFoundError:
        print(f"No encoding file found at {file_path}. Starting with an empty dictionary.")
        return {}

# Download video from URL
def download_video(url, filename):
    response = requests.get(url, stream=True)
    if response.status_code == 200:
        with open(filename, 'wb') as f:
            for chunk in response.iter_content(chunk_size=1024):
                if chunk:
                    f.write(chunk)
        print(f"Downloaded video from {url}")
    else:
        print(f"Failed to download video from {url}")

# Load known encodings if available
encodeDictKnown = load_encodings_json(encoding_file)

# If encodeDictKnown is not a dictionary, initialize it as one
if not isinstance(encodeDictKnown, dict):
    encodeDictKnown = {}

# Read Excel file
excel_file = "/content/Assignment Data.xlsx"
df = pd.read_excel(excel_file)

# List to store new faces detected for each video
new_faces_detected = []
people_detected = []

# Process each row in the DataFrame (assuming columns: 'Video URL' and 'Performance')
for index, row in df.iterrows():
    video_url = row['Video URL']
    performance = row['Performance']  # You can use this if needed for analysis

    # Generate a filename based on the URL (you can customize this)
    video_filename = f"video_{index}.mp4"

    # Download the video
    download_video(video_url, video_filename)

    # Initialize video capture
        # Initialize video capture
    cap = cv2.VideoCapture(video_filename)

    if not cap.isOpened():
        print(f"Error: Could not open video file {video_filename}.")
        continue

    unique_faces_count = len(encodeDictKnown)  # Initialize counter with existing encodings
    new_faces_count = 0  # Counter for new faces in the current video
    faces_in_video = []  # To keep track of faces detected in the current video

    # Persistent set for tracking detected faces in the video
    current_video_faces = set()

    while True:
        success, img = cap.read()

        if not success:
            if cap.get(cv2.CAP_PROP_POS_FRAMES) >= cap.get(cv2.CAP_PROP_FRAME_COUNT):
                print(f"Video {video_filename} has ended.")
            else:
                print("Error occurred while reading the video.")
            break

        imgS = cv2.resize(img, (0, 0), None, 0.45, 0.45)
        imgS = cv2.cvtColor(imgS, cv2.COLOR_BGR2RGB)

        faceCurFrame = face_recognition.face_locations(imgS, model="cnn")

        if len(faceCurFrame) == 0:
            continue

        encodeCurFrame = face_recognition.face_encodings(imgS, faceCurFrame)

        for faceloc, encoding in zip(faceCurFrame, encodeCurFrame):
            is_match = False
            matched_name = None  # Variable to store the name of the matched person

            # Check if there are any known encodings
            if encodeDictKnown:
                # Compare the current encoding with known encodings
                face_distances = face_recognition.face_distance(list(encodeDictKnown.values()), encoding)
                min_distance = min(face_distances)  # Find the closest match
                if min_distance < 0.60:  # Threshold for considering it a match
                    matched_index = np.argmin(face_distances)  # Index of the closest match
                    matched_name = list(encodeDictKnown.keys())[matched_index]  # Retrieve the name
                    is_match = True

            if is_match:
                # Add matched name to the current video set if not already added
                if matched_name not in current_video_faces:
                    current_video_faces.add(matched_name)  # Add to the set of detected faces
                    faces_in_video.append(matched_name)  # Append to the list of faces for the video
                    print(f"Matched with {matched_name} (distance: {min_distance:.2f})")
            else:
                # If no match, create a new unique name for the person
                unique_name = f"person_{unique_faces_count + 1}"
                encodeDictKnown[unique_name] = encoding
                unique_faces_count += 1
                new_faces_count += 1
                current_video_faces.add(unique_name)  # Add to the set of detected faces
                faces_in_video.append(unique_name)  # Append to the list of faces for the video
                print(f"New unique face detected at location {faceloc}! Total unique faces: {unique_faces_count}")

    # Save updated encodings to the JSON file after processing the current video
    save_encodings_json(encodeDictKnown, encoding_file)

    # Add the number of new faces detected and the faces detected list to the respective lists
    new_faces_detected.append(new_faces_count)
    people_detected.append(', '.join(faces_in_video))  # Join faces with comma

    # Release video capture resources and delete the video to save space
    cap.release()
    os.remove(video_filename)
    print(f"Video {video_filename} processed and deleted.")

    print(f"Total number of unique faces detected in {video_filename}: {unique_faces_count}")

# Ensure lengths match before adding columns
if len(new_faces_detected) == len(df) and len(people_detected) == len(df):
    # Add the new columns to the DataFrame
    df['People Detected'] = people_detected
    df['Number of New Faces Detected'] = new_faces_detected
else:
    print("Mismatch in lengths. Something went wrong while detecting faces.")

# Create the new Excel file with the updated data
output_excel_file = "updated_video_performance_with_people.xlsx"
df.to_excel(output_excel_file, index=False)

print(f"Updated Excel file saved as {output_excel_file}")


No encoding file found at /content/known_faces.json. Starting with an empty dictionary.
Downloaded video from https://fgimagestorage.blob.core.windows.net/facebook-assets/hd-999607261342550
New unique face detected at location (293, 176, 333, 137)! Total unique faces: 1
Video video_0.mp4 has ended.
Encodings saved to /content/known_faces.json
Video video_0.mp4 processed and deleted.
Total number of unique faces detected in video_0.mp4: 1
Downloaded video from https://fgimagestorage.blob.core.windows.net/facebook-assets/hd-997580728807604
Video video_1.mp4 has ended.
Encodings saved to /content/known_faces.json
Video video_1.mp4 processed and deleted.
Total number of unique faces detected in video_1.mp4: 1
Downloaded video from https://fgimagestorage.blob.core.windows.net/facebook-assets/hd-992418235673669
New unique face detected at location (384, 156, 441, 100)! Total unique faces: 2
New unique face detected at location (397, 148, 437, 109)! Total unique faces: 3
New unique face detec

#Totally 52 Unique face / Influencer Detected

#Analysis Part

In [8]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_excel(r"/content/updated_video_performance_with_people.xlsx")

In [9]:
df['People Detected'] = df['People Detected'].fillna('')
df['People Detected'] = df['People Detected'].apply(lambda x: [name.strip() for name in x.split(',') if name])
df_exploded = df.explode('People Detected')

In [10]:
df_exploded

Unnamed: 0,Performance,Video URL,People Detected,Number of New Faces Detected
0,1.106000,https://fgimagestorage.blob.core.windows.net/f...,person_1,1
1,2.244700,https://fgimagestorage.blob.core.windows.net/f...,,0
2,2.012600,https://fgimagestorage.blob.core.windows.net/f...,person_2,4
2,2.012600,https://fgimagestorage.blob.core.windows.net/f...,person_3,4
2,2.012600,https://fgimagestorage.blob.core.windows.net/f...,person_4,4
...,...,...,...,...
266,0.156167,https://fgimagestorage.blob.core.windows.net/f...,person_8,0
266,0.156167,https://fgimagestorage.blob.core.windows.net/f...,person_4,0
266,0.156167,https://fgimagestorage.blob.core.windows.net/f...,person_48,0
266,0.156167,https://fgimagestorage.blob.core.windows.net/f...,person_2,0


In [20]:
df_exploded['People Detected'] = df_exploded['People Detected'].fillna('')
df_exploded = df_exploded[df_exploded['People Detected'] != ""]


Unnamed: 0,Performance,Video URL,People Detected,Number of New Faces Detected
0,1.106000,https://fgimagestorage.blob.core.windows.net/f...,person_1,1
2,2.012600,https://fgimagestorage.blob.core.windows.net/f...,person_2,4
2,2.012600,https://fgimagestorage.blob.core.windows.net/f...,person_3,4
2,2.012600,https://fgimagestorage.blob.core.windows.net/f...,person_4,4
2,2.012600,https://fgimagestorage.blob.core.windows.net/f...,person_5,4
...,...,...,...,...
266,0.156167,https://fgimagestorage.blob.core.windows.net/f...,person_7,0
266,0.156167,https://fgimagestorage.blob.core.windows.net/f...,person_8,0
266,0.156167,https://fgimagestorage.blob.core.windows.net/f...,person_4,0
266,0.156167,https://fgimagestorage.blob.core.windows.net/f...,person_48,0


In [13]:
stats = df_exploded.groupby('People Detected').agg(
    Mean_Performance=('Performance', 'mean'),
    Median_Performance=('Performance', 'median'),
    Std_Deviation=('Performance', 'std'),
    Appearances=('Performance', 'count')
).reset_index()
stats

Unnamed: 0,People Detected,Mean_Performance,Median_Performance,Std_Deviation,Appearances
0,person_1,1.062443,0.976586,0.620232,89
1,person_10,0.911185,0.912767,0.627385,17
2,person_11,0.895274,0.6777,0.618856,11
3,person_12,0.854684,0.741103,0.628333,19
4,person_13,0.944641,0.826,0.57497,7
5,person_14,1.011952,0.695585,0.967461,5
6,person_15,0.912431,0.720853,0.602291,20
7,person_16,1.369262,1.5304,0.718496,9
8,person_17,1.246366,1.241732,0.938958,4
9,person_18,1.059367,1.070043,0.639565,17


It seems like you've summarized the performance statistics for influencers based on their appearances in the dataset. Here’s how you can interpret and proceed with this data:

### Key Metrics:
1. **Mean Performance:** Represents the average engagement generated by the influencer.
2. **Median Performance:** Indicates the middle point of performance and is useful for understanding central tendency, especially if there are outliers.
3. **Standard Deviation (Std_Deviation):** Reflects the consistency of the influencer's performance. Lower values suggest consistent engagement.
4. **Appearances:** Shows the number of times the influencer appears in the dataset, which can help gauge reliability.

---

### Suggestions to Choose Influencers:
1. **Consistency and High Average Performance:**
   - Look for influencers with high **Mean Performance** and low **Std_Deviation.**
   - Example: `person_42` (Mean Performance: 1.09, Std_Deviation: 0.46) and `person_24` (Mean Performance: 1.13, Std_Deviation: 0.44).

2. **Sufficient Appearances:**
   - Avoid influencers with very few appearances as their performance metrics may not be reliable.
   - Example: Focus on influencers with at least 10 appearances, such as `person_1` (89 appearances) or `person_2` (90 appearances).

3. **Balanced Strategy:**
   - Combine influencers with high performance and those who are consistent across campaigns for a balanced approach.

---

### Actionable Steps:
1. **Rank Influencers:**
   - Create a ranking by weighting the metrics (e.g., Mean Performance contributes 50%, Median 30%, and Std_Deviation consistency 20%).
   - Normalize values for proper comparison.

2. **Visualize Results:**
   - Plot influencers’ Mean Performance against Std_Deviation for easy identification of high-performing, consistent influencers.
   - Use a bubble chart to include **Appearances** as bubble size.

3. **Filter Low Performers:**
   - Exclude influencers with Mean Performance below a threshold (e.g., 0.7) or Std_Deviation above a threshold (e.g., 0.7).

