Open this notebook in Google Colab : [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Riminder/hrflow-cookbook/blob/feature/fake-page-integration-for-non-indexed-profiles/examples/%5BScoring%5D%20Fake_Page_Integration_for_Non_Indexed_Profiles.ipynb)

##### Copyright 2023 HrFlow's AI Research Department

Licensed under the Apache License, Version 2.0 (the "License");

In [None]:
# Copyright 2023 HrFlow's AI Research Department. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

**Welcome to this Google Colaboratory tutorial for developers!**
This notebook is designed to help you tackle the challenge of **non-indexed profiles** by enhancing resumes that lack personal information. **In just 3 simple steps**, you'll learn how to add a customized fake page with placeholder personal details to ensure resumes can be properly indexed and processed.

Here’s a quick overview of the notebook's workflow:

1. **🛠 Upload Resume:** Upload the resume you want to add a fake page.
2. **👷 Generate Fake Page:** Add a new page to the resume containing personal information.
3. **📝 Upload to HrFlow Source:** Send the resume to a specified HrFlow Source.

# Getting Started

In [None]:
!pip install --quiet hrflow pymupdf faker

In [None]:
from functools import wraps
from getpass import getpass
from hrflow import Hrflow
from time import time, sleep
from tqdm.notebook import tqdm
import fitz
from IPython.display import Image
from faker import Faker

API_SECRET = getpass("YOUR_API_SECRET")
API_USER = getpass("USER@EMAIL.DOMAIN")
SOURCE_KEY = getpass("YOUR_SOURCE_KEY")


client = Hrflow(api_secret=API_SECRET, api_user=API_USER)

# 1. **🛠 Upload Resume:**

In [None]:
from google.colab import files
binary_cv = list(files.upload().values())[0]

# 2. **👷 Generate Fake Page:**

**Using a dictionary of personal information, this tutorial enables you to prepend a new page to a resume that lists these details clearly and professionally.**


In [None]:
fake = Faker()

fake_profile_info = {
    "full_name": fake.name(),
    "email":  fake.email(),
    "phone": fake.phone_number(),
    "address": {"text": fake.address()},

}


In [None]:
X_MARGIN = 72
Y_MARGIN = 50
TITLE_FONTSIZE = 14
LABEL_FONTSIZE = 12
VALUE_FONTSIZE = 12
LINE_SPACING = 15
WIDTH = 595
HEIGHT = 842

def wrap_text(text, font_size, max_width, x_margin):
    """
      Splits a block of text into multiple lines to fit within a specified width for PDF rendering.

      Inputs:
          - text (str): The text to be wrapped.
          - font_size (int): Font size to use for calculating text width.
          - max_width (float): Maximum width of the text area.
          - x_margin (float): Margin from the left edge of the page to the start of the text.

      Outputs:
          - lines (list of str): List of strings, each representing a line that fits within the specified width.
    """
    words = text.split()
    lines = []
    current_line = words[0]

    for word in words[1:]:
        test_line = f"{current_line} {word}"
        text_width = fitz.get_text_length(test_line, fontsize=font_size, fontname="helv")
        if text_width <= max_width-x_margin:
            current_line = test_line
        else:
            lines.append(current_line)
            current_line = word

    lines.append(current_line)
    return lines


def add_fake_page(file, profile_dict_fake_fields, position="append"):
    """
      Generates a new PDF page with fake personal information and appends it to an existing document.

      Inputs:
          - file (bytes): The binary content of the existing PDF file to which the new page will be added.
          - profile_dict_fake_fields (dict): Dictionary containing fake personal information. Keys represent field names, and values are strings or dictionaries.
          - position (str): Specifies where to insert the fake page, either "append" (default) or "prefix".

      Outputs:
          - fake_binary_cv (bytes): The binary content of the PDF document with the added page.
    """
    # Define page margins
    x_margin = X_MARGIN
    y_margin = Y_MARGIN
    # Define font sizes and line spacing
    title_fontsize = TITLE_FONTSIZE
    label_fontsize = LABEL_FONTSIZE
    value_fontsize = VALUE_FONTSIZE
    line_spacing = LINE_SPACING


    pdf_document = fitz.open('pdf', file)

    if pdf_document.page_count > 0:
        reference_page = pdf_document[0]
        width, height = reference_page.rect.width, reference_page.rect.height
    else:
        width, height = WIDTH, HEIGHT

    new_page = pdf_document.new_page(-1 if position == "append" else 0, width=width, height=height)

    max_width = width - 2 * x_margin

    # Title
    y_offset = y_margin
    new_page.insert_text(
        point=(x_margin, y_offset),
        text="Personal Information",
        fontsize=title_fontsize,
        fontname="helv",
        color=(0, 0, 1)  # Blue
    )
    y_offset += 2 * line_spacing

    for key, value in profile_dict_fake_fields.items():
        if key == "address":
            value = value.get('text', 'N/A')

        label = key.replace("_", " ").capitalize() + ":"

        if isinstance(value, dict):
            value = value.get('text', 'N/A')

        value = str(value) if value else "N/A"

        # Add label
        new_page.insert_text(
            point=(x_margin, y_offset),
            text=label,
            fontsize=label_fontsize,
            fontname="helv",
            color=(0, 0, 0)
        )

        # Wrap and add value text
        wrapped_lines = wrap_text(value, value_fontsize, max_width, x_margin)
        for line in wrapped_lines:
            new_page.insert_text(
                point=(x_margin + 150, y_offset),
                text=line,
                fontsize=value_fontsize,
                fontname="helv",
                color=(0.2, 0.2, 0.2)
            )
            y_offset += line_spacing


        y_offset += line_spacing

        if y_offset + line_spacing > height - y_margin:
            print("Warning: Not enough space on the page. Truncation may occur.")
            break

    fake_binary_cv = pdf_document.tobytes()
    pdf_document.close()
    return fake_binary_cv

In [None]:
# Add fake page
fake_binary_cv = add_fake_page(binary_cv, fake_profile_info)

In [None]:
# Display added Image
fake_cv = fitz.open("pdf", fake_binary_cv)

first_page_img = fake_cv[0].get_pixmap().tobytes()

Image(first_page_img)

# 3. **📝 Upload to HrFlow Source:**

In [None]:
client.profile.parsing.add_file(SOURCE_KEY, profile_file=fake_binary_cv, reference='resume_with_fake_page')

In [None]:
# If the source you are using operates synchronously, you can use the code below to verify the content.
# For asynchronous sources, parsing the CV could take some time.

output = client.profile.storing.get(SOURCE_KEY, reference='resume_with_fake_page')['data']

print("Full Name:", output['info']['full_name'])
print("Email:", output['info']['email'])
print("Phone:", output['info']['phone'])