# Search assistant for onboarding new hires

Search technologies can significantly assist in the onboarding process for new employees, making it more efficient and effective by providing:

- Quick Access to Information: New employees often have numerous questions and need access to various resources, such as company policies, employee handbooks, training materials, and frequently asked questions. A search function allows them to quickly find relevant information without relying solely on human assistance.
- Self-Service Onboarding: Search tools enable new hires to navigate the onboarding process at their own pace. They can search for specific topics, departments, or keywords to find the information they need, reducing the burden on HR or support staff.

Reranking when used as part of your search pipeline provides a accuracy boost.

In this notebook, you'll learn about:
- Reranking semantic search results
- Reranking semi-structured data
- Reranking tabular data
- Multilingual reranking


## Setup

Install the required packages and define the bedrock runtime agent

In [104]:
!pip install boto3 --upgrade

import boto3
import json
import pandas as pd
from io import StringIO
region = boto3.Session().region_name

bedrock_agent_runtime = boto3.client('bedrock-agent-runtime',region_name='us-west-2')

modelId = "cohere.rerank-v3-5:0"
model_package_arn = f"arn:aws:bedrock:{region}::foundation-model/{modelId}"



## Reranking search results

Rerank requires just a single line of code to implement.

We have a list of documents treated as a FAQ list, however this list may not be optimally ranked for relevance to the user query. 

In [96]:
# Define the documents
documents = [
    "Reimbursing Travel Expenses: Easily manage your travel expenses by submitting them through our finance tool. Approvals are prompt and straightforward.",
    "Working from Abroad: Working remotely from another country is possible. Simply coordinate with your manager and ensure your availability during core hours.",
    "Health and Wellness Benefits: We care about your well-being and offer gym memberships, on-site yoga classes, and comprehensive health insurance.",
    "Performance Reviews Frequency: We conduct informal check-ins every quarter and formal performance reviews twice a year.",
    "Our mission is to innovate and create cutting-edge technology solutions that empower businesses and enhance user experiences.",
    "You can access your work email through the company's email platform, which is typically provided during the onboarding process.",
    "Essential tools include our project management software, communication platforms, and version control systems.",
    "Time-off requests are submitted through the HR portal, where you can also track your approved leave.",
    "We foster a collaborative and inclusive culture, encouraging open communication and a healthy work-life balance.",
    "Training materials and modules are accessible through our learning management system, which you'll receive access to during onboarding.",
    "Performance goals are outlined in your role-specific documentation and discussed during your performance review meetings.",
    "Regular company-wide updates are shared via email and our internal communication channels.",
    "We offer mentorship programs, online courses, and industry conferences to support your professional growth.",
    "Sensitive data should be accessed and handled with care, following our data security guidelines and training materials.",
    "Our company is committed to fostering an inclusive environment, and our policies are outlined in the employee handbook.",
    "You can volunteer for committees, join employee resource groups, or participate in company-wide events.",
    "Role transfers are handled through the HR department, who will guide you through the necessary steps.",
    "Benefit details and enrollment processes are provided during onboarding and can be accessed through the HR portal.",
    "Expense reimbursement is managed through an online system, where you can submit and track your expenses.",
    "You can report issues or provide feedback through our internal communication platforms or directly to your manager.",
    "Remote work policies are outlined in the company's work-from-home guidelines, which you'll receive during onboarding.",
    "Technical support is available through our help desk, where you can submit tickets for assistance.",
    "Travel policies, including expense guidelines and booking procedures, are detailed in the employee handbook.",
    "Community engagement opportunities are promoted through our social responsibility initiatives and volunteer programs."
]

In [97]:
def rerank_text(text_query, text_sources, num_results, model_package_arn):
    response = bedrock_agent_runtime.rerank(
        queries=[
            {
                "type": "TEXT",
                "textQuery": {
                    "text": text_query
                }
            }
        ],
        sources=text_sources,
        rerankingConfiguration={
            "type": "BEDROCK_RERANKING_MODEL",
            "bedrockRerankingConfiguration": {
                "numberOfResults": num_results,
                "modelConfiguration": {
                    "modelArn": model_package_arn,
                }
            }
        }
    )
    return response['results']

In [98]:
text_sources = []
for text in documents:
    text_sources.append({
        "type": "INLINE",
        "inlineDocumentSource": {
            "type": "TEXT",
            "textDocument": {
                "text": text,
            }
        }
    })

In [101]:
# Add the user query
query = "Can I work remotely for 3 days a week"

response = rerank_text(query, text_sources, 5, model_package_arn)
for i in response:
    print("\tIndex:{} Relevance:{}\t{}".format(i['index'],i['relevanceScore'],documents[i['index']]).replace("\n", " "))
    print ("\n")

	Index:20 Relevance:0.2174043506383896	Remote work policies are outlined in the company's work-from-home guidelines, which you'll receive during onboarding.


	Index:1 Relevance:0.1998075246810913	Working from Abroad: Working remotely from another country is possible. Simply coordinate with your manager and ensure your availability during core hours.


	Index:15 Relevance:0.1214878261089325	You can volunteer for committees, join employee resource groups, or participate in company-wide events.


	Index:14 Relevance:0.07031037658452988	Our company is committed to fostering an inclusive environment, and our policies are outlined in the employee handbook.


	Index:7 Relevance:0.06570681184530258	Time-off requests are submitted through the HR portal, where you can also track your approved leave.




## Reranking semi-structured data

The Rerank model supports multi-aspect and semi-structured data like emails, invoices, JSON documents, code, and tables.

In the following example, we'll use an email data example. It is a semi-stuctured data that contains a number of fields – `from`, `to`, `date`, `subject`, and `text`. 

Suppose the new hire now wants to search for any emails about check-in sessions. Let's assume we have a list emails retrieved from the email provider's API.

In [103]:
# Define the data
emails = [
    {"from": "hr@co1t.com", "to": "david@co1t.com", "date": "2024-06-24", "subject": "A Warm Welcome to CompanyXYZ!", "text": "We are delighted to welcome you to the team! As you embark on your journey with us, you'll find attached an agenda to guide you through your first week."},
    {"from": "it@co1t.com", "to": "david@co1t.com", "date": "2024-06-24", "subject": "Setting Up Your IT Needs", "text": "Greetings! To ensure a seamless start, please refer to the attached comprehensive guide, which will assist you in setting up all your work accounts."},
    {"from": "john@co1t.com", "to": "david@co1t.com", "date": "2024-06-24", "subject": "First Week Check-In", "text": "Hello! I hope you're settling in well. Let's connect briefly tomorrow to discuss how your first week has been going. Also, make sure to join us for a welcoming lunch this Thursday at noon—it's a great opportunity to get to know your colleagues!"}
]

text_sources = []
for text in emails:
    text_sources.append({
        "type": "INLINE",
        "inlineDocumentSource": {
            "type": "JSON",
            "jsonDocument": text
        }
    })

query = "What are my action items for week1?"

response = rerank_text(query, text_sources, 3, model_package_arn)
for i in response:
    print("\tIndex:{} Relevance:{}\t{}".format(i['index'],i['relevanceScore'],emails[i['index']]['text']).replace("\n", " "))
    print ("\n")

	Index:0 Relevance:0.2867535948753357	We are delighted to welcome you to the team! As you embark on your journey with us, you'll find attached an agenda to guide you through your first week.


	Index:2 Relevance:0.09654402732849121	Hello! I hope you're settling in well. Let's connect briefly tomorrow to discuss how your first week has been going. Also, make sure to join us for a welcoming lunch this Thursday at noon—it's a great opportunity to get to know your colleagues!


	Index:1 Relevance:0.02798897586762905	Greetings! To ensure a seamless start, please refer to the attached comprehensive guide, which will assist you in setting up all your work accounts.




## Reranking Tabular data

Many enterprises rely on tabular data, such as relational databases, CSVs, and Excel. To perform reranking, you can transform a dataframe into a list of JSON records and use Rerank capabilities to rank them. Here we convert the data into JSON format before passing it to the Rerank endpoint.

Here's an example of reranking a CSV file that contains employee information.

In [106]:

# Create a demo CSV file
data = """name,role,join_date,email,status
Rebecca Lee,Senior Software Engineer,2024-07-01,rebecca@co1t.com,Full-time
Emma Williams,Product Designer,2024-06-15,emma@co1t.com,Full-time
Michael Jones,Marketing Manager,2024-05-20,michael@co1t.com,Full-time
Amelia Thompson,Sales Representative,2024-05-20,amelia@co1t.com,Part-time
Ethan Davis,Product Designer,2024-05-25,ethan@co1t.com,Contractor"""
data_csv = StringIO(data)

# Load the CSV file
df = pd.read_csv(data_csv)
df.head()

# Define the documents
employees = df.to_json(orient='records')

# Parse the JSON string to ensure proper formatting with indentation
emp_json_data = json.loads(employees)

text_sources = []
for text in emp_json_data:
    text_sources.append({
        "type": "INLINE",
        "inlineDocumentSource": {
            "type": "JSON",
            "jsonDocument": text
        }
    })
    
# Add the user query
query = "Any other product designers who joined recently?"

# Rerank the documents
response = rerank_text(query, text_sources, 3, model_package_arn)
for i in response:
    print("\tIndex:{} Relevance:{}\t{}".format(i['index'],i['relevanceScore'],emp_json_data[i['index']]).replace("\n", " "))
    print ("\n")

	Index:1 Relevance:0.7948475480079651	{'name': 'Emma Williams', 'role': 'Product Designer', 'join_date': '2024-06-15', 'email': 'emma@co1t.com', 'status': 'Full-time'}


	Index:4 Relevance:0.7900289297103882	{'name': 'Ethan Davis', 'role': 'Product Designer', 'join_date': '2024-05-25', 'email': 'ethan@co1t.com', 'status': 'Contractor'}


	Index:2 Relevance:0.21260815858840942	{'name': 'Michael Jones', 'role': 'Marketing Manager', 'join_date': '2024-05-20', 'email': 'michael@co1t.com', 'status': 'Full-time'}




## Multilingual Reranking

The Rerank endpoint also supports multilingual semantic search. This means you can perform semantic search on texts in different languages.

In the example below, we repeat the steps of performing reranking on an Arabic query.

In [107]:
text_sources = []
for text in documents:
    text_sources.append({
        "type": "INLINE",
        "inlineDocumentSource": {
            "type": "TEXT",
            "textDocument": {
                "text": text,
            }
        }
    })
    
query = "كيف يتم إجراء مراجعات الأداء" #How are performance reviews conducted?

response = rerank_text(query, text_sources, 5, model_package_arn)
for i in response:
    print("\tIndex:{} Relevance:{}\t{}".format(i['index'],i['relevanceScore'],documents[i['index']]).replace("\n", " "))
    print ("\n")


	Index:3 Relevance:0.4832344651222229	Performance Reviews Frequency: We conduct informal check-ins every quarter and formal performance reviews twice a year.


	Index:10 Relevance:0.32701295614242554	Performance goals are outlined in your role-specific documentation and discussed during your performance review meetings.


	Index:19 Relevance:0.18124903738498688	You can report issues or provide feedback through our internal communication platforms or directly to your manager.


	Index:6 Relevance:0.07069435715675354	Essential tools include our project management software, communication platforms, and version control systems.


	Index:15 Relevance:0.05651179328560829	You can volunteer for committees, join employee resource groups, or participate in company-wide events.




## Conclusion

In this notebook we learned how Rerank can be utilized to refine search results based on semantic relevance to the search query.Reranking can be performed on semi-structured data, tabular data, Multilingual data and more.