# Extracting structred information from Unstrctured Data

Often, sales representatives are required to go through call transcriptions to extract relevant details and manually enter them into Salesforce tables. This process is very time consuming. Using You.com's APIs, it's possible to automate the extraction of structured information from unstructured data like call transcripts. The following sections of the notebook demonstrates how the APIs can answer questions based on the transcript, and when there is not enough information available, the API can search through the web to find the most relevant answer.

### Setup environment

In [196]:
# The YDC_API_KEY and OPENAI_API_KEY should be defined in a .env file
# Let's load the API keys in from the .env file
import dotenv

dotenv.load_dotenv(".env", override=True)

True

### Sample Transcript

Following is a sample transcript between Alex, a Sales Representative from Tech Solution and Taylor who is an IT Analyst at You.com. Alex has booked a call with Taylor to understand potential pain points his company might be facing. 

In [197]:
transcript = """
Sales Representative: Hi, this is Alex from Tech Solutions. Is this Taylor?

Prospect: Yes, this is Taylor.

Sales Representative: Great! How are you today, Taylor?

Prospect: I'm doing well, thanks. How about you?

Sales Representative: I'm doing well, thank you! If you don't mind me asking, Taylor,  what is your role at You.com?

Prospect: I'm an IT Analyst at You.com. I manage the IT infrastructure and help resolve any technical issues that arise.

Sales Representative: That's great! I appreciate you taking the time to speak with me. 
The reason for my call is that I noticed youdotcom has been growing rapidly, and I wanted to see 
if there's a way we can help streamline some of your IT processes. Do you have a few minutes to discuss this?

Prospect: Sure, I can spare a few minutes.

Sales Representative: Excellent. To give you a bit of context, at Tech Solutions, we specialize in 
providing comprehensive IT management software that helps companies like yours save time and reduce costs. 
Can you tell me a bit about your current IT setup and any challenges you might be facing?

Prospect: Well, we have a small team managing our IT infrastructure, and sometimes it gets overwhelming.
We’ve been experiencing a few issues with network downtime and managing our software licenses.

Sales Representative: I understand. Network downtime and software management can definitely be challenging. 
How much downtime are you currently experiencing, and how is it impacting your operations?

Prospect: We’ve had a few incidents in the past month, resulting in a couple of hours of downtime each time. 
It's affecting our productivity and causing some frustration among the team.

Sales Representative: That sounds frustrating indeed. At Tech Solutions, our software is designed to minimize
downtime by proactively monitoring your network and automating many of the manual tasks that can lead to errors and delays. 
Additionally, our license management feature ensures that all your software licenses are up-to-date and compliant, 
reducing the risk of unexpected downtime. Would a solution like this be of interest to you?

Prospect: It does sound helpful. How does it work exactly?

Sales Representative: Great question! Our software integrates seamlessly with your existing infrastructure.
It provides real-time monitoring and alerts for any potential issues, allowing your team to address them before
they cause downtime. It also automates software updates and license renewals, so you don’t have to worry about them manually.
Would you be interested in seeing a demo of how this works?

Prospect: Yes, a demo would be useful.

Sales Representative: Fantastic! I can schedule a demo for you with one of our specialists.
They can walk you through the features and show you how it can specifically benefit You.com.
Does tomorrow at 2 PM work for you?

Prospect: Tomorrow at 2 PM works for me.

Sales Representative: Perfect. I’ll send you a calendar invite with all the details.
Before we wrap up, do you have any other questions or concerns that you'd like to address?

Prospect: Not at the moment, but I’m looking forward to the demo.

Sales Representative: Great to hear. Thanks again for your time, Taylor. 
I’ll send over the invite shortly, and I look forward to speaking with you tomorrow.

Prospect: Thank you, Alex. Talk to you then.

Sales Representative: Take care!
"""

After the call is over, Alex wants to update the Salesforce table on how the call went.  Following are the list of questions that Alex wants to know the answer to. Note that not all questions can be answered solely based on the transcript, and in fact might involve searching through the web to find relevant information.

In [188]:
questions = """
what is the name of the prospect?
what is the prospect's role?
what is the name of the company?
what is the company's website URL?
what is company's LinkedIn URL?
what is the market segment the prospect operates in?
what are the pain points expressed by the prospect?
what are the follow up actions?
what is the overall sentiment of the conversation?
"""

### Extract structured information

Given a transcript, the following function extracts  structural information and answers specific questions based on the transcript provided.
If the answers to the question do not exist in the transcript, search the You.com searches the web for relevant answers.

In [189]:
import requests
import os

def get_structural_info(transcript, questions, mode="research"):
    headers = {'x-api-key': os.environ['YDC_API_KEY']}
    endpoint = f"https://chat-api.you.com/{mode}"

    params = {"query": f"""
            You are a helpful assistant who is given a transcript` between a sales representative and a prospect.
            Your task is to extract the information from the transcript as well the web to answer the following questions:
            {questions}

            The transcript you need to analyze is as follows:
            {transcript}

            Convert relative time references to absolute time references.
            Return the extracted information as a JSON object.
            """
        }
    response = requests.get(endpoint, params=params, headers=headers)
    return response.json()

### Results
Display the answers to Alex's questions:

In [195]:
structural_info = get_structural_info(transcript, questions, mode="smart")
print(structural_info["answer"])

{
  "name": "Taylor",
  "prospect_role": "IT Analyst",
  "company_name": "You.com",
  "company_website": "https://www.you.com",
  "company_linkedin": "https://www.linkedin.com/company/you-com",
  "market_segment": "IT Solutions",
  "pain_points": [
    "Network downtime",
    "Managing software licenses"
  ],
  "follow_up_actions": [
    "Schedule a demo for tomorrow at 2 PM"
  ],
  "sentiment": "Positive"
}


Note that information such as `company_website` and `company_linkedin` are not available on the transcript. Similarly, the API can also answer questions like the overall sentiment of the transcript by leveraging underlying foundational LLMs.