In [1]:
%reload_ext autoreload
%autoreload 2

In [2]:

from typing import List, Optional, AsyncGenerator
from datetime import datetime
from pydantic import Field
from core import BaseCall, Msg, system_msg, user_msg
import pandas as pd
from sampling import WeightedSampler, fetch_joined_data
from query_planner import QueryPlan, SearchQuery

# Fetch the data
df = await fetch_joined_data()

# Initialize sampler with the data
sampler = WeightedSampler(
    df=df.drop(columns=['other_speakers']),
    date_column='meeting_timestamp',
    decay_factor=0.2
)

Building vector index...


Batches:   0%|          | 0/100 [00:00<?, ?it/s]

In [3]:
from typing import List, Dict
from pydantic import Field, validator
from core import BaseCall

class SourcesResponse(BaseCall):
    relevant_sources: List[int] = Field(
        description="List of the indices of the  relevant sources from the context, at least 10",
        default_factory=list
    )

In [4]:
from typing import List, Dict
from pydantic import Field, validator
from core import BaseCall

class SearchResponse(BaseCall):
    response: str = Field(
        description="A comprehensive analysis drawing from the provided context. Direct and concise answer."
    )
    relevant_sources: List[int] = Field(
        description="List of the indices of the  relevant sources from the context, at least 10",
        default_factory=list
    )


In [5]:
def get_similar_entries(sampler: WeightedSampler, relevant_df: pd.DataFrame, n_samples: int = 100) -> pd.DataFrame:
    """Find entries similar to the relevant sources"""
    
    # Combine text from relevant sources to create a rich query
    query_texts = []
    for _, row in relevant_df.iterrows():
        query_parts = []
        if pd.notna(row['topic_name']):
            query_parts.append(row['topic_name'])
        if pd.notna(row['summary']):
            query_parts.append(row['summary'])
        if pd.notna(row['details']):
            query_parts.append(row['details'])
        query_texts.append(' '.join(query_parts))
    
    # Sample using combined similarity
    similar_results = sampler.sample(
        query=query_texts,
        n_samples=n_samples,
        mode='combined',
        recency_weight=0.1,
        similarity_weight=0.8,
        filter_weight=0.1
    )
    
    # Remove the original sources from results
    similar_results = similar_results[~similar_results.index.isin(relevant_df.index)]
    
    return similar_results


In [6]:

# Sample recent data
sampled_df = sampler.sample(
    n_samples=500,
    mode='recency',
    recency_weight=1.0,
    similarity_weight=0.0
).sort_values('meeting_timestamp', ascending=True).reset_index(drop=True)

In [7]:
sampled_df = pd.concat([sampled_df, similar_entries]).drop_duplicates()


NameError: name 'similar_entries' is not defined

In [10]:
len(sampled_df)

500

In [11]:
user_query = "user feedback of vexa, be specific, list specific users"

In [12]:
context_str = "\n".join([
    f"[{i}] {row['meeting_timestamp'].strftime('%Y-%m-%d')}, "
    f"{row['speaker_name']}, "
    f"{row['topic_name']}, "
    f"{row['details']}"
    for i, (_, row) in enumerate(sampled_df.iterrows())
])

In [13]:
from core import count_tokens
count_tokens(context_str)

23251

In [15]:
messages = [
    system_msg("""You are Vexa, a helpful search assistant trained by Vexa AI. Your task is to deliver comprehensive 
analytical reports drawing from the given context. Your analysis must be detailed, well-structured, and written in 
a professional business report style.

CRITICAL RULES:
- Structure your response in clear sections
- Provide specific details, dates, and speaker attributions
- Include relevant statistics or patterns if present
- Synthesize information across multiple meetings
- Rate context_sufficiency based on depth and breadth of available information
- Select relevant_sources based on information significance, not just mention
- If information is missing or unclear, explicitly state what's not covered

REPORT STRUCTURE:
1. Key Findings (2-3 main points)
2. Detailed Analysis (by subtopic or chronologically)
3. Notable Quotes or Specific Examples
4. Patterns or Trends (if applicable)
5. Gaps or Limitations in Available Information"""),
    
    user_msg(f"""Below are numbered meeting notes. Each note contains:
- Date
- Speaker
- Topic
- Details

Numbered Context:
{context_str}

Question: {user_query}

Provide a comprehensive analysis using only information from these notes. Include specific details, dates, and speaker attributions.""")
]

In [15]:
async for response in SearchResponse.call_stream(messages):
    pass

In [16]:
from IPython.display import Markdown
Markdown(response.response)

The user feedback for Vexa highlights both positive experiences and areas for improvement. Users appreciate the clean interface and the ability to generate useful summaries from meeting transcripts. However, there are concerns regarding the functionality of certain features, such as button clickability and the effectiveness of the transcription process. Specific users mentioned include Robert Hangu, who noted the time-saving benefits of using Vexa alongside ChatGPT, and Dmitriy Grankin, who emphasized the need for better filtering and search capabilities based on user requests. Overall, while the product shows promise, ongoing adjustments and enhancements are necessary to meet user expectations fully.

In [18]:
sampled_df.iloc[response.relevant_sources]

Unnamed: 0,summary_index,summary,details,referenced_text,topic_name,topic_type,meeting_id,meeting_timestamp,speaker_name,filter_score,recency_score,similarity_score,combined_score
196,3,Robert outlined his workflow for using Vexa in...,He described how he uses Vexa to transcribe ca...,Robert Hangu: So what I ended up doing was I ...,Robert's workflow using Vexa and ChatGPT,task,4ff12d56-f520-4391-a283-03d6365b49e6,2024-09-09 10:46:17.724000,Robert Hangu,1.0,1e-06,0.0,0.000313
203,8,Vexa is a real-time assistant for meetings tha...,Robert Hangu provides positive feedback about ...,Robert Hangu: Let me just sign in so that we ...,Vexa,product,4ff12d56-f520-4391-a283-03d6365b49e6,2024-09-09 10:46:17.724000,Robert Hangu,1.0,1e-06,0.0,0.000313
198,6,"Dmitry Grankin is a speaker in the meeting, di...","Dmitry is involved in the development of Vexa,...",Dmitry Grankin: used virxa for a few times ye...,Dmitry Grankin,person,4ff12d56-f520-4391-a283-03d6365b49e6,2024-09-09 10:46:17.724000,Dmitry Grankin,1.0,1e-06,0.0,0.000313
481,2,"Dmitriy provided an update on his startup VEX,...","Currently, VEX has around 200 free users. Dmit...",Sergio Goriachev: Как дела с VEX? Сколько кли...,VEX User Growth and Feedback,update,2647ed9b-411e-46dc-ba53-c08ef2887479,2024-10-18 15:23:41.975999,Dmitriy Grankin,1.0,0.003379,0.0,0.000316
230,6,Olga plans to create targeted Google Ads campa...,Olga aims to develop Google Ads campaigns that...,"Olga Nemirovskaya: Для Google Ads, то, что я ...",Google Ads campaign planning,action plan,d06b1c93-f225-40cd-8448-e9efd790336e,2024-09-10 19:05:27.756000,Olga Nemirovskaya,1.0,2e-06,0.0,0.000313


In [19]:

# Usage example
relevant_sources = sampled_df.iloc[response.relevant_sources]
similar_entries = get_similar_entries(sampler, relevant_sources, n_samples=500)

In [20]:
similar_entries

Unnamed: 0,summary_index,summary,details,referenced_text,topic_name,topic_type,meeting_id,meeting_timestamp,speaker_name,filter_score,recency_score,similarity_score,combined_score
405,9,Vexa is an AI assistant designed to facilitate...,Dmitriy explains how Vexa can transcribe meeti...,"Dmitriy Grankin: Tá no início, eu estou testa...",Vexa,product,1c084be2-5c19-4b05-95d7-b337219acf75,2024-09-24 15:36:13.840000,Dmitriy Grankin,1.0,2.780953e-05,0.610468,0.000561
1664,13,Vexa is a meeting transcription tool that prov...,Dmitry highlights its unique feature of not ha...,,Vexa,product,247a9f4e-1182-401e-bf61-95a6c3a12137,2024-08-30 10:29:05.146000,Dmitry Grankin,1.0,1.873791e-07,0.606071,0.000558
2095,0,"The team discussed the features of VEXA, a rea...",VEXA aims to assist users in real-time by prov...,"Dmitriy Grankin: Вот мы это стартап сольды, с...",Functionality of VEXA,discussion,f0d5f231-b866-4cb4-b59d-73899f2e0dc9,2024-09-04 15:00:03.180000,Dmitriy Grankin,1.0,5.093493e-07,0.597101,0.000551
1258,8,"VEX is a startup mentioned by Dmitriy Grankin,...",Dmitriy mentioned that VEX has around 200 free...,"Dmitriy Grankin: Ну, у VEX количество пользов...",VEX,company,2647ed9b-411e-46dc-ba53-c08ef2887479,2024-10-18 15:23:41.975999,Dmitriy Grankin,1.0,3.379148e-03,0.586021,0.000543
2502,12,Vexa is a product being discussed in the meeti...,The team is considering how to improve user ac...,Olga Nemirovskaya: Вот. Можно ответить так.,Vexa,product,0eceb16a-7732-4e91-931b-ab588771dd41,2024-09-06 19:03:43.560000,Olga Nemirovskaya,1.0,7.598599e-07,0.577688,0.000536
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2414,6,Обсуждение концепции нового лендинга для повыш...,Обсуждение необходимости обновления лендинга с...,"Dmitrii Bashkirov: Вот. В целом, конечно, над...",Обсуждение концепции лендинга,discussion,fa6923af-1655-4564-bc97-6f436e0e664e,2024-09-06 14:38:04.040000,Дмитрий Гранкин,1.0,7.598599e-07,0.110736,0.000180
2215,2,"Обсуждение проблемы с транскриптами, которые н...","Транскрипты, полученные из Redis, имеют пробле...",: У нас почему-то по результатам SingData час...,Транскрипты из Redis,concern,c244ce5e-7145-4c6d-b16d-b1766f04e9bf,2024-09-05 12:19:08.931999,nan (4),1.0,6.221207e-07,0.109795,0.000179
2694,19,Delaware is mentioned as the location where on...,The conversation touches on the logistical asp...,Eugene Tartakovsky: Если он работает из короб...,Delaware,location,081d04c6-827d-4375-af68-9f8fbd50b707,2024-09-10 12:02:21.644000,Dmitry Grankin,1.0,1.691099e-06,0.103847,0.000175
1001,0,The discussion revolves around the relevance o...,The speaker questions the necessity of grillin...,: тестирую тестирую да какой смысл делать гри...,Purpose of grilling during a gathering,idea,ecadb973-6e3d-4c14-aab7-a0951957b2b1,2024-10-09 17:37:05.070000,,1.0,5.585694e-04,0.096198,0.000169


In [158]:
from core import generic_call_stream

In [122]:
from IPython.display import Markdown
Markdown(output)

Matt Lewis is mentioned in the context of considering a visit to Bali, where his friends are involved in business [2024-08-30]. There are no further details provided about his role or background in the meeting notes.