## Setup and init

### Environment and dependencies

1. Create and activate a vitual env, e.g., `python -m venv .venv`
1. Install dependencies with `pip install -r requirements-dev.txt`
1. Copy `.env.example` and rename to `.env`.
  1. Provide the required values using the target Cosmos DB that has the conversations data you're interested in.

## Get conversation data

In [None]:
from history import *
from dotenv import load_dotenv
import pandas as pd
import json

load_dotenv()   

### Date filtering

Results can be filtered by start and/or end dates. Providing a value for both will return conversations between the two dates, inclusive. Passing `None` for one and a value for the other will produce a before or after filter accordingly. Passing `None` for both will return all results. 

Note that dates can be simple, like `2024-01-30`, but can also be more targeted by providing time details, like `2024-01-30T10:15:00Z`.

In [2]:
# Get conversations for a date range (also supports only one date being provided)
start_date = pd.to_datetime("2024-01-29")
end_date = pd.to_datetime("2024-01-30")

# Uncomment to use None for start_date and end_date to get all conversations
# start_date = None
# end_date = None

### Query

In [5]:
dataset = got_conversations(start_date=start_date, end_date=end_date)
df = pd.DataFrame(dataset)

df.head(3)

Unnamed: 0,id,timestamp,response_timestamp,user_query,conversation_id,context,chat_response
0,c515b325-31d8-4902-8eb0-bfdf8aa91551,2024-01-29T23:59:31.206075,2024-01-29T23:59:54.315737,"{'id': 'ecccf511-35f3-a6e8-2881-02af2addf17a',...",ecccf511-35f3-a6e8-2881-02af2addf17a,{'citations': [{'content': 'Microsoft Research...,"{'id': '', 'model': 'gpt-4', 'created': 170657..."
1,030777f3-f05a-4177-bd8a-57ff4f418262,2024-01-29T23:57:41.405727,2024-01-29T23:58:13.572655,"{'id': 'fcdd1c9d-dff4-73c4-a137-9dba99990983',...",fcdd1c9d-dff4-73c4-a137-9dba99990983,{'citations': [{'content': '. GPT -4 and the ...,"{'id': '', 'model': 'gpt-4', 'created': 170657..."
2,6e8e6bd7-56e8-4abd-845e-a5e6649d8a8a,2024-01-29T23:56:22.674996,2024-01-29T23:57:08.228601,"{'id': '508917c2-bc62-ecda-bb98-d6f940379334',...",c3cb145e-63be-cc87-7cba-4ca2f15d0f78,{'citations': [{'content': 'Title: Research Fo...,"{'id': '', 'model': 'gpt-4', 'created': 170657..."


### Extend with calculated colums

In [6]:
# Adds some calculated columns to the dataframe like 'duration', 'turn_count' etc.
extend_dataframe(df)
df.head(3)

Unnamed: 0,id,timestamp,response_timestamp,user_query,conversation_id,context,chat_response,user_input,answer,duration,turn_count
0,c515b325-31d8-4902-8eb0-bfdf8aa91551,2024-01-29 23:59:31.206075,2024-01-29 23:59:54.315737,"{'id': 'ecccf511-35f3-a6e8-2881-02af2addf17a',...",ecccf511-35f3-a6e8-2881-02af2addf17a,{'citations': [{'content': 'Microsoft Research...,"{'id': '', 'model': 'gpt-4', 'created': 170657...",What kind of problems is MSR's AI research try...,Microsoft Research's AI research is focused on...,23.109662,1
1,030777f3-f05a-4177-bd8a-57ff4f418262,2024-01-29 23:57:41.405727,2024-01-29 23:58:13.572655,"{'id': 'fcdd1c9d-dff4-73c4-a137-9dba99990983',...",fcdd1c9d-dff4-73c4-a137-9dba99990983,{'citations': [{'content': '. GPT -4 and the ...,"{'id': '', 'model': 'gpt-4', 'created': 170657...",Can you summarize the key challenges tackled b...,The Microsoft Research Forum for this year add...,32.166928,1
2,6e8e6bd7-56e8-4abd-845e-a5e6649d8a8a,2024-01-29 23:56:22.674996,2024-01-29 23:57:08.228601,"{'id': '508917c2-bc62-ecda-bb98-d6f940379334',...",c3cb145e-63be-cc87-7cba-4ca2f15d0f78,{'citations': [{'content': 'Title: Research Fo...,"{'id': '', 'model': 'gpt-4', 'created': 170657...",what datasets were used in novel ways by Micro...,"In the NeurIPS 2023 submissions, Microsoft res...",45.553605,4


## Save to Excel

In [None]:
filename = 'chat_history.xlsx'

# Create a new DataFrame from the results
out_dataset = pd.DataFrame(df, columns=[
    'id', 
    'conversation_id', 
    'turn_count', 
    'timestamp', 
    'response_timestamp', 
    'duration', 
    'user_input', 
    'answer', 
    'context'
])

# Write the new DataFrame to a new Excel file
output_file_path = filename
out_dataset.to_excel(output_file_path, index=False)

out_dataset.head(4)