# Data generation pipeline


As context the state will get information about the **company** and the **employees** and the **project**. 


Company information will contaion:
- Company Overview
- Current Project
- Team Structure
- Technology Stack

Project information will contain:
- Project Overview
- Technology Stack
- Project Component

Employee information will contain:
- Name
- Role 
- Responsibilities
- Skills


TODO: 
- might be usefull if we would have a **project state** documents where progress would have been kept.
- These informations could be set or generated by another pipeline but for now it is ok to keep it simple.



Steps: 
1. Define the meeting type (e.g., Sprint Planning, Backend Architecture Discussion, UI/UX Design Review, Project Status Update, Daily Standup)
2. Outline the key topics and structure for the meeting
3. Consider the participants and their roles
4. Determine the desired length of the meeting
5. Generate the transcript, ensuring natural conversation flow and relevant technical details
6. Review and refine the transcript if necessary




**Meeting purpose generator:**

``` text
Prompt: Generate a brief description of the purpose for a meeting. Based on the company and project information and the current project state. Since the company uses agile methodology the meeting type could be sprint planning, review, or retrospective.
```

**Meeting Type Selector**
``` text
Prompt: Given the following meeting types: [list of meeting types], select the most appropriate type for a meeting about [brief description of meeting purpose].
```

**Topic Outliner**
``` text
Prompt: For a [selected meeting type] meeting, generate an outline of key topics that should be discussed. Include at least [X] main points and [Y] sub-points for each.
```

**Participant Definer**
``` text
Prompt: Based on the [selected meeting type] and [topic outline], list the necessary participants for this meeting. For each participant, provide their name, role, and key responsibilities in the context of this project. Here is the list of participants and their information: [list of participants].
```

**Meeting Length Estimator**
``` text
Prompt: Considering the [topic outline] and [list of participants], estimate an appropriate length for this meeting in minutes. Provide a brief justification for your estimate. Average words per minute of conversation is 120. 
```


**Conversation Generator**
``` text
Prompt: Using the [topic outline], [list of participants], and [estimated meeting length], generate a realistic conversation transcript for this meeting. Ensure each participant contributes according to their role, and that the conversation flows naturally while covering all outlined topics. The transcript should reflect approximately [X] words per minute of conversation. 

IMPORTANT: The conversation must be annotated with the name of the person who is speaking.

Example:
####
[Sarah]: Good morning, everyone! Welcome to ...
[Alex]: Thanks, Sarah. In the last sprint, we ...
[Emily]: We also finalized the responsive ...
#### 
```


In [23]:
import os
from dotenv import load_dotenv
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated, List
import operator
from langgraph.checkpoint.sqlite import SqliteSaver
from langchain_core.messages import AnyMessage, SystemMessage, HumanMessage, AIMessage, ChatMessage
from langchain.callbacks.tracers import LangChainTracer
from langchain_anthropic import ChatAnthropic
from langchain import PromptTemplate

load_dotenv()
anthropic_api_key = os.getenv("ANTHROPIC_API_KEY")
#model = ChatAnthropic(model="claude-3-haiku-20240307", anthropic_api_key=anthropic_api_key)
model = ChatAnthropic(model="claude-3-5-sonnet-20240620", anthropic_api_key=anthropic_api_key, max_tokens= 8192)
# Set up LangChain tracing
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = "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"
os.environ["LANGCHAIN_PROJECT"] = "data_generation"

# Utils

In [3]:
def initialize_data_generation_state(company_data: str, project_general: str, project_requirements: str, project_sprint_state: str, project_backlog :str, employee_profiles: str, meeting_history: str ) -> DataGenerationState:
    return DataGenerationState(
    	company_data=company_data,
        project_general = project_general,
    	project_requirements = project_requirements,	
    	project_sprint_state= project_sprint_state,	
        project_backlog = project_backlog,
    	employee_profiles=employee_profiles,
        meeting_history = meeting_history,	
    	meeting_purpose="",	
    	meeting_type="",		
    	meeting_outline= "",		
    	meeting_participants="",		 
    	meeting_length= "",
    	transcript="",
    )

def load_markdown_to_str(file_path):
    with open(file_path, 'r', encoding='utf-8') as md_file:
        markdown_content = md_file.read()
    return markdown_content


# Agents

In [2]:
class DataGenerationState(TypedDict):
    company_data: str	# init
    project_general: str	# init
    project_requirements: str		# init + need to be updated - this could be get from project state
    project_sprint_state : str
    project_backlog : str
    employee_profiles: str	# init
    meeting_history: str	# init + need to be updated -this could be get from project state
    meeting_purpose: str	# [optional] generated by first node
    meeting_type: str		# generated by 2nd node
    meeting_outline: str		# generated by 3rd node
    meeting_participants: str		# 
    meeting_length: str
    transcript: str


In [4]:
MEETING_PURPOSE_GENERATOR_PROMPT ="""
You are the Scrum Master in a company. Your work is passed to pipeline where the goal is to generate realistic meeting transcripts by simulating a small Tech company.
Your resbonsibility is to make a brief description about the next meeting.
Here you will find information about the company where you are giving advice as a Scrum Master: \n #### \n {company_data} \n #### \n
Here you will find information about the employees and their detailed profile: \n #### \n {employee_profiles} \n #### \n
The company currently working on this project: \n #### \n {project_general} \n #### \n
Here you will find the requirements that needs to be fufilled: \n #### \n {project_requirements} \n #### \n
Here you will find information about the project status overall and sprint and the state of the sprint overall: \n #### \n {project_sprint_state} \n #### \n
Here you will find information about the current status of the backlog: \n #### \n {project_backlog} \n #### \n
Here you will find the past meetings that happened. \n #### \n {meeting_history} \n #### \n


As you are an experineced srum master your task is to help moving the project toward. To achive this you need to decide what should the team discuss in their next meeting.

Generate a brief description of the purpose for a meeting. 
Based on the company and project information and the current project state.
Since the company uses agile methodology the meeting type could be 
[Sprint planning meeting, 
Daily Scrum meeting, 
Backlog Refinement, 
Sprint review meeting, 
Sprint retrospective meeting,
Technical Debt Meetings, 
Design or Architecture Sessions]
Refer back to project state and to meeting history to see where the project is standing right now.
You have the make the description so the project can move foward.
If there is any techical condiseration that needs to be address then contain that in your description.
Set the name and the date with start to end when the meeting take place.
"""

In [5]:
MEETING_TPYE_SELECTOR ="""
You are the Meeting Type Selector in a pipeline for generating realistic meeting transcripts.
Your responsibility is to determine the most appropriate type of meeting based on the given input.
You will be provided with a brief description of the meeting's purpose.

Description:
####
{meeting_purpose}
####
Choose from the following meeting types:
- Sprint planning meeting: Technical considerations are discussed when planning tasks for the upcoming sprint, including potential challenges and solutions.
- Daily Scrum meeting: While this is primarily for quick updates, team members can briefly mention technical challenges they're facing. More in-depth discussions are typically taken offline.
- Backlog Refinement: The team discusses and clarifies user stories, which often involves addressing technical aspects and potential solutions.
- Sprint review meeting: The team demonstrates completed work, which can lead to technical discussions about implementation details.
- Sprint retrospective meeting: Team members can bring up technical issues that affected the sprint and discuss ways to improve.
- Technical Debt Meetings: Some teams hold separate meetings to address technical debt and architectural concerns.
- Design or Architecture Sessions: These are ad-hoc meetings focused on solving specific technical problems or planning system architecture.
Respond only with the selected meeting type.
"""

In [6]:
TOPIC_OUTLINER_PROMPT = """
You are the Topic Outliner in a meeting transcript generation pipeline. Your task is to create a structured flow or outline of the meeting. 
Take into consideration that the meeting will be a {meeting_type}.
The purpose of the meeting is the following: \n####\n {meeting_purpose} \n####\n

You can find information about the company you are giving advice as a Scrum Master: \n####\n {company_data} \n####\n
You can find information about the employees and their detailed profile: \n####\n {employee_profiles} \n####\n 
You can find information about the current project that the team is working on this contains general information: \n####\n {project_general} \n####\n
You can find information about the state of the project: \n####\n  {project_sprint_state} \n####\n
You can find information about the current state of the backlog: \n####\n  {project_backlog} \n####\n

Provide the outline and the flow of the main topics and ideas or technical problems with a small description, that needs to be spoken about in the meeting.
"""

In [7]:
PARTICIPANT_DEFINER_PROMPT = """
You are the Participant Definer in a meeting transcript generation pipeline.
Your role is to determine the necessary participants for the meeting based on the meeting type and topic outline.

The meeting type: {meeting_type}
The meeting purpose: {meeting_purpose}
The meeting outline: {meeting_outline}

When deciding who needs to be there in the given meeting, take the employee profiles to consideration.

Here you can find the detailed description of the employees that can be participant:
####
{employee_profiles}
####
Provide a list the participants, including their names, roles, and key responsibilities relevant to the meeting topics.
"""


In [8]:
MEETING_LENGTH_ESTIMATOR = """
You are the Meeting Length Estimator in a meeting transcript generation pipeline.
Your job is to estimate an appropriate length for the meeting.
Take into consideration that the meeting will be a {meeting_type}.
The purpose of the meeting is the following: {meeting_purpose}.
This will be the outline of the meeting: {meeting_outline}

Note as the final output will be generated by a Large Language model it only can respand with 8192 token which is ≈ 5461 to 6301 words.
An average person speaks at a rate of about 125-150 words per minute in normal conversation. 
If we assume about 70 percent of meeting time involves active speaking, then the average pace: ~85-105 words per minute.
You can calculate the needed minutes: by total_length_in_words/average_pace
You can calculate the needed tokens with: total_length*average_pace*1.5
When you decide how long the meeting will be make sure if it will be able to fit the 8192 response output size. 
If not then response with: "MORE TURNS NEEDED"
"""


In [9]:
TRANSCRIPT_GENERATOR_PROMPT = """
You are the Conversation Generator, the final node in a meeting transcript generation pipeline.
Your task is to create a realistic meeting transcript based on all previous inputs.
You will receive the meeting type, topic outline, list of participants, and estimated meeting length.

Here are some the necessarily information, use these as a context. Each section will be separeted with four hashtag like ####. Use this as a delimiter. 
Information about the company: \n####\n {company_data} \n####\n
Information about the current project that the team is working on this contains general information about the project: \n####\n {project_general} \n####\n
Information about the employees and their detailed profile: \n####\n {employee_profiles} \n####\n
Information about the state of the project: \n####\n {project_sprint_state} \n####\n
Information about the current state of the backlog: \n####\n {project_backlog} \n####\n
Information about the past meetings that happened: \n####\n {meeting_history} \n####\n

Here are the necessarily information about the transcript. Use this to generate the final transcript. 
Meeting type: {meeting_type}\n
Meetinf purpose: \n{meeting_purpose}\n
Meeting outline: \n {meeting_outline}\n
Meeting participants - how actually takes part in the meeting: \n {meeting_participants} \n
Meeting estimated length: \n {meeting_length} \n


Generate a transcript that follows the topic outline, includes contributions from all participants according to their roles. 
Note: Only inculede those participants how have been listed as actual participants.
The transcript should be in a format where each speaker's name is in square brackets, followed by their dialogue. 
Ensure the conversation flows naturally and covers all outlined topics while maintaining realism and relevance to a software development project.
IMPORTANT: 
Be very verbose.
Only response with the transcript.
If you finished say "FINISHED". This is very important! It will cost you a lot if you dont say that! 
"""

In [10]:
def meeting_purpose_node(state: DataGenerationState) -> DataGenerationState:
    formatted_prompt = MEETING_PURPOSE_GENERATOR_PROMPT.format(
		company_data = state.get("company_data"),
		employee_profiles = state.get("employee_profiles"),
		project_general = state.get("project_general"),
		project_requirements = state.get("project_requirements"),
		project_sprint_state = state.get("project_sprint_state"),
		project_backlog = state.get("project_backlog"),
		meeting_history = state.get("meeting_history"),
    )
    messages = [
        SystemMessage(content=formatted_prompt),
        HumanMessage(content="Generate a meeting purpose based on the provided information.")
	]
    response = model.invoke(messages)
    state["meeting_purpose"] = response.content
    print(state["meeting_purpose"])
    return state

In [28]:
state = initialize_data_generation_state(
    company_data=load_markdown_to_str(file_path="../data/company-data/company-data.md"),
    project_general=load_markdown_to_str(file_path= "../data/company-data/current-project-general.md"),
    project_requirements=load_markdown_to_str(file_path="../data/company-data/current-project-requirements.md"),
    project_sprint_state=load_markdown_to_str(file_path="../data/company-data/current-project-sprint-state.md"),
    project_backlog=load_markdown_to_str(file_path="../data/company-data/current-project-backlog.md"),
    employee_profiles =load_markdown_to_str(file_path="../data/company-data/employee-profiles.md"),
    meeting_history =load_markdown_to_str(file_path="../data/company-data/current-meeting-history.md")
)
meeting_purpose_node(state=state)

Meeting Name: Daily Scrum Meeting

Date: June 20, 2023
Time: 9:00 AM - 9:15 AM

Purpose:
The purpose of this Daily Scrum meeting is to kick off the first day of Sprint 1 for the HealthTrack Pro project. As the sprint has just begun, this meeting will focus on ensuring all team members are aligned with their initial tasks and ready to start work. 

Key points to address:
1. Each team member will briefly share their plan for the day, focusing on the tasks they're starting with from the sprint backlog.
2. Discuss any immediate blockers or concerns, particularly regarding the project setup and core architecture that Alex Rodriguez is responsible for.
3. Confirm that everyone has access to necessary resources and development environments.
4. Briefly touch on the integration of the tech stack (React.js, Node.js, PostgreSQL) to identify any potential early challenges.
5. Ensure Liam Foster is on track to share the initial mockups by end of day, as this will be crucial for the frontend and bac

{'company_data': '# TechNova Solutions\n\n## Company Overview\nTechNova Solutions is a small, dynamic IT company specializing in web application development. With a team of 6 skilled professionals, they focus on creating innovative, user-friendly web solutions for small to medium-sized businesses.\n\n## Current Project: HealthTrack Pro\nTechNova is developing HealthTrack Pro, a comprehensive web application for personal health management. This application allows users to track their daily activities, nutrition, and health metrics, and provides insights and recommendations for a healthier lifestyle.\n\n## Team Structure\n1. ** Sarah Chen - Project Manager / Scrum Master**\n   - Oversees project progress, manages timelines, and facilitates communication\n   - Has a background in both frontend and backend development\n\n2. ** Alex Rodriguez - Senior Full-Stack Developer**\n   - Leads technical decisions and architecture design\n   - Proficient in both frontend and backend technologies\n\n

In [11]:

def meeting_type_node(state: DataGenerationState) -> DataGenerationState:
    formatted_prompt = MEETING_TPYE_SELECTOR.format(
		meeting_purpose = state.get("meeting_purpose")
	)
    messages = [
        SystemMessage(content=formatted_prompt),
        HumanMessage(content="Generate a meeting type based on the provided information.")
    ]
    response = model.invoke(messages)
    state["meeting_type"] = response.content
    print(state["meeting_type"])
    return state


In [29]:
meeting_type_node(state=state)

Daily Scrum meeting


{'company_data': '# TechNova Solutions\n\n## Company Overview\nTechNova Solutions is a small, dynamic IT company specializing in web application development. With a team of 6 skilled professionals, they focus on creating innovative, user-friendly web solutions for small to medium-sized businesses.\n\n## Current Project: HealthTrack Pro\nTechNova is developing HealthTrack Pro, a comprehensive web application for personal health management. This application allows users to track their daily activities, nutrition, and health metrics, and provides insights and recommendations for a healthier lifestyle.\n\n## Team Structure\n1. ** Sarah Chen - Project Manager / Scrum Master**\n   - Oversees project progress, manages timelines, and facilitates communication\n   - Has a background in both frontend and backend development\n\n2. ** Alex Rodriguez - Senior Full-Stack Developer**\n   - Leads technical decisions and architecture design\n   - Proficient in both frontend and backend technologies\n\n

In [12]:
def topic_outliner_node(state: DataGenerationState) -> DataGenerationState:
    formatted_prompt = TOPIC_OUTLINER_PROMPT.format(
        meeting_type = state.get("meeting_type"),
        meeting_purpose = state.get("meeting_purpose"), 
		company_data = state.get("company_data"),
		employee_profiles = state.get("employee_profiles"),
		project_general = state.get("project_general"),
		project_sprint_state = state.get("project_sprint_state"),
        project_backlog = state.get("project_backlog")
	)
    messages = [
        SystemMessage(content=formatted_prompt),
        HumanMessage(content="Generate the meeting topics and outline based on the provided information.")
    ]
    response = model.invoke(messages)
    state["meeting_outline"] = response.content
    print(state["meeting_outline"])
    return state

In [30]:
topic_outliner_node(state)

Here's an outline for the Daily Scrum meeting based on the provided information:

1. Introduction and Sprint Kickoff (1 minute)
   - Sarah Chen welcomes the team to the first day of Sprint 1
   - Briefly remind the team of the sprint goal and duration

2. Individual Updates (8 minutes, ~1.5 minutes per person)
   a. Alex Rodriguez
      - Progress on project structure and core architecture setup
      - Any immediate technical decisions or challenges

   b. Emily Watson
      - Initial frontend tasks for user authentication
      - Plans for health dashboard component structure

   c. Michael Kim
      - Starting points for backend models and authentication system
      - Any early database design considerations

   d. Olivia Martinez
      - Outline of testing strategy and CI/CD pipeline setup
      - Any immediate security considerations for user authentication

   e. Liam Foster
      - Status update on initial mockups
      - Plans for user authentication and health dashboard desig

{'company_data': '# TechNova Solutions\n\n## Company Overview\nTechNova Solutions is a small, dynamic IT company specializing in web application development. With a team of 6 skilled professionals, they focus on creating innovative, user-friendly web solutions for small to medium-sized businesses.\n\n## Current Project: HealthTrack Pro\nTechNova is developing HealthTrack Pro, a comprehensive web application for personal health management. This application allows users to track their daily activities, nutrition, and health metrics, and provides insights and recommendations for a healthier lifestyle.\n\n## Team Structure\n1. ** Sarah Chen - Project Manager / Scrum Master**\n   - Oversees project progress, manages timelines, and facilitates communication\n   - Has a background in both frontend and backend development\n\n2. ** Alex Rodriguez - Senior Full-Stack Developer**\n   - Leads technical decisions and architecture design\n   - Proficient in both frontend and backend technologies\n\n

In [13]:
def meeting_length_estimator_node(state: DataGenerationState) -> DataGenerationState:
    formatted_prompt = MEETING_LENGTH_ESTIMATOR.format(
        meeting_type = state.get("meeting_type"),
		meeting_purpose = state.get("meeting_purpose"),
		meeting_outline = state.get("meeting_outline"),
	)
    messages = [
        SystemMessage(content=formatted_prompt),
        HumanMessage(content="Generate the meeting length based on the provided information.")
    ]
    response = model.invoke(messages)
    state["meeting_length"] = response.content
    print(state["meeting_length"])
    return state

In [31]:
meeting_length_estimator_node(state=state)

Based on the provided information and considerations, let's estimate the meeting length:

1. The scheduled time for the Daily Scrum meeting is 15 minutes (9:00 AM - 9:15 AM).
2. The outline suggests a well-structured meeting that covers all necessary points within this timeframe.
3. Given the nature of a Daily Scrum, especially on the first day of a sprint, 15 minutes is an appropriate duration to keep the meeting focused and efficient.

Calculating the potential transcript length:
- 15 minutes of meeting time
- Assuming 85-105 words per minute (lower end due to the structured nature of the meeting)
- 15 * 85 = 1,275 words (minimum estimate)
- 15 * 105 = 1,575 words (maximum estimate)

Token estimation:
- 1,575 words * 1.5 (token to word ratio) = 2,362.5 tokens

This falls well within the 8,192 token limit for the AI's response.

Therefore, the estimated meeting length is 15 minutes, which aligns with the scheduled time and should provide sufficient content for a comprehensive transcri

{'company_data': '# TechNova Solutions\n\n## Company Overview\nTechNova Solutions is a small, dynamic IT company specializing in web application development. With a team of 6 skilled professionals, they focus on creating innovative, user-friendly web solutions for small to medium-sized businesses.\n\n## Current Project: HealthTrack Pro\nTechNova is developing HealthTrack Pro, a comprehensive web application for personal health management. This application allows users to track their daily activities, nutrition, and health metrics, and provides insights and recommendations for a healthier lifestyle.\n\n## Team Structure\n1. ** Sarah Chen - Project Manager / Scrum Master**\n   - Oversees project progress, manages timelines, and facilitates communication\n   - Has a background in both frontend and backend development\n\n2. ** Alex Rodriguez - Senior Full-Stack Developer**\n   - Leads technical decisions and architecture design\n   - Proficient in both frontend and backend technologies\n\n

MORE TURNS NEEDED: we need to add a conditional edge if "MORE TURNS NEEDED" is in meeting_length ---> generation needs to be continued


In [14]:
def participant_definer_node(state: DataGenerationState) -> DataGenerationState:
    formatted_prompt = PARTICIPANT_DEFINER_PROMPT.format(
        meeting_type = state.get("meeting_type"),
		meeting_purpose = state.get("meeting_purpose"),
		meeting_outline = state.get("meeting_outline"),
		employee_profiles = state.get("employee_profiles"),
	)
    messages = [
        SystemMessage(content=formatted_prompt),
        HumanMessage(content="List the participants who need to be present at the meeting based on the provided information.")
    ]
    response = model.invoke(messages)
    state["meeting_participants"] = response.content
    print(state["meeting_participants"])
    return state

In [32]:
participant_definer_node(state=state)

Based on the meeting type (Daily Scrum), purpose, and outline, here's the list of participants who need to be present at the meeting:

1. Sarah Chen - Project Manager / Scrum Master
   - Key responsibilities: Facilitate the Daily Scrum meeting, ensure the team is aligned with sprint goals, address any immediate concerns or blockers, and promote collaboration.

2. Alex Rodriguez - Senior Full-Stack Developer
   - Key responsibilities: Report on project structure and core architecture setup, share any immediate technical decisions or challenges.

3. Emily Watson - Frontend Developer
   - Key responsibilities: Update on initial frontend tasks for user authentication and plans for health dashboard component structure.

4. Michael Kim - Backend Developer
   - Key responsibilities: Share progress on backend models and authentication system, discuss any early database design considerations.

5. Olivia Martinez - QA Engineer / DevOps Specialist
   - Key responsibilities: Provide updates on tes

{'company_data': '# TechNova Solutions\n\n## Company Overview\nTechNova Solutions is a small, dynamic IT company specializing in web application development. With a team of 6 skilled professionals, they focus on creating innovative, user-friendly web solutions for small to medium-sized businesses.\n\n## Current Project: HealthTrack Pro\nTechNova is developing HealthTrack Pro, a comprehensive web application for personal health management. This application allows users to track their daily activities, nutrition, and health metrics, and provides insights and recommendations for a healthier lifestyle.\n\n## Team Structure\n1. ** Sarah Chen - Project Manager / Scrum Master**\n   - Oversees project progress, manages timelines, and facilitates communication\n   - Has a background in both frontend and backend development\n\n2. ** Alex Rodriguez - Senior Full-Stack Developer**\n   - Leads technical decisions and architecture design\n   - Proficient in both frontend and backend technologies\n\n

In [81]:
state["transcript"] = ""
state

{'company_data': '# TechNova Solutions\n\n## Company Overview\nTechNova Solutions is a small, dynamic IT company specializing in web application development. With a team of 6 skilled professionals, they focus on creating innovative, user-friendly web solutions for small to medium-sized businesses.\n\n## Current Project: HealthTrack Pro\nTechNova is developing HealthTrack Pro, a comprehensive web application for personal health management. This application allows users to track their daily activities, nutrition, and health metrics, and provides insights and recommendations for a healthier lifestyle.\n\n## Team Structure\n1. ** Sarah Chen - Project Manager / Scrum Master**\n   - Oversees project progress, manages timelines, and facilitates communication\n   - Has a background in both frontend and backend development\n\n2. ** Alex Rodriguez - Senior Full-Stack Developer**\n   - Leads technical decisions and architecture design\n   - Proficient in both frontend and backend technologies\n\n

In [15]:
def generate_transcript_node(state: DataGenerationState) -> DataGenerationState:
    formatted_prompt = TRANSCRIPT_GENERATOR_PROMPT.format(
		company_data = state.get("company_data"),
		project_general = state.get("project_general"),
		employee_profiles = state.get("employee_profiles"),
		project_sprint_state = state.get("project_sprint_state"),
		project_backlog = state.get("project_backlog"),
		meeting_history = state.get("meeting_history"),
		meeting_type = state.get("meeting_type"),
		meeting_purpose = state.get("meeting_purpose"),
		meeting_outline = state.get("meeting_outline"),
		meeting_participants = state.get("meeting_participants"),
		meeting_length = state.get("meeting_length"),
	)
    messages = [
        SystemMessage(content=formatted_prompt),
        HumanMessage(content="Generate the transcript based on the provided information.")
    ]
    
    full_transcript = ""
    more_turns_needed = True
    turn_count = 0
    max_turns = 5
    
    while more_turns_needed and turn_count < max_turns:
        response = model.invoke(messages)
        turn_transcript = response.content
        
        full_transcript += turn_transcript
        
        if "FINISHED" in turn_transcript:
            more_turns_needed = False
        else:
            messages.append(AIMessage(content=turn_transcript))
            messages.append(HumanMessage(content="Continue the transcript from where you left off."))
        
        
        turn_count += 1
        print(turn_count)

    state["transcript"] = full_transcript
    print(state["transcript"])
    return state

In [33]:
generate_transcript_node(state=state)

1
[Sarah Chen]: Good morning, everyone! Welcome to our first Daily Scrum of Sprint 1 for the HealthTrack Pro project. I hope you're all excited to kick things off. Just a quick reminder, our sprint goal is to "Implement core user authentication and health dashboard functionality, laying the foundation for HealthTrack Pro's MVP." We've got two weeks ahead of us, so let's make them count. Let's go around and hear what everyone's planning to work on today. Alex, why don't you start us off?

[Alex Rodriguez]: Morning, team. Today, I'm focusing on setting up our project structure and core architecture. I've already created the basic React and Node.js project structures. I'm planning to set up our PostgreSQL database and configure the initial connection today. One thing I want to flag - I'm a bit concerned about how we'll handle data model scalability as our user base grows. I'll be looking into this as I work on the architecture.

[Sarah Chen]: Thanks, Alex. That's a good point about scalab

{'company_data': '# TechNova Solutions\n\n## Company Overview\nTechNova Solutions is a small, dynamic IT company specializing in web application development. With a team of 6 skilled professionals, they focus on creating innovative, user-friendly web solutions for small to medium-sized businesses.\n\n## Current Project: HealthTrack Pro\nTechNova is developing HealthTrack Pro, a comprehensive web application for personal health management. This application allows users to track their daily activities, nutrition, and health metrics, and provides insights and recommendations for a healthier lifestyle.\n\n## Team Structure\n1. ** Sarah Chen - Project Manager / Scrum Master**\n   - Oversees project progress, manages timelines, and facilitates communication\n   - Has a background in both frontend and backend development\n\n2. ** Alex Rodriguez - Senior Full-Stack Developer**\n   - Leads technical decisions and architecture design\n   - Proficient in both frontend and backend technologies\n\n

# Graph

In [16]:
def create_data_generation_workflow(checkpointer):
    workflow = StateGraph(DataGenerationState)

    workflow.add_node("meeting_purpose_node", meeting_purpose_node)
    workflow.add_node("meeting_type_node", meeting_type_node)
    workflow.add_node("topic_outliner_node", topic_outliner_node)
    workflow.add_node("meeting_length_estimator_node", meeting_length_estimator_node)
    workflow.add_node("participant_definer_node", participant_definer_node)
    workflow.add_node("generate_transcript_node", generate_transcript_node)

    workflow.add_edge("meeting_purpose_node", "meeting_type_node")
    workflow.add_edge("meeting_type_node", "topic_outliner_node")
    workflow.add_edge("topic_outliner_node", "meeting_length_estimator_node")
    workflow.add_edge("meeting_length_estimator_node", "participant_definer_node")
    workflow.add_edge("participant_definer_node", "generate_transcript_node")
    workflow.add_edge("generate_transcript_node", END)

    workflow.set_entry_point("meeting_purpose_node")

    return workflow.compile(checkpointer=checkpointer)

# Create and visualize the workflow
with SqliteSaver.from_conn_string(":memory:") as memory:
    graph = create_data_generation_workflow(memory)
    print(graph.get_graph().draw_mermaid())

%%{init: {'flowchart': {'curve': 'linear'}}}%%
graph TD;
	__start__([<p>__start__</p>]):::first
	meeting_purpose_node(meeting_purpose_node)
	meeting_type_node(meeting_type_node)
	topic_outliner_node(topic_outliner_node)
	meeting_length_estimator_node(meeting_length_estimator_node)
	participant_definer_node(participant_definer_node)
	generate_transcript_node(generate_transcript_node)
	__end__([<p>__end__</p>]):::last
	__start__ --> meeting_purpose_node;
	generate_transcript_node --> __end__;
	meeting_length_estimator_node --> participant_definer_node;
	meeting_purpose_node --> meeting_type_node;
	meeting_type_node --> topic_outliner_node;
	participant_definer_node --> generate_transcript_node;
	topic_outliner_node --> meeting_length_estimator_node;
	classDef default fill:#f2f0ff,line-height:1.2
	classDef first fill-opacity:0
	classDef last fill:#bfb6fc



# Run Workflow

In [35]:
import os
import json
from datetime import datetime

def export_transcript(state, folder_path):
    # Create the folder if it doesn't exist
    files = [f for f in os.listdir("../data/transcript-workflows") if os.path.isfile(os.path.join("../data/transcript-workflows", f))]
    id = len(files) + 1
    transcript = state["transcript"]
    
    filename = state["meeting_purpose"].split("\n")[0].split(":")[-1].replace(" ","") + str(id) + ".txt"
    
    os.makedirs(folder_path, exist_ok=True)
    
    
    # Construct the full file path
    file_path = os.path.join(folder_path, filename)
    
    # Write the string to a text file
    with open(file_path, 'w', encoding='utf-8') as f:
        f.write(transcript)

def export_state(state, folder_path):
    # Create the folder if it doesn't exist
    files = [f for f in os.listdir(folder_path) if os.path.isfile(os.path.join(folder_path, f))]
    id = len(files) + 1
    
    filename = f"state_log_{id}.json"
    
    os.makedirs(folder_path, exist_ok=True)
    
    # Construct the full file path
    file_path = os.path.join(folder_path, filename)
    
    # Write the state dict to a JSON file
    with open(file_path, 'w', encoding='utf-8') as f:
        json.dump(state, f, indent=4)

def update_meeting_history(state, file_path="../data/company-data/current-meeting-history.md"):
    # Extract relevant information from the state
    participants = state.get("meeting_participants", "")
    meeting_time = state.get("meeting_time", datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
    purpose = state.get("meeting_purpose", "")

    # Prepare the prompt for the OpenAI model
    prompt = f"""
    Create a concise summary of a meeting for a meeting history log. Use the following information:

    Name: 
    Participants: {participants}
    Time: {meeting_time}
    Purpose: {purpose}

    Format the summary as a single paragraph, starting with the date and time, followed by a brief description of the meeting and its outcomes. Keep it under 100 words.
    """

    messages = [
        SystemMessage(prompt),
        HumanMessage("You are a professional assistant tasked with creating concise meeting summaries.")
	]
    response = model.invoke(messages)
    
    summary = response.content.strip()

    # Read the existing content of the file
    with open(file_path, 'r', encoding='utf-8') as f:
        existing_content = f.read()

    # Append the new summary to the existing content
    updated_content = f"{existing_content}\n\n{summary}"

    # Write the updated content back to the file
    with open(file_path, 'w', encoding='utf-8') as f:
        f.write(updated_content)

    print(f"Meeting history updated: {file_path}")




In [34]:
export_transcript(state=state, folder_path="../data/transcript-workflows")
export_state(state=state, folder_path="../data/logs")
update_meeting_history(state=state, file_path="../data/company-data/current-meeting-history.md")

Meeting history updated: ../data/company-data/current-meeting-history.md


In [38]:
def run_meeting_step():
    state = initialize_data_generation_state(
    company_data=load_markdown_to_str(file_path="../data/company-data/company-data.md"),
    project_general=load_markdown_to_str(file_path= "../data/company-data/current-project-general.md"),
    project_requirements=load_markdown_to_str(file_path="../data/company-data/current-project-requirements.md"),
    project_sprint_state=load_markdown_to_str(file_path="../data/company-data/current-project-sprint-state.md"),
    project_backlog=load_markdown_to_str(file_path="../data/company-data/current-project-backlog.md"),
    employee_profiles =load_markdown_to_str(file_path="../data/company-data/employee-profiles.md"),
    meeting_history =load_markdown_to_str(file_path="../data/company-data/current-meeting-history.md")
	)
    all_states = []
    
    with SqliteSaver.from_conn_string(":memory:") as memory:
        thread = {"configurable": {"thread_id": "1"}}
        graph = create_data_generation_workflow(memory)
        for s in graph.stream(state, thread):
            print(s)
            all_states.append(s)

        last_state = {key: value for key, value in all_states[-1].items()}
        export_transcript(state=last_state["generate_transcript_node"], folder_path="../data/transcript-workflows")
        export_state(state=last_state["generate_transcript_node"], folder_path="../data/logs")
        update_meeting_history(state=last_state["generate_transcript_node"], file_path="../data/company-data/current-meeting-history.md")
        return last_state
    
        

In [43]:
run_meeting_step()

Meeting Name: Sprint 2 Planning Meeting

Date: July 4, 2023
Time: 10:00 AM - 1:00 PM

Purpose:
The purpose of this Sprint 2 Planning Meeting is to define the goals and backlog for the upcoming sprint, addressing key challenges identified in Sprint 1 and continuing progress on the HealthTrack Pro MVP. The team will focus on:

1. Reviewing the product backlog and prioritizing items for Sprint 2, with emphasis on completing unfinished tasks from Sprint 1 and progressing with Basic Activity Tracking features.

2. Discussing and implementing more conservative estimation techniques to avoid overcommitment, as identified in the previous sprint retrospective.

3. Addressing technical considerations, including:
   - Strategies for resolving CI/CD pipeline issues
   - Prototyping solutions for data model scalability concerns
   - Planning the integration of third-party APIs, considering potential documentation delays

4. Defining clear acceptance criteria for each user story to improve task comp

{'generate_transcript_node': {'company_data': '# TechNova Solutions\n\n## Company Overview\nTechNova Solutions is a small, dynamic IT company specializing in web application development. With a team of 6 skilled professionals, they focus on creating innovative, user-friendly web solutions for small to medium-sized businesses.\n\n## Current Project: HealthTrack Pro\nTechNova is developing HealthTrack Pro, a comprehensive web application for personal health management. This application allows users to track their daily activities, nutrition, and health metrics, and provides insights and recommendations for a healthier lifestyle.\n\n## Team Structure\n1. ** Sarah Chen - Project Manager / Scrum Master**\n   - Oversees project progress, manages timelines, and facilitates communication\n   - Has a background in both frontend and backend development\n\n2. ** Alex Rodriguez - Senior Full-Stack Developer**\n   - Leads technical decisions and architecture design\n   - Proficient in both frontend

In [90]:
state2 = initialize_data_generation_state(
    company_data=load_markdown_to_str(file_path="../data/company-data/company-data.md"),
    project_general=load_markdown_to_str(file_path= "../data/company-data/current-project-general.md"),
    project_requirements=load_markdown_to_str(file_path="../data/company-data/current-project-requirements.md"),
    project_sprint_state=load_markdown_to_str(file_path="../data/company-data/current-project-sprint-state.md"),
    project_backlog=load_markdown_to_str(file_path="../data/company-data/current-project-backlog.md"),
    employee_profiles =load_markdown_to_str(file_path="../data/company-data/employee-profiles.md"),
    meeting_history =load_markdown_to_str(file_path="../data/company-data/current-meeting-history.md")
)
# Run the workflow
all_states = []
with SqliteSaver.from_conn_string(":memory:") as memory:
    thread = {"configurable": {"thread_id": "1"}}
    graph = create_data_generation_workflow(memory)
    for s in graph.stream(state2, thread):
        all_states.append(s)

last_state = {key: value for key, value in all_states[-1].items()}
export_transcript(last_state["generate_transcript_node"], "../data/transcript_workflow")



Meeting Name: Sprint Review and Retrospective

Date: Tomorrow, 10:00 AM - 12:00 PM

Purpose:
The purpose of this meeting is to conduct a combined Sprint Review and Retrospective for Sprint 4. We will review the progress made on the nutrition logging feature and performance optimization efforts. The team will demonstrate completed work, discuss challenges encountered, and gather feedback for future improvements. We'll also reflect on our processes and identify areas for team enhancement. 

Key points to address:
1. Demo the current state of the nutrition logging feature
2. Review progress on performance optimization, including caching implementation
3. Discuss challenges with the Nutritionix API integration and proposed solutions
4. Evaluate the effectiveness of our current sprint planning and task allocation
5. Identify any roadblocks or bottlenecks in our development process
6. Plan action items for addressing technical debt, particularly database query optimization
7. Prepare for the

In [130]:
updates_dict = {key: value for key, value in all_states[-1].items()}
print(type(updates_dict))
print(updates_dict.keys())



<class 'dict'>
dict_keys(['generate_transcript_node'])


"Meeting Name: Sprint Review and Retrospective\n\nDate: Tomorrow, 10:00 AM - 12:00 PM\n\nPurpose:\nThe purpose of this meeting is to conduct a combined Sprint Review and Retrospective for Sprint 4. We will review the progress made on the nutrition logging feature and performance optimization efforts. The team will demonstrate completed work, discuss challenges encountered, and gather feedback for future improvements. We'll also reflect on our processes and identify areas for team enhancement. \n\nKey points to address:\n1. Demo the current state of the nutrition logging feature\n2. Review progress on performance optimization, including caching implementation\n3. Discuss challenges with the Nutritionix API integration and proposed solutions\n4. Evaluate the effectiveness of our current sprint planning and task allocation\n5. Identify any roadblocks or bottlenecks in our development process\n6. Plan action items for addressing technical debt, particularly database query optimization\n7. 

In [137]:
name =updates_dict["generate_transcript_node"]["meeting_purpose"].split("\n")[0].split(":")[-1].replace(" ","") + ".txt"
print(name)

files = [f for f in os.listdir("../data/transcript_workflow") if os.path.isfile(os.path.join("../data/transcript_workflow", f))]
print(files)

SprintReviewandRetrospective.txt
['Sprint Review and Retrospective.txt']
