# Automated Meeting Analysis System Proof-of-Concept

This notebook provides a complete proof-of-concept for an Automated Meeting Analysis System. It includes the following functionalities:

- **Audio Transcription:** Converts a meeting audio file into text using speech recognition.
- **(Optional) Speaker Diarization:** Placeholder for speaker separation (can be integrated with tools like `pyannote.audio`).
- **Transcript Summarization:** Generates a concise summary using transformer models.
- **Keyword Extraction:** Extracts the top keywords from the meeting transcript.
- **Sentiment Analysis:** Analyzes the overall sentiment of the conversation.

You can extend or customize each section for your project needs.

## 1. Setup and Import Libraries

This section imports the required libraries. Make sure to install these dependencies using pip:

```bash
pip install SpeechRecognition transformers nltk
# Optionally, for speaker diarization:
# pip install pyannote.audio
```

In [None]:
import os
import re
import speech_recognition as sr
from transformers import pipeline
import nltk
from nltk.corpus import stopwords
from collections import Counter

# Download required NLTK resources
nltk.download('stopwords')

def clean_text(text):
    """
    Clean and normalize text for keyword extraction.
    """
    text = text.lower()
    text = re.sub(r'[^\w\s]', '', text)
    return text

print('Setup complete: Libraries imported and resources downloaded.')

## 2. Audio Transcription

This cell transcribes the meeting audio. Place your meeting audio file (in WAV format) in the same directory as the notebook and name it `meeting_audio.wav`. If the audio file is in another format or has a different name, adjust the `audio_file` variable accordingly.

In [None]:
# Initialize the speech recognizer
r = sr.Recognizer()

# Set the audio file path
audio_file = 'meeting_audio.wav'

transcript = ""

if os.path.exists(audio_file):
    with sr.AudioFile(audio_file) as source:
        print('Reading audio file...')
        audio = r.record(source)
    try:
        transcript = r.recognize_google(audio)
        print('Transcription complete:')
        print(transcript)
    except sr.UnknownValueError:
        print("Error: Speech Recognition could not understand the audio.")
    except sr.RequestError as e:
        print(f"Error: Could not request results; {e}")
else:
    print(f"Error: Audio file '{audio_file}' not found. Please add your audio file.")

## 3. (Optional) Speaker Diarization

For a complete meeting analysis, separating speakers is beneficial. This section is a placeholder where you can integrate a speaker diarization tool such as [pyannote.audio](https://github.com/pyannote/pyannote-audio). 

For demonstration purposes, we assume the transcript is from a single speaker. If you integrate diarization, you could split the transcript by speaker and analyze each segment separately.

## 4. Transcript Summarization

Using Hugging Face's Transformers summarization pipeline, we create a brief summary of the meeting transcript.

In [None]:
if transcript:
    summarizer = pipeline("summarization")
    # Adjust max_length and min_length parameters as needed
    summary = summarizer(transcript, max_length=130, min_length=30, do_sample=False)
    print("\nMeeting Summary:")
    print(summary[0]['summary_text'])
else:
    print("Skipping summarization: No transcript available.")

## 5. Keyword Extraction

This cell extracts keywords by cleaning the transcript, removing stopwords, and calculating the frequency of remaining words. The top 5 words are displayed as keywords.

In [None]:
if transcript:
    text_clean = clean_text(transcript)
    words = text_clean.split()

    # Remove stopwords
    filtered_words = [word for word in words if word not in stopwords.words('english')]

    # Count word frequencies
    word_freq = Counter(filtered_words)
    keywords = word_freq.most_common(5)
    
    print("\nTop Keywords:")
    for word, freq in keywords:
        print(f"{word}: {freq}")
else:
    print("Skipping keyword extraction: No transcript available.")

## 6. Sentiment Analysis

Using another Hugging Face pipeline, we analyze the overall sentiment of the meeting conversation. This can provide insights into the mood or tone of the discussion.

In [None]:
if transcript:
    sentiment_analyzer = pipeline('sentiment-analysis')
    sentiment = sentiment_analyzer(transcript)
    print("\nSentiment Analysis:")
    print(sentiment)
else:
    print("Skipping sentiment analysis: No transcript available.")

## 7. Final Thoughts

This proof-of-concept demonstrates how to automatically process and analyze a meeting recording. You can further enhance the project by:

- Integrating speaker diarization to separate speakers and perform per-speaker analysis.
- Adding more advanced NLP analytics (such as topic modeling or action item extraction).
- Creating a dashboard to visualize insights such as sentiment over time.

Feel free to extend this notebook to meet the full requirements of your project.