### **Step 1: Example Conversation in Cantonese (Discussing Living Issues)**
```
UserA: 最近租金又加咗，你嗰邊情況點呀？
UserB: 真係頂唔順，我而家諗緊要唔要搬返去屋企同父母住。
UserA: 我都明白，你覺得返去屋企會唔會舒服啲？
UserB: 未必啦，雖然慳到租金，但係始終冇自己嘅私人空間。
UserA: 係呀，呢個係一個問題。我諗住同室友傾下，睇下有冇可能一齊搬去平啲嘅地方。
UserC: 係喇，咁樣可能會好啲。你哋有冇諗住去邊區？
```

### **Step 2: Telegram Export Example in JSON Format**
If exported from Telegram, the `.json` file may look like this:
```json
[
  {
    "id": 1,
    "date": "2024-09-28T10:00:00",
    "from": "UserA",
    "text": "最近租金又加咗，你嗰邊情況點呀？"
  },
  {
    "id": 2,
    "date": "2024-09-28T10:05:00",
    "from": "UserB",
    "text": "真係頂唔順，我而家諗緊要唔要搬返去屋企同父母住。"
  },
  {
    "id": 3,
    "date": "2024-09-28T10:10:00",
    "from": "UserA",
    "text": "我都明白，你覺得返去屋企會唔會舒服啲？"
  },
  {
    "id": 4,
    "date": "2024-09-28T10:15:00",
    "from": "UserB",
    "text": "未必啦，雖然慳到租金，但係始終冇自己嘅私人空間。"
  },
  {
    "id": 5,
    "date": "2024-09-28T10:20:00",
    "from": "UserA",
    "text": "係呀，呢個係一個問題。我諗住同室友傾下，睇下有冇可能一齊搬去平啲嘅地方。"
  },
  {
    "id": 6,
    "date": "2024-09-28T10:25:00",
    "from": "UserC",
    "text": "係喇，咁樣可能會好啲。你哋有冇諗住去邊區？"
  }
]
```

### **Step 3: Input into the GPT-4 API**

In [None]:
import openai
import json

# Initialize OpenAI API key
openai.api_key = "YOUR_API_KEY"

# Load the conversation JSON file
with open('telegram_conversation.json', 'r', encoding='utf-8') as file:
    conversation_data = json.load(file)

# Prepare the conversation text for input to the API
conversation_text = ""
for message in conversation_data:
    conversation_text += f"{message['from']}: {message['text']}\n"

# Use the OpenAI API to analyze and label the conversation
def analyze_conversation(conversation_text):
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are an assistant that analyzes and labels conversations about living issues."},
            {"role": "user", "content": conversation_text}
        ]
    )
    return response['choices'][0]['message']['content']

# Get analysis result
analysis_result = analyze_conversation(conversation_text)
print(analysis_result)

# Save the result to a text file for further analysis
with open('analysis_result.txt', 'w', encoding='utf-8') as output_file:
    output_file.write(analysis_result)

### **Step 4a: Example Output from GPT-4 Analysis**


**Labels:**
- Main Topic: Housing and Rent
- Subtopics:
  - Rising Rent Costs
  - Living with Parents
  - Privacy Concerns
  - Roommate Arrangements
  - Financial Planning

**Sentiment Analysis:**
- User_123: Concerned but proactive
- User_456: Frustrated and uncertain
- User_789: Neutral but interested

**Engagement Analysis:**
- Unique Users Discussing Main Topic: 50
- Total Mentions of Main Topic: 150
- Repeat Count (Total Mentions/Unique Users): 3.0
- Time Span in a month: 2 weeks
- Total Word Count by User_123: 150 words
- Total Word Count by User_456: 120 words
- Total Word Count by User_789: 85 words

### **Step 4b: Example JSON Ouput from GPT-4**

In [None]:
{
    "main_topic": "Housing and Rent",
    "subtopics": [
        "Rising Rent Costs",
        "Living with Parents",
        "Privacy Concerns",
        "Roommate Arrangements",
        "Financial Planning"
    ],
    "sentiment_analysis": [
        {
            "user_id": "UserA",
            "sentiment": "Concerned but proactive",
            "total_word_count": 45
        },
        {
            "user_id": "UserB",
            "sentiment": "Frustrated and uncertain",
            "total_word_count": 40
        }
    ],
    "engagement_analysis": {
        "unique_users_discussing": 2,
        "total_mentions_of_topic": 6,
        "repeat_count": 3.0,
        "time_span": "25 minutes"
    }
}

### **Step 5: Extract data from GPT JSON output**

In [None]:
main_topic = re.search(r"Main Topic:\s*(.*)", gpt_output).group(1)
subtopics = re.findall(r"- (.*)", gpt_output.split("Subtopics:")[1].split("**")[0].strip())

sentiment_section = gpt_output.split("**Sentiment Analysis:**")[1].split("**Engagement Analysis:**")[0]
user_sentiments = re.findall(r"- (User_\d+): (.*)", sentiment_section.strip())

engagement_section = gpt_output.split("**Engagement Analysis:**")[1]
unique_users = int(re.search(r"Unique Users Discussing Main Topic:\s*(\d+)", engagement_section).group(1))
total_mentions = int(re.search(r"Total Mentions of Main Topic:\s*(\d+)", engagement_section).group(1))
repeat_count = float(re.search(r"Repeat Count.*:\s*([\d.]+)", engagement_section).group(1))
time_span = re.search(r"Time Span:\s*(.*)", engagement_section).group(1)

user_word_counts = re.findall(r"Total Word Count by (User_\d+):\s*(\d+) words", engagement_section)

# Create a DataFrame to capture the data
user_data = []
for user_id, sentiment in user_sentiments:
    word_count = next((int(count) for uid, count in user_word_counts if uid == user_id), None)
    user_data.append({
        'User_ID': user_id,
        'Sentiment': sentiment,
        'Word_Count': word_count
    })

user_df = pd.DataFrame(user_data)

# Displaying the main analysis DataFrame
main_analysis_df = pd.DataFrame({
    'Main Topic': [main_topic],
    'Subtopics': [', '.join(subtopics)],
    'Unique Users Discussing': [unique_users],
    'Total Mentions of Topic': [total_mentions],
    'Repeat Count': [repeat_count],
    'Time Span': [time_span],
})

import ace_tools as tools; tools.display_dataframe_to_user(name="Main Analysis", dataframe=main_analysis_df)
tools.display_dataframe_to_user(name="User Sentiments and Word Counts", dataframe=user_df)

# Showing counts of unique user engagement and word averages
print("\nTotal Unique Users Discussing the Topic:", unique_users)
print("\nRepeat Count (Mentions/User):", repeat_count)
print("\nUser Engagement Summary:")
print(user_df.describe())