In this Jupyter Notebook, I created functions that read different types of text files i.e. txt, docx, and pdf. Since I can not create a file upload mechanism on the web app directly without figuring out how to interact with the OpenAI api, I tested how to read text from multiple transcript files of different file types, turned them into a prompt, and communciated with the OpenAI api to do thematic analysis for the given text. 

With the notebook, it was easier to test individual functions before copying all the code into OpenAI.py file.

The 'Transcripts' folder contains four subfolders for Txt, Pdf, Docx, and a mixture of three file types respectively. 

In [1]:
# import needed modules and functions
import os
from openai import OpenAI # pip install openai on terminal
from keys import open_ai_api_key # import OpenAI Key from keys.py
import docx # pip install python-docx on terminal
import PyPDF2  # pip install PyPDF2 on terminal

<h3>Read_txt Function</h3>

In [2]:
# Function to read .txt files
def read_txt(file_path):
    with open(file_path, 'r', encoding='utf-8') as file:
        return file.read()

In [3]:
# Check if read_txt function works
print(read_txt("/Users/shristishrestha/Documents/Grad School/Fall24/HCI584/OpenAI_Test/Transcripts/Txt/P1_Interview_Transcript.txt"))

Transcript P1_Interview
March 11, 2024, 7:05PM

Shristi  0:07
Ok.
OK, so we have started the recording and transcription.
I just move ahead and start with a few warm-up questions.
So can you tell us a little bit about yourself?

P1  0:21
Sure. Umm.
So my name is [--].
I'm currently a second-year master's student in the HCI program.
This is my second to last semester.
Umm before that I'm.
I'm from [a US state], but originally I was born in [an East Asian Country] and I went to [a US university] for mechanical engineering and I did that for I did Engineering for like 2 years.
Decided that it wasn't for me, so I wanted to pivot to something a little more creative and so here I am in the HCI program and my goal eventually is to be a UX designer.

Shristi   1:02
Umm, thank you.
So which social media platforms are using currently?

P1  1:10
Umm, right now I think my my biggest ones are probably like Reddit and Tiktok.

Shristi 1:22
And Tiktok.
OK umm so.
Umm.
So when did you for Snapchat, si

<h3>Read_docx Function</h3>

In [4]:
# Function to read .docx files
def read_docx(file_path):
    doc = docx.Document(file_path)
    full_text = []
    for para in doc.paragraphs:
        full_text.append(para.text)
    return '\n'.join(full_text)

In [5]:
# Check if read_txt function works
print(read_docx("/Users/shristishrestha/Documents/Grad School/Fall24/HCI584/OpenAI_Test/Transcripts/DocX/P1_Interview_Transcript.docx"))

Transcript P1_Interview
March 11, 2024, 7:05PM

Shristi  0:07
Ok.
OK, so we have started the recording and transcription.
I just move ahead and start with a few warm-up questions.
So can you tell us a little bit about yourself?

P1  0:21
Sure. Umm.
So my name is [--].
I'm currently a second-year master's student in the HCI program.
This is my second to last semester.
Umm before that I'm.
I'm from [a US state], but originally I was born in [an East Asian Country] and I went to [a US university] for mechanical engineering and I did that for I did Engineering for like 2 years.
Decided that it wasn't for me, so I wanted to pivot to something a little more creative and so here I am in the HCI program and my goal eventually is to be a UX designer.

Shristi   1:02
Umm, thank you.
So which social media platforms are using currently?

P1  1:10
Umm, right now I think my my biggest ones are probably like Reddit and Tiktok.

Shristi 1:22
And Tiktok.
OK umm so.
Umm.
So when did you for Snapchat, si

<h3>Read_pdf Function</h3>

In [6]:
import PyPDF2

# Function to read .pdf files using PyPDF2 and post-process the text
def read_pdf(file_path):
    pdf_text = ""
    with open(file_path, 'rb') as file:
        reader = PyPDF2.PdfReader(file)
        for page_num in range(len(reader.pages)):
            page = reader.pages[page_num]
            pdf_text += page.extract_text()

    # Post-processing to remove unnecessary newlines
    lines = pdf_text.split('\n')
    processed_text = ' '.join([line.strip() for line in lines if line.strip()])
    
    return processed_text

In [15]:
# Check if read_txt function works
print(read_pdf("/Users/shristishrestha/Documents/Grad School/Fall24/HCI584/OpenAI_Test/Transcripts/PDF/P1_Interview_Transcript.pdf"))

Transcript P1_Interview March 11, 2024, 7:05PM Shristi 0:07 Ok. OK, so we have started the recording and transcription. I just move ahead and start with a few warm-up questions. So can you tell us a little bit about yourself? P1 0:21 Sure. Umm. So my name is [--]. I'm currently a second-year master's student in the HCI program. This is my second to last semester. Umm before that I'm. I'm from [a US state], but originally I was born in [an East Asian Country] and I went to [a US university] for mechanical engineering and I did that for I did Engineering for like 2 years. Decided that it wasn't for me, so I wanted to pivot to something a little more creative and so here I am in the HCI program and my goal eventually is to be a UX designer. Shristi 1:02 Umm, thank you. So which social media platforms are using currently? P1 1:10 Umm, right now I think my my biggest ones are probably like Reddit and Tiktok. Shristi 1:22 And Tiktok. OK umm so. Umm. So when did you for Snapchat, since we've 

<h3>Load Transcripts Function</h3>

In [18]:
# Function to load multiple transcript files (supporting txt, docx, pdf)
def load_transcripts(folder_path):
    transcripts = []
    for filename in os.listdir(folder_path):
        file_path = os.path.join(folder_path, filename)
        if filename.endswith(".txt"):
            transcripts.append(read_txt(file_path))
        elif filename.endswith(".docx"):
            transcripts.append(read_docx(file_path))
        elif filename.endswith(".pdf"):
            transcripts.append(read_pdf(file_path))  
        else:
            print(f"Unsupported file format: {filename}")
    return transcripts

In [23]:
# Test if the function works with Txt folder
text = load_transcripts("/Users/shristishrestha/Documents/Grad School/Fall24/HCI584/OpenAI_Test/Transcripts/Txt")
print(text)
print(len(text))

['\ufeffTranscript P2_Interview\nMarch  18, 2024, 12:10AM\n\n\nShristi started transcription\n\nKadir   0:15\nOK, perfect. So.\nCan you tell us a little bit about yourself, if possible?\n\nP2   0:25\nLittle about myself?\nKadir   0:27\nYes, yes.\n\nP2   0:27\nOK, so I\'m a graduate student here, here at the [a US university], and now graduating this year.\nSo I\'m doing my civil engineering masters in civil engineering with an emphasis on Geotechnical engineering. I think I have been using Snapchat for the past three years. If you want to know.\n\nKadir   0:50\nWow. Past three years, it\'s been quite a few kind of like experience probably it seems alright So then could you also tell me what social media platforms do you use other than, let\'s say, in this case, Snapchat?\n\nP2   1:10\nUh, so Instagram. Facebook, mostly on Instagram and mostly Instagram. Other than that, Facebook, sometimes with Twitter, and that\'s all. If LinkedIn counts as social media, I did that while I was searchi

In [24]:
# Test if the function works with DocX folder
text = load_transcripts("/Users/shristishrestha/Documents/Grad School/Fall24/HCI584/OpenAI_Test/Transcripts/DocX")
print(text)
print(len(text))

["Transcript P2_Interview\nMarch  18, 2024, 12:10AM\n\n\nShristi started transcription\n\nKadir   0:15\nOK, perfect. So.\nCan you tell us a little bit about yourself, if possible?\n\nP2   0:25\nLittle about myself?\nKadir   0:27\nYes, yes.\n\nP2   0:27\nOK, so I'm a graduate student here, here at the [a US university], and now graduating this year.\nSo I'm doing my civil engineering masters in civil engineering with an emphasis on Geotechnical engineering. I think I have been using Snapchat for the past three years. If you want to know.\n\nKadir   0:50\nWow. Past three years, it's been quite a few kind of like experience probably it seems alright So then could you also tell me what social media platforms do you use other than, let's say, in this case, Snapchat?\n\nP2   1:10\nUh, so Instagram. Facebook, mostly on Instagram and mostly Instagram. Other than that, Facebook, sometimes with Twitter, and that's all. If LinkedIn counts as social media, I did that while I was searching for jobs

In [25]:
# Test if the function works with PDF folder
text = load_transcripts("/Users/shristishrestha/Documents/Grad School/Fall24/HCI584/OpenAI_Test/Transcripts/PDF")
print(text)
print(len(text))

["Transcript P2_Interview March 18, 2024, 12:10AM Shristi started transcription Kadir 0:15 OK, perfect. So. Can you tell us a little bit about yourself, if possible? P2 0:25 Little about myself? Kadir 0:27 Yes, yes. P2 0:27 OK, so I'm a graduate student here, here at the [a US university], and now graduating this year. So I'm doing my civil engineering masters in civil engineering with an emphasis on Geotechnical engineering. I think I have been using Snapchat for the past three years. If you want to know. Kadir 0:50 Wow. Past three years, it's been quite a few kind of like experience probably it seems alright So then could you also tell me what social media platforms do you use other than, let's say, in this case, Snapchat? P2 1:10 Uh, so Instagram. Facebook, mostly on Instagram and mostly Instagram. Other than that, Facebook, sometimes with Twitter, and that's all. If LinkedIn counts as social media, I did that while I was searching for jobs, but not anymore. Kadir 1:28 So we get an 

In [26]:
# Test if the function works with Mixed folder containing txt, docx, and pdf files
text = load_transcripts("/Users/shristishrestha/Documents/Grad School/Fall24/HCI584/OpenAI_Test/Transcripts/Mixed")
print(text)
print(len(text))

['\ufeffTranscript P2_Interview\nMarch  18, 2024, 12:10AM\n\n\nShristi started transcription\n\nKadir   0:15\nOK, perfect. So.\nCan you tell us a little bit about yourself, if possible?\n\nP2   0:25\nLittle about myself?\nKadir   0:27\nYes, yes.\n\nP2   0:27\nOK, so I\'m a graduate student here, here at the [a US university], and now graduating this year.\nSo I\'m doing my civil engineering masters in civil engineering with an emphasis on Geotechnical engineering. I think I have been using Snapchat for the past three years. If you want to know.\n\nKadir   0:50\nWow. Past three years, it\'s been quite a few kind of like experience probably it seems alright So then could you also tell me what social media platforms do you use other than, let\'s say, in this case, Snapchat?\n\nP2   1:10\nUh, so Instagram. Facebook, mostly on Instagram and mostly Instagram. Other than that, Facebook, sometimes with Twitter, and that\'s all. If LinkedIn counts as social media, I did that while I was searchi

<h3> Generate Prompt </h3>

In [28]:
# Define function to generate a prompt for thematic analysis
def generate_prompt(transcripts):
    prompt = "Here are the transcripts:\n\n"
    
    for i, transcript in enumerate(transcripts):
        prompt += f"Transcript {i+1}:\n{transcript}\n\n"
    
    prompt += "Please provide the major themes along with their descriptions."
    
    return prompt

In [30]:
# Generate a prompt for thematic analysis
prompt = generate_prompt(text)
print(prompt)

You are an expert qualitative data analyst. Your task is to analyze the following interview transcripts and identify the major themes along with detailed descriptions. Here are the transcripts:

Transcript 1:
﻿Transcript P2_Interview
March  18, 2024, 12:10AM


Shristi started transcription

Kadir   0:15
OK, perfect. So.
Can you tell us a little bit about yourself, if possible?

P2   0:25
Little about myself?
Kadir   0:27
Yes, yes.

P2   0:27
OK, so I'm a graduate student here, here at the [a US university], and now graduating this year.
So I'm doing my civil engineering masters in civil engineering with an emphasis on Geotechnical engineering. I think I have been using Snapchat for the past three years. If you want to know.

Kadir   0:50
Wow. Past three years, it's been quite a few kind of like experience probably it seems alright So then could you also tell me what social media platforms do you use other than, let's say, in this case, Snapchat?

P2   1:10
Uh, so Instagram. Facebook, m

In [33]:
# Estimate the number of tokens
num_characters = (len(prompt))
num_tokens = num_characters/4 # Rule of thumb: 1 token  ~4 characters of common English text
print("Number of Tokens:", num_tokens)

Number of Tokens: 23473.75


In [37]:
# Instantiate a Python API client that configures the environment for communicating with OpenAI api
client = OpenAI(
    api_key=open_ai_api_key
)

In [49]:
# Define a function that sends prompt to OpenAI API for analysis using "gpt-4o-mini"
def analyze_transcripts_with_openai(prompt):
    response = client.chat.completions.create(
        model="gpt-4o-mini",  # Using gpt-4o-mini model for thematic analysis
        messages=[
            # This first message is prob not needed but can be useful in some cases, maybe you want the AI to act like a HCI researcher.
            {"role": "system", "content": "You are an expert qualitative data analyst. Your task is to analyze the following interview transcripts and identify the major themes along with detailed descriptions."},   
        # this is the actual question
        {
            "role": "user",
            "content": prompt,
        }
    ],
        max_tokens=16384,  # This is the max limit for token limits
        temperature=0.4,  # Controls creativity level
    )
    return response.choices[0].message.content

In [50]:
result = analyze_transcripts_with_openai(prompt)

In [51]:
print(result)

Based on the transcripts provided, several major themes can be identified regarding users' experiences with Snapchat's MyAI feature. Below are the major themes along with detailed descriptions:

### 1. **User Engagement and Interaction Style**
   - **Description**: Users often approach MyAI with a conversational tone, treating it more like a friend or casual acquaintance rather than a formal AI tool. This is evident in how they greet the AI and how they express their prompts casually. For instance, P1 mentioned that they felt compelled to greet MyAI and engage in a friendly manner, which contrasts with their interactions with other AI models like ChatGPT, which they perceive as more formal and machine-like.
   - **Example**: P1 stated, "I felt like I could be more casual with, like talking to it. Like I didn't have to like form very formal prompts."

### 2. **Expectations vs. Reality**
   - **Description**: Users often have low expectations for MyAI, which influences their satisfaction