In [1]:
pip install deepmultilingualpunctuation nltk


Collecting deepmultilingualpunctuation
  Downloading deepmultilingualpunctuation-1.0.1-py3-none-any.whl.metadata (4.0 kB)
Downloading deepmultilingualpunctuation-1.0.1-py3-none-any.whl (5.4 kB)
Installing collected packages: deepmultilingualpunctuation
Successfully installed deepmultilingualpunctuation-1.0.1


In [4]:
from deepmultilingualpunctuation import PunctuationModel
import nltk
nltk.download('punkt')
nltk.download('punkt_tab')

import re

# 1️⃣ Load model
model = PunctuationModel()

# 2️⃣ Input raw interview text
#raw_text = """thanks for coming in today uh why dont we start with a quick intro can you tell me a bit about your background yeah sure um so ive been working as a backend engineer for a little over three years now mostly python and django but in my first job i also had some exposure to go working on small microservices before that i did my computer science degree i guess what i really enjoy is solving scaling problems like making things run faster when theres a ton of data okay thats good so lets warm up whats the difference between a process and a thread right so a process has its own memory space its like a totally separate program running a thread is lighter it lives inside a process and shares memory with other threads processes are more isolated but heavier and threads are faster to spin up but then you have to deal with synchronization issues mmhmm now imagine uh youre running a web service and suddenly people are complaining its super slow what do you check first first thing id look at the basics cpu memory usage disk io if those look fine id move on to the database maybe queries are slow or missing an index then id check traffic maybe theres a sudden spike oh and sometimes its network latency between services logging and metrics like in datadog or new relic help me narrow it down okay how would you lets say detect a cycle in a linked list uh id probably go with floyds cycle detection algorithm you know the slow and fast pointer approach one pointer moves step by step the other skips ahead if they ever meet then yeah that means theres a cycle got it now tell me how does a hash map work internally so keys get passed into a hash function which gives you an index in an array if two keys map to the same index thats a collision usually you fix that with chaining like a linked list at that slot or open addressing okay let me shift gears have you worked with cloud services yeah mostly aws ec2 for compute s3 for storage rds for relational databases and lambda for serverless stuff sometimes i used cloudwatch for monitoring can you describe a production issue you had to fix yeah sure so one time a dependency update broke our background worker suddenly tasks stopped processing i uh rolled back the change quickly then added alerts so wed know if that happened again later i patched the code to handle the new dependency properly that was stressful"""
raw_text = """hey nice to meet you today can you start by giving a short introduction about yourself yeah absolutely so im a frontend engineer with about two years of experience mainly working with react and typescript ive also worked a bit on performance optimization and design systems before that i studied information technology and i love working on user experience things that make products feel smooth and polished cool lets get into it can you explain the difference between rest and graphql sure so rest uses multiple endpoints each representing a resource graphql on the other hand has a single endpoint and the client can request exactly the data it needs so fewer round trips and more flexibility but it also comes with complexity in caching right good answer now suppose your application is loading really slow on the client side what are the things you would investigate first id open devtools and check the network tab see if there are large bundle sizes too many requests or blocking scripts then id look at render performance maybe excessive re renders or heavy components also lazy loading and code splitting can help reduce initial load time okay lets switch to data structures how do you implement a queue using two stacks oh yeah so you basically use one stack for enqueue operations push everything there and when dequeuing if the second stack is empty move all elements from the first stack to the second so the oldest element comes on top then pop from the second stack nice what about event loop in javascript how does it work so javascript is single threaded but it relies on an event loop to handle asynchronous operations tasks go to the call stack and async operations go to the callback queue or microtask queue the event loop checks if the call stack is free and pushes tasks back in allowing async execution okay makes sense what cloud services have you worked with mainly firebase for auth and firestore database also tried deploying to vercel and netlify for static sites not as much aws yet but im learning it cool can you tell me about a production bug you solved once yeah so we had this issue where a certain component would freeze the entire page turns out there was a huge json parsing happening on the main thread i moved the computation to a web worker so the ui stayed responsive and that fixed the problem"""

# 3️⃣ Restore punctuation
text_with_punct = model.restore_punctuation(raw_text)
print("✅ Punctuation restored!\n")
print(text_with_punct[:500], "...\n")

# 4️⃣ Sentence tokenization
sentences = nltk.sent_tokenize(text_with_punct)

# 5️⃣ Detect question sentences
question_starters = [
    "what", "how", "why", "can you", "tell me", "do you", "have you",
    "describe", "okay how", "okay whats", "so what", "now imagine", "alright", "let me"
]

def is_question(sentence):
    s = sentence.lower().strip()
    return any(s.startswith(q) for q in question_starters) or s.endswith('?')

# 6️⃣ Group into Q&A pairs
qa_pairs = []
current_q = None
current_a = []

for sent in sentences:
    if is_question(sent):
        if current_q and current_a:
            qa_pairs.append((current_q, " ".join(current_a)))
            current_a = []
        current_q = sent
    else:
        if current_q:
            current_a.append(sent)

if current_q and current_a:
    qa_pairs.append((current_q, " ".join(current_a)))

# 7️⃣ Display results
for i, (q, a) in enumerate(qa_pairs, 1):
    print(f"\nQ{i}: {q}\nA{i}: {a}\n" + "-"*60)


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
Device set to use cpu


✅ Punctuation restored!

hey, nice to meet you today. can you start by giving a short introduction about yourself? yeah, absolutely so. im a frontend engineer with about two years of experience, mainly working with react and typescript. ive also worked a bit on performance optimization and design systems. before that, i studied information technology and i love working on user experience- things that make products feel smooth and polished. cool, lets get into it. can you explain the difference between rest and graphql?  ...


Q1: can you start by giving a short introduction about yourself?
A1: yeah, absolutely so. im a frontend engineer with about two years of experience, mainly working with react and typescript. ive also worked a bit on performance optimization and design systems. before that, i studied information technology and i love working on user experience- things that make products feel smooth and polished. cool, lets get into it.
----------------------------------------------