In [1]:
from dotenv import load_dotenv
load_dotenv()

True

## Install libraries
```%pip install -q youtube-transcript-api langchain-community langchain-openai faiss-cpu tiktoken python-dotenv```

In [3]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate

## Step 1a - Indexing (Document Ingestion)

In [4]:
video_id = "La385-EDmWw" # only the ID, not full URL
try:
    # If you don’t care which language, this returns the “best” one
    transcript_list = YouTubeTranscriptApi.get_transcript(video_id, languages=["en"])
    # {text: , start: , duration: }, ...

    # Flatten it to plain text
    transcript = " ".join(chunk["text"] for chunk in transcript_list)
    print(transcript)

except TranscriptsDisabled:
    print("No captions available for this video.")

Software engineering is getting crushed by AI and Lead Code the platform that is used for software engineering interviews is slowly dying. Matter of fact, companies like GitLab, Buffer, and Zapier ditched Leap Code years ago. And recently, Snapchat announced that they've stopped Leap Code interviews entirely. So, these tech interviews aren't just changing, but the whole interview hiring process is going through a complete overhaul. So, in this video, let's talk about the death of lead code, the impact of AI, how software engineering interviews are going to look like, and what you need to do to prepare right now. And this is especially important if you're trying to break into tech or or even in tech because in some way, shape, or form, you will be affected by these changes. First, let's talk about the death of lead code. For those who don't know what lead code is, it's basically a website that has a lot of different brain puzzle coding questions, like using ones and zeros. What is the n

In [5]:
transcript_list

[{'text': 'Software engineering is getting crushed',
  'start': 0.08,
  'duration': 3.92},
 {'text': 'by AI and Lead Code the platform that is',
  'start': 1.839,
  'duration': 3.92},
 {'text': 'used for software engineering interviews',
  'start': 4.0,
  'duration': 3.6},
 {'text': 'is slowly dying. Matter of fact,',
  'start': 5.759,
  'duration': 3.681},
 {'text': 'companies like GitLab, Buffer, and',
  'start': 7.6,
  'duration': 4.64},
 {'text': 'Zapier ditched Leap Code years ago. And',
  'start': 9.44,
  'duration': 4.319},
 {'text': 'recently, Snapchat announced that',
  'start': 12.24,
  'duration': 3.279},
 {'text': "they've stopped Leap Code interviews",
  'start': 13.759,
  'duration': 3.52},
 {'text': 'entirely. So, these tech interviews',
  'start': 15.519,
  'duration': 3.52},
 {'text': "aren't just changing, but the whole",
  'start': 17.279,
  'duration': 3.521},
 {'text': 'interview hiring process is going',
  'start': 19.039,
  'duration': 3.761},
 {'text': 'through 

## Step 1b - Indexing (Text Splitting)

In [6]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.create_documents([transcript])

In [7]:
len(chunks)

17

In [8]:
chunks[10]

Document(metadata={}, page_content='and the candidate gets paid. So this is the true win-win scenario that lead code was striving to be. Plus startup interviews they actually recommend that you use AI or any tools that you would have access to on the job because if it helps you write better software once again that\'s a win-win. So if you\'re trying to prepare for this new style of interview, forget grinding leap code and focus on building real things. Start by using AI tools like GitHub C-Pilot, Cursor or Warp to accelerate your coding workflow and work smarter. Use platforms like Replet or Versel to quickly develop full stack applications. But whatever you do, make sure to build projects that solve actual problems, not just another coding project for the sake of doing a coding project. I am tired of to-do list apps. Oh, but what if I have no idea what coding project I can do? Try this. Go to a local mom and pop store and ask them, "Is there any annoying, time-consuming things that yo

## Step 1c & 1d - Indexing (Embedding Generation and Storing in Vector Store)

In [9]:
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vector_store = FAISS.from_documents(chunks, embeddings)

In [10]:
vector_store.index_to_docstore_id

{0: '953ee627-500a-48d4-a978-68836df13651',
 1: 'ad4d38e2-80e9-4f6a-bed5-5b40639d008c',
 2: 'c5a2ba9c-948f-4680-b363-b3a6f907ee9f',
 3: '7ddf9271-80b6-4055-870b-09e136b188c2',
 4: 'a675f4c5-f443-43ca-8090-db4285c984b2',
 5: '88b59ed8-61ad-44f9-90be-2143c2e2b724',
 6: '7f622005-1015-4ffe-9644-c9dacbcea493',
 7: '2b6be001-b23a-43d2-8912-04e6cf6af68e',
 8: 'd1a38023-416d-45f3-a92d-2198f77226b2',
 9: 'bc6a2a4a-8a26-448a-b909-0c8f1ded3422',
 10: '1880639b-248c-48f8-928d-c39daffdd08a',
 11: '67d67a17-4156-4661-bd8b-b0bd7324a34e',
 12: '1348569f-f231-4e2a-9cdb-b54a54fb2e81',
 13: '84d9875a-688a-41f7-9257-5188db75a71f',
 14: '1474ee21-d000-4918-a07c-0e35f49dbd80',
 15: '189e466c-28da-45c5-bb7c-bb7218d7edba',
 16: '67e392ef-d4b8-4847-b534-8ee7a5fd72d4'}

In [11]:
vector_store.get_by_ids(['189e466c-28da-45c5-bb7c-bb7218d7edba'])

[Document(id='189e466c-28da-45c5-bb7c-bb7218d7edba', metadata={}, page_content="latencies and throughput limitations. If you understood that joke, leave a comment down below. And if you're preparing for system design interviews, I recommend starting with groing the system design interview or designing data inensive applications. After that, try doing mock interviews on platforms like interviewing.io where you get that realworld Google Amazinesesque type interview experience. And once you're comfortable with that, try diving into real world system walkthroughs on YouTube channels like Bite My Go. And if you want a solid free resource to help you out with all of this, check the link down below in the description. I'll have something ready for you. Well, that's about all I have in this video. I really hope that you guys enjoyed it. And if you did, make sure to hit the like button, subscribe if you haven't already. If you're interested in my free tech newsletter, link will be down below in

## Step 2 - Retrieval

In [12]:
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 4})

In [13]:
retriever

VectorStoreRetriever(tags=['FAISS', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x00000235DE442EF0>, search_kwargs={'k': 4})

In [16]:
retriever.invoke('is dsa important now-a-days?')

[Document(id='1348569f-f231-4e2a-9cdb-b54a54fb2e81', metadata={}, page_content="A simple project could be building a Google calendar autosync tool, an event RSVP tracker, or even a custom GPT that helps draft emails and social posts. You can use Zapier, Open AI, and Google Appcript to create something pretty quickly in like a weekend. Again, it's not about the actual code that you write, but rather the problem-solving skills since that's exactly what hiring managers want to see. And maybe through your search of solving real world problems, you might accidentally create a startup. Just make sure it's not built to help people cheat. The second way interviews are changing is by using more system design. And this is an area where AI can't really do a lot of cheating for you because it involves a lot of human reflection and conversation. For those who don't know, system design is pretty much a type of interview that is traditionally given to senior level engineers. And the questions could g

## Step 3 - Augmentation

In [17]:
prompt = PromptTemplate(
    template="""
      You are a helpful assistant.
      Answer ONLY from the provided transcript context.
      If the context is insufficient, just say you don't know.

      {context}
      Question: {question}
    """,
    input_variables = ['context', 'question']
)

In [26]:
question          = "is the topic of dsa vs system design and project building discussed in this video? if yes then what was discussed"
retrieved_docs    = retriever.invoke(question)

In [27]:
retrieved_docs

[Document(id='84d9875a-688a-41f7-9257-5188db75a71f', metadata={}, page_content="design is pretty much a type of interview that is traditionally given to senior level engineers. And the questions could go something like, if you had to design the architecture of a company like Netflix or Spotify, how would you go about it? Someone might start by asking, would you use AWS as your cloud provider? And then dive into choices around storage, latency, availability zones, and caching strategies. Then they might follow up with a scenario like, what would you do if there's a sudden traffic spike? to which someone would respond by saying you would scale horizontally using load balancers, autoscaling groups, and possibly a content delivery network to distribute the load efficiently. And although sure, you could technically cheat in this, it's much more difficult because it's more of a conversation where you're talking about the design of things, not just here's a question and here's an answer. Plus

In [28]:
context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
context_text

"design is pretty much a type of interview that is traditionally given to senior level engineers. And the questions could go something like, if you had to design the architecture of a company like Netflix or Spotify, how would you go about it? Someone might start by asking, would you use AWS as your cloud provider? And then dive into choices around storage, latency, availability zones, and caching strategies. Then they might follow up with a scenario like, what would you do if there's a sudden traffic spike? to which someone would respond by saying you would scale horizontally using load balancers, autoscaling groups, and possibly a content delivery network to distribute the load efficiently. And although sure, you could technically cheat in this, it's much more difficult because it's more of a conversation where you're talking about the design of things, not just here's a question and here's an answer. Plus, there are a lot of follow-ups and intricacies. For each design choice in\n\nm

In [29]:
final_prompt = prompt.invoke({"context": context_text, "question": question})

In [30]:
final_prompt

StringPromptValue(text="\n      You are a helpful assistant.\n      Answer ONLY from the provided transcript context.\n      If the context is insufficient, just say you don't know.\n\n      design is pretty much a type of interview that is traditionally given to senior level engineers. And the questions could go something like, if you had to design the architecture of a company like Netflix or Spotify, how would you go about it? Someone might start by asking, would you use AWS as your cloud provider? And then dive into choices around storage, latency, availability zones, and caching strategies. Then they might follow up with a scenario like, what would you do if there's a sudden traffic spike? to which someone would respond by saying you would scale horizontally using load balancers, autoscaling groups, and possibly a content delivery network to distribute the load efficiently. And although sure, you could technically cheat in this, it's much more difficult because it's more of a conv

## Step 4 - Generation

In [31]:
client = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)

In [32]:
answer = client.invoke(final_prompt)
print(answer.content)

Yes, the topic of system design and project building is discussed in this video. 

For system design, it explains that it is a type of interview traditionally given to senior-level engineers, involving questions about designing architectures for companies like Netflix or Spotify. It emphasizes the conversational nature of these interviews, where candidates must defend their design choices and handle follow-up questions.

For project building, it suggests simple projects like a Google calendar autosync tool or an event RSVP tracker, highlighting the importance of problem-solving skills over the actual code written. It mentions using tools like Zapier, OpenAI, and Google Apps Script to create projects quickly, and notes that through solving real-world problems, one might even create a startup.


## Building a Chain

In [33]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser

In [34]:
def format_docs(retrieved_docs):
    context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
    return context_text

In [35]:
parallel_chain = RunnableParallel({
    'context': retriever | RunnableLambda(format_docs),
    'question': RunnablePassthrough()
})

In [36]:
parallel_chain.invoke('why dsa is not enough?')

{'context': "solving logical puzzles on leak code, that does not mean they are a good software engineer. Imagine there's someone who can throw, catch, and run the football at a professional NFL level, but because they can only bench 215 lbs for some reason, versus me, if I can bench 225 lbs, but I can't catch a football for the life of me, I should not be allowed to make it into the NFL. But unfortunately, with the way the system is designed, tech companies care far more about your weightlifting, i.e. the leak code, than your ability to create software applications and be a good software engineer. or at least they used to and thankfully things are taking a change. So although for the most part big tech companies still care a lot about lead code on a micro level a lot of startups don't even use it. Matter of fact startups because they are super tight on their budget they can't afford to make a wrong hire. They can't afford to do brain teaser interviews. They want to assess you based on 

In [37]:
parser = StrOutputParser()

In [38]:
main_chain = parallel_chain | prompt | client | parser

In [39]:
main_chain.invoke('Can you summarize the video')

"The video discusses a controversial software called Interview Coder, created by a Columbia student named Roy Lee, which uses AI to help users cheat in technical coding interviews. The software generates solutions for coding problems by analyzing screenshots and is designed to prevent detection by moving the user's gaze around the screen. Although the creator achieved success in interviews at major tech companies like Amazon and Meta, he faced severe repercussions, including being blacklisted and expelled from Columbia after revealing his methods in a YouTube video. The video also offers resources for preparing for system design interviews and encourages viewers to engage with the content."

## Improvements

1. **UI based enhancements**
2. **Evaluation**
    - a. Ragas  
    - b. LangSmith  
3. **Indexing**
    - a. Document Ingestion  
    - b. Text Splitting  
    - c. Vector Store  
4. **Retrieval**
    - a. *Pre-Retrieval*  
        - i. Query rewriting using LLM  
        - ii. Multi-query generation  
        - iii. Domain aware routing  
    - b. *During Retrieval*  
        - i. MMR  
        - ii. Hybrid Retrieval  
        - iii. Reranking  
    - c. *Post-Retrieval*  
        - i. Contextual Compression  
5. **Augmentation**
    - a. Prompt Templating  
    - b. Answer grounding  
    - c. Context window optimization  
6. **Generation**
    - a. Answer with Citation  
    - b. Guard railing  
7. **System Design**
    - a. Multimodal  
    - b. Agentic  
    - c. Memory based  
