Task 1: Articulate the problem and the user of your application
Hints:
Create a list of potential questions that your user is likely to ask!
What is the user’s job title, and what is the part of their job function that you’re trying to automate?
Can’t I do this with ChatGPT? 
Should I trust a machine instead of a real doctor? 
Deliverables
Write a succinct 1-sentence description of the problem
Write 1-2 paragraphs on why this is a problem for your specific user


ANSWER 1- Athletes recovering from injuries often receive static rehab plans that are hard to adapt to daily CrossFit training, leading to confusion, setbacks, or ineffective modifications.


ANSWER 2- Based on my experience, CrossFit athletes are highly engaged, performance-driven individuals who thrive on dynamic, constantly varied training. When they're injured, they’re often handed rigid rehab plans (typically in PDF form) that don’t account for the complexity of daily WODs, movement substitutions, or how their body feels on a given day. This situation forces them to either skip workouts entirely or attempt unsafe modifications without proper guidance, risking re-injury or slowed recovery.
This is especially frustrating because the athletes expect personalized, coach-like support. The inability to easily integrate their rehab plans into their ongoing training disrupts consistency and motivation.

Task 2: Propose a Solution
Now that you’ve defined a problem and a user, there are many possible solutions.
Choose one, and articulate it.
Task 2: Articulate your proposed solution
Hint:
Paint a picture of the “better world” that your user will live in. How will they save time, make money, or produce higher-quality output?
Recall the LLM Application stack we’ve discussed at length </aside>
✅ Deliverables
Write 1-2 paragraphs on your proposed solution. How will it look and feel to the user?
Describe the tools you plan to use in each part of your stack. Write one sentence on why you made each tooling choice.
LLM
Embedding Model
Orchestration
Vector Database
Monitoring
Evaluation
User Interface
(Optional) Serving & Inference
Where will you use an agent or agents? What will you use “agentic reasoning” for in your app?


ANSWER 1- This solution enables users to upload their clinic history or rehab plan PDF directly into the app, allowing their selected AI trainer to instantly understand their injury, limitations, and prescribed recovery movements. Using Retrieval-Augmented Generation (RAG), the AI reads and interprets the document, then dynamically tailors daily workout suggestions, scaling options, and movement substitutions based on both the rehab plan and the user’s current training goals. The experience is seamless, once the PDF is uploaded, users can simply ask questions like “What should I do today if I can’t squat?” or “Can I still train my upper body while rehabbing my knee?” and receive intelligent, personalized responses.

To the user, this feels like having a dedicated coach who not only remembers their injury but actively helps them train around it. Whether they’re following a gym’s programming or a custom track within the app, the AI trainer will continuously adjust recommendations to keep them progressing without compromising recovery. It bridges the gap between rehab and training, giving users confidence, clarity, and support throughout the healing process, without needing to interpret medical documents or guess what’s safe and what is not.


ANSWER 2:
LLM: gpt-4.1-mini
Embedding Model: text-embedding-3-small
Orchestration: LangChain + LangGraph
Vector Database: Qdrant
Monitoring: LangSmith
Evaluation: RAGAS
User Interface: Next.js


ANSWER 3- I have only one agent called PubMedCrossFitAgent, which leverages different data sources like PubMed scientific articles and local documents. It is applying ‘reasoning’ by deciding whether to search PubMed, local docs, or both based on the query. It also reasons about user profiles, current workouts and scientific data to provide personalized plans. 


Task 3: Dealing with the Data
You are an AI Systems Engineer. The AI Solutions Engineer has handed off the plan to you. Now you must identify some source data that you can use for your application.
Assume that you’ll be doing at least RAG (e.g., a PDF) with a general agentic search (e.g., a search API like Tavily or SERP).

Task 3: Collect data for (at least) RAG and choose (at least) one external API
Hint:
Ask other real people (ideally the people you’re building for!) what they think.
What are the specific questions that your user is likely to ask of your application? Write these down. </aside>
✅ Deliverables
Describe all of your data sources and external APIs, and describe what you’ll use them for.
Describe the default chunking strategy that you will use. Why did you make this decision?
[Optional] Will you need specific data for any other part of your application? If so, explain.


ANSWER 1- I’m using the following:
External APIs: 
PubMed: Scientific validation and research-backed recommendations
OpenAI: Intelligent synthesis and personalized responses
LangSmith: Production monitoring and optimization insights

Data sources:
Local PDFs: Practical CrossFit techniques and coaching guidance
Qdrant: Fast semantic search across local knowledge base


ANSWER 2- I’m using RecursiveCharacterTextSplitter with 1000 character chunks and 200 character overlap. It removes headers, footers and tables of content and the overlap ensures exercise instructions aren't cut mid-sentence.


Task 4: Building a Quick End-to-End Agentic RAG Prototype
Task 4: Build an end-to-end Agentic RAG application using a production-grade stack and your choice of commercial off-the-shelf model(s)


✅ Deliverables
Build an end-to-end prototype and deploy it to a local endpoint
Task 5: Creating a Golden Test Data Set
You are an AI Evaluation & Performance Engineer. The AI Systems Engineer who built the initial RAG system has asked for your help and expertise in creating a "Golden Data Set" for evaluation.
Task 5: Generate a synthetic test data set to baseline an initial evaluation with RAGAS
✅ Deliverables
Assess your pipeline using the RAGAS framework including key metrics faithfulness, response relevance, context precision, and context recall. Provide a table of your output results.
What conclusions can you draw about the performance and effectiveness of your pipeline with this information?


ANSWER 1- 

(Insert here the table after evaluation)


ANSWER 2- We can conclude that the system is retrieving high-quality, relevant content that enables accurate and faithful responses.


Task 6: The Benefits of Advanced Retrieval
You are an AI Systems Engineer. The AI Evaluation and Performance Engineer has asked for your help in making stepwise improvements to the application. They heard that “as goes retrieval, so goes generation” and have asked for your expertise.
Task 6: Install an advanced retriever of your choosing in our Agentic RAG application.
✅ Deliverables
Describe the retrieval techniques that you plan to try and to assess in your application. Write one sentence on why you believe each technique will be useful for your use case.
Test a host of advanced retrieval techniques on your application.


ANSWER 1 and 2- Ensemble retrieval (semantic + BM25) based on RAGAS evaluation showing 95.8% faithfulness and 94.4% context precision being the highest of them all. 


Task 7: Assessing Performance
You are the AI Evaluation & Performance Engineer. It's time to assess all options for this product.
Task 7: Assess the performance of the naive agentic RAG application versus the applications with advanced retrieval tooling
✅ Deliverables
How does the performance compare to your original RAG application? Test the fine-tuned embedding model using the RAGAS frameworks to quantify any improvements. Provide results in a table.
Articulate the changes that you expect to make to your app in the second half of the course. How will you improve your application?


ANSWER 1:

Before:
(Insert here the table before evaluation)


After:
(Insert here the table after evaluation)


ANSWER 3: Now, although we improved the faithfulness a lot, the answer relevancy has some room for improvement. I want to boost the Answer Relevancy from 85% to, at least, 90%+. Fine-tune prompt engineering to better align responses with user intent and Leverage LangSmith traces to identify slow queries and optimize retrieval speed.
Your Final Submission