### ❓ QUESTION #1:

Why do we want to support streaming? What about streaming is important, or useful?

Streaming is extremely important for ensuring a good user experience (especially when building a production app). **A streaming app allows tokens to be streamed back to the user as they are generated, creating a responsive and user-friendly experience.** Without streaming, the user would need to wait for every token of the response to be generated before seeing any result. Depending on the underlying LLM and it's latency, this can be a poor experience (especially in the case of reasoning LLMs).

#### ❓ QUESTION #2: 

Why are we using User Session here? What about Python makes us need to use this? Why not just store everything in a global variable?

A user session object can help us store user-specific or chat-session specific information in the session object. As an example, if we had implemented a login mechanism, the user session object would hold information specific to each user (which we would want to avoid other user's from accessing). 

*Concretely for this application, we create a `RetrievalAugmentedQAPipeline` parameterized by vector_db which contains indexed chunks of the file which the specific user uploaded. This pipeline is specific to each user and stored on the user session object.* One can imagine a production scenario where different user sessions are retrieved with different underlying pipelines. 

A global variable simply doesn't allow this flexibility. In the case of a global variable, each user would see the same value of the global variable. 

#### ❓ Discussion Question #1:

Upload a PDF file of the recent DeepSeek-R1 paper and ask the following questions:

1. What is RL and how does it help reasoning?
2. What is the difference between DeepSeek-R1 and DeepSeek-R1-Zero?
3. What is this paper about?

Does this application pass your vibe check? Are there any immediate pitfalls you're noticing?

The answers to the above-listed questions are in the following cell. They've been generated by uploading the [DeepSeek-R1 paper](https://arxiv.org/pdf/2501.12948) and asking the questions from the deployment at [this space](https://huggingface.co/spaces/deman539/PDFChatter).

1. What is RL and how does it help reasoning?

RL stands for Reinforcement Learning. It is a type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize some notion of cumulative reward. In the context of reasoning capabilities in language models, RL is used to incentivize and improve the model's reasoning patterns without relying on supervised fine-tuning (SFT). This approach allows the model to explore different reasoning behaviors and develop complex chain-of-thought processes, ultimately enhancing its performance on various reasoning tasks. The models, like DeepSeek-R1-Zero, have shown significant improvements in reasoning due to the pure RL process, demonstrating capabilities such as self-verification and reflection.

2. What is the difference between DeepSeek-R1 and DeepSeek-R1-Zero?

The main differences between DeepSeek-R1 and DeepSeek-R1-Zero are as follows:

General Capability: DeepSeek-R1 currently has limitations in tasks such as function calling, multi-turn interactions, complex role-playing, and JSON output, while DeepSeek-R1-Zero demonstrates stronger foundational capabilities and has shown improvement in reasoning tasks throughout its reinforcement learning (RL) training process.

Performance: DeepSeek-R1-Zero shows a steady enhancement in performance as tracked by metrics like the average pass@1 score on the AIME 2024 benchmark. In performance comparisons, DeepSeek-R1-Zero exceeds the performance of OpenAI-o1-0912, whereas DeepSeek-R1 is positioned as less capable in various tasks.

Self-evolution: The self-evolution process of DeepSeek-R1-Zero allows it to autonomously improve its reasoning capabilities directly from the base model without the influence of supervised fine-tuning. This approach provides clearer insights into its progression over time.

Readability and Output Format: DeepSeek-R1-Zero often produces outputs that may not be suitable for reading, mixing multiple languages or lacking formatting. In contrast, DeepSeek-R1 is designed to provide clearer, more reader-friendly outputs with a structured format that includes a summary at the end.

Overall, DeepSeek-R1-Zero appears to be a more advanced iteration with better performance and capabilities in reasoning tasks compared to DeepSeek-R1.


3. What is this paper about?\
I don't know the answer.





In terms of vibe check, a few drawbacks are immediately visible. 
1. The simple RAG app isn't able to semantically understand the whole paper. When we ask it for difference between DeepSeek-R1 and DeepSeek-R1-Zero, it briefly describes the properties of each model but doesn't really compare and contrast the key difference between them (which is only RL vs RL and SFT). This is understandable given how the RAG application gets the most semantically relevant chunk based on query embedding. The chunk size and chunking strategy plays a huge role in this behavior. In order to get a better response, we would additionally need to index additional summaries of the paper which are simpler to reason aboout. 
2. When we ask the model what the paper is about, it doesn't know the answer. Once again, this is because the query embedding (the question itself) can't find a good match using only the indexed chunks. Some additional metadata about the paper would be required for a better response. 