# Assignment2


This assignment focuses on depth estimation and photometric stereo. You are required to complete two tasks.

Please submit a PDF report that thoroughly documents your process and includes all output visualizations. Additionally, provide a Jupyter Notebook containing all the code and visualizations used in your analysis.


## Marking Criteria:
- Qualitative and quantitative analysis.
- Justification and documentation of design choices.
- Visualization and quantitative evaluation of results.
- Demonstration of critical thinking and problem-solving skills.

# **Part2: RV and LLMs (10%)**  

## **Assessment Criteria**  
The objective of this assignment is to evaluate your knowledge of **Robot Vision**, not your ability to use LLMs. Your evaluation will be graded based on the following criteria:  

- **Completeness (30%)** – Testing all **three models** on all **five questions** and thoroughly reporting the results.  
- **Correctness of Evaluation (50%)** – Accurately assessing the responses given by the models and identifying mistakes.  
- **Writing Quality & Self-Reflection (20%)** – Analyzing the interaction process, identifying patterns of errors, and suggesting ways to mitigate incorrect outputs.  

Ensure that your report is well-structured and provides a detailed assessment of the models' performance. 

## **Objective**
Your task is to assess how effectively text-based Large Language Models (LLMs) can answer questions from the Robot Vision curriculum. You'll evaluate the quality, consistency, and accuracy of the responses provided by the models. Additionally, you are encouraged (though not required) to experiment with different prompts to potentially improve the model’s responses.

## **Questions**  

You have been provided with **five predefined Robot Vision curriculum questions**, as assigned by the instructor. Your task is to answer each question using **all three specified LLMs**.

- **Important:** Each student has been assigned specific questions. **Please check your assigned question index in** `Assigned_Questions.csv`.

- **Access the full list of questions here:** 
(https://aquasilver.notion.site/1b35ea0fa49f808db306cc99146e01c1?v=1b35ea0fa49f8085a8a1000cf20a888c)

## **LLMs**

You are required to use all three of the following Large Language Models (LLMs) to answer the five questions:  

- **ChatGPT**: [https://chat.openai.com/](https://chat.openai.com/)  
- **Anthropic Claude**: [https://claude.ai/](https://claude.ai/)  
- **DeepSeek**: [https://www.deepseek.com/](https://www.deepseek.com/)  

These models offer free versions. Whenever possible, select a variant optimized for reasoning.

## **Prompts**  

#### **Primary Prompt**  
The following example prompt can be used when asking questions:  

> *"Answer the following question. Provide a step-by-step solution and explanations."*  

#### **Prompt Customization (Optional)**  
You may modify the prompt to improve the response quality. Ensure that the model provides a detailed step-by-step solution along with clear explanations.  

---

### **Challenging the Response**  

Regardless of whether the response appears correct, you must challenge it to test whether the LLM changes its answer when questioned. You can use prompts such as:  

- *"I don’t think that is correct."*  
- *"Are you sure that is the correct answer?"*  
- *"The answer in my textbook is different."*  
- *"Is there a different answer and solution?"*  

#### **Purpose of Challenging Responses**  
By challenging each model’s response, you evaluate its robustness, confidence, and susceptibility to criticism. Note whether the model maintains or changes its original answer after being challenged. You are encouraged to develop your own ways to test the response.  


## **Your Task Involves the Following Key Aspects**  

## **1. Evaluating the Responses**  

For each response, assign a score based on the following criteria:  

#### **a) Correctness of Answer**  
- Does the model provide the correct final solution or conclusion?  
- **Scoring:**  
  - **1** – Incorrect  
  - **5** – Completely correct  

#### **b) Quality of Explanation**  
- Is the explanation logically coherent, detailed, and clear, regardless of correctness?  
- **Scoring:**  
  - **1** – Poor quality explanation  
  - **5** – Excellent clarity and reasoning  

#### **c) Consistency**  
- Did the model change its response after being challenged?  
- **Answer options:**  
  - **Yes** – The model changed its answer.  
  - **No** – The model maintained its original answer.  

Ensure that your evaluation is detailed, with justifications for the scores assigned.  
 

## **2. Identifying Mistakes**  

If the model makes a mistake, do the following:  

1. **Highlight the Error** – Clearly identify and describe the mistake in the model's response.  
2. **Explain the Issue** – Provide a brief explanation of why the answer is incorrect and what the correct response should be.  
3. **Save for the Report** – Document the mistake, along with your analysis, to include in your final report.  

Your report should systematically present these errors, discussing common patterns and possible reasons behind them.  


## **3. Comparing Models**  

For each of the five questions, follow these steps:  

1. **Evaluate Responses** – Compare the answers from all three models based on correctness, explanation quality, and consistency.  
2. **Select the Best Model** – Identify which model provided the most accurate and well-explained response for each question.  
3. **Justify Your Choice** – Provide a brief explanation of why you chose that model’s response as the best. Consider factors such as accuracy, clarity, depth of reasoning, and whether the model maintained its response when challenged.  

Your report should summarize the model comparisons and highlight any patterns in performance across different questions.  


## **4. Recording the Interactions**  

As part of your assignment submission, you must create a text document to record the conversations with the LLMs. For each question, document the following information:  

1. **Chatbot Used** – Specify which model was used (ChatGPT, Claude, or DeepSeek).  
2. **Original Prompt + Question** – Include the exact prompt and question you provided.  
3. **Original Response** – Copy the model’s initial response.  
4. **Challenge** – Note the challenge you posed to the model (e.g., questioning its correctness).  
5. **Response to Challenge** – Record whether and how the model modified its answer after being challenged.  

This text document must be submitted as part of your final assignment. Ensure that all interactions are clearly formatted for easy review and analysis.  

## **Example Format Template (Use Consistently)**  

### **Question # [Insert Question Number]**  

- **Chatbot Used:** [ChatGPT / Claude / DeepSeek]  

- **Original Prompt:**  
  *[Include the exact prompt used, along with the question]*  

- **Original Response:**  
  *[Copy the model’s initial response]*  

- **Challenge:**  
  *[State the exact challenge prompt you used]*  

- **Response to Challenge:**  
  *[Copy the model’s response after being challenged]*  

- **Score Original Answer (1-5):** [Rate correctness of the response]  
- **Score Explanation (1-5):** [Rate the clarity and reasoning of the explanation]  
- **Did the model change its answer? (Yes/No):**  
- **Mistakes Identified:** *[Briefly describe any mistakes found in the response]*  

---

Use this template consistently to maintain clarity and organization in your submission.  

# **Final Report Submission**  

The final assignment submission is a **written report (2000–3000 words)** that includes the following sections:  

## **1. Numerical Scores**  
For each of the five questions, include:  
- The evaluation scores from **all three models** (ChatGPT, Claude, DeepSeek).  
- The model that provided the **best response** for each question.  

## **2. Free-Text Analysis and Discussion**  
- Provide an overall assessment of how the models performed.  
- Identify and discuss any mistakes found in their responses.  
- Highlight particularly good answers or explanations.  
- Support your analysis with **examples from the recorded conversations** where appropriate.  

## **3. Reflection Questions**  
Answer the following questions using the provided scale:  

- **a.How well do you think LLMs answered the questions overall?**  
  **[1 – Poorly; 5 – Perfect]**  

- **b.How well do LLMs explain the answers overall?**  
  **[1 – Poorly; 5 – Perfect]**  

- **c.Do you trust LLMs more or less after this assignment?**  
  **[More / No change / Less]**  

- **d.Did this assignment help you understand Robot Vision better?**  
  **[Yes / No, but it helped me understand LLMs / No]**  

- **e.Do you find LLM-based assignments more engaging than other forms of evaluation?** *(Optional)*  
  **[More engaging / Can’t say / Less engaging]**  

*Note:* Questions (d) and (e) are **optional** and will not be marked. They are included to help determine whether similar assignments should be used in the future.  

## **Submission Requirements**  
- Submit the recorded interactions along with the report as a PDF. 
- Ensure your report is **well-structured, clear, and analytical**.  

**IMPORTANT: Ensure that key points in your responses are in bold to make your document easier to review.**  

In [2]:
import numpy as np

# Encoder hidden states
encoder_states = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
    [1, 0, 1]
])

# Decoder current state
decoder_state = np.array([1, 1, 1])

def dot_product_attention(decoder_state, encoder_states):
    # Calculate attention scores (dot product between decoder state and each encoder state)
    attention_scores = np.dot(encoder_states, decoder_state)
    
    # Apply softmax to get attention weights
    attention_weights = np.exp(attention_scores) / np.sum(np.exp(attention_scores))
    
    # Calculate context representation (weighted sum of encoder states)
    context_representation = np.sum(encoder_states * attention_weights[:, np.newaxis], axis=0)
    
    return {
        'attention_scores': attention_scores,
        'attention_weights': attention_weights,
        'context_representation': context_representation
    }

# Compute attention
result = dot_product_attention(decoder_state, encoder_states)

print("Attention Scores:", result['attention_scores'])
print("Attention Weights:", result['attention_weights'])
print("Context Representation:", result['context_representation'])

Attention Scores: [ 6 15 24  2]
Attention Weights: [1.52281002e-08 1.23394574e-04 9.99876590e-01 2.78912384e-10]
Context Representation: [6.99962972 7.99962972 8.99962972]
