In [2]:
import os
from dotenv import load_dotenv
from IPython.display import Markdown, display, update_display
from scraper import  fetch_website_contents
from openai import OpenAI

In [3]:
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    MODEL = 'gpt-5-nano'
    openai = OpenAI()
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    OLLAMA_BASE_URL = "http://localhost:11434/v1"
    ollama = OpenAI(base_url=OLLAMA_BASE_URL, api_key='ollama')
    MODEL = "llama3.2"

API key looks good so far


In [4]:
class ArticleSummarizer:
    def __init__(self,Article, url):
        self.Article = Article
        self.url = url

    def fetch_page_and_all_relevant_links(self, url):
        contents = fetch_website_contents(url)
        result = f"## Landing Page:\n\n{contents}\n## Relevant Links:\n"

        return result

    summarizer_system_prompt = """
    When I provide you with a URL to a research article from any academic source 
    (such as PubMed, arXiv, Elsevier, PNAS, Nature, Science, IEEE, Springer, Wiley, BMC, MDPI, or any other scholarly database), 
    please follow these steps:

    1. **Access and read the full content** of the research article from the provided URL
    2. **Extract and analyze** all sections of the paper thoroughly
    3. **Identify and highlight** Bold and increase size of Headings
    4. **Provide a comprehensive summary** that includes:

    - **Title and Authors**: The complete title and list of authors
    - **Publication Details**: Journal name, publication date, DOI (if available)
    - **Abstract Summary**: A concise overview of the abstract in your own words
    - **Introduction/Background**: The research problem, context, and motivation for the study
    - **Research Objectives**: The specific aims, hypotheses, or research questions
    - **Methodology**: 
        - Study design and approach
        - Sample size and participants (if applicable)
        - Data collection methods
        - Analysis techniques and tools used
    - **Key Findings/Results**: 
        - Main outcomes and discoveries
        - Statistical significance (if mentioned)
        - Data patterns and trends
    - **Discussion**: Interpretation of results and their implications
    - **Limitations**: Any constraints or weaknesses acknowledged by the authors
    - **Conclusions**: Final takeaways and the authors' main conclusions
    - **Future Directions**: Suggestions for future research (if mentioned)
    - **Significance**: Why this research matters to the field

    4. **Structure your summary** in a clear, organized format with appropriate headings and subheadings
    5. **Maintain academic accuracy** while making the content accessible and easy to understand
    6. If you cannot access the full text, clearly state what portions you were able to read and summarize accordingly

    Please provide the summary in a structured format that allows me to quickly grasp the essence of the research while also having
    access to detailed information about each component of the study.
    """

    def get_user_prompt(self):
        user_prompt = f"""
            You are looking at a Article called: {self.Article}
            Here are the contents page of given article 
            use this information to build a summary of the article in markdown without code blocks.\n\n
            """
        user_prompt += self.fetch_page_and_all_relevant_links(self.url)
        user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
        return user_prompt
        

    def stream_article(self):
        stream = openai.chat.completions.create(
            model=MODEL,
            messages=[
                {"role": "system", "content": self.summarizer_system_prompt},
                {"role": "user", "content": self.get_user_prompt()}
            ],
            stream=True
        )    
        response = "" 
        display_handle = display(Markdown(""), display_id=True) 
        for chunk in stream: 
            response += chunk.choices[0].delta.content or '' 
            update_display(Markdown(response), display_id=display_handle.display_id)

In [5]:
if __name__ == "__main__":
    summarizer = ArticleSummarizer(
        Article="Deep Residual Learning for Image Recognition",
        url="https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf"
    )
    summarizer.stream_article()

## **Deep Residual Learning for Image Recognition** — Summary

### **Authors**
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

### **Publication Details**
- Source: arXiv preprint 1512.03385 (2015). 
- Conference: Presented at CVPR 2016.
- DOI: Not provided for the arXiv version; the conference proceedings contain the formal citation.

---

## **Abstract Summary**
The paper introduces the idea of residual learning to tackle optimization difficulties in very deep neural networks. Rather than mapping an input x to a desired output H(x) directly, the network learns a residual function F(x) = H(x) − x, so that H(x) = F(x) + x. This simple reformulation, combined with shortcut connections that skip one or more layers, enables much deeper networks to be trained effectively. The authors propose residual blocks with identity shortcuts (and projection shortcuts when dimensions change) and demonstrate that networks with depths up to 152 layers outperform previous architectures on ImageNet and related benchmarks. They also introduce bottleneck blocks to build deeper networks efficiently. Overall, residual learning makes very deep CNNs practical and beneficial for image recognition.

---

## **Introduction / Background**
- Deep CNNs historically suffer from optimization difficulties as depth increases (the so-called degradation problem: accuracy degrades with deeper architectures, even when more parameters are available).
- Residual learning reframes the problem: instead of learning H(x) directly, the network learns F(x) that models the residual between the input and the target mapping, making it easier to optimize.
- Skip connections (identity mappings) provide a direct path for gradients and information to flow through the network, mitigating vanishing/exploding gradient issues.
- This framework enables systematic construction of very deep networks with improved trainability and generalization.

---

## **Research Objectives**
- Develop a general residual learning framework that enables training of extremely deep networks (up to 152 layers) for image recognition.
- Evaluate the benefits of residual connections across architectures of varying depth (e.g., 34, 50, 101, 152 layers).
- Investigate design choices: identity vs projection shortcuts, bottleneck blocks, and their impact on performance and efficiency.
- Demonstrate strong performance on large-scale (ImageNet) and smaller-scale (CIFAR) datasets.

---

## **Methodology**

### Study Design and Approach
- Architectural blueprint: Residual blocks composed of convolutional layers with skip connections that add input x to the block’s output F(x).
- Block types:
  - Basic block (used in shallower nets): typically two 3x3 convolutions with an identity skip.
  - Bottleneck block (used for deeper nets): 1x1, 3x3, 1x1 convolutions to reduce and then restore dimensionality, enabling very deep networks with fewer parameters.
- Shortcuts:
  - Identity shortcuts when input and output dimensions match.
  - Projection shortcuts (via 1x1 convolution) when downsampling or changing dimensionality is needed.
- Training setup:
  - Large-scale ImageNet dataset (1000-class classification) with data augmentation (random crops, horizontal flips, etc.).
  - Convolutional layers followed by batch normalization and ReLU activations.
  - Global average pooling before the final fully connected classifier.
  - Stochastic gradient descent with momentum, weight decay, and carefully tuned learning rate schedules.
- Evaluation:
  - Comparisons across depths (e.g., 34-layer, 50-layer, 101-layer, 152-layer) on ImageNet.
  - Additional validation on CIFAR datasets to illustrate generality.

### Participants / Datasets
- ImageNet (large-scale, 1000-class) used to benchmark performance of varying depths.
- CIFAR-10 and CIFAR-100 used to further validate the residual approach on smaller images and datasets.

### Data Collection Methods and Analysis Tools
- Standard supervised training on GPUs with well-established deep learning toolchains.
- Analysis centered on accuracy metrics (top-1 / top-5) and training convergence behavior across depths.

---

## **Key Findings / Results**

### Main Outcomes
- Residual learning enables effective training of very deep networks (up to 152 layers) that outperform their shallower counterparts and traditional deep CNNs.
- Deeper ResNets consistently deliver better accuracy than shallower models, demonstrating that the degradation issue is mitigated by residual connections.
- Bottleneck design enables deep architectures with more layers without an inordinate increase in parameters or computational cost.

### Statistical Significance / Data Patterns
- Deeper residual networks show marked improvements in accuracy on ImageNet relative to non-residual deep networks.
- The 152-layer ResNet achieved state-of-the-art performance on ImageNet at the time, illustrating the practical benefits of extremely deep models.
- Improvements were observed across both large-scale and smaller datasets, indicating the robustness of the residual framework.

### Notable Architectural Insights
- Identity shortcuts facilitate gradient flow and enable the network to learn residual functions that are easier to optimize.
- When downsampling or changing dimensionality, projection shortcuts maintain information flow and alignment between layers.
- The bottleneck block is particularly effective for very deep models, balancing depth with computational efficiency.

---

## **Discussion**

- The introduction of residual connections fundamentally changes how very deep networks are optimized, making it feasible to train hundreds of layers without performance collapse.
- Residual learning shifts the optimization target from fitting entire mappings to fitting residuals, which are typically easier to learn when the desired function is close to the identity.
- The approach generalizes beyond ImageNet, benefiting various vision tasks and datasets, and has influenced a wide array of subsequent architectures.

---

## **Limitations**

- Computational and memory demands grow with depth, posing practical constraints for training extremely deep models on limited hardware.
- While residual connections alleviate optimization problems, the gains may exhibit diminishing returns beyond a certain depth or with suboptimal training setups.
- The effectiveness of residual blocks is demonstrated on standard benchmarks; applicability to other domains or non-visual tasks may require adaptations.

---

## **Conclusions**

- Deep residual learning makes extremely deep convolutional networks trainable and beneficial for image recognition.
- The residual framework, especially with identity and projection shortcuts and bottleneck blocks, enables systematic construction of very deep nets that achieve state-of-the-art results.
- This work has profoundly influenced subsequent deep learning research, establishing residual networks as a foundational architecture in computer vision and beyond.

---

## **Future Directions**

- Exploration of even deeper or more efficient residual architectures, alternative shortcut designs, and improved bottleneck configurations.
- Investigation into complementary techniques (e.g., normalization strategies, optimization tricks, data augmentation) to further improve training stability and performance.
- Application of residual learning principles to other modalities and tasks (e.g., segmentation, detection, video, and non-vision domains).

---

## **Significance**

- The paper introduced a simple yet powerful idea—residual connections—that dramatically improved the trainability and performance of very deep networks.
- ResNets became a foundational architecture, enabling deeper models to reach or surpass prior performance ceilings and influencing a broad range of subsequent models, libraries, and practical applications in computer vision and beyond.