# Medium Blog Summarizer - With OpenAI API

In [1]:
# imports
import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI


In [2]:
# Load environment variables in a file called .env

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Check the key
if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")


API key found and looks good so far!


In [3]:
openai = OpenAI()

In [4]:
# A class to represent a Webpage
# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [5]:

def user_prompt_for_blog(website):
    user_prompt = f"You are looking at a blog titled {website.title}"
    user_prompt += "\nThe contents of this blog is as follows; \
please provide a summary of this blog in markdown. \
If it includes references any research papers and github links, please provide their links also.\n\n"
    user_prompt += website.text
    return user_prompt


def summarize_medium_blog(url, system_prompt, model="gpt-4o-mini"):
    website = Website(url)
    response = openai.chat.completions.create(
        model = model,
        messages = [{"role": "system", "content": system_prompt},
           {"role": "user", "content": user_prompt_for_blog(website)}]
    )
    return response.choices[0].message.content




In [6]:
# System prompt
system_prompt_guru = "You are a scientific assistant. You have to help summarize a blog post.\
Please tell about the most important findings and anything the author emphasizes.\
Please provide all the important links mentioned. \
Give the output in markdown."

# Blog URL
guru_blog = "https://gurudeep1998.medium.com/w-net-a-deep-model-for-fully-unsupervised-image-segmentation-reproduction-2651540eaed6"
#guru_blog = Website("https://medium.com/write-a-catalyst/you-are-fired-now-80458d77205a")

In [7]:
display(Markdown(summarize_medium_blog(guru_blog, system_prompt_guru, "gpt-4o-mini")))

# Summary of "W-Net: A Deep Model for Fully Unsupervised Image Segmentation Reproduction" by Guru Deep Singh

## Introduction
The blog discusses the implementation of the paper "W-Net: A Deep Model for Fully Unsupervised Image Segmentation" by Xide Xia and Brian Kulis. The W-Net architecture consists of two U-net architectures for unsupervised image segmentation, facilitating the reconstruction of images from segmentations. The author and Nadine Duursma aimed to reproduce this model for a student assignment at Delft University of Technology.

## W-Net Model Architecture
- **Encoder**: Outputs image segmentations from original images.
- **Decoder**: Reconstructs images from segmentations.
- The architecture has 18 modules, 46 convolutional layers in total, with specific use of depth-wise separable convolutions in many modules for efficiency.

### Key Features
- Input images are resized to 224x224 pixels.
- Utilizes both ReLU and batch normalization.
- The model's final layers involve 1x1 convolutions to map feature vectors to the desired number of classes.

## Training Methodology
- The model was trained on the **PASCAL VOC2012** dataset using a batch size of 10.
- Two optimizers were used: one for the encoder (with Soft-N-cut loss) and another for the entire model (with reconstruction loss).
- The model implemented dropout to prevent overfitting and utilized a learning rate strategy involving adjustments after certain iterations.

## Testing
- The model was tested using **BSDS300** and **BSDS500** datasets, and performance was measured using Variation of Information (VI), Probabilistic Rand Index (PRI), and Segmentation Covering (SC).

## Reproduction of Results
The authors modified three existing GitHub repositories for their reproduction effort:
1. **Base Model Repository**: Provided foundational code (Repository [2]).
2. **N-cut Loss Repository**: Offered an efficient implementation of N-cut loss (Repository [3]).
3. **Metrics Evaluation Repository**: Facilitated performance metric calculations (Repository [4]).

## Results and Discussion
The authors plotted losses and visualized segmentations throughout training. They observed that while their losses were higher than those reported in the original paper, the visual output was satisfactory, showcasing improved segmentations.

### Challenges Encountered
Several key parameters were either missing or not clearly mentioned in the original paper, including:
- Model initialization
- Optimizer specifications
- Number of classes (K)

These omissions might have impacted the reproduction results.

## Conclusion
The reproduction of the W-Net model demonstrated promising results despite challenges stemming from incomplete information in the original paper. The authors noted that their visual results were recognizable and closely aligned with the input images, validating the effectiveness of the unsupervised segmentation approach.

## Important Links
- Original paper: [W-Net: A Deep Model for Fully Unsupervised Image Segmentation](https://arxiv.org/abs/1711.08506) [1].
- Repository [2]: [W-Net Implementation in Pytorch](https://github.com/fkodom/wnet-unsupervised-image-segmentation) [2].
- Repository [3]: [W-Net Unsupervised Image Segmentation](https://github.com/fkodom/wnet-unsupervised-image-segmentation) [3].
- Repository [4]: [BSD500-Segmentation-Evaluator](https://github.com/KuangHaofei/BSD500-Segmentation-Evaluator) [4].
- Quantifying Reproducibility: [A Step Toward Quantifying Independently Reproducible Machine Learning Research](http://arxiv.org/abs/1909.06674) [5].

---

This markdown summary captures the essence of the blog post along with key findings and relevant links to the sources discussed.