A study of deep-learning approaches to single-image super resolution (SISR), comparing a from-scratch SRCNN implementation against four published architectures (FSRCNN, ESPCN, EDSR, LapSRN) on the same input, at scale factors ranging from 2x to 8x.
The goal was not to set a new benchmark. The goal was to build a deep, hands-on understanding of how CNN-based super resolution actually works: the patch extraction, the upsampling strategies (pre-upsample bicubic vs. learned sub-pixel vs. progressive Laplacian pyramid), the trade-off between depth and runtime, and where each design wins or fails visually.
I wanted to answer one question for myself: when a paper says "our model produces sharper textures than bicubic interpolation," what does that actually look like at the pixel level, across architectures, on the same image?
So I trained SRCNN from scratch in Keras to internalize the full pipeline (HDF5 patch generation, sliding window over downscaled-then-rescaled inputs, MSE optimization), then ran four published architectures through OpenCV's dnn_superres module against identical inputs and compared the outputs to bicubic interpolation. The result is a like-for-like visual comparison across five very different design choices.
A butterfly image (Project Files/butterfly.png) is the canonical test image. For each architecture, the repo contains a three-panel comparison: original low-resolution input, bicubic-interpolated baseline, and the deep-learning super-resolved output.
| Architecture | Scale | Side-by-side output |
|---|---|---|
| SRCNN (trained from scratch) | 2x | Project Files/Image Super Resolution (SRCNN).png |
| FSRCNN | 3x | Project Files/Image Super Resolution (FSRCNN).png |
| ESPCN | 4x | Project Files/Image Super Resolution (ESPCN).png |
| EDSR | 4x | Project Files/Image Super Resolution (EDSR).png |
| LapSRN | 8x | Project Files/Image Super Resolution (LapSRN).png |
There is also a video demo (Video Results/SR_ESPCN_4x.mp4 vs Video Results/Bicubic.mp4 vs Video Results/Original.mp4) showing ESPCN running frame-by-frame on a short clip at 4x, alongside the bicubic baseline.
A short walkthrough video (Image Super Resolution Project Presentation.mp4) and the full project report (Image Super Resolution Project Report.pdf) are at the repo root.
The work is split across two notebooks.
1. Train SRCNN from scratch (Project Files/Notebooks/Image_Super_Resolution_SRCNN.ipynb)
The notebook downloads the UKBench100 dataset, builds HDF5 datasets of low-resolution / high-resolution patch pairs via sliding-window crops, defines the 3-layer SRCNN architecture in Keras, trains for 50 epochs, and applies the trained model to a test image. Open it in Colab or Jupyter and run cells top to bottom. A GPU is recommended for the training cell.
2. Run the four pretrained architectures (Project Files/Notebooks/Image_Super_Resolution_OpenCV.ipynb)
This notebook uses OpenCV's dnn_superres interface to load any of the four pretrained .pb models in Project Files/Networks/ and upscale a target image. Swap the model and img paths in the args dict to try a different architecture or input. Requires opencv-contrib-python==4.4.0.44.
# inside the OpenCV notebook
args = {
"model": "Project Files/Networks/EDSR_x4.pb", # or ESPCN_x4.pb, FSRCNN_x3.pb, LapSRN_x8.pb
"img": "Project Files/butterfly.png",
}The pretrained .pb files live in Project Files/Networks/ and ship with the repo (total ~40 MB).
What I learned from each design:
SRCNN (Dong et al., 2014). The original "deep learning for SR" baseline. Three conv layers (9x9, 1x1, 5x5, with 64 / 32 / 3 channels) on top of a bicubically pre-upsampled input. Trained with MSE loss, Adam (lr=0.001), 50 epochs, batch size 128, on 33x33 patches predicting 21x21 center crops. Simple, surprisingly effective, but slow because all conv operations run at high-resolution scale.
FSRCNN. Moves the upsampling to the end via a deconvolution layer, so all the expensive feature extraction happens in low-resolution space. Much faster than SRCNN at equivalent quality.
ESPCN. Replaces deconvolution with a learned sub-pixel convolution (pixel shuffle) layer at the end. Best speed of the bunch, and the architecture I picked for the video demo because it can keep up with real-time inference on short clips.
EDSR. Wins on quality. Strips batch normalization out of the standard ResNet block (it hurts SR), then stacks many residual blocks. Heavy (the .pb file alone is 36 MB) but the visual improvement over bicubic at 4x is unmistakable.
LapSRN. Built for large scale factors. Reconstructs the high-resolution image progressively through a Laplacian pyramid (2x, then 4x, then 8x) instead of jumping straight to 8x, which keeps intermediate features meaningful. The 8x output in this repo is from LapSRN.
The training loss curve for SRCNN (Project Files/Training Loss.png) shows the expected smooth MSE decay over 50 epochs.
- Python, TensorFlow / Keras (SRCNN training)
- OpenCV
dnn_superres(FSRCNN, ESPCN, EDSR, LapSRN inference) - HDF5 via h5py for patch-pair storage during training
- NumPy, PIL, Matplotlib, imutils for I/O and visualization
This was framed as a qualitative comparative study, not a benchmark, so I did not compute PSNR/SSIM tables here. The deliverables are the per-architecture side-by-side PNGs in Project Files/Images/<NETWORK>/ (each folder has the bicubic baseline alongside the SR output for that network), the SRCNN training loss plot, and the ESPCN 4x video demo.
The qualitative finding I cared about: at 4x, EDSR's residual depth produces visibly cleaner edges than ESPCN's sub-pixel convolution on the same input, but ESPCN runs fast enough for real-time video while EDSR does not. At 8x, LapSRN's progressive pyramid avoids the smearing that a one-shot upsample would produce. All four deep models beat bicubic interpolation on edges and high-frequency texture.
.
├── Project Files/
│ ├── Notebooks/
│ │ ├── Image_Super_Resolution_SRCNN.ipynb # train SRCNN from scratch
│ │ └── Image_Super_Resolution_OpenCV.ipynb # run pretrained SR models
│ ├── Networks/ # pretrained .pb files
│ │ ├── EDSR_x4.pb
│ │ ├── ESPCN_x4.pb
│ │ ├── FSRCNN_x3.pb
│ │ └── LapSRN_x8.pb
│ ├── Images/ # per-network result PNGs
│ │ ├── SRCNN/ ESPCN/ FSRCNN/ EDSR/ LapSRN/
│ ├── butterfly.png # canonical test image
│ ├── Training Loss.png # SRCNN MSE loss curve
│ └── Final Project Report.pdf
├── Video Results/
│ ├── Original.mp4, Bicubic.mp4, SR_ESPCN_4x.mp4 # frame-by-frame ESPCN demo
├── Image Super Resolution Project Presentation.mp4
├── Image Super Resolution Project Report.pdf
└── Different Approaches towards Image Super Resolution.pptx
The original scope included a GAN-based approach (SRGAN / ESRGAN) for iterative perceptual refinement, which I cut because of timeline. That is the natural next step: a perceptual-loss model whose outputs look sharper to a human even when the MSE metric prefers the EDSR-style output.
MIT. See LICENSE.