Skip to content


Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation

Computer Vision (CS 763) - Spring 2018

Course Information

  • Instructor: Arjun Jain
  • Office: 216, CSE New Building
  • Email: ajain@cse DOT iitb DOT ac DOT in
  • Teaching Assistants: Rishabh Dabral, Safeer Afaque
  • Instructor Office Hours (in room 216 CSE New Building): Arjun is on campus only on Thursdays and Fridays. Meet him after class or fix an appointment over email.

Please note that CS663 is a hard prerequisite for this course.

Topics to be covered (tentative)

  • Camera geometry, camera calibration, vanishing points, important transformations, homographies
  • Image registration: RANSAC for point-matching, SIFT overview
  • Deep Learning in computer vision: the data-driven paradigm, feed forwards networks, back-propagation and chain rule; CNNs and their building blocks, generative adverserial networks (GANs)
  • Deep Learning applications including face detection, CNN compression, siamese and triplet networks and applications to face recognition
  • Algorithms for: shape from shading, optical flow, Kanade-Lucas-Tomasi algorithm, applications of optical flow
  • Photometric stereo - deriving shape from multiple images of an object taken under different lighting conditions; applications to illumination invariant face recognition, face relighting
  • Stereo (geometric binocular): epipolar geometry and fundamental matrix, the correspondence problem and shape from stereo; structure from motion

Learning materials and textbooks

Grading Policy

  • Mid-sem exam: 20%
  • Final exam (cumulative): 20%
  • Assignments (five or six): 35% (all to be done in groups of 2-3 students)
  • Course project: 20% (to be done in the same group of 2-3 students)
  • Class participation: 5%
  • Course project work will be presented by the student group during a viva at the end of the course. During this viva, each student in the group will be separately questioned, not only on the project work, but also the assignments. Each student is expected to contribute to each and every assignment and the course project.
  • Audit requirements: You must write both exams, submit all assignments and the project, and score at least 40% to get an AU.

Other Policies

  • Assignments will be given out (typically) once every two or three weeks. They must be submitted on or before the deadline. No late assignments will be accepted. The programming components of the assignments will typically involve MATLAB and lua, so you must be willing to learn it quickly.
  • We will adopt a zero-tolerance policy against any forms of plagiarism or any other form of cheating. Just don't do it! In cases of plagiarism, givers and takers will both be considered equally responsible.
  • This course is (inherently) cumulative. The syllabus for the final exam will include everything taught during the semester.

Course Projects

[02/02/2018] Course projects have now been finalized.

Go to this link for the finalized list.


  • [12-Jan-18] Assignment 1 has been released. The due date for submission is Friday, January 26, 2018.
  • [27-Jan-18] Assignment 2 has been released. The due date for submission is Sunday, February 4, 2018.
  • [09-Feb-18] Assignment 3 has been released. The due date for submission is Wednesday, February 21, 2018. Corresponding kaggle competition link
  • [06-Mar-18] Assignment 4 has been released. The due date for submission is Monday, March 19, 2018. Corresponding kaggle competition link
  • [24-March-18] Assignment 5 on Tracking has been released. Due date: April 2, 2018. Download the necessary files from here
  • [11-April-18] Assignment 6 on Multiview Geometry has been released. Due date: April 19, 2018.

Lecture Schedule:

Date Topics Slides iTorch Notebooks Extra Reading
4th Jan. 2018
  • Introduction to computer vision, applications and course overview
    Slides -- --
    5th Jan. 2018 Camera Geometry
    • Homogeneous coordinates and projective geometry
    • Vanishing points, ideal line, point line duality in P2
    • Important 2D and 3D transformations using homogeneous coordinates
    • Introduction to the pin-hole camera model
    Slides -- Homogeneous Representations of Points, Lines and Planes
    12th Jan. 2018
    • Modeling the pinhole camera analytically, intinsic and extrinsic parameters
    • World, camera, image plane and sensor plane coordinate systems and transformations between them
    • Linear and non-linear (lens distortion) errors
    • Homography, planar world and pure rotation of the camera
    Slides -- --
    13th Jan. 2018
    • Iterative solutions for dealing with with non-linear (lens distortion) errors
    • Normalized, ideal, euclidian, affine and general camera models
    • Orthographic and weak-perspective camera models
    • Cross ratios and its applications
    • Camera calibration using DLT (known 3D control points)
    Slides -- Resource on SVD, how/why it can be used to solve eq. sytems of type Ax=0, |x|=1
    18th Jan. 2018
    • Zhang's camera calibration method, mention of a few DL based calibration methods
    Image Alignment
    • Image alignment: problem statement, physically and digitally corresponding points
    • Motion models and degrees of freedom; non-rigid/deformable/non-parametric image alignment
    • Control point based image alignment using least squares - derivation for pseudo-inverse
    • Introduction to the SIFT algorithm
    • Forward and reverse image warping - bilinear and nearest-neighbor interpolation
    • Mention of DL based image patch descriptors
    -- --
    19th Jan. 2018
    • Image alignment using image similarity measures: mean squared error, normalized cross-correlation
    • Concept of field of view in image alignment using image similarity measures
    • Monomodal and multimodal image alignment
    • Concept of joint histograms and behaviour of joint histograms in multi-modal image alignment
    • Concept of entropy and joint entropy, algorithm for multimodal registration by minimizing joint entropy
    • Aspects of image registration: 2D/3D, motion model, monomodal or multimodal
    • Application scenarios for image alignment: template matching, video stabilization, panorama generation, face recognition, 3D to 2D alignment
    Robust Methods in Computer Vision
    • Least squares problems and their relation to the Gaussian distribution on the noise
    • Examples of outliers in computer vision
    • Explanation of why the Gaussian distribution is unsuited to handling outliers
    • Introduction to the Laplacian distribution
    • The importance of heavy-tailed distributions in robust statistics
    • RANSAC (random sample consensus) algorithm
    -- --
    25th Jan. 2018 Recognizing images, objects, scenes (Prof. Suyash P. Awate)
    • Texture modeling and classification
    • Image classification, challenges
    • Bag of words model, dictionary learning
    • Defining image similarity, pyramid match kernel (PMK)
    -- --
    1st Feb. 2018 Recognizing images, objects, scenes (Prof. Suyash P. Awate)
    • Pyramid match kernel (PMK)
    • Kernel coding, local coding, vector quantization, sparse coding, LcLC
    -- --
    2nd Feb. 2018 Robust Methods in Computer Vision
    • RANSAC: time complexity and expected no. of iterations
    • Using RANSAC for Homography estimation
    • Introduction to the Laplacian distribution
    • Mean versus median: L2 fit versus L1 fit
    • LMeds: Least Median of Squares
    Deep Learning for Computer Vision
    • History, introduction
    • Data driven paradigm
    • K-NN on CIFAR 10
    • Hyperparameters, choice of loss function, cross-validation
    KNN Matrix calculus reminder
    8th Feb. 2018
    • Softmax classifier, cross-entropy loss function, regularization
    • Optimization: vanilla gradient descent, stochastic gradient descent
    • Vanilla momentum, Nesterov momentum, AdaGrad, RMSProp, ADAM
    • Second order optimization methods, it's issues with deep learning
    • Good learning rate, learning rate decay
    Gradient Check ADAM, Nesterov
    DL optimization algorithms overview
    9th Feb. 2018
    • Feed forward, back-propagation
    • Fully connected layer
    • Activation functions: sigmoid, tanh, ReLU, LeakyReLU, ELU, etc.
    Linear Layer, ReLU --
    15th Feb. 2018
    • Convolutions: transposed, dilated, fully-connected as convolution, sliding window as convolution
    • Max-pooling, Dropout
    MaxPool, Convolution, Transposed convolution, Dropout Convolution arithmetic for deep learning
    16th Feb. 2018
    • SoftMax, Cross Entropy
    • Data Augmentation, hyperparamter selection
    • Weight initialization
    Cross Entropy, Weight Initialization --
    22nd Feb. 2018
    • ConvNet applications
    • ConvNet case studies
    -- --
    23rd Feb. 2018
    • RNNs, LSTMS
    • Visualizing and understanding ConvNets
    -- --
    8th March 2018
    • Visualizing and understanding ConvNets
    • Images that maximize ConvNet class scores, reconstructing images from ConvNet codes
    • Deep Dream, Neural Art, Adversarial Examples
    • Dimentionality reduction: siamese and triplet networks
    -- --
    9th March 2018
    • Other vision tasks: semantic segmentation, object localization, object detection, instance segmentation
    • R-CNN, Mask R-CNN,
    • Autoencoders
    • Generative modeling: VAEs, GANs
    • Case studies: pix2pix, CycleGAN, UNIT
    MNIST Vanilla GAN --
    15th March 2018
    • Deep Reinforcement Learning (Prof. Shivaram Kalyanakrishnan)
    Slides -- --
    16th March 2018 Structure from Motion
    • Motion as a cue to inference of 3D structure from images
    • Motion factorization algorithm by Tomasi and Kanade for inference of (sparse) 3D structure of a fixed object being observed by a moving orthographic camera (or a rigidly moving object, being observed by a fixed orthographic camera)
    • Aspects of the above algorithm: Rank theorem, metric constraints for inference of motion parameters and 3D structure
    -- --
    22nd March 2018 Kanade-Lucas-Tomasi Feature Tracking (KLT)
    • Tracking feature-points from a template by estimating motion parameters.
    • Finding good features to track.
    -- Lucas-Kanade 20 Years On: A Unifying Framework
    23rd March 2018 Geometric Stereo
    • Orientation parameters for the camera pair and relative orientation.
    • Coplanarity constraint for corresponding points
    • Derivation and key properties of the Fundamental matrix
    -- --
    5th April 2018
    • Introduction to epipolar geometry
    • Essential matrix
    • Popular parameterizations for the relative orientation
    • Generating the normalized stereo case from arbitrary views
    -- --
    6th April 2018
    • Direct Solutions for Computing Fundamental and Essential Matrix
    • 8-point algorithm
    • Triangulation
    -- --
    12th April 2018
    • Absolute Orientation
    • Multi-View Geometry and Bundle Adjustment
    -- --
    19th April 2018
    • Shape from Shading: Introduction
    • Reflectance Models
    -- --
    20th April 2018
    • Photometric Stereo
    -- --


    No releases published


    No packages published