# Computer Vision, Lab 2: Homographies

In computer vision, the most important geometric transformations are
2D planar homographies, 3D Euclidean homographies, and 3D similarity homographies.

Although you will be limited to a single scene plane,
a 2D planar homography is the simplest way to get 3D information from a 2D image.

Today we'll boost our OpenCV programming skills and learn how to do the math we
learned in class using code.

## GUI programing in OpenCV

To display the results of your calculations, you'll need to be able to visualize
images from a camera or video and draw on them. In OpenCV, these capabilities use the
"HighGUI" and "ImgProc" libraries. For showing images, we use the following important
functions:

 - <code>imshow()</code>: Show an image in a window. If you use the same name as an existing window, the image will replace the old image in the same window.
 - <code>waitKey()</code>: Wait for the user to press a key indefinitely or until a timer expires. Since image display runs in a separate thread that you
   don't (usually) control, your main thread needs to pause briefly to give time for display. <code>waitKey()</code> is a good way to do this, as it will
   put the calling thread to sleep for the given number of milliseconds. 1 ms is plenty of time for the display thread to do its work. If you have other
   actions that block the main thread such as waiting for the next image to be captured and transferred to RAM by a camera driver and you don't need user
   input, you don't need to use <code>waitKey()</code>.
 - <code>destroyWindow()</code>: Destroy a target window.
 - <code>destroyAllWindows()</code>: Destroy all windows under the program's control.
 
Let's start with a simple version of our solution.

### Tip on loading files before starting

Depending on how your program is started when testing it, it will always have a specific working directory.

If you want to open files by filename only without a full path, you'll need to put them in the program's working directory
or change the working directory to point to where the files are.

**Windows Visual Studio C++**: Put resources such as images and videos in the same directory as your <code>.cpp</code> source code.

For example, suppose we create a project named <code>Samplelab2</code> under <code>C:\Users\alisa\source\repos</code>.
The <code>.cpp</code> file containing the main function is in
<code>C:\Users\alisa\source\repos\Samplelab2\Samplelab2</code>, so we should put the image file <code>lena.png</code> as below:

<img src="img/lab02-1.PNG" width="600"/>

By the way, we shouldn't use the image <tt>lena.png</tt>, even though it is convenient and ships with
the OpenCV source code. [Read some of the context in Wikipedia](https://en.wikipedia.org/wiki/Lenna).
The image comes from a pornographic magazine, *Playboy*, from the 1970s,
and its continued use in the image processing community is given as
an example of sexism in the sciences, reinforcing gender stereotypes.

So while we're being nostalgic, [here's a better image from the 1970s](img/sample.jpg).
Anyway, once you put it in the right place, your program your program can refer to the file without a full path:

    Mat srcImage = imread("sample.jpg");

However, if you run the executable from another directory, you'll have to put the resource in the directory you're running from.

**Python**: Use the same idea as above.

**Linux**: Find out how your IDE sets the working directory when you run, or put the resource in the build directory, or use a relative
path to run the executable from the directory where the resource is located.

### Show an image in C++

Here's some code to show an image. Get [<tt>sample.jpg</tt> from here](img/sample.jpg).

    #include <iostream>
    #include <opencv2/opencv.hpp> // This includes all of OpenCV. You could use just opencv2/highgui.hpp.

    using namespace cv;           // Without this you would have to prefix every OpenCV call with cv::
    using namespace std;          // Without this you would have to prefix every C++ standard library call with std::

    int main(int argc, char* argv[])
    {
        int iKey = -1;
        string sFilename = "sample.jpg";
        Mat matImage = imread(sFilename);
        if (matImage.empty())
        {
            cout << "No image to show" << endl;
            return 1;
        }
        imshow("Input image", matImage);
        // Wait up to 5s for a keypress
        iKey = waitKey(5000);
        cout << "Key output value: " << iKey << endl;    
        return 0;
    }

### Show an image in Python

Things are a bit simpler in Python:

    import cv2

    if __name__ == '__main__':
        path = 'sample.jpg'
        img = cv2.imread(path)
        if img is None:
            print('No image to show')
        else
            cv2.imshow('Input image', img)
            # Wait up to 5s for a keypress
            cv2.waitKey(5000);

### Show a video in C++

Here's how to show a video in C++:

    #include <opencv2/opencv.hpp>
    #include <iostream>

    using namespace cv;
    using namespace std;

    // In C++, you can define constants variable using #define
    #define VIDEO_FILE "robot.mp4"
    #define ROTATE false

    int main(int argc, char** argv)
    {
        Mat matFrameCapture;
        Mat matFrameDisplay;
        int iKey = -1;

        // Open input video file
        VideoCapture videoCapture(VIDEO_FILE);
        if (!videoCapture.isOpened()) {
            cerr << "ERROR! Unable to open input video file " << VIDEO_FILE << endl;
            return -1;
        }

        // Capture loop
        while (iKey != int(' '))        // play video until user presses <space>
        {
            // Get the next frame
            videoCapture.read(matFrameCapture);
            if (matFrameCapture.empty())
            {
                // End of video file
                break;
            }

            // We can rotate the image easily if needed.
    #if ROTATE
            rotate(matFrameCapture, matFrameDisplay, RotateFlags::ROTATE_180);   //rotate 180 degree and put the image to matFrameDisplay
    #else
            matFrameDisplay = matFrameCapture;
    #endif

            float ratio = 480.0 / matFrameDisplay.rows;
            resize(matFrameDisplay, matFrameDisplay, cv::Size(), ratio, ratio, INTER_LINEAR); // resize image to 480p for showing

            // Display
            imshow(VIDEO_FILE, matFrameDisplay); // Show the image in window named "robot.mp4"
            iKey = waitKey(30); // Wait 30 ms to give a realistic playback speed
        }
        return 0;
    }

### Show a video in Python

Now let's do the same in Python:

    import cv2
    import numpy as np
    import sys

    VIDEO_FILE = 'robot.mp4'
    ROTATE = False

    if __name__ == '__main__':
    
        key = -1;

        # Open input video file
        videoCapture = cv2.VideoCapture(VIDEO_FILE);
        if not videoCapture.isOpened():
            print('Error: Unable to open input video file', VIDEO_FILE)
            sys.exit('Unable to open input video file')

        width  = videoCapture.get(cv2.CAP_PROP_FRAME_WIDTH)   # float `width`
        height = videoCapture.get(cv2.CAP_PROP_FRAME_HEIGHT)  # float `height`

        # Capture loop 
        while (key != ord(' ')):        # play video until user presses <space>
            # Get the next frame
            _, matFrameCapture = videoCapture.read()
            if matFrameCapture is None:
                # End of video
                break

            # Rotate if needed
            if ROTATE:
                _, matFrameDisplay = cv2.rotate(matFrameCapture, cv2.ROTATE_180)
            else:
                matFrameDisplay = matFrameCapture;

            ratio = 480.0 / height
            dim = (int(width * ratio), int(height * ratio))
            # resize image to 480p for display
            matFrameDisplay = cv2.resize(matFrameDisplay, dim)

            # Show the image in window named "robot.mp4"
            cv2.imshow(VIDEO_FILE, matFrameDisplay)
            key = cv2.waitKey(30)


### In-Lab Exercises

1. Use <code>waitKey()</code> to wait for the user to press <tt>&lt;space&gt;</tt> to advance to the next frame or 'q' to quit. Check the [documentation for waitKey()](https://docs.opencv.org/4.3.0/d7/dfc/group__highgui.html#ga5628525ad33f52eab17feebcfba38bd7), change the delay parameter to 0 for an infinite wait, and perform the necessary action on a 'spacebar' or 'q' key.

2. Display the full 1080p or 720p frame from the video without making the display window too big for your desktop. Take a look at the [documentation for namedWindow()](https://docs.opencv.org/4.3.0/d7/dfc/group__highgui.html#ga5afdf8410934fd099df85c75b2e0888b) and figure out which flags you should use to set up your display window to be resizable but keep the aspect ratio and display the expanded GUI.

3. Next we probably want to give the user some useful information. Check out the [documentation for displayOverlay()](https://docs.opencv.org/4.3.0/dc/d46/group__highgui__qt.html#ga704e0387318cd1e7928e6fe17e81d6aa) and add some explanatory information for the user about frame number, total frames, and user control actions.

4. Last little detail: currently, if the user closes the window, the program doesn't exit. Modify your program to exit when image display window is closed. Hint: try <code>cv::getWindowProperty(VIDEO_FILE, cv::WND_PROP_VISIBLE)</code>.

## Getting four points for a homography

Now, we'd like to allow the user to select four points comprising a square in the real world then compute a rectifying homography for that square, thus rectifying the entire ground plane from the robot's point of view.

Check out the [documentation for setMouseCallback()](https://docs.opencv.org/4.3.0/d7/dfc/group__highgui.html#ga89e7806b0a616f6f1d502bd8c183ad3e). Experiment with it until you can get four mouse clicks without interfering with the other GUI functions such as pan/tilt/zoom. Check that you are getting the image coordinates rather than the window coordinates of the mouse clicks.

The steps of getting 4 points for homography are:
 1. Do the video captures loop
 2. press any key to pause the screen and show a new pop-up for showing the pausing image
 3. Use the mouse click 4 points.
 
Try the sample code below.

### C++ variables

Create <tt>Mat</tt> variables to store images and a vector for storing points, then declare a <tt>mouseHandler()</tt> function. To keep
things simple, use global variables:

    Mat matPauseScreen, matResult, matFinal;
    Point point;
    vector<Point> pts;
    int var = 0;
    int drag = 0;

    // Create mouse handler function
    void mouseHandler(int, int, int, int, void*);


### Python variables

    matResult = None
    matFinal = None
    matPauseScreen = None

    point = (-1, -1)
    pts = []
    var = 0 
    drag = 0


### C++ mouse handler

    // An OpenCV mouse handler function has 5 parameters
    
    void mouseHandler(int event, int x, int y, int, void*)
    {
        if (var >= 4) // If we already have 4 points, do nothing
            return;
        if (event == EVENT_LBUTTONDOWN) // Left button down
        {
            drag = 1; // Set it that the mouse is in pressing down mode
            matResult = matFinal.clone(); // copy final image to draw image
            point = Point(x, y); // memorize current mouse position to point var
            if (var >= 1) // if the point has been added more than 1 points, draw a line
            {
                line(matResult, pts[var - 1], point, Scalar(0, 255, 0, 255), 2); // draw a green line with thickness 2
            }
            circle(matResult, point, 2, Scalar(0, 255, 0), -1, 8, 0); // draw a current green point
            imshow("Source", matResult); // show the current drawing
        }
        if (event == EVENT_LBUTTONUP && drag) // When Press mouse left up
        {
            drag = 0; // no more mouse drag
            pts.push_back(point);  // add the current point to pts
            var++; // increase point number
            matFinal = matResult.clone(); // copy the current drawing image to final image
            if (var >= 4) // if the homograpy points are done
            {
                line(matFinal, pts[0], pts[3], Scalar(0, 255, 0, 255), 2); // draw the last line
                fillPoly(matFinal, pts, Scalar(0, 120, 0, 20), 8, 0); // draw polygon from points

                setMouseCallback("Source", NULL, NULL); // remove mouse event handler
            }
            imshow("Source", matFinal);
        }
        if (drag) // if the mouse is dragging
        {
            matResult = matFinal.clone(); // copy final images to draw image
            point = Point(x, y); // memorize current mouse position to point var
            if (var >= 1) // if the point has been added more than 1 points, draw a line
            {
                line(matResult, pts[var - 1], point, Scalar(0, 255, 0, 255), 2); // draw a green line with thickness 2
            }
            circle(matResult, point, 2, Scalar(0, 255, 0), -1, 8, 0); // draw a current green point
            imshow("Source", matResult); // show the current drawing
        }
    }

### Python mouse handler

    def mouseHandler(event, x, y, flags, param):
        global point, pts, var, drag, matFinal, matResult   # call global variable to use in this function

        if (var >= 4):                           # if homography points are more than 4 points, do nothing
            return
        if (event == cv2.EVENT_LBUTTONDOWN):     # When Press mouse left down
            drag = 1                             # Set it that the mouse is in pressing down mode
            matResult = matFinal.copy()          # copy final image to draw image
            point = (x, y)                       # memorize current mouse position to point var
            if (var >= 1):                       # if the point has been added more than 1 points, draw a line
                cv2.line(matResult, pts[var - 1], point, (0, 255, 0, 255), 2)    # draw a green line with thickness 2
            cv2.circle(matResult, point, 2, (0, 255, 0), -1, 8, 0)             # draw a current green point
            cv2.imshow("Source", matResult)      # show the current drawing
        if (event == cv2.EVENT_LBUTTONUP and drag):  # When Press mouse left up
            drag = 0                             # no more mouse drag
            pts.append(point)                    # add the current point to pts
            var += 1                             # increase point number
            matFinal = matResult.copy()          # copy the current drawing image to final image
            if (var >= 4):                                                      # if the homograpy points are done
                cv2.line(matFinal, pts[0], pts[3], (0, 255, 0, 255), 2)   # draw the last line
                cv2.fillConvexPoly(matFinal, np.array(pts, 'int32'), (0, 120, 0, 20))        # draw polygon from points
            cv2.imshow("Source", matFinal);
        if (drag):                                    # if the mouse is dragging
            matResult = matFinal.copy()               # copy final images to draw image
            point = (x, y)                   # memorize current mouse position to point var
            if (var >= 1):                            # if the point has been added more than 1 points, draw a line
                cv2.line(matResult, pts[var - 1], point, (0, 255, 0, 255), 2)    # draw a green line with thickness 2
            cv2.circle(matResult, point, 2, (0, 255, 0), -1, 8, 0)         # draw a current green point
            cv2.imshow("Source", matResult)           # show the current drawing

### C++ <tt>main()</tt>

    int main(int argc, char** argv)
    {
        Mat matFrameCapture;
        Mat matFrameDisplay;
        int key = -1;

        // --------------------- [STEP 1: Make video capture from file] ---------------------
        // Open input video file
        VideoCapture videoCapture(VIDEO_FILE);
        if (!videoCapture.isOpened()) {
            cerr << "ERROR! Unable to open input video file " << VIDEO_FILE << endl;
            return -1;
        }

        // Capture loop
        while (key < 0)        // play video until press any key
        {
            // Get the next frame
            videoCapture.read(matFrameCapture);
            if (matFrameCapture.empty()) {   // no more frame capture from the video
                // End of video file
                break;
            }
            cvtColor(matFrameCapture, matFrameCapture, COLOR_BGR2BGRA);

            // Rotate if needed, some video has output like top go down, so we need to rotate it
    #if ROTATE
            rotate(matFrameCapture, matFrameCapture, RotateFlags::ROTATE_180);   //rotate 180 degree and put the image to matFrameDisplay
    #endif

            float ratio = 640.0 / matFrameCapture.cols;
            resize(matFrameCapture, matFrameDisplay, cv::Size(), ratio, ratio, INTER_LINEAR);

            // Display
            imshow(VIDEO_FILE, matFrameDisplay); // Show the image in window named "robot.mp4"
            key = waitKey(30);

            // --------------------- [STEP 2: pause the screen and show an image] ---------------------
            if (key >= 0)
            {
                matPauseScreen = matFrameCapture;  // transfer the current image to process
                matFinal = matPauseScreen.clone(); // clone image to final image
            }
        }

        // --------------------- [STEP 3: use mouse handler to select 4 points] ---------------------
        if (!matFrameCapture.empty())
        {
            var = 0;   // reset number of saving points
            pts.clear(); // reset all points
            namedWindow("Source", WINDOW_AUTOSIZE);  // create a windown named source
            setMouseCallback("Source", mouseHandler, NULL); // set mouse event handler "mouseHandler" at Window "Source"
            imshow("Source", matPauseScreen); // Show the image
            waitKey(0); // wait until press anykey
            destroyWindow("Source"); // destroy the window
        }
        else
        {
            cout << "You did not pause the screen before the video finish, the program will stop" << endl;
            return 0;
        }

        return 0;
    }

### Python <tt>\_\_main\_\_</tt>:

    if __name__ == '__main__':
        global matFinal, matResult, matPauseScreen         # call global variable to use in this function
        key = -1;

        # --------------------- [STEP 1: Make video capture from file] ---------------------
        # Open input video file
        videoCapture = cv2.VideoCapture(VIDEO_FILE);
        if not videoCapture.isOpened():
            print("ERROR! Unable to open input video file ", VIDEO_FILE)
            sys.exit('Unable to open input video file')

        width  = videoCapture.get(cv2.CAP_PROP_FRAME_WIDTH)   # float `width`
        height = videoCapture.get(cv2.CAP_PROP_FRAME_HEIGHT)  # float `height`

        # Capture loop 
        while (key < 0):        # play video until press any key
            # Get the next frame
            _, matFrameCapture = videoCapture.read()
            if matFrameCapture is None:   # no more frame capture from the video
                # End of video file
                break

            # Rotate if needed, some video has output like top go down, so we need to rotate it
            if ROTATE:
                _, matFrameDisplay = cv2.rotate(matFrameCapture, cv2.ROTATE_180)   #rotate 180 degree and put the image to matFrameDisplay
            else:
                matFrameDisplay = matFrameCapture;

            ratio = 640.0 / width
            dim = (int(width * ratio), int(height * ratio))
            # resize image to 480 * 640 for showing
            matFrameDisplay = cv2.resize(matFrameDisplay, dim)

            # Show the image in window named "robot.mp4"
            cv2.imshow(VIDEO_FILE, matFrameDisplay)
            key = cv2.waitKey(30)

            # --------------------- [STEP 2: pause the screen and show an image] ---------------------
            if (key >= 0):
                matPauseScreen = matFrameCapture     # transfer the current image to process
                matFinal = matPauseScreen.copy()     # copy image to final image

        # --------------------- [STEP 3: use mouse handler to select 4 points] ---------------------
        if (matFrameCapture is not None):
            var = 0                                             # reset number of saving points
            pts.clear()                                         # reset all points
            cv2.namedWindow("Source", cv2.WINDOW_AUTOSIZE)      # create a windown named source
            cv2.setMouseCallback("Source", mouseHandler)        # set mouse event handler "mouseHandler" at Window "Source"
            cv2.imshow("Source", matPauseScreen)                # Show the image
            cv2.waitKey(0)                                      # wait until press anykey
            cv2.destroyWindow("Source")                         # destroy the window
        else:
            print("No pause before end of video finish. Exiting.")


### Sample result

<img src="img/lab02-2.PNG" width="800"/>

### In-Lab Exercise

Global variables are not good practice in C++ (or any programming language, as far as I know).

Figure out how to allocate the state information used by your mouse handler on the heap, and request
HighGUI to pass a pointer to the state information to your callback. Instead of

    setMouseCallback("Source", mouseHandler, NULL);

use

    setMouseCallback("Source", mouseHandler, pState);

### Calculate Homography

Now, given the four points that you've collected from the user, calculate a homography to a rectified square with a desired number of pixels per meter, e.g., 1000. The tiles in the video from last week are 60cm x 60cm.

Note that OpenCV doesn't have a <code>null()</code> function like Matlab and Octave. Instead, you'll have to use the SVD operation to get the row of V associated with the smallest singular value of the design matrix. Test that you get the same result from OpenCV's SVD operation and Octave's <code>null()</code> function.

Once you've got a homography that works for the selected quadrilateral, you'll want to adjust it by incorporating a translation that maps the bounding box of the transformed image to a valid range starting at 0 for the uppermost Y coordinate and leftmost X coordinate.

### Display rectified image

Once you've got that working, you'll want to display a rectified version of the original image in a second HighGUI window as we step through the video. There are two ways to do this: directly (manually) and using <code>cv::warpPerspective()</code>. For this lab's learning outcomes, it would be better for you to do it directly/manually using bilinear interpolation. This will give you a better understanding of how to render image transforms. In your own work later, go ahead and use <code>warpPerspective()</code> or whatever suits you.

### Display original and rectified optical flows

Once you have the display of the original and ground-plane-rectified images working, add the optical flows from Lab 01 and render them in both images. This will be really useful.

Here are examples of using <code>getPerspectiveTransfrom()</code> to get the homograpy and <tt>warpPerspective()</tt> to get the warped image.


#### C++

Put this code in your main function.

    if (pts.size() == 4)
    {
        Point2f src[4];
        for (int i = 0; i < 4; i++)
        {
            src[i].x = pts[i].x * 1.0;
            src[i].y = pts[i].y * 1.0;
        }
        Point2f reals[4];
        reals[0] = Point2f(800.0, 800.0);
        reals[1] = Point2f(1000.0, 800.0);
        reals[2] = Point2f(1000.0, 1000.0);
        reals[3] = Point2f(800.0, 1000.0);

        Mat homography_matrix = getPerspectiveTransform(src, reals);
        std::cout << "Estimated Homography Matrix is:" << std::endl;
        std::cout << homography_matrix << std::endl;

        // perspective transform operation using transform matrix
        cv::warpPerspective(matPauseScreen, matResult, homography_matrix, matPauseScreen.size(), cv::INTER_LINEAR);
        imshow("Source", matPauseScreen);
        imshow("Result", matResult);

        waitKey(0);
    }

#### Python

Here is equivalent Python code.

    if (len(pts) == 4):
        src = np.array(pts).astype(np.float32)

        reals = np.array([(800, 800),
                          (1000, 800),
                          (1000, 1000),
                          (800, 1000)], np.float32)

        homography_matrix = cv2.getPerspectiveTransform(src, reals);
        print("Estimated Homography Matrix is:")
        print(homography_matrix)

        # perspective transform operation using transform matrix

        h, w, ch = matPauseScreen.shape
        matResult = cv2.warpPerspective(matPauseScreen, homography_matrix, (w, h), cv2.INTER_LINEAR)
        matPauseScreen = cv2.resize(matPauseScreen, dim)
        cv2.imshow("Source", matPauseScreen)
        matResult = cv2.resize(matResult, dim)
        cv2.imshow("Result", matResult)

        cv2.waitKey(0)
        
You should get a result similar to this:

<img src="img/lab02-3.PNG" width="600"/>

Is this image correct? Let's see what we get during the lab session.

## Exercises

1. Calculate the homography manually using the SVD of the linear system design matrix similar to the null space solution from class.

2. Compute the warped image manually using the inverse of the homography and bilinear interpolation in the input image.

3. Reuse the homography from your last run: If your program finds a file <tt>homography.yml</tt> in the working directory,
   it should read the homography from that file and use it to display the transformed image. For this, you will have to learn
   how OpenCV stores data files in YML format using the <tt>FileStorage</tt> class. When the user selects four points in a frame,
   output the resulting homography to the data file and re-read that file when the program starts again.
   This way, the user only has to do the "calibration" once.

### What to turn in

Write a brief report and turn in using Google Classroom before the next lab. Include your experience with both the in-class exercises
and the final exercises.

Make a video showing the frames of the original video with optical flows side-by-side with the rectified image and rectified optical flows, put the video online, and point to it on the Piazza discussion board.