# Sum of squares

When the data contains the observations $(x_1,y_1),(x_2,y_2),\dots,(x_n,y_n)$ and the line $y=ax+b$ is fitted to the data, the error can be computed with the sum of squares formula $$\sum_{i=1}^{n}(y_i-(ax_i+b))^2.$$ For example, when the data is $(1,1),(3,2),(5,3)$ and the line is $y=x-1$ (i.e., $a=1$ and $b=-1$), the error is $(1-(1-1))^2+(2-(3-1))^2+(3-(5-1))^2=2.$ Implement a class `SquareSum` with the methods

-   `add(x, y)`: add an observation to the data
-   `calc(a, b)`: return the sum of squares error for the given line parameters

The time complexity of both methods should be $O(1)$.

In a file `squaresum.py`, implement a class `SquareSum` according to the following template:

In [None]:
class SquareSum:
    def __init__(self):
        # TODO

    def add(self, x, y):
        # TODO

    def calc(self, a, b):
        # TODO  

if __name__ == "__main__":
    s = SquareSum()
    s.add(1, 1)
    s.add(3, 2)
    s.add(5, 3)
    print(s.calc(1, 0)) # 5
    print(s.calc(1, -1)) # 2
    print(s.calc(0.5, 0.5)) # 0
    s.add(4, 2)
    print(s.calc(0.5, 0.5)) # 0.25

## Attempt 1

In [5]:
# Time Complexity of O(n):

class SquareSum:
    def __init__(self):
        self.coord = []

    # Time Complexity of O(1)
    def add(self, x, y):
        self.coord.append((x, y))

    # Time Complexity of O(n)
    def calc(self, a, b):
        total = 0
        for i in self.coord:
            total += (i[1] - (i[0]*a + b))**2
        return total

if __name__ == "__main__":
    s = SquareSum()
    s.add(1, 1)
    s.add(3, 2)
    s.add(5, 3)
    print(s.calc(1, 0)) # 5
    print(s.calc(1, -1)) # 2
    print(s.calc(0.5, 0.5)) # 0
    s.add(4, 2)
    print(s.calc(0.5, 0.5)) # 0.25

5
2
0.0
0.25


In [39]:
y = 15
x = 4
a = 5
b = 3
print((y - (x*a + b))**2)
print((y**2 - 2*y*(a*x + b) + (a*x + b)**2))
print((y**2 - 2*y*(a*x + b) + (a*x + b) * (a*x + b)))
print((y**2 - 2*y*(a*x + b) + (a**2 * x**2)  + 2*a*x*b + b**2 ))
print((y**2 - 2*y*a*x - 2*y*b + (a**2 * x**2)  + 2*a*x*b + b**2 ))
print((y**2 - (2*x*y)*(a) - 2*y*(b) + x**2*(a**2)  + x*(2*a*b) + b**2 ))

64
64
64
64
64
64


In [3]:
# y**2 - (2*x*y)*(a) - 2*y*(b) + x**2*(a**2)  + x*(2*a*b) + b**2
class SquareSum:
    def __init__(self):
        self.coord = []
        self.x = 0
        self.y = 0
        self.xy = 0
        self.xsquare = 0
        self.ysquare = 0
        

    def add(self, x, y):
        self.coord.append((x, y))
        self.x += x
        self.y -= 2*y
        self.xy -= 2*x*y
        self.xsquare += x**2
        self.ysquare += y**2

    def calc(self, a, b):
        return self.ysquare + (self.xy * a) + (self.y * b) + (self.xsquare * (a)**2) + (self.x * 2 * a * b) + (len(self.coord) * b**2)

if __name__ == "__main__":
    s = SquareSum()
    s.add(1, 1)
    s.add(3, 2)
    s.add(5, 3)
    print(s.calc(1, 0)) # 5
    print(s.calc(1, -1)) # 2
    print(s.calc(0.5, 0.5)) # 0
    s.add(4, 2)
    print(s.calc(0.5, 0.5)) # 0.25

5
2
0.0
0.25


## Solution

An efficient solution can be found if we expand the sum of squares formula: $$(y_i-(ax_i+b))^2 = a^2 x_i^2 + y_i^2 - 2a x_i y_i + 2ab x_i - 2b y_i + b^2$$ The class stores the sums of the expressions $x_i$, $y_i$, $x_i y_i$, $x_i^2$ and $y_i^2$. Then the sum of squares error can be computed in constant time for any parameters $a$ and $b$.

In [None]:
class SquareSum:
    def __init__(self):
        self.n = 0
        self.xsum = 0
        self.ysum = 0
        self.xysum = 0
        self.x2sum = 0
        self.y2sum = 0
 
    def add(self, x, y):
        self.n += 1
        self.xsum += x
        self.ysum += y
        self.xysum += x*y
        self.x2sum += x*x
        self.y2sum += y*y
 
    def calc(self, a, b):
        return a*a*self.x2sum + self.y2sum - 2*a*self.xysum + \
               2*a*b*self.xsum - 2*b*self.ysum + self.n*b*b 