# CMPE 547 HW 2

### 1) Visualize the dataset:

In [222]:
%matplotlib notebook

import matplotlib.pyplot as plt
import numpy as np

x = np.loadtxt('data.txt', usecols=(0,1),skiprows=1)

x1=x[:,0];
x2=x[:,1];

plt.figure()
plt.axis('equal')
plt.xlabel('x1')
plt.ylabel('x2')
plt.plot(x1,x2, '.')
plt.show()

<IPython.core.display.Javascript object>

### 2) Propose a generative model:

The visual data resambles an ellipse when plotted. By taking the mean of data we can approximate the centre:

In [182]:
meanX1 = sum(x1) / float(len(x1))
meanX2= (sum(x2) / float(len(x2)))
print ('Mean x1:', meanX1)
print ('Mean x2:', meanX2)

Mean x1: 4.89903
Mean x2: 4.85242


Which results the numbers above. Therefore we can take $(5,5)$ as the centre of the ellipse. It's also visible that the vertical boundaries of the proposed ellipse are close to $(5,2)$ and $(5,8)$ also $(4,5)$ and $(6,5)$ being the horizontal ones. We can propose the following equation for our model:

$ \dfrac{(x_1-5)^2}{1^2} + \dfrac{(x_2-5)^2}{3^2} = 1 $ 

### 3) Implement the generative model

Now that we have a basic model we can try improving it by first comparing it visually to our dataset.

In [254]:
import matplotlib as mpl
from matplotlib.patches import Ellipse

center = (5,5)
a = 2
b = 6
ell = mpl.patches.Ellipse(xy=center, width=a, height=b, fc='none', ec='red')
fig, ax = plt.subplots()
plt.plot(x1,x2, '.')
ax.add_patch(ell)
plt.axis('equal')
ax.autoscale()

<IPython.core.display.Javascript object>

The figure above represents how the ellipse fits out data set. However our data is not perfect and contain errors with respect to the proposed ellipse horizontally. What we can do is to come up with an error factor and try to approximate it. To accomplish this we can examine how the error factor is distributed by calculating the distances between each point and the according point on the ellipse on horizontal axis. Then we'll obtain a distribution for the error on x1-axis. As there are no points beyond vertical boundries, calculating horizontal error is sufficient.

To calculate the respective x1 value we'll use the equation below in terms of x2:

$x_1 = 5 \pm \sqrt{1-\dfrac{(x_2-5)^2}{9}} $

In [184]:
# Calculate the distance between each point and the ellipse on x1 axis.
import math
x1Dist = []

# Iterate over points
for i in range(len(x1)):
    # To calculate x1 distance find x1 point on the ellipse using x2 coordinate of the point
    
    # if on the right side of the center (5,5), take to point on the right else take the left one
    if x1[i]>5:
        x_1 = 5 + math.sqrt(1-((x2[i]-5)**2)/9)
    else:
        x_1 = 5 - math.sqrt(1-((x2[i]-5)**2)/9)
    distX1 = x_1-x1[i]
    x1Dist.append(distX1)
fig2 = plt.figure()
plt.hist(x1Dist, 10)

<IPython.core.display.Javascript object>

(array([  1.,   2.,   5.,  29.,  35.,  18.,   4.,   1.,   2.,   3.]),
 array([-0.44978725, -0.3444858 , -0.23918434, -0.13388289, -0.02858143,
         0.07672002,  0.18202148,  0.28732294,  0.39262439,  0.49792585,
         0.6032273 ]),
 <a list of 10 Patch objects>)

Figure above shows how the errors are destributed on x1 axis. As it fits a normal distribution visually, we can calculate the mean and come up with a distribution of error values. We assume it's a normal distribution with the mean 0 and st. dev. 0.2. Below in orange color is a sample from the normal distribution with mentioned parameters.

In [238]:
mean = np.mean(x1Dist)
randoms = np.random.normal(mean, 0.2, len(x1Dist))

fig2 = plt.figure()

ax1= fig2.add_subplot(1, 1, 1)
ax1.hist(x1Dist, 10)
ax1.hist(randoms,10)

<IPython.core.display.Javascript object>

(array([  4.,   1.,   7.,  10.,  20.,  18.,  16.,  11.,   5.,   8.]),
 array([-0.463499  , -0.36977758, -0.27605616, -0.18233475, -0.08861333,
         0.00510808,  0.0988295 ,  0.19255092,  0.28627233,  0.37999375,
         0.47371517]),
 <a list of 10 Patch objects>)

Additionally x1 values of dataset can be assumed to come from a uniform distribution. Combining all together we can generate the data in the next step.

### 4) Visualize

As we'll be generating data in terms of x1, we need to infer the x2 data using the ellipse equation.

$x_2 = 5 + \sqrt{9-9*(x-5)^2}$

In [250]:
x1gen = np.random.uniform(4,6,len(x1))
err = np.random.normal(0, 0.2, len(x1))
ran = np.random.randn(len(x1))
x2gen = []

for i in range(len(x1)):
    if(ran[i]<0):
        x2gen.append((5 + math.sqrt(9-9*(x1gen[i]-5)**2)) + err[i] )
    else:
        x2gen.append((5 - math.sqrt(9-9*(x1gen[i]-5)**2)) + err[i] )

plt.figure()
plt.axis('equal')
plt.xlabel('x1')
plt.ylabel('x2')
plt.plot(x1,x2, '.', label = 'original dataset')
plt.plot(x1gen,x2gen, '.r' , label='generated data')
plt.legend(loc='best');

<IPython.core.display.Javascript object>

### 5) Discuss comparison

Besides visual comparison we can divide our graph into "grids" and calculate if number of points in each grid are close to equal. We can also examine quantitively if both data are correlated.