1\. **2D minimization of a six-hump camelback function**

$$f(x,y) = \left(4-2.1x^2+\frac{x^4}{3} \right) x^2 +xy + (4y^2 -4)y^2$$

has multiple global and local minima.

- Find the global minima of this function
- How many global minima are there, and what is the function value at those points?
- What happens for an initial guess of $(x, y) = (0, 0)$?

Hints:

* Variables can be restricted to $-2 < x < 2$ and $-1 < y < 1$.
* Use `numpy.meshgrid()` and `pylab.imshow()` to find visually the regions.
* Use `scipy.optimize.minimize()`, optionally trying its optional arguments.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
from scipy.optimize import curve_fit
from scipy.special import factorial
from scipy import stats
import pandas as pd

%matplotlib notebook

In [13]:
def camelback(x,y):
    return (4 - 2.1 * x**2 + (x**4)/3)*x**2 +x*y +(4*y**2 -4)*y**2

x=np.linspace(-2,2,300)
y=np.linspace(-1,1,300)

# optimization x
optimization = optimize.minimize(camelback,[1], args=(0,)) 
best_x = optimization.x

# optimization y
optimization = optimize.minimize(lambda x,y: camelback(y,x), [1], args=(0.5,))
best_y1 = optimization.x

optimization = optimize.minimize(lambda x,y: camelback(y,x), [1], args=(-0.5,))
best_y2 = optimization.x

print('Global minima coordinates:\n')
print(best_x,best_y1)
print(best_x,best_y2)


fig=plt.figure()
fig.suptitle('Six-hump Camelback function')
ax1=fig.add_subplot(1, 2, 2, projection='3d')
ax2=fig.add_subplot(1, 2, 1)

xx,yy=np.meshgrid(x,y)
zz=camelback(xx,yy)

ax1.plot_surface(xx,yy,zz,cmap=plt.cm.jet)
ax1.set_xlabel('x')
ax1.set_ylabel('y')
ax1.set_zlabel('f(x, y)')

ax2.imshow(camelback(xx,yy),cmap=plt.cm.jet, extent=[-2, 2, -1, 1],origin="lower")
ax2.scatter([best_x], [best_y1], c='r',marker='.');
ax2.scatter([best_x], [best_y2], c='r',marker='.');
ax2.set_xlabel('x')
ax2.set_ylabel('y')

Global minima coordinates:

[-6.86148911e-09] [-0.73649881]
[-6.86148911e-09] [0.73649867]


<IPython.core.display.Javascript object>

Text(0, 0.5, 'y')

In [14]:
#initial guess at [0,0]

# optimization x
optimization = optimize.minimize(camelback,[0], args=(0,)) 
best_x = optimization.x

# optimization y
optimization = optimize.minimize(lambda x,y: camelback(y,x), [0], args=(0,))
best_y1 = optimization.x

fig=plt.figure()
plt.imshow(camelback(xx,yy),cmap=plt.cm.jet, extent=[-2, 2, -1, 1],origin="lower")
plt.scatter([best_x], [best_y1], c='r',marker='.');

<IPython.core.display.Javascript object>

2\. **Curve fitting of temperature in Alaska** 

The temperature extremes in Alaska for each month, starting in January, are given by (in degrees Celcius):

max:  `17,  19,  21,  28,  33,  38, 37,  37,  31,  23,  19,  18`

min: `-62, -59, -56, -46, -32, -18, -9, -13, -25, -46, -52, -58`

* Plot these temperatures.
* Find a suitable a function that can describe min and max temperatures. 
* Fit this function to the data with `scipy.optimize.curve_fit()`.
* Plot the result. Is the fit reasonable? If not, why?
* Is the time offset for min and max temperatures the same within the fit accuracy?

In [15]:
month=np.array([i for i in range(1,13)])
t_max=(17,  19,  21,  28,  33,  38, 37,  37,  31,  23,  19,  18)
t_min=(-62, -59, -56, -46, -32, -18, -9, -13, -25, -46, -52, -58)

def f(x,a,b,c,d):
    return a*np.cos(x*b+c)+d

print('Fitting function expression: a*cos(x*b+c)+d \n')

popt_max, pcov_max = curve_fit(f, month, t_max)
print('Parametres for max temperature fit:\n', popt_max)
print("a = %.2f +- %.2f" % (popt_max[0], np.sqrt(pcov_max[0,0])))
print("b = %.2f +- %.2f" % (popt_max[1], np.sqrt(pcov_max[1,1])))
print("c = %.2f +- %.2f" % (popt_max[2], np.sqrt(pcov_max[2,2])))
print("d = %.2f +- %.2f" % (popt_max[3], np.sqrt(pcov_max[3,3])),'\n')

popt_min, pcov_min = curve_fit(f, month, t_min)
print('Parametres for min temperature fit:\n', popt_min)
print("a = %.2f +- %.2f" % (popt_min[0], np.sqrt(pcov_min[0,0])))
print("b = %.2f +- %.2f" % (popt_min[1], np.sqrt(pcov_min[1,1])))
print("c = %.2f +- %.2f" % (popt_min[2], np.sqrt(pcov_min[2,2])))
print("d = %.2f +- %.2f" % (popt_min[3], np.sqrt(pcov_min[3,3])),'\n')

x_range=np.linspace(1,12,100)
y_range=np.linspace(-70,40,100)

fig=plt.figure()
fig.suptitle('Temperature in Alaska')
ax=fig.add_subplot()

ax.scatter(month,t_max, marker='.',c='r',label='T max')
ax.plot(x_range, f(x_range,*popt_max), c='orange', label='max_T fit')

ax.scatter(month,t_min, marker='.',c='b',label='T min')
ax.plot(x_range, f(x_range,*popt_min), c='lightblue',label='min_T fit')

#ax.fill_between(x_range, y_range, where = y_range > f(x_range,*popt_min) & y_range<f(x_range,*popt_max), color='lightblue')

ax.legend(loc='best')
ax.set_xlabel('Month')
ax.set_ylabel('Temperature [C˚]')
fig.show()

Fitting function expression: a*cos(x*b+c)+d 

Parametres for max temperature fit:
 [-10.58169955  -0.59227423   7.12331894  27.94963524]
a = -10.58 +- 0.41
b = -0.59 +- 0.02
c = 7.12 +- 0.15
d = 27.95 +- 0.43 

Parametres for min temperature fit:
 [-25.14962061  -0.60132051  -5.13556143 -36.71257798]
a = -25.15 +- 1.31
b = -0.60 +- 0.02
c = -5.14 +- 0.16
d = -36.71 +- 1.19 



<IPython.core.display.Javascript object>

3\. **Fit the residues**

Read the `data/residuals_261.pkl` file. If you haven't it already, download it from here:

```bash
wget https://www.dropbox.com/s/3uqleyc3wyz52tr/residuals_261.pkl -P data/
```

The feature name "residual" contains the residuals (defined as $y_i - \hat{y}_i$) of a linear regression as a function of the independent variable "distances".

- Considering only the "residual" feature, create an histogram with the appropriate binning and display it.
- Set the appropriate Poisson uncertainty for each bin (thus, for each bin, $\sigma_i = \sqrt{n_i}$, where $n_i$ is the number of entries in each bin)
- By looking at the distribution of the residuals, define an appropriate function and fit it to the histogram of the residuals
- Perform a goodness-of-fit test. Is the p-value of the fit satisfactory?

In [5]:
#!wget https://www.dropbox.com/s/3uqleyc3wyz52tr/residuals_261.pkl -P data/


In [6]:
a=np.load('data/residuals_261.pkl',allow_pickle=True)
df=pd.DataFrame(data=a.item())

df

Unnamed: 0,residuals,distances
0,1.100000,16.0
1,-1.950000,6.3
2,-7.200000,3.4
3,-4.150000,8.4
4,-6.590000,1.7
...,...,...
11126,-0.760000,14.4
11127,0.380000,7.5
11128,0.083333,1.7
11129,0.166667,19.1


In [7]:
print('Max value:',df['residuals'].max())
print('Min value:',df['residuals'].min())
print('Unique values:',df['residuals'].nunique())


def f(x,a,b,c,d):
    return a*np.exp(-pow(x-b,2)/(2*pow(c,2)))+d
    
n_bins=150
x=df['residuals']

%matplotlib notebook
fig=plt.figure()
df['residuals'].plot(kind='hist',bins=n_bins, alpha=0.6,label='Residuals',color='deepskyblue' )

y,binEdges = np.histogram(x,bins=n_bins)
bincenters = 0.5*(binEdges[1:]+binEdges[:-1])
yerr = np.sqrt(y)
plt.bar(bincenters, y, width=0, yerr=yerr)

y_mask = np.where(y>0)
y_filtred=y[y_mask]
yerr_filtered=yerr[y_mask]

bincenters_filtred=bincenters[y_mask]


popt, pcov = curve_fit(f, bincenters, y) ####
x_range=np.linspace(df['residuals'].min(),df['residuals'].max(),1000) 
plt.plot(x_range, f(x_range,*popt), c='red', linestyle='dashed' ,label='Gaussian fit',)

plt.scatter(bincenters,y,marker='.',color='blue')

plt.xlabel('Residual value')
plt.xlim(-165,15)
plt.yscale('log')
plt.legend(loc='best')

plt.show()

print('Fit parametres:\n', popt, '\n')

ndof = n_bins - 1
chi2 = np.sum(((y_filtred - f(bincenters_filtred,*popt))**2) / yerr_filtered)
pvalue = 1. - stats.chi2.cdf(chi2, ndof)
print('chi2:',chi2)
print("p-value:", pvalue)



Max value: 11.32000000000005
Min value: -160.8499999999989
Unique values: 3361


<IPython.core.display.Javascript object>

Fit parametres:
 [ 8.04776601e+03 -2.15429938e-02  5.93579807e-01  5.17853915e+00] 

chi2: 8799.431160055497
p-value: 0.0


4\. **Temperatures in Munich**

Get the following data file:

```bash
https://www.dropbox.com/s/7gy9yjl00ymxb8h/munich_temperatures_average_with_bad_data.txt
```

which gives the temperature in Munich every day for several years.


Fit the following function to the data:

$$f(t) = a \cos(2\pi t + b)+c$$

where $t$ is the time in years.

- Make a plot of the data and the best-fit model in the range 2008 to 2012.

   - What are the best-fit values of the parameters?

   - What is the overall average temperature in Munich, and what are the typical daily average values predicted by the model for the coldest and hottest time of year?

   - What is the meaning of the $b$ parameter, and what physical sense does it have?


- Now fit the data with the function $g(x)$, which has 1 more parameter than $f(x)$.
$$g(x) = a \cos(2\pi b t + c)+d$$
   - What are the RSS for $f(x)$ and $g(x)$?
   - Use the Fisher F-test to determine whether the additional parameter is necessary.

In [8]:
#!wget https://www.dropbox.com/s/7gy9yjl00ymxb8h/munich_temperatures_average_with_bad_data.txt 

In [9]:
df1=pd.read_csv('munich_temperatures_average_with_bad_data.txt',names=['Time','Temperature'],delimiter=" ")

df2=df1.loc[df1['Time'].between(2008,2012, inclusive=True)].copy()
df2

Unnamed: 0,Time,Temperature
4748,2008.00274,-2.94444
4749,2008.00548,-2.66667
4750,2008.00821,-2.66667
4751,2008.01095,-2.00000
4752,2008.01369,-1.94444
...,...,...
6204,2011.98836,3.44444
6205,2011.99110,1.27778
6206,2011.99384,2.88889
6207,2011.99658,1.83333


In [10]:
def g(t,a,b,c):
    return a*np.cos(2*np.pi*t+b)+c #this is the function f(t)

popt_mun, pcov_mun = curve_fit(g, df2['Time'],df2['Temperature'])
t_range=np.linspace(df2['Time'].min(),df2['Time'].max(),1500) 



fig=plt.figure()
fig.suptitle('Temperature in Munich from 2008 to 2012')

plt.plot(df2['Time'],df2['Temperature'],label='Temperatures',c='lightblue')
plt.plot(t_range, g(t_range,*popt_mun), c='red', linestyle='dashed' ,label='Sinusoidal fit f(t)',)

plt.xlabel('Year')
plt.ylabel('Temperature [C˚]')
plt.legend(loc='best')

mean=np.mean(g(df2['Time'],*popt_mun)) #this should be the c parameter
maximal=np.max(g(df2['Time'],*popt_mun))
minimal=np.min(g(df2['Time'],*popt_mun))

print('Fit parametres:\n', popt_mun, '\n')
print('Annual average temperature:',mean,'C˚')#its actually c
print('Maximal annual temperature:',maximal,'C˚')
print('Minimal annual temperature:',minimal,'C˚')


<IPython.core.display.Javascript object>

Fit parametres:
 [-9.98813368 12.33302301  9.38411487] 

Annual average temperature: 9.384097739241064 C˚
Maximal annual temperature: 19.37219556156115 C˚
Minimal annual temperature: -0.6037388683203027 C˚


In [11]:
def k(t,a,b,c,d):
    return a*np.cos(2*np.pi*t*b+c)+d #this is the function g(t)

popt_mun1, pcov_mun1 = curve_fit(k, df2['Time'],df2['Temperature'])

fig=plt.figure()
fig.suptitle('Temperature in Munich from 2008 to 2012')

plt.plot(df2['Time'],df2['Temperature'],label='Temperatures',c='lightblue')
plt.plot(t_range, k(t_range,*popt_mun1), c='blue', linestyle='dashed' ,label='Sinusoidal fit g(t)')

plt.xlabel('Year')
plt.ylabel('Temperature [C˚]')
plt.legend(loc='best')

print('Fit parametres:\n', popt_mun1, '\n')

<IPython.core.display.Javascript object>

Fit parametres:
 [-9.98218217  1.00144393 -5.90317112  9.39812374] 



In [12]:
ssr_f = np.sum((df2['Temperature'] - g(df2['Time'],*popt_mun))**2)
ssr_g = np.sum((df2['Temperature'] - k(df2['Time'],*popt_mun1))**2)

tss = np.sum((np.mean(df2['Temperature']) - df2['Temperature'])**2)

rsq_f = 1 - ssr_f / tss
rsq_g = 1 - ssr_g / tss

print('SSR for f(t):',ssr_f)
print('SSR for g(t):',ssr_g,'\n')

print('R for f(t):',np.sqrt(rsq_f))
print('R for g(t):',np.sqrt(rsq_g),'\n')

def Ftest(ssr_1, ssr_2, ndof_1, ndof_2, nbins, verbose=False):
    F = ((ssr_1 - ssr_2)/(ndof_2 - ndof_1)) / (ssr_2/(nbins - ndof_2))
    CL = 1. - stats.f.cdf(F, ndof_2 - ndof_1, nbins - ndof_2)
    if verbose: print("CL: %.3f" % CL, ", additional parameter necessary:", "YES" if CL < 0.10 else "NO")
    return CL

ndof_f=2
ndof_g=3
N=len(df2['Temperature'])

print('F-Test:')
cl_f_vs_g = Ftest(ssr_f, ssr_g, ndof_f, ndof_g, N, verbose=True)

SSR for f(t): 34359.85859996652
SSR for g(t): 34352.794053978876 

R for f(t): 0.8243745993663526
R for g(t): 0.8244145541802156 

F-Test:
CL: 0.584 , additional parameter necessary: NO
