# Analysing determinants of wine price using wine review data

In this example, we are going to use a subset of a large [dataset of wine reviews](https://www.kaggle.com/zynicide/wine-reviews) to examine the relationship between critic-ascribed quality ("points") and the price of wines.

The analysis is going to be carried out using regression.

Using Pandas, import the data into a dataframe and examine the first few rows of it.

Plot critic score ("points") versus price. To do this, use Matplotlib.pyplot's scatter function.

In this example, we are first going to assume a linear regression model of the form:

\begin{equation}
\text{price}_i = a + b *\text{points}_i + \epsilon_i
\end{equation}

where $\epsilon_i$ represents an error term and $a$ and $b$ are parameters to estimate.

We can find the least squares estimates of parameters using Sklearn via the following code:

`from sklearn import linear_model
reg = linear_model.LinearRegression()
X = df["points"]
y = df["price"]
X = X.iloc[:, ].values.reshape(-1, 1)
y = y.iloc[:, ].values.reshape(-1, 1)
reg.fit(X, y)
a = reg.intercept_[0]
b = reg.coef_[0, 0]`

Run this code to obtain estimates of the parameters.

What does your model suggest is the average associated increase in wine price for a 1 unit change in points?

Plot your estimated model regression line on top of the data. To do this use the following steps:

1. Create a vector of grid point values between 80 and 100 using numpy's linspace.
2. Calculate predicted prices for each point value using your estimates of $a$ and $b$
3. Reuse your scatter plot code from above and run `plt.plot(points, price)` after your previous `plt.scatter(..)` command to overlay the regression line on top.