In [None]:
%matplotlib inline

# Polynomial Regression Example

We'll now use a population dataset, but now let's try to fit the data using polynomial regression.


## Installation

First, let's install the required libraries then import them:

In [None]:
%pip install matplotlib
%pip install numpy
%pip install sklearn
%pip install pandas

In [None]:
# Import the libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error, r2_score

## Viewing the data

Next, let's see the data we're working with

In [None]:
# Load the population dataset
populations = pd.read_csv("https://raw.githubusercontent.com/datasets/population/master/data/population.csv")

In [None]:
populations.head()

## Choosing our input

Let's use the level as our input feature and the salary as the output feature.

Let's only look at Afghanistan for now.

In [None]:
# Create a variable called year and assign it the Year column of Afghanistan
# HINT: Afghanistan has a country code of AFG

# Create a variable called pop and assign it the Value column of Afghanistan


<details><summary>Click to cheat</summary>

```python
# Create a variable called year and assign it the Year column of Afghanistan
# HINT: Afghanistan has a country code of AFG
populations2 = populations[populations["Country Code"] == "AFG"]
year = populations2['Year'].to_numpy()
# Create a variable called pop and assign it the Value column of Afghanistan
pop = populations2['Value'].to_numpy()
```
</details>

## Observing our Dataset

Let's take a look at what the data looks like.

In [None]:
plt.scatter(year, pop)
plt.show()

## Creating the model

In [None]:
# choose a degree
# degree = 

# Create polynomial regression object


# Transform the model into a linear one


# Create the linear model


# Train the model using the training sets


# Get the predictions


<details><summary>Click to cheat</summary>

```python
# choose a degree
degree = 3

# Create polynomial regression object
poly = PolynomialFeatures(degree=degree, include_bias=False)

# Transform the model into a linear one
year_poly = poly.fit_transform(year.reshape(-1, 1))

# Create the linear model
model = LinearRegression()

# Train the model using the training sets
model.fit(year_poly, pop)

# Get the predictions
pop_pred = model.predict(year_poly)
```
</details>

## Evaluting our model

We can print the raw numbers and plot!

In [None]:
# The weights
print("Weight:", model.coef_)
# The bais
print("Bias:", model.intercept_)
# The mean squared error
print("Mean squared error: %.2f" % mean_squared_error(pop, pop_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination: %.2f" % r2_score(pop, pop_pred))

In [None]:
# Plot outputs
plt.scatter(year, pop, color="black")
plt.scatter(year, pop_pred, color="green")
plt.plot(year, pop_pred, color="blue", linewidth=1)

plt.xlabel("Year")
plt.ylabel("Population")

plt.show()

## Choosing our degree

Now let's be smarter and choose a degree using the BIC.

In [None]:
# Our BIC function
def bic(n, k, ssr):
    return k * np.log(n) + n * np.log(ssr)

# Our squared error function
def squared_error(trueY, predY):
    return mean_squared_error(trueY, predY) * len(trueY)

# Let's create our array of BIC values
bics = []
for i in range(2, 20):
    # Create the model of degree i-1

    # Fit the data


    # Predict the model


    # Calculate the BIC


bics = np.array(bics)
k = np.arange(2, 20, 1)

<details><summary>Click to cheat</summary>

```python
# Let's create our array of BIC values
bics = []
for i in range(2, 20):
    # Create the model of degree i-1
    poly = PolynomialFeatures(degree=i - 1, include_bias=False)
    year_poly = poly.fit_transform(year.reshape(-1, 1))
    model = LinearRegression()

    # Fit the data
    model.fit(year_poly, pop)

    # Predict the model
    pop_pred = model.predict(year_poly)

    # Calculate the BIC
    ssr = squared_error(pop, pop_pred)
    bics.append(bic(len(pop), i, ssr))

bics = np.array(bics)
k = np.arange(2, 20, 1)

```
</details>

In [None]:
plt.close()
plt.plot(k, bics, color="blue", linewidth=1)

plt.xlabel("k")
plt.ylabel("BIC")

plt.show()