# Import Libraries
- requests to fetch the web page
- bs4 (Beautiful Soup) to scrape the table from the web page
- pandas to load the table in to a dataframe
- sklearn (scikit-learn) to fit the linear regression model and calculate its error
- matplotlib to visualize the model
- scipy to test the correlation

In [15]:
import math
import requests
from bs4 import BeautifulSoup
import pandas as pd
from sklearn import linear_model
from sklearn.metrics import mean_absolute_error
import matplotlib.pyplot as plt
from scipy.stats import pearsonr

# Understanding a Simple Weight Prediction Model
## Loading and Preparing the Data

**Load Webpage**

In [16]:
url = 'http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights'
page = requests.get(url)

**Extract Content**

In [17]:
soup = BeautifulSoup(page.content, 'html.parser')
tbl = soup.find("table",{"class":"wikitable"})

**Convert to DataFrame**

In [18]:
height_weight_df = pd.read_html(str(tbl))[0][['Height(Inches)','Weight(Pounds)']]

**Count Records**

In [19]:
num_records = height_weight_df.shape[0]
print(num_records)

200


**Place in x and y variables**

In [20]:
height_weight_df
x = height_weight_df['Height(Inches)'].values.reshape(num_records, 1)
y = height_weight_df['Weight(Pounds)'].values.reshape(num_records, 1)

In [21]:
### Fitting a Linear Regression Model
model = linear_model.LinearRegression()
LR = model.fit(x,y)

**Generate Equation**

In [22]:
print("ŷ = " + str(model.intercept_[0]) + " + " + str(model.coef_.T[0][0]) + " x₁")

ŷ = -106.02770644878126 + 3.432676129271628 x₁


**Compute Mean Absolute Error**

In [23]:
y_pred = model.predict(x)
mae = mean_absolute_error(y, y_pred)
print(mae)

7.758737380388219


**Plot Regression Line ± Error**

In [None]:
plt.figure(figsize=(12,12))
plt.rcParams.update({'font.size': 16})
plt.scatter(x, y,  color='black')
plt.plot(x, y_pred, color='blue', linewidth=3)
plt.plot(x, y_pred + mae, color='lightgray')
plt.plot(x, y_pred - mae, color='lightgray')
plt.title('')
plt.xlabel('Height(Inches)')
plt.ylabel('Weight(Pounds)')
plt.grid(True)
plt.show()

**Calculate Pearsons Correlation Coefficient**

In [24]:
corr, pval = pearsonr(x[:,0], y[:,0])
print(corr)

0.5568647346122995


**Two-tailed p-value**

In [25]:
print(pval < 0.05)

True
