# Regression Analysis

We are going to use WNBA player statistics to learn about [linear regression](https://en.wikipedia.org/wiki/Linear_regression) as well as [interpolation](https://en.wikipedia.org/wiki/Interpolation) and [extrapolation](https://en.wikipedia.org/wiki/Extrapolation).

Let's start by importing the code libraries that we will need.

In [None]:
import pandas as pd
import plotly.express as px
import numpy as np
print('Libraries imported')

Now we can import the WNBA data and display the first few rows.

In [None]:
url = "https://raw.githubusercontent.com/Data-Dunkers/data/main/WNBA/player/wnba_player_stats_all.csv"
df = pd.read_csv(url)
df.head()

To visualize some data, let's create a scatterplot with minutes played (`MIN`) on the x-axis and points (`PTS`) on the y-axis.

In [None]:
px.scatter(df, x='MIN', y='PTS', title='Points versus Minutes Played', trendline='ols')

### Linear Regression

We can see a trend in the data, more minutes played correlates to more points made. The trendline, calculated using the [ordinary least squares](https://en.wikipedia.org/wiki/Ordinary_least_squares) method, rises to the right.

We can use the [polynomial fit](https://numpy.org/doc/stable/reference/generated/numpy.polynomial.polynomial.Polynomial.fit.html#numpy.polynomial.polynomial.Polynomial.fit) function from the [numpy](https://numpy.org) library to calculate equation of the trendline.

In [None]:
b, m = np.polynomial.Polynomial.fit(df['MIN'], df['PTS'], 1).convert().coef
print(f'slope = {m}')
print(f'y-intercept = {b}')
print('equation of the line:')
print(f'PTS = {m:.2f} Ã— MIN + {b:.2f}')



## Questions

1. 
2. 
3. 