https://www.nrs.fs.usda.gov/pubs/rn/rn_ne311.pdf
https://meridian.allenpress.com/fpj/article/72/1/11/475647/Nowcasting-of-Lumber-Futures-Price-with-Google
https://www.bls.gov/news.release/ppi.nr0.htm


In [None]:
# Import the necessary libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

In [None]:
url = 'https://raw.githubusercontent.com/jonathan-barrios/project_datasets/main/datasets/Lumber%20Futures%20Historical%20Data.csv'
df = pd.read_csv(url)

# EDA (exploratory data analysis)
Now that the data is loaded, let's take a look at the first few rows to get a 
feel for the dataset:

In [None]:
df.head()

In [None]:
df.describe()

In [None]:
df.info()

# Clean and Wrangle Data
We can see that the dataset has a Date column and a Price column. The Date column is in the format YYYY-MM-DD, and the Price column is the price of lumber in dollars per thousand board feet.

Before we can build a linear regression model, we need to clean and prepare the data. First, let's convert the Date column to a datetime data type:

In [None]:
df['Date'] = pd.to_datetime(df['Date'])# Convert the Date column to a datetime data type
df['Price'] = df['Price'].str.replace(',', '') # remove random commas
df['Price'] = pd.to_numeric(df['Price']) # some non-integer values in the Price column. This is causing the astype method to raise a ValueError.

In [None]:
# verify data type change
df.info()

# Handle Missing Data

In [None]:
df.isnull().sum()

It looks like there are no missing values in the dataset. That's great!


# Visualize Data
Now, let's plot the Price column to get a sense of how the lumber prices have changed over time:

In [None]:
sns.lineplot(x='Date', y='Price', data=df)

Make it a little bigger

In [None]:
ax = sns.lineplot(x='Date', y='Price', data=df)
ax.figure.set_size_inches(20, 7)

From the plot, it looks like lumber prices have been generally increasing over time, with some ups and downs along the way.

Next steps: Now that we have a sense of the data, we can move on to building a linear regression model.
