In [1]:
# import numpy, pandas, and stats models
import pandas as pd
import numpy as np
import statsmodels.api as sm

In [14]:
# Load the data into a pandas DataFrame. Calculate summary statistics and make sure everything looks sensible.
df = pd.read_csv("retaildata.csv", sep='\t')
df.describe()

Unnamed: 0,dayofyear,product,price,units,month1,month2,month3,month4,month5,month6,month7,month8,month9,month10,month11,month12
count,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0
mean,183.0,1.5,46.507945,2980.834247,0.084932,0.076712,0.084932,0.082192,0.084932,0.082192,0.084932,0.084932,0.082192,0.084932,0.082192,0.084932
std,105.438271,0.500343,17.612156,927.817395,0.278971,0.266317,0.278971,0.274845,0.278971,0.274845,0.278971,0.278971,0.274845,0.278971,0.274845,0.278971
min,1.0,1.0,25.5,1759.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,92.0,1.0,30.0,2095.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,183.0,1.5,44.25,3203.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,274.0,2.0,65.0,3635.75,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,365.0,2.0,65.0,5860.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [16]:
df.head()

Unnamed: 0,dayofyear,product,price,units,month1,month2,month3,month4,month5,month6,month7,month8,month9,month10,month11,month12
0,1,1,30.0,2064,1,0,0,0,0,0,0,0,0,0,0,0
1,1,2,65.0,3616,1,0,0,0,0,0,0,0,0,0,0,0
2,2,1,30.0,2222,1,0,0,0,0,0,0,0,0,0,0,0
3,2,2,65.0,3454,1,0,0,0,0,0,0,0,0,0,0,0
4,3,1,30.0,2026,1,0,0,0,0,0,0,0,0,0,0,0


In [18]:
# Estimate a regression of units sold on price
y = df['units']
X = df['price']
X = sm.add_constant(X)
model = sm.OLS(y, X).fit()
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                  units   R-squared:                       0.740
Model:                            OLS   Adj. R-squared:                  0.740
Method:                 Least Squares   F-statistic:                     2071.
Date:                Fri, 11 Oct 2024   Prob (F-statistic):          4.63e-215
Time:                        16:42:02   Log-Likelihood:                -5531.8
No. Observations:                 730   AIC:                         1.107e+04
Df Residuals:                     728   BIC:                         1.108e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        873.3970     49.519     17.638      0.0

In [21]:
df.columns

Index(['dayofyear', 'product', 'price', 'units', 'month1', 'month2', 'month3',
       'month4', 'month5', 'month6', 'month7', 'month8', 'month9', 'month10',
       'month11', 'month12'],
      dtype='object')

In [23]:
# regression with confounders
y = df['units']
X = df[['price', 'dayofyear', 'product']]
X = sm.add_constant(X)
model_confounders = sm.OLS(y, X).fit()
print(model_confounders.summary())

                            OLS Regression Results                            
Dep. Variable:                  units   R-squared:                       0.819
Model:                            OLS   Adj. R-squared:                  0.818
Method:                 Least Squares   F-statistic:                     1092.
Date:                Fri, 11 Oct 2024   Prob (F-statistic):          1.55e-268
Time:                        16:55:55   Log-Likelihood:                -5400.3
No. Observations:                 730   AIC:                         1.081e+04
Df Residuals:                     726   BIC:                         1.083e+04
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         49.2020     74.270      0.662      0.5

In [25]:
# Regression adjusting for product and months
y = df['units']
X = df[[ 'product', 'price', 'month1', 'month2', 'month3',
       'month4', 'month5', 'month6', 'month7', 'month8', 'month9', 'month10',
       'month11', 'month12']]
X = sm.add_constant(X)
model_confounders = sm.OLS(y, X).fit()
print(model_confounders.summary())

                            OLS Regression Results                            
Dep. Variable:                  units   R-squared:                       0.924
Model:                            OLS   Adj. R-squared:                  0.923
Method:                 Least Squares   F-statistic:                     670.2
Date:                Fri, 11 Oct 2024   Prob (F-statistic):               0.00
Time:                        17:06:56   Log-Likelihood:                -5082.4
No. Observations:                 730   AIC:                         1.019e+04
Df Residuals:                     716   BIC:                         1.026e+04
Df Model:                          13                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        386.1035     41.375      9.332      0.0

1. Verify Steve's estimate. Report the regression coefficient on price, from a regression of units on price, rounded to the nearest two decimal places.

-  the regression coefficient on price in my regression is 45.31. This verifies Steves estimate that every additional dollar the price increases is associated with 45 units higher sales if I were to round to the nearest whole unit.

2. Brainstorm possible confounds. Report at least two possible confounds, not necessarily in the data. Explain why you think they are confounds.

- A possible confound could be the season becauase sales may be higher during certain times of the year such as holidays like christmass. During these times, customers are more inclined to purchase products regardless of the price. This could mislead that price increases drive sales when time of year is an important confounding factor.
- Promotional events could also be a confounding variable because advertisments could affect the relationship between price and units sold. When prices are increased due to strong marketing time, the increase in sales could be dure to this instead of just the increase in price.

3. Estimate the association between units and price, adjusting for product and for month of year. Report the regression coefficient on price, rounded to the nearest two decimal places.

- the regression coefficient on price in my regression is -25.73. This implies that every additional dollar the price increases is associated with 25.73 units less of sales.

4. Explain why your answer in (3) is different from in (1)

- The answer in 3 is likely different from 1 due to the addition of confounding factors such as product type and month of year effects. In the first regression, the model did not account for these variables, leading to a potentially biased estimate of the relationship between price and sales. By adjusting for product and month, we provide a better picture of how price impacts sales.

5. Describe a possible unmeasured confound (it's fine if it's one you mentioned in question 2). How do you think adjusting for it would affect your answer, if you could observe it?

- Customer sentiment could be an unmeasured confounding variable. If a customer perceieves a producr as more valuble when the price increase, thsi could lead to an increase in sales regardless of the actual change in price. If this could be measured and included in the model, it would likely reduce the estimated effect of price on units sold becuase it adds a factor that shows that price is not the only thing that effects sales.

6. Does your analysis indicate that the current pricing policy should be changed? If so, how? If not, why not?

- Since the adjusted regression for price is significantly lower than the original estimate, I believe that it suggets that th pricing pocly should be changed. I think that something they could do would be to increase promotions during peak seasonal sales periods in order to drive sales without just effecting the prive.