# Bike Sharing Regression Assignment (Completed)

Regression Algorithms`

Peer Graded Assignments

In [None]:
import pandas as pd
from pathlib import Path

# Load dataset
path = Path('/mnt/data/bike_assignment/day.csv')
df = pd.read_csv(path)
df['dteday'] = pd.to_datetime(df['dteday'])
print('Data shape:', df.shape)
display(df.head())
display(df[['temp','cnt']].describe())

In [None]:
df['month_name'] = df['dteday'].dt.month_name()
months_order = ['January','February','March','April','May','June','July','August','September','October','November','December']
df['month_name'] = pd.Categorical(df['month_name'], categories=months_order, ordered=True)
display(df[['dteday','month_name','temp','cnt']].head(10))

In [None]:
import statsmodels.formula.api as smf
formula1 = 'cnt ~ C(month_name, Treatment(reference="January"))'
Model1 = smf.ols(formula=formula1, data=df).fit()
print(Model1.summary())

### Model1 Results (Simple Linear Regression: `cnt ~ month_name`)

**R-squared (Model1):** 0.3906

**Reference month:** January (the intercept represents its predicted `cnt`).

**Predicted `cnt` for January (Model1, intercept):** 2176.339

**Predicted `cnt` for June (Model1):** 5772.367

**Interpretation of R-squared:**

- The R-squared above indicates the proportion of variance in `cnt` explained by month alone. Months capture seasonality, so we expect a non-trivial R-squared but not extremely high because day-to-day factors (like weather) also affect ridership.


In [None]:
formula2 = 'cnt ~ temp + C(month_name, Treatment(reference="January"))'
Model2 = smf.ols(formula=formula2, data=df).fit()
print(Model2.summary())

### Model2 Results (Multiple Regression: `cnt ~ temp + month_name`)

**R-squared (Model2):** 0.4469

**Why R-squared changed from Model1:**

- Adding `temp` provides extra explanatory power because temperature influences bike usage (warmer days often increase ridership). Therefore, R-squared for Model2 is higher than Model1, indicating that `temp` explains additional variability in `cnt` beyond what month alone explains.

**Comparison of month coefficients between Model1 and Model2:**

- Coefficients for month dummy variables change when `temp` is added because `temp` is correlated with month (seasonal temperatures). Model1's month coefficients partially captured temperature effects; Model2's month coefficients represent month effects after controlling for temperature (adjusted effects). This is expected due to confounding between month and temperature.

**Predicted `cnt` for January at temp=0.25 (Model2):** 2260.863


### Coefficient estimates (selected)

**Model1 coefficients:**

- Intercept: 2176.339
- C(month_name, Treatment(reference="January"))[T.February]: 478.960
- C(month_name, Treatment(reference="January"))[T.March]: 1515.919
- C(month_name, Treatment(reference="January"))[T.April]: 2308.561
- C(month_name, Treatment(reference="January"))[T.May]: 3173.435
- C(month_name, Treatment(reference="January"))[T.June]: 3596.028
- C(month_name, Treatment(reference="January"))[T.July]: 3387.339
- C(month_name, Treatment(reference="January"))[T.August]: 3488.081
- C(month_name, Treatment(reference="January"))[T.September]: 3590.178
- C(month_name, Treatment(reference="January"))[T.October]: 3022.887
- C(month_name, Treatment(reference="January"))[T.November]: 2070.845
- C(month_name, Treatment(reference="January"))[T.December]: 1227.468

**Model2 coefficients:**

- Intercept: 702.077
- C(month_name, Treatment(reference="January"))[T.February]: 87.502
- C(month_name, Treatment(reference="January"))[T.March]: 555.116
- C(month_name, Treatment(reference="January"))[T.April]: 852.313
- C(month_name, Treatment(reference="January"))[T.May]: 939.044
- C(month_name, Treatment(reference="January"))[T.June]: 804.845
- C(month_name, Treatment(reference="January"))[T.July]: 151.134
- C(month_name, Treatment(reference="January"))[T.August]: 544.234
- C(month_name, Treatment(reference="January"))[T.September]: 1220.567
- C(month_name, Treatment(reference="January"))[T.October]: 1473.028
- C(month_name, Treatment(reference="January"))[T.November]: 1242.968
- C(month_name, Treatment(reference="January"))[T.December]: 681.350
- temp: 6235.144

Note: For both models, January is the reference level so there is no explicit coefficient named for January — the Intercept represents January's baseline prediction (at temp=0 for Model2).

## Submission

Saved notebook: `mod1_peer_review_Completed.ipynb`.

Please download, rename to `mod1_peer_review_<YourFirstName>_<YourLastName>.ipynb`, and submit to the course platform. 

*If you'd like, I can rename the file for you — tell me your first and last name and I'll create a renamed copy.*