# BA4b: Train-Test Split for Regression (Bike Rentals Dataset)

**Objective:** Learn how to perform a basic train-test split using the `bike_rentals.csv` dataset and evaluate a linear regression model.

This is a foundational skill in predictive modeling. We'll use the number of bike rentals (`cnt`) as our target variable.

### Step 1: Import Required Libraries
We'll use `pandas` for data handling and `scikit-learn` for splitting and modeling.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

### Step 2: Load the Dataset
We load the bike rentals dataset which contains 731 observations of daily rental counts and relevant features.

In [None]:
df = pd.read_csv('bike_rentals.csv')
df.head()

### Step 3: Define Features and Target

- We drop the columns `cnt` (target) and `instant` (index-like column).
- The rest are used as features (`X`).
- `y` will be the total number of rentals (`cnt`).

In [None]:
X = df.drop(columns=['cnt', 'instant'])
y = df['cnt']

### Step 4: Split into Training and Test Sets

We'll split the dataset into 80% training and 20% testing. The `random_state` is set for reproducibility.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f'Training samples: {len(X_train)}')
print(f'Test samples: {len(X_test)}')

### Step 5: Train a Linear Regression Model

Now we fit a `LinearRegression` model using the training data.

In [None]:
model = LinearRegression()
model.fit(X_train, y_train)

### Step 6: Predict and Evaluate the Model

We make predictions on the test set and evaluate performance using R² score (coefficient of determination).

In [None]:
y_pred = model.predict(X_test)
r2 = r2_score(y_test, y_pred)
print(f'R² Score on Test Set: {r2:.3f}')

### Step 7: Try these

Here are some ways to extend this simple analysis:
- Try using only a subset of features (e.g., `temp`, `hum`, `windspeed`) and compare model performance.
- Add polynomial features to capture nonlinear relationships.
- Use a different model (e.g., `DecisionTreeRegressor`) and compare results.
- Explore how the model performs with different `test_size` values (e.g., 0.1, 0.3, 0.5).
- Visualize predictions vs actuals using a scatterplot.

Train-test split is simple but crucial — understanding it well prepares you for deeper validation strategies like cross-validation.