# Hypothesis Design

This notebook defines the statistical hypotheses used in this phase of the project in order to answer business-relevant questions about ride demand in New York City.

## H1 - Difference in demand between weekdays and weekends

### Business question
Is average ride demand per zone-hour different on weekdays compared to weekends?

### Null hypothesis (H0)
The mean ride demand per zone-hour is the same for weekdays and weekends.

### Alternative hypothesis (H1)
The mean ride demand per zone-hour is different for weekdays and weekends.

### Variable
- Target variable: demand (rides per zone-hour)
- Grouping variable: is_weekend (0 = weekday, 1 = weekend)

### Planned test
Two-sample statistical test on zone-hour demand.

## H2 - Peak hour versus off-peak demand

### Business question
Is demand during peak hours significantly higher than during off-peak hours?

### Null hypothesis (H0)
The mean ride demand per zone-hour during peak hours is equal to that during off-peak hours.

### Alternative hypothesis (H1)
The mean ride demand per zone-hour during peak hours is higher than during off-peak hours.

### Variable
- Target variable: demand
- Grouping variable: peak period (peak vs off-peak)

Peak hours will be defined based on exploratory analysis.

### Planned test
One-sided two-sample statistical test.

## H3 - Impact of weather on demand

### Business question
Does adverse weather affect ride demand?

### Null hypothesis (H0)
Mean ride demand per zone-hour is the same on adverse weather days and normal weather days.

### Alternative hypothesis (H1)
Mean ride demand per zone-hour is different on adverse weather days.

### Variable
- Target variable: demand
- Grouping variable: weather condition (adverse vs normal)

Weather information will be merged at the date or hour level.

## H4 - Spatial heterogeneity in demand

### Business question
Do different city zones exhibit significantly different average demand levels?

### Null hypothesis (H0)
All zones have the same mean ride demand per zone-hour.

### Alternative hypothesis (H1)
At least one zone has a different mean ride demand per zone-hour.

### Variable
- Target variable: demand
- Grouping variable: zone_id

### Planned test
One-way analysis of variance (ANOVA).

## H5 - Monthly demand differences

### Business question
Does average ride demand differ between months?

### Null hypothesis (H0)
Mean ride demand per zone-hour is equal across months.

### Alternative hypothesis (H1)
At least one month has a different mean ride demand per zone-hour.

### Variable
- Target variable: demand
- Grouping variable: month

### Planned test
One-way analysis of variance (ANOVA).

## Notes on assumptions

The proposed hypothesis tests assume:
- independence between zone-hour observations
- approximate normality of sample means due to large sample size
- comparable variance across groups

Given the very large sample size, small effect sizes may still become statistically significant.
Therefore, effect sizes and confidence intervals will be reported in addition to p-values.