# Hotel Cancellation Prediction Project

## Dataset Management

### Current Dataset
- **Source**: Kaggle - Hotel Booking Demand Dataset
- **Local Path**: `data/hotel_bookings.csv`
- **Shape**: Check output above

### For Future Kaggle Downloads

To download any Kaggle dataset directly to this project's data folder, use:

```python
# Example: Download a different dataset
data_path = download_kaggle_dataset("username/dataset-name")

# Then load it
df_new = pd.read_csv(f"{data_path}/filename.csv")
```

### Benefits of This Approach
1. **Organized**: All datasets in one `data/` folder
2. **Reusable**: Function works for any Kaggle dataset
3. **Portable**: Dataset travels with your project
4. **Version Control**: Can track dataset changes if needed


In [3]:
import kagglehub
import os
import shutil

def download_kaggle_dataset(dataset_name, project_root=None):
    """
    Download a Kaggle dataset and copy it to the project's data folder.
    
    Args:
        dataset_name (str): Kaggle dataset name in format 'username/dataset-name'
        project_root (str): Path to project root. If None, uses current directory.
    
    Returns:
        str: Path to the local data folder
    """
    if project_root is None:
        project_root = os.getcwd()
    
    # Create data folder if it doesn't exist
    data_folder = os.path.join(project_root, 'data')
    os.makedirs(data_folder, exist_ok=True)
    
    # Download dataset from Kaggle
    print(f"Downloading dataset: {dataset_name}")
    kaggle_path = kagglehub.dataset_download(dataset_name)
    print(f"Kaggle download path: {kaggle_path}")
    
    # Copy files to local data folder
    for file in os.listdir(kaggle_path):
        src = os.path.join(kaggle_path, file)
        dst = os.path.join(data_folder, file)
        if os.path.isfile(src):
            shutil.copy2(src, dst)
            print(f"Copied: {file}")
    
    print(f"Dataset files copied to: {data_folder}")
    return data_folder

# Download the hotel booking dataset
data_path = download_kaggle_dataset("jessemostipak/hotel-booking-demand")

Downloading dataset: jessemostipak/hotel-booking-demand
Kaggle download path: /Users/franciscoteixeirabarbosa/.cache/kagglehub/datasets/jessemostipak/hotel-booking-demand/versions/1
Copied: hotel_bookings.csv
Dataset files copied to: /Users/franciscoteixeirabarbosa/Dropbox/Random_scripts/predict_hotel_cancellations/data


In [4]:
import pandas as pd

# Load the dataset from local data folder
df = pd.read_csv("data/hotel_bookings.csv")

print(f"Dataset shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
df.head()

Dataset shape: (119390, 32)
Columns: ['hotel', 'is_canceled', 'lead_time', 'arrival_date_year', 'arrival_date_month', 'arrival_date_week_number', 'arrival_date_day_of_month', 'stays_in_weekend_nights', 'stays_in_week_nights', 'adults', 'children', 'babies', 'meal', 'country', 'market_segment', 'distribution_channel', 'is_repeated_guest', 'previous_cancellations', 'previous_bookings_not_canceled', 'reserved_room_type', 'assigned_room_type', 'booking_changes', 'deposit_type', 'agent', 'company', 'days_in_waiting_list', 'customer_type', 'adr', 'required_car_parking_spaces', 'total_of_special_requests', 'reservation_status', 'reservation_status_date']


Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,...,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
0,Resort Hotel,0,342,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.0,0,0,Check-Out,2015-07-01
1,Resort Hotel,0,737,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.0,0,0,Check-Out,2015-07-01
2,Resort Hotel,0,7,2015,July,27,1,0,1,1,...,No Deposit,,,0,Transient,75.0,0,0,Check-Out,2015-07-02
3,Resort Hotel,0,13,2015,July,27,1,0,1,1,...,No Deposit,304.0,,0,Transient,75.0,0,0,Check-Out,2015-07-02
4,Resort Hotel,0,14,2015,July,27,1,0,2,2,...,No Deposit,240.0,,0,Transient,98.0,0,1,Check-Out,2015-07-03
