# Airbnb A/B Testing Project

# Project Aims
## Aims/Dashboard 1: Airbnb pricing analysis
*Analyze features of Airbnbs that are related to pricing*
### General pricing statistics:
- **Total listings:** *What is the total number of Airbnb listings in the dataset?*
- **Average daily price:** *What is the average Airbnb daily rate?*
- **Average service fee:** *What is the average service fee charged?*
- **Average total cost:** *What is the average total guest cost (price + service fee)?*
- **Price distribution:** *What is the median, quartiles, and range of pricing?*

### Geographic pricing:
#### Hypothesis - Manhattan properties command higher prices than other boroughs
- **Average price per neighborhood:** *Which neighborhood has the most expensive base rental rates?*
- **Price variance by neighborhood:** *Which neighborhoods have the highest price spreads?*
- **How service fees vary by property type/location:** *Do service fees vary systematically by location?*
- **Average total price per neighborhood:** *Total cost heatmap: Where are the most/least expensive areas for guests?*

### Property features impact on pricing:
#### Hypothesis 1: Entire homes command 40%+ premium over private rooms
#### Hypothesis 2: Newer properties (post-2010) have higher pricing power
- **Room type:** *How much more do entire homes cost vs. private rooms?*
- **Property age (construction year):** *Do newer properties command price premiums?*
- **Room type/Location interaction:** *How much more do entire homes cost vs. private rooms by location?*

### Host impact on pricing: 
#### Hypothesis 1: Professional hosts (5+ properties) price higher than casual hosts (< 3 properties)
#### Hypothesis 2: Verified hosts can charge premium prices
- **Host portfolio size effect:** *Do multi-property hosts charge more per listing?*
    - Method: A/B Test: Professional vs. Casual Host Performance
        - Groups:
            - Control: calculated_host_listings_count = 1 (Casual hosts)
            - Treatment: calculated_host_listings_count >= 5 (Professional hosts)
        - Primary metric: Average daily price difference
        - Secondary metric: Total guest cost (price + service fee)
- **Verification premium:** *What price advantage does host verification provide?*
    - Method: A/B Test: Host Verification Premium Test
        - Groups:
            - Control: host_identity_verified = 'unconfirmed'
            - Treatment: host_identity_verified = 'verified'
        - Primary metric: Average daily price difference
        - Secondary metric: Total guest cost (price + service fee)
        
### Policy impact on pricing:
#### Hypothesis 1: Instant bookable properties command premium for convenience
#### Hypothesis 2: Stricter cancellation policies allow for higher pricing
- **Instant booking availability:** *Do instant bookable properties charge more?*
    - Method: A/B Test: Instant Booking Impact Analysis
        - Groups:
            - Control: instant_bookable = FALSE
            - Treatment: instant_bookable = TRUE
        - Primary metric: Average daily price difference
        - Secondary metric: Total guest cost
- **Cancellation policy strictness:** *How do flexible vs. strict policies affect pricing?*
    - Stricter cancellation policies transfer booking risk from the host to the guest. When hosts offer this risk protection to themselves, they can potentially charge a premium for the reduced uncertainty in their revenue.
    - Method: A/B Test: Cancellation Policy Strategy Test
        - Groups:
            - Control: cancellation_policy = 'strict'
            - Treatment A: cancellation_policy = 'moderate'
            - Treatment B: cancellation_policy = 'flexible'
        - Primary metric: Average daily price by policy type
- **Minimum stay requirements:** *Is there optimal minimum stay length for pricing?*
   
   
## Aims/Dashboard 2: Airbnb reviews analysis
*Analyze features of Airbnbs that are related to reviews*
### General reviews statistics:
- **Total listings:** *What is the total number of Airbnb listings in the dataset?*
- **Total reviews:** *What are the total number of Airbnb reviews?*
- **Total active listings:** *How many properties have recent reviews? (last 3 months)*
- **Percentage active listings:** *What percentage of properties have recent reviews? (last 3 months)*
- **Average review rating:** *What is the average Airbnb review rating?* 

### Key success metrics:
- **Number of reviews :**
- **Review rating number:**

### Host impact on reviews: 
#### Hypothesis 1: Professional hosts receive more positive review activity
#### Hypothesis 2: Verified hosts receive higher review ratings
- **Host portfolio size effect:** *Do multi-property hosts maintain review quality across listings?*
    - Method: A/B Test: Professional vs. Casual Host Performance
        - Groups:
            - Control: calculated_host_listings_count = 1 (Casual hosts)
            - Treatment: calculated_host_listings_count >= 5 (Professional hosts)
        - Primary metric: Average rating review
        - Secondary metric: Number of reviews
- **Host verification impact:** *Does verification status affect guest review rating scores?*
    - Method: A/B Test: Host Verification Premium Test
        - Groups:
            - Control: host_identity_verified = 'unconfirmed'
            - Treatment: host_identity_verified = 'verified'
        - Primary metric: Average review rating difference
        - Secondary metric: Number of reviews
### Property features impact on reviews:
#### Hypothesis 1: Entire homes receive higher satisfaction ratings than shared spaces
#### Hypothesis 2: Newer properties receive more favorable reviews
- **Room type:** *Which room types generate the most positive guest feedback?*
- **Location (neighborhood):** *Do certain neighborhoods consistently receive better reviews?*
- **Property age (construction year):** *Are newer properties rated more favorably?*

### Policy impact on reviews:
#### Hypothesis 1: Instant booking reduces friction and improves guest satisfaction
#### Hypothesis 2: Flexible cancellation policies lead to better reviews
- **Instant booking availability:** *Does instant booking correlate with better review performance?*
    - A/B Test: Instant Booking Impact Analysis
        - Groups:
            - Control: instant_bookable = FALSE
            - Treatment: instant_bookable = TRUE
        - Primary metric: Average review rating difference
        - Secondary metric: Number of reviews
- **Cancellation policy strictness:** *How do cancellation policies affect guest satisfaction?*
    - A/B Test: Cancellation Policy Strategy Test
        - Groups:
            - Control: cancellation_policy = 'strict'
            - Treatment A: cancellation_policy = 'moderate'
            - Treatment B: cancellation_policy = 'flexible'
        - Primary metric: Average review rating difference by policy
        - Secondary metric: Number of reviews
- **Minimum stay requirements:** *What minimum stay requirements generate the best reviews?*


## Cross-Dashboard Analysis
### Price-Review Correlation
#### Hypothesis: Higher-priced properties receive better review ratings due to quality expectations
- **Method:** Correlation analysis between price tiers and review performance
- **Price Tiers:** Budget (<$100), Mid-range ($100-200), Premium ($200+)

## Study Limitations
- **Temporal:** Snapshot data, no longitudinal trends
- **Causality:** Observational study, correlation not causation
- **Selection Bias:** Only includes listed properties, not delisted ones
- **Proxy Metrics:** Reviews as proxy for bookings/satisfaction

# Study Methods

## Dataset Overview
- **Source:** Airbnb Open Data (NYC)
- **Sample Size:** [Insert total number of listings]
- **Time Period:** [Insert data collection period]
- **Geographic Scope:** New York City (5 boroughs)

## Statistical Approach
### A/B Testing Framework
- **Significance Level:** α = 0.05
- **Minimum Sample Size:** 30 per group (will calculate power analysis)
- **Statistical Tests:** Two-sample t-tests for continuous variables, Chi-square for categorical
- **Effect Size Calculation:** Cohen's d for practical significance

### Data Cleaning Criteria
- Exclude listings with missing price data
- Remove extreme outliers (prices >$1000 or <$10 per night)
- Filter for listings with at least 1 review for review analysis
- Include only listings active within last 12 months

## Key Variables
### Dependent Variables (Outcomes)
- **Pricing Analysis:** `price`, `service_fee`, total cost
- **Review Analysis:** `review_rate_number`, `number_of_reviews`

### Independent Variables (Predictors)
- **Geographic:** `neighbourhood_group`, `neighbourhood`
- **Property:** `room_type`, `construction_year`
- **Host:** `calculated_host_listings_count`, `host_identity_verified`
- **Policy:** `instant_bookable`, `cancellation_policy`, `minimum_nights`

# Code

## Set-up

### Load in packages

In [None]:
import pandas as pd
import numpy as np

### Load in data

In [None]:
text_df = pd.read_csv('clean_nus_sms.csv')

In [None]:
# view data
text_df.head(10)

Variable info is outlined [here](https://docs.google.com/spreadsheets/d/1b_dvmyhb_kAJhUmv81rAxl4KcXn0Pymz/edit?gid=1967362979#gid=1967362979)

In [None]:
## Data Quality Checks
- Verify price formatting consistency ($XXX vs XXX)
- Check for duplicate listings
- Validate geographic coordinates within NYC bounds
- Confirm review dates are logical (not future dates)

In [None]:
- **Instant booking availability:** *Do instant bookable properties charge more?*
    - Method: A/B Test: Instant Booking Impact Analysis
    - Groups:
        - Control: instant_bookable = FALSE
        - Treatment: instant_bookable = TRUE
    - **Primary Metric:** Average daily price difference
    - **Secondary Metrics:** Price variance, market positioning
    - **Expected Effect Size:** 5-15% price difference

In [None]:
-- A/B Test Analysis: Instant Booking
SELECT 
    instant_bookable as test_group,
    COUNT(*) as sample_size,
    AVG(CAST(REPLACE(price, '$', '') AS DECIMAL)) as avg_price,
    AVG(reviews_per_month) as avg_reviews_monthly,
    AVG(review_rate_number) as avg_rating
FROM airbnb_data 
WHERE instant_bookable IS NOT NULL 
GROUP BY instant_bookable;

Instant booking premium analysis
-- Does instant booking allow for higher pricing?

Business Insight: Do properties with instant booking command higher prices because they offer convenience? Or do budget properties use instant booking to compete?