In [None]:
'''
INFO_511_ Application Exercise 02: Tucson Housing
Author: Todd Adams
Date: 04/06/2024
Description: We are answering questions related to the Tucson Housing dataset.
Note: I used VS Code and ChatGPT to help me write this code.
'''


**Exercise 1**

In [None]:
# Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the Dataset
tucson_housing = pd.read_csv("data/tucson_housing.csv")

garage_types = ["Single Family", "Townhouse"]

# Create a new column 'garage' based on the 'type' column
tucson_housing['garage'] = tucson_housing['type'].apply(lambda x: 'Garage' if x in garage_types else 'No garage')

In [None]:
# Plot: histogram of house prices faceted by garage status
sns.set(style="whitegrid")

g = sns.displot(
    data=tucson_housing,
    x="price",
    col="garage",
    hue="garage",
    bins=30,                # You can adjust bin count if needed
    palette="Set2",
    element="step",         # Try "step" or "bars"
    common_bins=True,       # Shared bins across facets
    height=5,
    aspect=1.2
)

# Add labels
g.set_axis_labels("Price ($)", "Count")
g.set_titles("{col_name}")
g.fig.suptitle("Distribution of House Prices in Tucson by Garage Type", fontsize=16, y=1.05)
plt.tight_layout()
plt.show()


Houses with garages (typically Single Family and Townhouses) tend to have higher prices, as shown by the rightward shift of their distribution.  
Houses without garages are more frequently clustered at lower price ranges.  
This supports the idea that having a garage correlates with higher home value in Tucson.

**Exercise 2**

In [None]:
# Scatter plot of price vs area, colored by year_built
plt.figure(figsize=(10, 6))

sns.scatterplot(
    data=tucson_housing,
    x="area",
    y="price",
    hue="year_built",
    palette="viridis",
    alpha=0.6
)

# Add LOWESS smooth line (nonparametric regression)
lowess = sm.nonparametric.lowess
smoothed = lowess(endog=tucson_housing['price'], exog=tucson_housing['area'])

# Plot the LOWESS curve
plt.plot(smoothed[:, 0], smoothed[:, 1], color="red", linewidth=2, label="LOWESS fit")

# Labels and title
plt.title("Relationship between House Area and Price\nConditioned by Year Built")
plt.xlabel("Area (sq ft)")
plt.ylabel("Price ($)")
plt.legend(title="Year Built", loc="upper left", bbox_to_anchor=(1, 1))
plt.tight_layout()
plt.show()


**Claim 1: Larger houses are priced higher**  
    The scatter plot shows a clear upward trend: as area increases, price tends to increase.  
    The LOWESS curve reinforces this trend, especially for houses under ~2500 sq ft.  
    This supports the idea that larger houses are generally more expensive.

**Claim 2: Newer houses are priced higher**  
    The hue gradient shows that many higher-priced homes have darker colors (indicating more recent years).  
    However, there are also some expensive older homes, suggesting that while newer houses often cost more, it's not universally true.

**Claim 3: Bigger and more expensive houses tend to be newer ones than smaller and cheaper ones**  
    There’s a cluster of large, high-priced houses in the upper-right with more recent construction dates.  
    Conversely, smaller, cheaper homes appear more frequently with older construction years.  
    This pattern supports the claim, though with some exceptions.