# Initial Project Report

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# system import to access file on different directory 
import sys
sys.path.append("./util_")

# # wrangle and eda files
# import acquire_

# import prepare_
# import explore_
# import hyp_test_
# import final_visuals_
# import model_

## Goal

- Predicts single family properties tax assessed values.

- Find the key drivers of property value for single family properties.

    - Why do some properties have a much higher value than others when they are located so close to each other?
    - Why are some properties valued so differently from others when they have nearly the same physical attributes but only differ in location?
    - Is having 1 bathroom worse for property value than having 2 bedrooms?
 

## Acquire

I am using the Telco churn data from the Codeup database.

- Query the following columns:
    - `bedroomcnt, bathroomcnt,calculatedfinishedsquarefeet, taxvaluedollarcnt, yearbuilt, taxamount, fips`

- 2152863 rows and 7 columns.
- 7 numric and 0 object
- 22778 total null count (1% of the data)

## Prepare

- Remove all nulls (1% of the data)
- Remove duplicated rows.
- convert data type from float to int (bedrooms, bathrooms)
- remove outliers
- replace the fips code with county names and Encode county column.
- Split data into train, validate, and test. (`60/20/20 ` split)
- scale the humeric categorical and continuous variable and extract a copy of the original data frame.
    - `bedrooms, bathrooms, sqr_feet, year_built, tax_amount`

## Data Dictionary

| Column Name | Description |
| ----------- | ----------- |
| bedrooms | The number of bedrooms in the property. Bedrooms refer to individual rooms used primarily for sleeping and are typically found in residential properties. |
| bathrooms | The number of bathrooms in the property. Bathrooms refer to rooms containing a toilet, sink, and typically a bathtub or shower, used for personal hygiene. |
| sqr_feet | The total square footage of the property. Square footage is a measurement of the area covered by the property, indicating its size or living space. It is often used to estimate the property's value or to determine the price per square foot. |
| tax_value | The assessed value of the property for tax purposes. Tax value represents the estimated worth of the property as determined by the local tax authority. It is used to calculate property taxes. |
| year_built | The year in which the property was constructed or built. This indicates the age of the property and can be useful in assessing its condition or historical significance. |
| tax_amount | The amount of tax owed on the property. Tax amount refers to the actual dollar amount that needs to be paid in property taxes based on the assessed tax value and local tax rates. |
| county | The county where the property is located. County refers to a specific geographic region or administrative division within a state or country. It helps identify the property's location within a broader jurisdiction. |


## Explore

**Univariate Statistics**

- `bathrooms` and `bedrooms`: looks to have some normality with some outliers.

- `county`: three categories with Los Angeles having the largest porpotion.
- `sqr_feet`: positive skew (right skew) with out liers starting at aout 3500 sqr feet.
- `tax_amount`: bimodal distribution with two picks (modes) and contains outliers starting at about 12000 dollars.
- `tax_value`: bimodal distribution with two picks (modes) and contains outliers starting at about 100000 dollars. (this may be do to the `0` values in bedroom and bathrooms)
- `year built`: have some normality shape to it and some outliers . its pick is aroung the 1955.

**Bivariate Statistics - Categorical**

- `bathrooms_vs_tax_value`: `0` bathrooms has the lowest average while `5.5` have the highers average. `5 and 6` bathroom counts have the higers tax value.
- `bedrooms_vs_tax_value`: `0` bedrooms has the lowest average while `5` have the highers average.`5, 6and 7` bedroom counts have the higers tax value.
- `county_vs_tax_value`: `orange county` have a sighly higer average than the other counties but it's not by much. outliers might be affecting these counties. `Orange county` have the higers tax value.

**Bivariate Statistics - Continuous**

- `sqr_feet_vs_tax_value`: looks like a strong positive linear relationship between these two, but it is lossing the strength as it gains more sqare footage. this relationship looke to be strongest around 1500 sqr feet.
- `tax_amount_vs_tax_value`: looks like a moderate positive linear relationship between these two, but it is lossing the strength as it gains more dollars. this relationship looke to be strongest around 2500 dollars.
- `year)built_vs_tax_value`: looks like a positive linear relationship between these two. This relationship looke to be strongest around 1950 but countain miltiple little picks.

**Multivariate Statistics**

- `bathrooms_and_q`

### Stats Test

## Modeling

RMSE for Tweedie Regressor (Train and validation)
_____________________
- RMSE Training/In-Sample: 236386.03523479012, 
- RMSE Validation/Out-of-Sample:  236127.92329975966
- RMSE Difference:  -258.11193503046525
- R2_validate 0.32762366310882374")

## Recommendations