<a id="TableOfContents"></a>
# TABLE OF CONTENTS:
<li><a href='#imports'>Imports</a></li>
<li><a href="#Q1">Question 1</a></li>
<li><a href='#Q2'>Question 2</a></li>
<li><a href='#Q3'>Question 3</a></li>

Let's set up an example scenario as perspective for our regression exercises using the Zillow dataset.

As a Codeup data science graduate, you want to show off your skills to the Zillow data science team in hopes of getting an interview for a position you saw pop up on LinkedIn. You thought it might look impressive to build an end-to-end project in which you use some of their Kaggle data to predict property values using some of their available features; who knows, you might even do some feature engineering to blow them away. Your goal is to predict the values of single unit properties using the obervations from 2017.

In these exercises, you will complete the first step toward the above goal: acquire and prepare the necessary Zillow data from the zillow database in the Codeup database server.

<a id='imports'></a>
# IMPORTS:
<li><a href='#TableOfContents'>Table of Contents</a></li>

In [1]:
# Vectorization and tables
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Stats
from scipy import stats

# Connect to sql server
import env

# .py files
import wrangle

<a id='Q1'></a>
# Question 1
<li><a href='#TableOfContents'>Table of Contents</a></li>

### 1. Acquire bedroomcnt, bathroomcnt, calculatedfinishedsquarefeet, taxvaluedollarcnt, yearbuilt, taxamount, and fips from the zillow database for all 'Single Family Residential' properties.

In [2]:
zillow = wrangle.acquire()

In [3]:
zillow.sample(5)

Unnamed: 0,bedroomcnt,bathroomcnt,calculatedfinishedsquarefeet,taxvaluedollarcnt,yearbuilt,taxamount,fips,propertylandusedesc
1531032,3.0,2.0,1234.0,56912.0,1952.0,736.13,6037.0,Single Family Residential
1885367,3.0,2.0,1400.0,176680.0,1955.0,2166.5,6037.0,Single Family Residential
1306871,3.0,2.0,1143.0,287662.0,1920.0,3544.41,6037.0,Single Family Residential
261445,3.0,2.0,1325.0,902000.0,1948.0,10942.65,6037.0,Single Family Residential
1985480,3.0,2.0,1323.0,51722.0,1943.0,1482.15,6037.0,Single Family Residential


<a id='Q2'></a>
# Question 2
<li><a href='#TableOfContents'>Table of Contents</a></li>

### 2. Using your acquired Zillow data, walk through the summarization and cleaning steps in your wrangle.ipynb file like we did above. You may handle the missing values however you feel is appropriate and meaningful; remember to document your process and decisions using markdown and code commenting where helpful.

### Issues To Fix:

- Column Types
    - bedroomcnt
        - Float to int(19 - categorical)
        - 1-16, 18, 25
    - bathroomcnt
        - Float(38 - categorical)?
        - 0-20, 32
        - 0.5-12.5, 14.5, 19.5
        - 1.75
    - taxvaluedollarcount
        - Float(10580 - categorical)?
    - yearbuilt
        - Float to int(153 - categorical)?
    - fips
        - Float to object(3 - categorical)
        - 6037, 6059, 6111
- Column Values
    - bedroomcnt
        - 11 na
        - Fill with mode
    - bathroomcnt
        - 11 na
        - Fill with mode
    - calculatedfinishedsquarefeet
        - 8484 na
        - Fill with mean
    - taxvaluedollarcount
        - 493 na
        - Fill with mean
    - yearbuilt
        - 9337 na
        - Fill with ?
    - taxamount
        - 4442 na
        - Fill with mean

In [4]:
# Fix bedroomcnt
# Fill na with mode (3.0)
# Change float to int type

# Fix bathroomcnt
# Fill na with mode (2.0)

# Fix calculatedfinishedsquarefeet
# Rename sqrft
# Fill na with mean (1862.9)

# Fix taxvaluedollarcnt
# Rename assessedvalue
# Fill na with mean(461896.2)

# Fix yearbuilt
# Fill na with mode (1955)
# Change float to int type

# Fix taxamount
# Fill na with mean (5634.87)

# Fix fips
# Rename county
# Codes to county names
# 6037 Los Angeles
# 6059 Orange
# 6111 Ventura

# Add state column
# All values are California

# Get dummy values for ONLY county

# Test wrangle.prepare() function
zillow = wrangle.prepare()

In [5]:
# Verify dtypes changed
zillow.dtypes

bedroomcnt                                                  int64
bathroomcnt                                               float64
sqrft                                                     float64
assessedvalue                                             float64
yearbuilt                                                   int64
taxamount                                                 float64
county                                                     object
propertylandusedesc                                        object
state                                                      object
county_Los Angeles                                          uint8
county_Orange                                               uint8
county_Ventura                                              uint8
propertylandusedesc_Inferred Single Family Residential      uint8
propertylandusedesc_Single Family Residential               uint8
dtype: object

In [6]:
# Verify no nulls/na values
zillow.isna().sum()

bedroomcnt                                                0
bathroomcnt                                               0
sqrft                                                     0
assessedvalue                                             0
yearbuilt                                                 0
taxamount                                                 0
county                                                    0
propertylandusedesc                                       0
state                                                     0
county_Los Angeles                                        0
county_Orange                                             0
county_Ventura                                            0
propertylandusedesc_Inferred Single Family Residential    0
propertylandusedesc_Single Family Residential             0
dtype: int64

<a id='Q3'></a>
# Question 3
<li><a href='#TableOfContents'>Table of Contents</a></li>

### 3. Store all of the necessary functions to automate your process from acquiring the data to returning a cleaned dataframe with no missing values in your wrangle.py file. Name your final function wrangle_zillow.

In [7]:
print('\033[32mDAS IS COMPLETED JA\033[0m')

[32mDAS IS COMPLETED JA[0m


In [8]:
wrangle_zillow = wrangle.wrangle_zillow()
wrangle_zillow.sample()

Unnamed: 0,bedroomcnt,bathroomcnt,sqrft,assessedvalue,yearbuilt,taxamount,county,propertylandusedesc,state,county_Los Angeles,county_Orange,county_Ventura,propertylandusedesc_Inferred Single Family Residential,propertylandusedesc_Single Family Residential
221167,3,1.0,1224.0,193304.0,1953,2828.99,Los Angeles,Single Family Residential,California,1,0,0,0,1
