# Explore Exercises
Our Zillow scenario continues:

As a Codeup data science graduate, you want to show off your skills to the ```Zillow``` data science team in hopes of getting an interview for a position you saw pop up on LinkedIn. You thought it might look impressive to build an end-to-end project in which you use some of their Kaggle data to predict property values using some of their available features; who knows, you might even do some feature engineering to blow them away. Your goal is to predict the values of single unit properties using the observations from ```2017```.

In these exercises, you will run through the stages of exploration as you continue to work toward the above goal.

In [1]:
import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier

# import preprocessing
from sklearn.preprocessing import MinMaxScaler 
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import RobustScaler
from sklearn.preprocessing import QuantileTransformer

import seaborn as sns
import matplotlib.pyplot as plt
# Only works inside notebook
%matplotlib inline 

import QMCBT_wrangle as w
import QMCBT_explore_evaluate as ee
from env import user, password, host

import warnings
warnings.filterwarnings("ignore")

In [2]:
# Pull up Table of Contents for wrangle.py
w.TOC()

ACQUIRE DATA
* get_db_url
* new_wrangle_zillow_2017
* get_wrangle_zillow_2017
* wrangle_zillow

PREPARE DATA
* clean_zillow_2017
* train_val_test_split
* split
* scale_data
* visualize_scaler


#### 1. As with ```encoded``` vs. ```unencoded``` data, we recommend exploring ```un-scaled``` data in your EDA process.

#### 2. Make sure to perform a ```train```, ```validate```, ```test``` split before and use only your train dataset to explore the relationships between independent variables with other independent variables or independent variables with your target variable.

#### 3. Write a function named ```plot_variable_pairs``` that accepts a ```dataframe``` as input and plots all of the pairwise relationships along with the regression line for each pair.

#### 4. Write a function named ```plot_categorical_and_continuous_vars``` that accepts your ```dataframe``` and the name of the ```columns``` that hold the ```continuous``` and ```categorical``` features and outputs 3 different plots for visualizing a categorical variable and a continuous variable.

#### 5. Save the functions you have written to create visualizations in your ```explore.py``` file. Rewrite your notebook code so that you are using the functions imported from this file.

#### 6. Use the functions you created above to explore your ```Zillow train``` dataset in your ```explore.ipynb``` notebook.

#### 7. Come up with some initial hypotheses based on your goal of predicting property value.

#### 8. Visualize all combinations of variables in some way.

#### 9. Run the appropriate statistical tests where needed.

#### 10. What independent variables are correlated with the dependent variable, home value?

#### 11. Which independent variables are correlated with other independent variables (bedrooms, bathrooms, year built, square feet)?

#### 12. Make sure to document your takeaways from visualizations and statistical tests as well as the decisions you make throughout your process.

#### 13. Explore your dataset with any other visualizations you think will be helpful.

# Bonus Exercise
In a seperate notebook called ```explore_mall```, use the functions you have developed in this exercise with the ```mall_customers``` dataset in the ```Codeup database server```. You will need to write a sql query to ```acquire``` your data. Make ```spending_score``` your target variable.