# Explore Exercises

<hr style="border:2px solid gray">

<b>Our Zillow scenario continues</b>:

As a Codeup data science graduate, you want to show off your skills to the Zillow data science team in hopes of getting an interview for a position you saw pop up on LinkedIn. You thought it might look impressive to build an end-to-end project in which you use some of their Kaggle data to predict property values using some of their available features; who knows, you might even do some feature engineering to blow them away. Your goal is to predict the values of single unit properties using the observations from 2017.

In these exercises, you will run through the stages of exploration as you continue to work toward the above goal.

1. As with encoded vs. unencoded data, we recommend exploring un-scaled data in your EDA process.

2. Make sure to perform a train, validate, test split before and use only your train dataset to explore the relationships between independent variables with other independent variables or independent variables with your target variable.

3. Write a function named plot_variable_pairs that accepts a dataframe as input and plots all of the pairwise relationships along with the regression line for each pair.

4. Write a function named plot_categorical_and_continuous_vars that accepts your dataframe and the name of the columns that hold the continuous and categorical features and outputs 3 different plots for visualizing a categorical variable and a continuous variable.

5. Save the functions you have written to create visualizations in your explore.py file. Rewrite your notebook code so that you are using the functions imported from this file.

6. Use the functions you created above to explore your Zillow train dataset in your explore.ipynb notebook.

7. Come up with some initial hypotheses based on your goal of predicting property value.

8. Visualize all combinations of variables in some way.

9. Run the appropriate statistical tests where needed.

10. What independent variables are correlated with the dependent variable, home value?

11. Which independent variables are correlated with other independent variables (bedrooms, bathrooms, year built, square feet)?

12. Make sure to document your takeaways from visualizations and statistical tests as well as the decisions you make throughout your process.

13. Explore your dataset with any other visualizations you think will be helpful.

<b>Bonus Exercise</b>
<br>
In a seperate notebook called explore_mall, use the functions you have developed in this exercise with the mall_customers dataset in the Codeup database server. You will need to write a sql query to acquire your data. Make spending_score your target variable.

<hr style="border:1px solid black">
<hr style="border:1px solid black">

In [1]:
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from scipy.stats import pearsonr, spearmanr

import env
import wrangle_new

In [2]:
df = wrangle_new.wrangle_zillow()

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2130214 entries, 4 to 2152862
Data columns (total 7 columns):
 #   Column      Dtype  
---  ------      -----  
 0   bedrooms    float64
 1   bathrooms   float64
 2   sqft        float64
 3   home_value  float64
 4   year_built  float64
 5   yearly_tax  float64
 6   fips        float64
dtypes: float64(7)
memory usage: 130.0 MB


In [4]:
df = wrangle_new.wrangle_zillow()

In [5]:
train, validate, test = wrangle_new.split_clean_zillow()

In [6]:
train.shape, validate.shape, test.shape

((1192919, 7), (511252, 7), (426043, 7))

In [7]:
train.head()

Unnamed: 0,bedrooms,bathrooms,sqft,home_value,year_built,yearly_tax,fips
553191,4.0,2.0,1556.0,373090.0,1923.0,4579.25,6037.0
1209132,3.0,2.0,1513.0,74070.0,1937.0,986.55,6037.0
174634,4.0,2.0,2040.0,138000.0,1954.0,2553.05,6037.0
170584,3.0,2.0,1834.0,263870.0,1959.0,3139.76,6059.0
2001226,2.0,2.0,1225.0,335603.0,1975.0,3461.38,6059.0


<hr style="border:1px solid black">

### #1. As with encoded vs. unencoded data, we recommend exploring un-scaled data in your EDA process.

<hr style="border:1px solid black">

### #2. Make sure to perform a train, validate, test split before and use only your train dataset to explore the relationships between independent variables with other independent variables or independent variables with your target variable.

<hr style="border:1px solid black">

### #3. Write a function named plot_variable_pairs that accepts a dataframe as input and plots all of the pairwise relationships along with the regression line for each pair.

<hr style="border:1px solid black">

### #4. Write a function named plot_categorical_and_continuous_vars that accepts your dataframe and the name of the columns that hold the continuous and categorical features and outputs 3 different plots for visualizing a categorical variable and a continuous variable.

<hr style="border:1px solid black">

### #5. Save the functions you have written to create visualizations in your explore.py file. Rewrite your notebook code so that you are using the functions imported from this file.

<hr style="border:1px solid black">

### #6. Use the functions you created above to explore your Zillow train dataset in your explore.ipynb notebook.

<hr style="border:1px solid black">

### #7. Come up with some initial hypotheses based on your goal of predicting property value.

<hr style="border:1px solid black">

### #8. Visualize all combinations of variables in some way.

<hr style="border:1px solid black">

### #9. Run the appropriate statistical tests where needed.

<hr style="border:1px solid black">

### #10. What independent variables are correlated with the dependent variable, home value?

<hr style="border:1px solid black">

### #11. Which independent variables are correlated with other independent variables (bedrooms, bathrooms, year built, square feet)?

<hr style="border:1px solid black">

### #12. Make sure to document your takeaways from visualizations and statistical tests as well as the decisions you make throughout your process.

<hr style="border:1px solid black">

### #13. Explore your dataset with any other visualizations you think will be helpful.