## Case Studies

Going forward we will be completing weekly data analytics case studies. This will involve the 3 main steps of data analysis
1. Exploratory Data Analysis
* Visualize Distributions
* Quantify Metrics
* Univariate Relationships
* Bivariate Relationships
2.Confirmatory/Diagnostic Data Analysis
* Conduct hypothesis tests
* Conduct model training 
3. Prescriptive Data Analysis 
* Recommend a course of action based on your findings.


Occasionally we will add in more challenges (dashboarding, sql, BigQuery)

## Bank Data Analytics Case Study

We are data analysts at a Portuguese bank. Every year, we run a marketing campaign where we attempt to sell [term deposit subscriptions](https://www.investopedia.com/terms/t/termdeposit.asp) to our clientele and external consumers. We've collected features on 45,211 individuals in order to make data-driven decisions on the effectiveness of our marketing campaign. The features of these individuals is listed in "Data Documentation."

Complete the following prompts to create a comprehensive data analysis. Management currently does not have any intended goal for this analysis and instead is looking to your team for context.

General goals for each section will be listed. However, no pseudocode will be given and you must implement code by looking back to notes & reading documentation. 

For each section, you will be given a list of pre-questions and post-questions to answer.

You will be working on this case study with your respective capstone group members. You are allowed to collaborate on code and responses. This is due 5/3: 

## Data Documentation

For more information, read [Moro et al., 2011].

Input variables:
## bank client data:
1 - age (numeric)  
2 - job : type of job (categorical: "admin.","unknown","unemployed","management","housemaid","entrepreneur","student",
                                    "blue-collar","self-employed","retired","technician","services")   
3 - marital : marital status (categorical: "married","divorced","single"; note: "divorced" means divorced or widowed)  
4 - education (categorical: "unknown","secondary","primary","tertiary")  
5 - default: has credit in default? (binary: "yes","no")  
6 - balance: average yearly balance, in euros (numeric)   
7 - housing: has housing loan? (binary: "yes","no")  
8 - loan: has personal loan? (binary: "yes","no")  
## related with the last contact of the current campaign:  
9 - contact: contact communication type (categorical: "unknown","telephone","cellular")   
10 - day: last contact day of the month (numeric)  
11 - month: last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec")  
12 - duration: last contact duration, in seconds (numeric)  
## other attributes: 
13 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)  
14 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted)  
15 - previous: number of contacts performed before this campaign and for this client (numeric)  
16 - poutcome: outcome of the previous marketing campaign (categorical: "unknown","other","failure","success")  

Output variable (desired target):
17 - y - has the client subscribed a term deposit? (binary: "yes","no")  

**Missing Attribute Values** None  

## Exploratory Data Analysis

Your first step will be to explore the data.

**Goals**
* Analyze distributions
* Visualize univariate relationships
* Visualize bivariate

### Pre-Questions

Answer each question in the respective markdown block.

1. What is the problem we are trying to solve with this data analysis?

answer here

2. Which columns do you expect to be normally distributed?

answer here

3. What relationships do you expect in this dataset?

answer here

In [None]:
import pandas as pd
import seaborn as sns

# load dataframe



In [None]:
# explore metada



In [None]:
# explore univariate relationships



In [None]:
# explore bivariate relationships (focus on `y`)



### Post-Questions

Answer each question in the respective markdown block.

1. What relationships did you see visually identify this dataset? Any relationships not between the target variable?

write answer here

2. What distributions did you see in this dataset? Were any normal? How did you confirm that they were normal?

write answer here

3 Did you notice any highly correlated variabls's? How could you tell that they were correlated?

write answer here

## Confirmatory Data Analysis

**Goals**
* Implement any needed transformations
* Generate a machine learning model that explains the relationship between `y` and the other variables.
* Explore levels of significance

### Pre-Questions

Answer each question in the respective markdown block.

1. What machine learning model would be best suited to predict the `y` variable?

write answer here

2. What transformations must you do before predicting the `y` variable? Why?

write answer here

In [None]:
# do data transformations here



In [None]:
from sklearn.linear_model import LogisticRegression

# make test train split

# implemement machine learning model


In [None]:
# explore coefficients



In [None]:
# calculate accuracy


1. What coefficients ended up being significant. Which did not?

write answer here

2. What is the overall accuracy of this model? Is it good?

write answer here

### Post-Questions

Answer each question in the respective markdown block.

## Prescriptive Data Analysis

Answer each question in the respective markdown block.

1. Based on the results of this analysis, on which individuals would you recommend this bank to focus on? 

write answer here