# Work-Plan
Answer these questions on the blog posts. Three: (1-3) (4-6) (7-8)

1. Frame the problem - Look at big picture

2. Get the Data

3. Explore, gain insights

4. Prepare data to better expose the underlying patterns to ML Algorithms.

5. Explore models and short-list the best ones.

6. Fine tune models and combine them into a great solution.

7. Present your solution.

8. Launch, monitor and mantain your system.



# Deliverables:

1. Explore Business to report insights on Business Operations and product-market fit.
2. Report communicating model's findings relevant to the business's health. 
3. Model that predicts new customer's behaviors and purchases. 
4. Recommendations on next quarter's strategy. (Order data based on Dt_Customer to simulate new data.)


In [0]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set()

from sklearn.preprocessing import OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import make_column_transformer
from sklearn.compose import make_column_selector

In [3]:
#Get the data

raw_data = pd.read_csv('/marketing_campaign.csv', sep = ';')

drop_columns = ['ID', 'AcceptedCmp3', 'AcceptedCmp4', 'AcceptedCmp5', 'AcceptedCmp2', 'AcceptedCmp1', 'Z_CostContact','Z_Revenue', 'Response']
Customers = raw_data.drop(drop_columns, axis = 1)
Customers.sort_values('Dt_Customer')
Customers.set_index('Dt_Customer', inplace = True)

Full_Dataset_len = len(Customers)
Full_Dataset_len


FileNotFoundError: ignored

## ${\textbf{Split out the test set}}$

In [0]:
#To simulate new customers, I removed the last 10 percent of customers to sign up.
Customers_indeces = int(len(Customers)*0.9)
New_Customers = Customers.iloc[Customers_indeces:]
Customers = Customers.iloc[:Customers_indeces]

#Test that we succesfully split the data
assert len(New_Customers) + len(Customers) == Full_Dataset_len

## ${\textbf{Explore the data}}$'

#Market Analysis

1. Who are our Customers?
2. How are they distributed based on their signup features?

Signup_features = ['Year_Birth', 'Education', 'Marital_Status', 'Income', 'Kidhome', 'Teenhome', 'Dt_Customer']

In [0]:
#Visualizing the Data 
plt.figure(figsize = (12,9))
s = sns.heatmap(Customers.reset_index().corr(),
               annot = True, 
               cmap = 'viridis',
               vmin = -1, 
               vmax = 1)
s.set_yticklabels(s.get_yticklabels(), rotation = 0, fontsize = 12)
s.set_xticklabels(s.get_xticklabels(), rotation = 90, fontsize = 12)
plt.title('Correlation Heatmap')
plt.show()

In [0]:
#Customer Sign-up data 
Signup_features = ['Year_Birth', 'Education', 'Marital_Status', 'Income', 'Kidhome', 'Teenhome', 'Dt_Customer']
Customers.info()

In [0]:
Customers.isna().sum()

In [0]:
#Since it's only 17 let's check them manually to decide how to treat the missing values.
Customers.query('Customers.Income.isna() == True')

#Compare distributions in seaborn's kde

In [0]:
#possible strategies for pipeline:
#Drop missing income values, make missing values a category, use median, use regression function to estimate.
SimpleImputer(strategy = 'constant', fill_value='missing')


#Preprocessing
Customers['Income_missing'] = [1 for income in Customers.Income if Customers.Income == 'missing' else 0]
Customers[Customers.Marital_Status == 'Alone'] = 'Single'
Customers['Education'] = Customers.Education.astype('category')
Customers['Marital_Status'] = Customers.Marital_Status.astype('category')

ohe = OneHotEncoder()
col_transformer = make_column_transformer((ohe, make_column_selector(dtype_include = 'category')))

col_transformer.fit_transform(Customers)


In [0]:
#Pre-process Education and Marital Status
Education = Customers.Education.value_counts().to_frame().reset_index()
Education
sns.countplot(x = 'Education', data = Customers)
plt.title('Customer Education Distribution')



In [0]:
sns.countplot(x = 'Marital_Status',data = Customers)
plt.title('Customer Marital Distribution ')

#Insights:




#Business Analysis

1. Revenue per sector over the last two years
2. Predict Complaining - Is recency related to complaining: how important is complaining / What percentage of Customers who complain recind?

#Machine Learning Notebook
Prepare the Data:
 - 


In [0]:
#Replace missing vlaues with linear regression using the other features. Write a blog post about it. 
from sklearn.linear_model import LinearRegression

linreg = LinearRegression()


In [0]:
#Find a way to rank features to answer the best questions:

#Value generatable with this data:
  #Cluster clients to decide our development strategy for the future based on new registrations.
    #Use incoming registration data to 
      # Predict amount spent on group of goods over next 2 years. (2-year-value)
      # Predict platform interaction behavior (NumWebVisits)
      # Predict if the new customers will complain? Why? How much are we losing because they will?

In [0]:
# Deploy the new customers test data to a csv on github to access and assess our predictions to practice webscraping Using requests (r = requests.get(url); df = pd.read_html(r.text); return df[0] )