#### OFM3 — OFM3 TASK 3: ASSOCIATION RULES AND LIFT ANALYSIS

<ul>
<li>Ryan L. Buchanan</li>
<li>Student ID:  001826691</li>
<li>Masters Data Analytics (12/01/2020)</li>
<li>Program Mentor:  Dan Estes</li>
<li>385-432-9281 (MST)</li>
<li>rbuch49@wgu.edu</li>
</ul>

#### Scenario 1
One of the most critical factors in customer relationship management that directly affects a company’s long-term profitability is understanding its customers. When a company can better understand its customer characteristics, it is better able to target products and marketing campaigns for customers, resulting in better profits for the company in the long term.

You are an analyst for a telecommunications company that wants to better understand the characteristics of its customers. You have been asked to perform a market basket analysis to analyze customer data to identify key associations of your customer purchases, ultimately allowing better business and strategic decision-making.

#### Part I: Research Question

#### <span style="color:green"><b>A1. Proposal of Question</b>:</span>
Which are the items of interest in combination with discounts that might reduce customer churn?  That is, by analyzing a list of transactions, may we be able to better understand which items will endear us to customers more if offered as discounted with our services?
This question will be answered using <b>market basket analysis</b>.

#### <span style="color:green"><b>A2. Defined Goal</b>:</span>
Stakeholders in the company will benefit by knowing, with some measure of confidence, which customers are at highest risk of churn because this will provide weight for decisions in marketing improved services to customers with these characteristics and past user experiences.
The goal of this data analysis is to present items for discount purchase to company stakeholders to consider when creating customer enticements and marketing promotions.  We will endeavor to help decision makers better understand which combinations of features (items in concert with telecom services) put their customers at lower risk of churning.

#### Part II: Market Basket Justification

#### <span style="color:green"><b>B1. Explanation of Market Basket</b>:</span>
As pointed out by Li, "\[m\]arket basket analysis is one of the key techniques used . . . to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions" <span style="color:orange">(Li, p. 1)</span>.  

This analysis proposes to identify which combinations of telecom peripherals and ICT tools customers prefer and purchase together most often.  We will try to identify those items purchased most often together and demonstrate the relationships between these different items.

We expect that we will discover an optimal combination of items to offer at discounts in coordination with our services.

Our plan for analysis includes: 
* Prepare the dataset
* Discover missing values
* Run the Apriori method to identify association rules
* Check the rules with highest values for confidence, support and lift
* Recommend a course of action following the results of our analysis

#### <span style="color:green"><b>B2. Transaction Example</b>:</span>
On quick inspection of the given dataset, transactions are easily distinguishable.  The very first transactions includes a larger list of twenty items including:
* Logitech M510 Wireless mouse	
* HP 63 Ink	
* HP 65 ink	
* nonda USB C to USB Adapter	
* 10ft iPHone Charger Cable	
* HP 902XL ink	
* Creative Pebble 2.0 Speakers	
* Cleaning Gel Universal Dust Cleaner	
* Micro Center 32GB Memory card	
* YUNSONG 3pack 6ft Nylon Lightning Cable	
* TopMate C5 Laptop Cooler pad	
* Apple USB-C Charger cable	
* HyperX Cloud Stinger Headset	
* TONOR USB Gaming Microphone	
* Dust-Off Compressed Gas 2 pack	
* 3A USB Type C Cable 3 pack 6FT	
* HOVAMP iPhone charger	
* SanDisk Ultra 128GB card	
* FEEL2NICE 5 pack 10ft Lighning cable	
* FEIYOLD Blue light Blocking Glasses

These twenty items were purchased by one customer, synchronously.

#### <span style="color:green"><b>B3. Market Basket Assumption</b>:</span>
One assumption of MBA is to make determinations by building association rules.  These rules, suggests Dr. Susan Sivek, "are just statements that connect an 'antecedent' item to a 'consequent' item. Association rules also do not imply causal relationships, only co-occurrence" <span style="color:orange">(Sivek, p. 1)</span>.

So, for instance in our research proposal, we would like to identify items that would purchased before subscribing to a telecom service, or, perhaps, items that would be used in coordination with telecom services.

#### <span style="color:green"><b>C1. Transforming the Dataset</b>:</span>

In [2]:
# Standard data science imports
import numpy as np
import pandas as pd

# Visualization libraries
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [3]:
# Change color of Matplotlib font
import matplotlib as mpl

COLOR = 'white'
mpl.rcParams['text.color'] = COLOR
mpl.rcParams['axes.labelcolor'] = COLOR
mpl.rcParams['xtick.color'] = COLOR
mpl.rcParams['ytick.color'] = COLOR

In [4]:
# Increase Jupyter display cell-width
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:75% !important; }</style>"))

In [5]:
# Ignore Warning Code
import warnings
warnings.filterwarnings('ignore')

In [6]:
# Load data set into Pandas dataframe
teleco_mba_df = pd.read_csv('data/teleco_market_basket.csv')

In [None]:
# Examine the features of the dataset
teleco_mba_df.columns

In [None]:
# Get an idea of dataset size
teleco_mba_df.shape

In [None]:
# Examine first few records of dataset
teleco_mba_df.head()

In [None]:
# View DataFrame info
teleco_mba_df.info

In [None]:
# Get an overview of descriptive statistics
teleco_mba_df.describe()

In [None]:
# Get data types of features
teleco_mba_df.dtypes

In [None]:
# Discover missing data points within dataset
data_nulls = teleco_mba_df.isnull().sum()
print(data_nulls)

In [None]:
# Check for missing data & visualize missing values in dataset 

# Install appropriate library
!pip install missingno

# Importing the libraries
import missingno as msno

# Visualize missing values as a matrix
msno.matrix(teleco_mba_df);
"""(GeeksForGeeks, p. 1)"""

In [None]:
# Drop records with missing values
teleco_mba_df.dropna(how='all', inplace=True)

In [None]:
# Replace empty values with 0
teleco_mba_df.fillna(0, inplace=True)

In [None]:
# Get an idea of dataset size after changes
teleco_mba_df.shape

In [None]:
# Review changes to DataFrame
teleco_mba_df.head()

In [None]:
# Confirm no null values
teleco_mba_df.info()

In [None]:
# Convert dataset into list format for use with Apriori algorithm
teleco = []
for i in range(0, 7501):
    teleco.append([str(teleco_mba_df.values[i, j]) for j in range(0, 20)])
teleco_cleaned_df = pd.DataFrame(teleco)

In [None]:
# Review DataFrame
teleco_cleaned_df.head()

In [None]:
# Extract prepared dataset
teleco_cleaned_df.to_csv('data/teleco_market_basket_prepared.csv')

#### <span style="color:green"><b>C2. Code Execution</b>:</span>

In [7]:
# Load prepared dataset
teleco_mba_df = pd.read_csv('data/teleco_market_basket_prepared.csv')

In [8]:
# Generate association rules from Apriori algorithm
from apyori import apriori

# Train Apriori algorithm on the dataset
rule_list = apriori(teleco_mba_df, min_support = 0.003, min_confidence = 0.3, min_lift = 3, min_length = 2)

In [9]:
# Review generate rules
rule_list = list(rule_list)
print(rule_list[0])

RelationRecord(items=frozenset({' ', '0'}), support=0.047619047619047616, ordered_statistics=[OrderedStatistic(items_base=frozenset({' '}), items_add=frozenset({'0'}), confidence=1.0, lift=7.0), OrderedStatistic(items_base=frozenset({'0'}), items_add=frozenset({' '}), confidence=0.3333333333333333, lift=7.0)])


In [10]:
# Print number of rules
print(len(rule_list))

502


In [11]:
# Transform results into DataFrame structure
teleco_results = pd.DataFrame(rule_list)

In [12]:
# View results list
teleco_results

Unnamed: 0,items,support,ordered_statistics
0,"( , 0)",0.047619,"[(( ), (0), 1.0, 7.0), ((0), ( ), 0.3333333333..."
1,"( , :)",0.047619,"[(( ), (:), 1.0, 21.0), ((:), ( ), 1.0, 21.0)]"
2,"( , U)",0.047619,"[(( ), (U), 1.0, 21.0), ((U), ( ), 1.0, 21.0)]"
3,"( , a)",0.047619,"[(( ), (a), 1.0, 21.0), ((a), ( ), 1.0, 21.0)]"
4,"( , d)",0.047619,"[(( ), (d), 1.0, 21.0), ((d), ( ), 1.0, 21.0)]"
...,...,...,...
497,"(a, m, , :, d, 0, e, n)",0.047619,"[(( ), (a, m, :, d, 0, e, n), 1.0, 21.0), ((0)..."
498,"(0, a, m, , d, U, e, n)",0.047619,"[(( ), (0, a, m, d, U, e, n), 1.0, 21.0), ((0)..."
499,"(a, m, , :, d, U, e, n)",0.047619,"[(( ), (a, m, :, d, U, e, n), 1.0, 21.0), ((:)..."
500,"(a, m, :, d, 0, U, e, n)",0.047619,"[((0), (a, m, :, d, U, e, n), 0.33333333333333..."


In [13]:
# Separate support to indiviual DataFrame
support = teleco_results.support

In [14]:
# Instantiate four empty lists to contain lhs, rhs, confidence and lift
first_values = []
second_values = []
third_values = []
fourth_values = []

In [15]:
# Create for loop to iterate over list 
for i in range(teleco_results.shape[0]):
    single_list = teleco_results['ordered_statistics'][i][0]
    first_values.append(list(single_list[0]))
    second_values.append(list(single_list[1]))
    third_values.append(single_list[2])
    fourth_values.append(single_list[3])

In [16]:
# Convert lists into DataFrame
lhs = pd.DataFrame(first_values)
rhs = pd.DataFrame(second_values)
confidence = pd.DataFrame(third_values, columns=['Confidence'])
lift = pd.DataFrame(fourth_values, columns=['lift'])

In [17]:
# Concatenate lists into single DataFrame
results_final = pd.concat([lhs, rhs, support, confidence, lift], axis=1)
# results_final.fillna(value=' ', inplace=True)

In [18]:
results_final

Unnamed: 0,0,0.1,1,2,3,4,5,6,7,support,Confidence,lift
0,,0,,,,,,,,0.047619,1.000000,7.0
1,,:,,,,,,,,0.047619,1.000000,21.0
2,,U,,,,,,,,0.047619,1.000000,21.0
3,,a,,,,,,,,0.047619,1.000000,21.0
4,,d,,,,,,,,0.047619,1.000000,21.0
...,...,...,...,...,...,...,...,...,...,...,...,...
497,,a,m,:,d,0,e,n,,0.047619,1.000000,21.0
498,,0,a,m,d,U,e,n,,0.047619,1.000000,21.0
499,,a,m,:,d,U,e,n,,0.047619,1.000000,21.0
500,0,a,m,:,d,U,e,n,,0.047619,0.333333,7.0


In [19]:
# Set column names
results_final.columns = ['lhs', 1, 2, 3, 'rhs', 1, 2, 3, 4, 'support', 'confidence', 'lift']
results_final_1 = results_final[['lhs', 'rhs', 'support', 'confidence', 'lift']]
results_final_1

Unnamed: 0,lhs,rhs,support,confidence,lift
0,,,0.047619,1.000000,7.0
1,,,0.047619,1.000000,21.0
2,,,0.047619,1.000000,21.0
3,,,0.047619,1.000000,21.0
4,,,0.047619,1.000000,21.0
...,...,...,...,...,...
497,,d,0.047619,1.000000,21.0
498,,d,0.047619,1.000000,21.0
499,,d,0.047619,1.000000,21.0
500,0,d,0.047619,0.333333,7.0


In [20]:
# Visualize the list of rules
results = list(rule_list)
for i in results:
    print('\n')
    print(i)
    print('**********')



RelationRecord(items=frozenset({' ', '0'}), support=0.047619047619047616, ordered_statistics=[OrderedStatistic(items_base=frozenset({' '}), items_add=frozenset({'0'}), confidence=1.0, lift=7.0), OrderedStatistic(items_base=frozenset({'0'}), items_add=frozenset({' '}), confidence=0.3333333333333333, lift=7.0)])
**********


RelationRecord(items=frozenset({' ', ':'}), support=0.047619047619047616, ordered_statistics=[OrderedStatistic(items_base=frozenset({' '}), items_add=frozenset({':'}), confidence=1.0, lift=21.0), OrderedStatistic(items_base=frozenset({':'}), items_add=frozenset({' '}), confidence=1.0, lift=21.0)])
**********


RelationRecord(items=frozenset({' ', 'U'}), support=0.047619047619047616, ordered_statistics=[OrderedStatistic(items_base=frozenset({' '}), items_add=frozenset({'U'}), confidence=1.0, lift=21.0), OrderedStatistic(items_base=frozenset({'U'}), items_add=frozenset({' '}), confidence=1.0, lift=21.0)])
**********


RelationRecord(items=frozenset({' ', 'a'}), suppo

#### <span style="color:green"><b>C3. Association Rules Table</b>:</span>
<span style="color:red">Provide values for the support, lift, and confidence of the association rules table.</span>


#### <span style="color:green"><b>C4. Top Three Rules</b>:</span>
<span style="color:red">Identify the top three rules generated by the Apriori algorithm. Include a screenshot of the top rules along with their summaries.</span>


#### Part IV: Analysis
D. Perform the data analysis and report on the results by doing the following:

#### <span style="color:green"><b>D1. Significance of Support, Lift, and Confidence Summary</b></span>
<span style="color:red">Summarize the significance of support, lift, and confidence from the results of the analysis.</span>

#### <span style="color:green"><b>D2. Practical Significance of Findings</b></span>
<span style="color:red">Discuss the practical significance of the findings from the analysis.</span>

#### <span style="color:green"><b>D3. Course of Action</b></span>
<span style="color:red">Recommend a course of action for the real-world organizational situation from part A1 based on your results from part D1.</span>

 It is critical that decision-makers & marketers understand that our predictor variables create a relatively low accuracy score with the results of an 0.84 after scaling.   We should analyse the features that are in common among those leaving the company & attempt to reduce their likelihood of occuring with any given customer in the future.   This suggests that as a customer subscribes to more services that the company provided, an additional port modem or online backup for example, they are less likely to leave the company.   Clearly, it is the best interest of retaining customers to provide them with more services & improve their experience with the company by helping customers understand all the services that are available to them as a subscriber, not simple mobile phone service.

#### <span style="color:green"><b>E. Panopto Recording</b></span>
 <span style="color:red">link</span>

#### <span style="color:green"><b>F. Web Sources</b></span>
* GeeksForGeeks. &ensp; (2019, July 4). &ensp; <i>Python | Visualize missing values (NaN) values using Missingno Library</i>. &ensp; GeeksForGeeks. &ensp; https://www.geeksforgeeks.org/python-visualize-missing-values-nan-values-using-missingno-library/
<br>
* Gupta, A. &ensp; (2021). &ensp; <i>Implementing Apriori algorithm in Python</i>. &ensp; GeeksForGeeks. &ensp; https://www.geeksforgeeks.org/implementing-apriori-algorithm-in-python/
<br>
* Kumar, V. &ensp; (2020, May 11). &ensp; <i>Hands-On Guide To Market Basket Analysis With Python Codes</i>. &ensp; AnalyticsIndiaMag.com. &ensp; https://analyticsindiamag.com/hands-on-guide-to-market-basket-analysis-with-python-codes/
Intellipaat. (2021). Introduction to Apriori Algorithm in Python. https://intellipaat.com/blog/data-science-apriori-algorithm/
<br>
* Umredkar, R. &ensp; (2020, November 30). &ensp; <i>Guide To Association Rule Mining From Scratch</i>. &ensp; AnalyticsIndiaMag.com. &ensp;  https://analyticsindiamag.com/guide-to-association-rule-mining-from-scratch/
<br>
* Yogesh. &ensp; (2018). &ensp; <i>Market Basket Analysis (Apriori) in Python</i>. &ensp; Kaggle. &ensp; https://www.kaggle.com/yugagrawal95/market-basket-analysis-apriori-in-python


#### <span style="color:green"><b>G. Sources</b></span>
* Li, S. &ensp; (2017, September 24). &ensp; <i>A Gentle Introduction on Market Basket Analysis — Association Rules</i>. &ensp; TowardsDataScience. &ensp; https://towardsdatascience.com/a-gentle-introduction-on-market-basket-analysis-association-rules-fa4b986a40ce
<br>
* Sivek, S. &ensp; (2020, November 16). &ensp; <i>Market Basket Analysis 101: Key Concepts</i>. &ensp; TowardsDataScience. &ensp; https://towardsdatascience.com/market-basket-analysis-101-key-concepts-1ddc6876cd00