# Task
Imagine you are working for a retail company, and you have access to a dataset containing customer transactions. Your task is to perform market basket analysis to uncover patterns in customer purchasing behavior. By identifying which products  tend  to  be  bought  together,  the  company  can  make  informed  decisions  to  improve  sales  and  customer satisfaction.

Data Analysis Tool: 
Python (using libraries like Pandas, etc.)
•Data Visualization Tool: Matplotlib, Seaborn•Scikit-Learn•Jupyter NotebookToolsHigh-Level Steps:1.Data Preparation2.Exploratory Data Analysis (EDA)3.Market Basket Analysis4.Visualization5.Interpretation and Insights6.Recommendations
PresentationDeliverables:
•A  well-documented  Jupyter  Notebook  or  report  containing  code  and explanations.•Visualizations that support your findings.•A presentation or report for your mentorship group.•A GitHub repository with documentation and code.

ObjectiveThe goal of this project is to introduce you to the concept of market basket analysis, which is a crucial aspect of data science in retail and e-commerce. You will learn how to extract valuable insights from transaction data, understand customer purchasing behaviour, and use this knowledge for business optimization.Click for dataset


In [14]:
#import pandas library
import pandas as pd

In [15]:
#reading excel file into dataframe
filepath = 'C:\\Users\\ODUNAYO\\Documents\\Data Science seedbuilder\\Datasets\\Market Basket Analysis.csv'
df = pd.read_csv(filepath)
df.head()

Unnamed: 0,Member_number,Date,itemDescription
0,1808,21-07-2015,tropical fruit
1,2552,05-01-2015,whole milk
2,2300,19-09-2015,pip fruit
3,1187,12-12-2015,other vegetables
4,3037,01-02-2015,whole milk


In [16]:
# creating a deep copy, independent of the original dataset, just in case
df_copy_deep = df.copy(deep=True)
df

Unnamed: 0,Member_number,Date,itemDescription
0,1808,21-07-2015,tropical fruit
1,2552,05-01-2015,whole milk
2,2300,19-09-2015,pip fruit
3,1187,12-12-2015,other vegetables
4,3037,01-02-2015,whole milk
...,...,...,...
38760,4471,08-10-2014,sliced cheese
38761,2022,23-02-2014,candy
38762,1097,16-04-2014,cake bar
38763,1510,03-12-2014,fruit/vegetable juice


In [17]:
df.describe()

Unnamed: 0,Member_number
count,38765.0
mean,3003.641868
std,1153.611031
min,1000.0
25%,2002.0
50%,3005.0
75%,4007.0
max,5000.0


In [18]:
#checking to see if there is any missing values and summing it up
# Display the count of missing values for each column
missing_values = df.isnull().sum()

# Display the columns with missing values
columns_with_missing_values = missing_values[missing_values > 0].index.tolist()

# Display the missing values for each column
missing_values

Member_number      0
Date               0
itemDescription    0
dtype: int64

In [19]:
#dropping duplicates and calling the new dataset
df = df.drop_duplicates()
df

Unnamed: 0,Member_number,Date,itemDescription
0,1808,21-07-2015,tropical fruit
1,2552,05-01-2015,whole milk
2,2300,19-09-2015,pip fruit
3,1187,12-12-2015,other vegetables
4,3037,01-02-2015,whole milk
...,...,...,...
38760,4471,08-10-2014,sliced cheese
38761,2022,23-02-2014,candy
38762,1097,16-04-2014,cake bar
38763,1510,03-12-2014,fruit/vegetable juice


In [20]:
# Convert a column to datetime
df['Date'] = pd.to_datetime(df['Date'])
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Date'] = pd.to_datetime(df['Date'])


Unnamed: 0,Member_number,Date,itemDescription
0,1808,2015-07-21,tropical fruit
1,2552,2015-05-01,whole milk
2,2300,2015-09-19,pip fruit
3,1187,2015-12-12,other vegetables
4,3037,2015-01-02,whole milk
...,...,...,...
38760,4471,2014-08-10,sliced cheese
38761,2022,2014-02-23,candy
38762,1097,2014-04-16,cake bar
38763,1510,2014-03-12,fruit/vegetable juice


In [22]:
import matplotlib.pyplot as plt
import seaborn as sns
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

In [24]:
# Convert the transaction data into a one-hot encoded matrix
basket_encoded = df.groupby(['Member_number', 'Date'])['itemDescription'].apply(list).reset_index(name='items')

basket_encoded['items'] = basket_encoded['items'].apply(lambda x: ' '.join(x))

basket_encoded = basket_encoded['items'].str.get_dummies(' ')

# Generate frequent itemsets using Apriori algorithm
frequent_itemsets = apriori(basket_encoded, min_support=0.1, use_colnames=True)

# Generate association rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)

# Display the association rules
print(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])


    antecedents   consequents   support  confidence      lift
0       (whole)        (milk)  0.157923    1.000000  5.570737
1        (milk)       (whole)  0.157923    0.879747  5.570737
2       (other)  (vegetables)  0.122101    1.000000  4.494743
3  (vegetables)       (other)  0.122101    0.548813  4.494743


