![PREDICTING BUYING DECISIONS](PREDICTING%20BUYING%20DECISIONS.png)


Florence Books is looking to expand its online book club, which gives its subscribers access to 2 visual e-books every month. Florence Books did a trial run with 50,000 customers, and after collecting the data, it wants to understand which customers are more likely to subscribe to the online book club by examining their purchase histories and book club subscriptions from the trial run. 

The columns in the dataset are:

- `acctnum`: customer’s unique account number
- `gender`: customer’s gender, M/NB or F
- `first`: months since the customer’s first purchase in offline stores
- `last`: months since the customer’s last purchase in offline stores (a.k.a. recency)
- `total_`: customer’s total dollars spent in offline stores (a.k.a. monetary)
- `book_`: customer’s dollars spent on books in offline stores
- `nonbook_`: customer’s dollars spent on non-book products in offline stores
- `purch`: total # of books customer purchased in offline stores
- `child`: # of children’s books customer purchased in offline stores
- `youth`: # of young adult books customer purchased in offline stores
- `cook`: # of cookbooks customer purchased in offline stores
- `do_it`: # of DIY books customer purchased in offline stores
- `refernce`: # of reference books customer purchased in offline stores
- `art`: # of art books customer purchased in offline stores
- `geog`: # of geography and travel books customer purchased in offline stores
- `subscribe`: customer’s online book club subscription, yes/no


![SQL](SQL.png)


We will start by getting to know the data by conducting some basic statistical analysis 

In [1]:
df  <- read.csv('df.csv')


Q1) Of the 50,000 customers, what is the average and standard deviation of total offline $ spent

In [2]:
install.packages('sqldf')
library('sqldf')
sqldf("SELECT AVG(total_) , STDEV(total_) FROM df")
# The average of total offline $ spent is $208.3183
# The standard deviation of total offline $ spent is $101.3573

# Downloading packages -------------------------------------------------------
- Downloading sqldf from CRAN ...               OK [75.9 Kb in 0.6s]
- Downloading gsubfn from CRAN ...              OK [347.4 Kb in 0.41s]
- Downloading proto from CRAN ...               OK [459.6 Kb in 0.46s]
Successfully downloaded 3 packages in 2.5 seconds.

The following package(s) will be installed:
- gsubfn [0.7]
- proto  [1.0.0]
- sqldf  [0.4-11]
These packages will be installed into "~/renv/library/linux-ubuntu-jammy/R-4.4/x86_64-pc-linux-gnu".

# Installing packages --------------------------------------------------------
- Installing proto ...                          OK [installed binary and cached in 0.55s]
- Installing gsubfn ...                         OK [installed binary and cached in 0.52s]
- Installing sqldf ...                          OK [installed binary and cached in 0.98s]
Successfully installed 3 packages in 2.2 seconds.


Loading required package: gsubfn

Loading required package: proto

“no DISPLAY variable so Tk is not available”
Loading required package: RSQLite



AVG(total_),STDEV(total_)
<dbl>,<dbl>
208.3183,101.3573


Q2) Of the 50,000 customers, what is the average and standard deviation of total # offline books purchased?


In [3]:
sqldf("SELECT AVG(purch) , STDEV(purch) FROM df")
#The average of total # offline books purchased is $3.8902
# The standard deviation of total # offline books purchased is $3.4763

AVG(purch),STDEV(purch)
<dbl>,<dbl>
3.89022,3.47627


Q3) Of the 50,000 customers, what is the average and standard deviation of months since last offline purchase?

In [4]:
sqldf("SELECT AVG(last) , STDEV(last) FROM df")
#The average of months since last offline purchase is $12.3582
# The standard deviation of months since last offline purchase is $8.1531

AVG(last),STDEV(last)
<dbl>,<dbl>
12.35816,8.153091


Q4) What is the total number that subscribed to the online book club by gender. 

In [5]:
sqldf("SELECT COUNT(subscribe), gender
FROM df
WHERE subscribe = '1'
GROUP BY gender")

# total number of females that subscribed to the online book club = 2389
#total number of males that subscribed to the online book club = 2133

COUNT(subscribe),gender
<int>,<chr>
2389,F
2133,M/NB


Q5) What’s the % of females and males that subscribed to the book club?

In [6]:
sqldf("SELECT COUNT(acctnum) , gender
FROM df
GROUP BY gender")
#total count of women = 33302
#total count of men = 16698

female.percentage <- (2389/33302)*100
female.percentage
# % of females that subscribes to the book club is 7.17%
male.percentage <- (2133/16698) * 100
male.percentage
# % of males that subscribes to the book club is 12.77%


COUNT(acctnum),gender
<int>,<chr>
33302,F
16698,M/NB


In [7]:
df 


acctnum,gender,first,last,book_,nonbook_,total_,purch,child,youth,cook,do_it,refernce,art,geog,subscribe
<int>,<chr>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>
10001,M/NB,49,29,109,248,357,10,3,2,2,0,1,0,2,0
10002,M/NB,39,27,35,103,138,3,0,1,0,1,0,0,1,0
10003,F,19,15,25,147,172,2,0,0,2,0,0,0,0,0
10004,F,7,7,15,257,272,1,0,0,0,0,1,0,0,0
10005,F,15,15,15,134,149,1,0,0,1,0,0,0,0,0
10006,F,7,7,15,98,113,1,0,1,0,0,0,0,0,1
10007,M/NB,25,25,15,0,15,1,0,0,0,1,0,0,0,0
10008,M/NB,41,1,124,114,238,11,2,1,2,3,0,0,3,0
10009,F,65,5,130,288,418,11,0,2,3,2,0,3,1,1
10010,F,11,11,15,108,123,1,0,1,0,0,0,0,0,0


In [8]:
df$IsFemale <- with(df, ifelse(gender == 'F', '1', '0'))

In [9]:
set.seed(123)
pred <- lm(total_ ~ IsFemale + first + child + youth + cook + do_it + refernce + art +geog  , data =df )
summary(pred)




Call:
lm(formula = total_ ~ IsFemale + first + child + youth + cook + 
    do_it + refernce + art + geog, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-153.021  -75.383   -0.288   75.390  151.728 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 149.62753    0.95088 157.358   <2e-16 ***
IsFemale1     0.67856    0.86483   0.785    0.433    
first        -0.03351    0.03831  -0.874    0.382    
child        15.26421    0.42152  36.212   <2e-16 ***
youth        15.41773    0.62667  24.603   <2e-16 ***
cook         15.65582    0.40568  38.592   <2e-16 ***
do_it        15.00801    0.58779  25.533   <2e-16 ***
refernce     14.51158    0.69844  20.777   <2e-16 ***
art          14.46004    0.63037  22.939   <2e-16 ***
geog         15.18183    0.52843  28.730   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 86.87 on 49990 degrees of freedom
Multiple R-squared:  0.2656,	Adjusted R-sq

This linear regression model aims to predict the value of `total_` by considering various predictors, including demographic factors and interests. Here's how to interpret the results for each variable:

The intercept, which is 149.62753, represents the expected value of `total_` when all independent variables (`IsFemale`, `first`, `child`, `youth`, `cook`, `do_it`, `refernce`, `art`, `geog`) are zero. This serves as a starting point for predictions and is statistically significant with a p-value `<2e-16`.

Regarding gender (`IsFemale1`), being female (coded as `1`) increases the predicted value of `total_` by approximately 0.67856, but this effect is not statistically significant (p = 0.433), suggesting it doesn't significantly impact `total_` when other variables are controlled.

The variable `first`, representing months since the customer's first purchase, has a coefficient of -0.03351. This implies that each unit increase in `first` decreases the predicted value of `total_` by approximately 0.03351, but it's not statistically significant (p = 0.382).

However, interests such as `child`, `youth`, `cook`, `do_it`, `refernce`, `art`, and `geog` exhibit positive coefficients ranging from 14.46004 to 15.65582. This suggests that an increase in each of these variables by one unit increases the predicted value of `total_` by their respective coefficients, and all are highly statistically significant (p `<2e-16`).

The model summary indicates that about 26.56% of the variability in `total_` can be explained by the predictors. The F-statistic of 2009 with a p-value `< 2.2e-16` suggests that the overall model is statistically significant.

In conclusion, while demographic factors like gender and first-time status may not significantly influence `total_`, interests or activities such as those represented by `child`, `youth`, `cook`, `do_it`, `refernce`, `art`, and `geog` have a notable impact. This suggests that customers' purchasing behaviors and preferences for certain book categories play a significant role in determining their total spending at Florence Books.

# A logistic regression with variables representing recency, monetary, IsFemale, and all types of books purchased. 

In [10]:

pred2 <- glm(formula = subscribe ~ last + total_ + IsFemale + child + youth + cook + do_it + refernce + art +geog  ,family = binomial(link = "logit"), data =df )

summary(pred2)


Call:
glm(formula = subscribe ~ last + total_ + IsFemale + child + 
    youth + cook + do_it + refernce + art + geog, family = binomial(link = "logit"), 
    data = df)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.6001096  0.0520980 -30.713  < 2e-16 ***
last        -0.0947124  0.0027924 -33.918  < 2e-16 ***
total_       0.0011160  0.0001982   5.630 1.80e-08 ***
IsFemale1   -0.7607204  0.0357608 -21.272  < 2e-16 ***
child       -0.1862162  0.0172824 -10.775  < 2e-16 ***
youth       -0.1129745  0.0261087  -4.327 1.51e-05 ***
cook        -0.2703210  0.0171283 -15.782  < 2e-16 ***
do_it       -0.5391648  0.0269657 -19.994  < 2e-16 ***
refernce     0.2346876  0.0265583   8.837  < 2e-16 ***
art          1.1555840  0.0221439  52.185  < 2e-16 ***
geog         0.5742763  0.0186311  30.824  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 30355  on 

The logistic regression analysis reveals several key insights into the factors influencing customers' likelihood of subscribing to Florence Books' online book club. Firstly, the data suggests that customers who have made recent purchases in offline stores are more inclined to subscribe, with each additional month since the last purchase correlating with a decrease in the likelihood of subscription. Conversely, higher total spending offline positively impacts the odds of subscribing, indicating a strong association between monetary investment and online engagement. Gender also emerges as a significant predictor, with male customers showing a higher propensity to subscribe compared to females. Furthermore, the types of books purchased play a crucial role, with categories like art and geography books positively influencing subscription likelihood, while genres such as children's, cooking, and DIY books exhibit a negative impact. These findings collectively underscore the importance of targeting recent, high-spending male customers who demonstrate a preference for certain book categories in Florence Books' promotional efforts for its online book club.

In [13]:
# predicting the probabilities of subscription, classifying customers, and profiling them into deciles based on their likelihood of subscribing to the online book club.
library(dplyr)
# predict probabilities
df$PredProb <- pred2 %>% predict(df, type = "response")
# predict class: if prob > 50% then subscribed, otherwise, not subscribed
df$IsSubscribe_Pred <- ifelse(df$PredProb>.5,1,0)
#calculate deciles of predicted probabilities, negative so that decile 1 has the highest values 
df$PredProbDecile<-ntile(-df$PredProb,10)
head(df)

Unnamed: 0_level_0,acctnum,gender,first,last,book_,nonbook_,total_,purch,child,youth,cook,do_it,refernce,art,geog,subscribe,IsFemale,PredProb,IsSubscribe_Pred,PredProbDecile
Unnamed: 0_level_1,<int>,<chr>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<int>,<chr>,<dbl>,<dbl>,<int>
1,10001,M/NB,49,29,109,248,357,10,3,2,2,0,1,0,2,0,0,0.020029,0,8
2,10002,M/NB,39,27,35,103,138,3,0,1,0,1,0,0,1,0,0,0.01660684,0,9
3,10003,F,19,15,25,147,172,2,0,0,2,0,0,0,0,0,1,0.01582522,0,9
4,10004,F,7,7,15,257,272,1,0,0,0,0,1,0,0,0,1,0.07687632,0,4
5,10005,F,15,15,15,134,149,1,0,0,1,0,0,0,0,0,1,0.02012333,0,8
6,10006,F,7,7,15,98,113,1,0,1,0,0,0,0,0,1,1,0.04694578,0,6


## Interpretations and recommendations for promotional targeting:
 
### Top Decile (Decile 10):
Customers in this segment exhibit the highest predicted probabilities of subscribing to the online book club. They represent the most receptive audience and are primed for conversion. To capitalize on this segment:

- **Tailored Incentives:** Offer exclusive discounts, loyalty rewards, or bundled subscription packages to incentivize immediate action and reinforce brand loyalty.
- **VIP Treatment:** Provide personalized communications, early access to new releases, and priority customer support to enhance their subscription experience and foster long-term engagement.
- **Referral Programs:** Encourage enthusiastic subscribers to advocate for the book club by offering referral bonuses or hosting referral contests to expand the subscriber base organically.

### Bottom Decile (Decile 1):
Customers in this group have the lowest predicted probabilities of subscribing, presenting a challenge for conversion. However, with targeted efforts, they can still be engaged. Strategies for this segment include:

- **Reactivation Campaigns:** Deploy targeted emails with compelling offers, limited-time discounts, or free trials to reignite interest and encourage reconsideration.
- **Educational Content:** Provide informative content highlighting the value proposition of the book club, testimonials from satisfied subscribers, and success stories to address potential objections and build trust.
- **Follow-Up Surveys:** Seek feedback from non-subscribers to understand their reservations or barriers to subscription and tailor future marketing initiatives accordingly to address concerns effectively.

### Middle Deciles (Deciles 2-9):
Customers falling within these deciles exhibit varying levels of propensity to subscribe. To effectively target this diverse group:

- **Segment-Specific Messaging:** Customize promotional messages based on demographic characteristics, past purchase behavior, and interests to resonate with each subgroup.
- **Multichannel Engagement:** Utilize a mix of communication channels, including email, social media, and targeted advertisements, to reach customers where they are most receptive and reinforce brand visibility.
- **Progressive Nurturing:** Implement a graduated approach to lead nurturing by gradually introducing the value proposition of the book club, providing relevant content, and gradually increasing incentive offerings to move customers closer to conversion.

### Overall Recommendations:
- **Continuous Iteration:** Regularly analyze customer feedback, campaign performance metrics, and predictive modeling insights to refine targeting strategies and optimize promotional initiatives.
- **Customer-Centric Approach:** Prioritize customer needs and preferences by delivering personalized experiences, fostering meaningful connections, and actively listening to feedback to drive engagement and retention.
- **Experimentation and Optimization:** Embrace a culture of experimentation by testing new messaging, offers, and promotional channels, and leveraging insights from A/B testing to iterate and refine marketing strategies over time.
- **Long-Term Relationship Building:** Focus on cultivating enduring relationships with subscribers beyond the initial conversion by delivering ongoing value, fostering community engagement, and adapting to evolving customer preferences to drive sustained growth and loyalty.

By implementing these tailored recommendations based on predictive decile analysis, Florence Books can optimize its promotional targeting efforts, drive higher subscription conversions, and foster lasting relationships with its online book club members.



# Email Campaign 




Subject Line: Unlock a World of Literary Adventures with Our Online Book Club!

Hello [Recipient's Name],

Are you a book lover looking for a new way to explore the magical realms of literature? We have an exciting opportunity for you! Join Florence Books' Online Book Club and embark on an unforgettable journey through captivating stories and enriching experiences.

Imagine having a vast library of visually stunning e-books at your fingertips, carefully curated by our team of passionate bibliophiles. From timeless classics to contemporary bestsellers, our diverse collection has something for every reader to enjoy. Rediscover the joy of reading in a whole new way with e-books tailored to your unique tastes and preferences.

But don't just take our word for it! Hear what our satisfied members have to say:

"I've been a member for months, and I couldn't be happier! The quality of the e-books and the sense of community make it a truly enriching experience." - Emily

"I love being part of the book club! It's like having a book club meeting wherever I go, and I've discovered so many amazing reads." - Mark

As a valued customer, we're extending a special offer just for you. Sign up today, and receive an exclusive discount on your first month's subscription. Don't miss this chance to unlock a world of literary wonders and embark on a journey of discovery and imagination.

Ready to dive in? Click the link below to join our online book club and start your literary adventure today! Experience the magic of storytelling and connect with fellow book lovers from around the globe.

[Sign Up Now]

But that's not all! As a member of our online book club, you'll also enjoy:

- Access to member-only events and author Q&A sessions, where you can interact with your favorite authors and discuss their works in depth.
- Engaging discussions with fellow book enthusiasts, providing you with insights, recommendations, and perspectives on the books you love.
- Personalized book recommendations tailored to your interests and reading preferences, ensuring that you always have your next great read at your fingertips.

Happy reading!

Warmly,
The Florence Books Team

P.S. Don't let this opportunity slip away! Join our vibrant community of book lovers and unlock a world of literary adventures today.






Target Audience:

1. **Avid Readers (Top 10%):** Let's first focus on our most enthusiastic book lovers – the customers who are most likely to join our online book club based on their reading habits and interests.

2. **Casual Readers (Middle 80%):** Next, we'll reach out to those who have shown some interest in books but haven't yet taken the plunge into our book club. With a personalized approach, we can ignite their passion for reading and showcase the benefits of our community.

3. **Potential Readers (Bottom 10%):** Even those who haven't shown much interest in reading recently could be enticed by our irresistible offers and engaging content. Let's not leave any book lover behind and give them a chance to rediscover the joy of reading.

Timing:

- **Strategic Rollout:** To maximize engagement, we'll send the email at peak times when our customers are most likely to be browsing their inboxes, based on historical data and insights. Alternatively, we can stagger the send times to reach readers across different time zones and ensure optimal open rates.

- **Personalized Follow-up:** We'll tailor our follow-up emails based on each customer's interactions and behavior. For those who haven't opened the initial email, we'll send friendly reminders. And for those who have shown interest but haven't subscribed yet, we'll offer additional incentives to seal the deal.

Design:

- **Mobile-Friendly:** In today's on-the-go world, we'll ensure our email is optimized for mobile devices, allowing readers to enjoy a seamless experience no matter where they are.

- **Visually Captivating:** With stunning graphics, images, and a visually appealing layout, we'll capture the attention of our readers and immerse them in the world of literature.

- **Irresistible Call-to-Action:** A prominent and compelling CTA button will encourage readers to take the next step and join our online book club with just a click.

Metrics to Track:

1. **Open Rate:** We'll measure the percentage of recipients who open our email to gauge initial engagement and interest.

2. **Click-Through Rate (CTR):** By tracking the percentage of recipients who click on our CTA button, we'll know how effectively we're enticing readers to join our book club.

3. **Conversion Rate:** Monitoring the percentage of subscribers who successfully complete the sign-up process will help us assess the overall effectiveness of our campaign.

4. **Subscriber Growth:** Evaluating the overall increase in our subscriber base after the email campaign will quantify its impact on our business growth.

5. **Engagement Metrics:** Post-subscription, we'll analyze metrics like e-book downloads, community participation, and retention rates to measure the long-term success and value of our online book club for our members.

By implementing this targeted email campaign, Florence Books can leverage predictive analytics insights to drive conversions, foster customer engagement, and cultivate a vibrant and loyal online book club community.