<a id ='top'></a>

# 1.0 Personas and as-is Customer Journeys
The foundation of our personas uses the analysis of Lavazza’s Twitter followers’ descriptions  as a building block, especially for Andrea and Arianna. Marta, instead, was framed relying on our personal knowledge and experience, as she is close to us, and on survey-like questions to the friends of ours that matched her main features. The specific characterizations are based on empathization, done on several data sources (mostly: internal material, other insights obtained through Twitter Api, readings of Amazon reviews of several coffee products) as well as by itself. For example, we can use Facebook ads creation to create a query combining interest for aperitif or coffee to being a universitary student 18-25 woman in Italy, thus obtaining music interests for one of our personas.


Hereby we present three personas that we identified, and their as-is customer journeys. In a section 2.0, instead, we will try to figure out how these personas might interface with our product, thus building customer journeys for our kits.

## Index

##### Codes:


- [Tf-idf on users description](#tf-idf)


- [Pages liked by lavazza followers](#pages_liked)

##### Personas:

- [Arianna, the traveller](#Arianna)
    > Description
    >
    > Customer Journey as-is
    >
    > Market size
- [Andrea, the business man](#Andrea)
    > Description
    >
    > Customer Journey as-is
    >
    > Market size
- [Marta, the off-site student](#Marta)
    > Description
    >
    > Customer Journey as-is
    >
    > Market size

<a id = 'tf-idf'> </a> 

## Tf-idf on users descriptions

In this section we use information retrieval on the descriptions of Lavazza's followers. Notably, after some pre-processing and cleaning, we use tf-idf, which is a technique that allows to filter out non-relevant information. Indeed, it returns only words (or, in better NLP language, tokens) that are very frequent in some descriptions but don't occur in most of the descriptions, otherwise they would be just noise.

Given the preprocessing and given the nature and diversity of this dataset, running a tf-idf is a bit of an overkill, as we obtain the same information given by simply collecting term frequency; though we can easily obtain both at the same time, and it's extremely fast anyway, so there was no reason not to implement it.

From the info obtained, we framed 2 of our personas: Andrea and Arianna. Notably, we will be able to see a clear interest in travelling, as well as a general background in business-related jobs

In [7]:
''' loading db and libraries'''

import sqlite3
import re
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
import pandas as pd

# path
b_dir= r'C:/Users/Eugen/Documents/Uni/1 Marketing/API stuff' 

database_to_use = 'Brand_Followers.db'

# connect to the db
con = sqlite3.connect(database_to_use) 
cur = con.cursor()

''' loading db and libraries'''

In [6]:
def extract_descriptions(page, con):
    
    """ Connects to the DB and extracts to a list the descriptions of specified page users."""
    
    query="""
        SELECT description 
        FROM Users 
        WHERE twitter_page == '{}' 
        GROUP BY (user_id)  
        """.format(page)  #querying like this solves duplication issues
    
    # Reads query and takes column 'descriptions' transforming it to a list
    users_descriptions=pd.read_sql_query(query, con).description.to_list()
    print("we have {} user descriptions".format(len(users_descriptions))) # useless knowledge
    
    return users_descriptions


def bruteforce_clean(l):
    
    """keeps only words with latin letters ( --> no arab/idiograms/emojis)
    through regular expressions and returns a list."""
    
    # Strangely enough there are many hyperlinks to get rid of
    descr=[re.sub("http\S+", '', d).lower() for d in l if d] 
    
    # Keeps only these [a-zA-Z]
    only_latin_letters=[re.sub('[^a-zA-Z]+', ' ',d) for d in descr if d]
    
    # Creates a set of stopwords (every language in NLTK library)
    stopWords = set(stopwords.words())
    
    # Remove stopwords and lemmatize
    #clean=[" ".join([w.lemma_ for w in nlp(words) if w not in stopWords]) for words in only_latin_letters]
    # Removes stopwords
    clean=[" ".join([w for w in words.split() if w not in stopWords]) for words in only_latin_letters]
    print('We have {} non empty descriptions'.format(len(clean))) 
    
    return clean

In [7]:
# extract descriptions from Lavazza's followers 
page = "lavazzagroup"
descr = extract_descriptions(page, con)
descr

we have 13991 user descriptions


['http://t.co/yo2Caie6NV',
 'enlarge your beauty #Luminol https://t.co/6Rkb1lbBep',
 'Scuola 2.0 - Blogger-Social media observer. Comunicazione&SMM. Life lover (Opinioni mie, RT is not endorsement)',
 'Lavoro in televisione, non escludo in futuro di lavorare in altri piccoli elettrodomestici. Digital Area @ Mediaset. Add ➡️ https://t.co/HHW2Sfpcqa',
 'Human-Centered Design certified Facilitator by @ideorg & @plusacumen - Design Thinking, Design Sprint, Lego Serious Play, @lean event producer',
 'Presidente del #DigitalTransformation Institute. Docente (Sapienza e Carlo Bo). Giornalista (direttore di https://t.co/PvaGAiWyZS). Advisor per le Nazioni Unite',
 '🇨🇦 Flutist, conductor and publisher of https://t.co/Zfju8IeZBe',
 'Keep youself low but not too low',
 'Leicester bloke. Enjoys taking photos, getting outdoors, history, prehistory, good music, a good read & a spot of red wine. Recurve archer.',
 '#Aschaffenburg, https://t.co/wBkfYjyA9S, https://t.co/TxTsmLLoin,',
 'Blogger (non) po

In [8]:
# Clean
clean=bruteforce_clean(descr)
clean

We have 8809 non empty descriptions


['enlarge beauty luminol',
 'scuola blogger social media observer comunicazione smm life lover opinioni rt endorsement',
 'lavoro televisione escludo futuro lavorare altri piccoli elettrodomestici digital area mediaset add',
 'human centered design certified facilitator ideorg plusacumen design thinking design sprint lego serious play lean event producer',
 'presidente digitaltransformation institute docente sapienza carlo giornalista direttore advisor nazioni unite',
 'flutist conductor publisher',
 'keep youself low low',
 'leicester bloke enjoys taking photos getting outdoors history prehistory good music good read spot red wine recurve archer',
 'aschaffenburg',
 'blogger portatore verbo journalist designer bad tennis table player social media cars lover founder stylology',
 'partner sales quest mvp office microsoftteams headset geek blogger podcaster bds soz wol aufreisen ragnar nuggets wfh homeschooling',
 'entrepreneur kona co founder ceo ironman triathlete helping companies pro

In [9]:
# Only uniigrams. Not setting laguages for vectorizers because we have many, only analyzing words

vectorizer = CountVectorizer(analyzer='word', min_df=0.001, max_df=0.9)

X11 = vectorizer.fit_transform(clean)
print("Unigrams counts shape: {}".format(X11.shape))

tfidf_vectorizer = TfidfVectorizer(analyzer='word',
                                   min_df=0.001,
                                   max_df=0.9,   
                                   sublinear_tf=True) 

X12 = tfidf_vectorizer.fit_transform(clean)

# sets up df
df = pd.DataFrame(data={'word': vectorizer.get_feature_names(), 
                        'tf': X11.sum(axis=0).A1, 
                        'idf': tfidf_vectorizer.idf_,  
                        'tfidf': X12.sum(axis=0).A1  
                       })

#sorting dataframe and showing
df = df.sort_values(['tfidf', 'tf', 'idf'], ascending=False).reset_index(drop=True)
df.head(20)

Unigrams counts shape: (8809, 1344)


Unnamed: 0,word,tf,idf,tfidf
0,coffee,504,4.067486,141.443068
1,food,482,3.999143,119.5944
2,love,357,4.324741,112.034737
3,marketing,333,4.350301,97.579056
4,life,265,4.55819,92.743586
5,lover,232,4.671997,76.793141
6,italian,236,4.717667,73.031784
7,vita,145,5.149169,66.79673
8,digital,193,4.841896,63.827335
9,music,192,4.863287,62.756169


In [10]:
# Only bigrams. Not setting laguages for vectorizers because we have many, only analyzing words
vectorizer2 = CountVectorizer(analyzer='word', min_df=0.001, max_df=0.8, ngram_range=(2,2))

X21 = vectorizer2.fit_transform(clean) # Counts

print("Bi-grams - Counts shape: {}".format(X21.shape))

tfidf_vectorizer_2 = TfidfVectorizer(analyzer='word',  
                                   min_df=0.001,  
                                   max_df=0.8,   
                                   sublinear_tf=True,
                                   ngram_range=(2,2))  

X22 = tfidf_vectorizer_2.fit_transform(clean) #tfidf 

# sets up df
df2 = pd.DataFrame(data={'word': vectorizer2.get_feature_names(), 
                        'tf': X21.sum(axis=0).A1,  
                        'idf': tfidf_vectorizer_2.idf_,  #idf
                        'tfidf': X22.sum(axis=0).A1 
                       })

#sorting dataframe and showing
df2 = df2.sort_values(['tfidf', 'tf', 'idf'], ascending=False).reset_index(drop=True)
df2.head(20) 

Bi-grams - Counts shape: (8809, 79)


Unnamed: 0,word,tf,idf,tfidf
0,social media,96,5.519295,73.595681
1,food wine,41,6.345973,33.547477
2,digital marketing,36,6.472725,30.999132
3,made italy,31,6.617907,29.723646
4,co founder,29,6.751438,25.6226
5,coffee lover,24,6.864767,23.395884
6,official twitter,27,6.751438,22.453823
7,food beverage,20,7.03912,19.087723
8,food travel,21,6.9926,17.558621
9,italian food,19,7.08791,17.389676


<a id ='pages_liked'></a>
## Liked pages

- [back on top](#top)

In [77]:
from collections import Counter

def common_pages(df, page, n):
    
    """Inputs:
    -"page" of interest (eg 'lavazzagroup'), 
    -"n" most liked pages for the queried page users 
    - "con" connection to DB
    through regular expressions splits-extracts pages.
    returns a distionary with the n most common pages counts"""
    
    users_likes_list=df.pages_liked.to_list() #transform to list
    
    #for each user (with not empty pages_like field) we create a list where every element is a liked page [list of list of string]
    liked_reed= [user_likes.split(" _***_ ") for user_likes in users_likes_list if user_likes] 
    print(f"We have {len(liked_reed)} users with pages likes in '{page}'.\n") #how many users were not empty
    
    #instance counter and counts
    cnt=Counter()
    for i in liked_reed:
        cnt.update(i)
        
    return cnt.most_common(n) 

In [78]:
page = 'lavazzagroup'

query="""
SELECT pages_liked, description 
FROM Users 
WHERE twitter_page == '{}' 
GROUP BY user_id """.format(page)  #querying that solves eventual duplicate ids issues
df = pd.read_sql_query(query, con)


lavazza = common_pages(df, page, 50)
lavazza

We have 13984 users with pages likes in 'lavazzagroup'.



[('lavazzagroup', 13775),
 ('BarackObama', 4363),
 ('YouTube', 3523),
 ('repubblica', 3410),
 ('nytimes', 3394),
 ('Twitter', 3368),
 ('Corriere', 3222),
 ('Pontifex_it', 3191),
 ('SkyTG24', 3056),
 ('matteorenzi', 2852),
 ('lorenzojova', 2821),
 ('LaStampa', 2761),
 ('sole24ore', 2757),
 ('Agenzia_Ansa', 2627),
 ('Expo2015Milano', 2626),
 ('fattoquotidiano', 2454),
 ('CNN', 2446),
 ('BBCBreaking', 2442),
 ('BillGates', 2359),
 ('realDonaldTrump', 2315),
 ('instagram', 2315),
 ('RaiNews', 2312),
 ('ValeYellow46', 2300),
 ('radiodeejay', 2300),
 ('Fiorello', 2295),
 ('cnnbrk', 2290),
 ('redazioneiene', 2250),
 ('BBCWorld', 2233),
 ('TheEconomist', 2223),
 ('Google', 2213),
 ('Internazionale', 2206),
 ('NASA', 2169),
 ('robertosaviano', 2117),
 ('katyperry', 2113),
 ('WSJ', 2112),
 ('NicoSavi', 2023),
 ('reportrai3', 2010),
 ('NatGeo', 2007),
 ('ladygaga', 1990),
 ('Starbucks', 1980),
 ('illyIT', 1979),
 ('ilpost', 1972),
 ('dolcegabbana', 1946),
 ('Reuters', 1902),
 ('beppesevergnini', 

In [86]:
# We have constructed a somewhat slow function will
# allow us to get the most liked pages by a subset
# of Lavazza's followers who have certain words
# in their description, to better characterize and 
# understand our personas

import spacy
nlp = spacy.load('en_core_web_sm', disable=['parser', 'ner'])

df2 = df[df.notnull()]["description"]

def filter_descriptions(df2, word1, word2, word3):
    i = 0
    l = []
    for description in df2.to_numpy():
        for word in nlp(str(description)):
            if word1 in word.lemma_.lower() or word2 in word.lemma_.lower() or word2 in word.lemma_.lower(): 
                l.append(i)
        i += 1
    print(len(l))
    return l

In [87]:
l = filter_descriptions(df2, "travel", "viaggi", "viaggiare")
df3 = df.loc[l]
common_pages(df3, page, 50)

# considering only Lavazza's followers who have travel related
# words in their descriptions, we notice 2 particularly 
# interesting results: about 1 on 4 likes TravelLeisure and/or
# luxury__travel

402
We have 401 users with pages likes in 'lavazzagroup'.



[('lavazzagroup', 382),
 ('BarackObama', 142),
 ('nytimes', 140),
 ('lonelyplanet', 132),
 ('Twitter', 111),
 ('repubblica', 105),
 ('BBCBreaking', 104),
 ('NatGeo', 104),
 ('instagram', 103),
 ('Expo2015Milano', 102),
 ('CNN', 101),
 ('BBCWorld', 99),
 ('SlowFoodItaly', 97),
 ('YouTube', 97),
 ('Corriere', 97),
 ('TripAdvisor', 96),
 ('CNTraveler', 93),
 ('TravelLeisure', 92),
 ('cnnbrk', 91),
 ('Starbucks', 87),
 ('lorenzojova', 86),
 ('WSJ', 85),
 ('NatGeoTravel', 83),
 ('TIME', 83),
 ('SkyTG24', 83),
 ('HuffPost', 83),
 ('NewYorker', 82),
 ('TheEconomist', 82),
 ('Forbes', 82),
 ('Google', 82),
 ('Pontifex_it', 82),
 ('ilpost', 81),
 ('katyperry', 81),
 ('TheEllenShow', 80),
 ('lonelyplanet_it', 80),
 ('cntraveller', 80),
 ('jamieoliver', 79),
 ('LaStampa', 79),
 ('foodandwine', 78),
 ('viaggiatori', 78),
 ('wireditalia', 77),
 ('Alitalia', 77),
 ('ilGamberoRosso', 76),
 ('Internazionale', 76),
 ('Agenzia_Ansa', 76),
 ('Italia', 75),
 ('BillGates', 75),
 ('sole24ore', 75),
 ('Reute

In [88]:
l = filter_descriptions(df2, "marketing", "manager", "business")
df3 = df.loc[l]
common_pages(df3, page, 50)

# considering business related folks.
# Some highlights include Expo2015Milano, spinozait, mashable, 
# TEDTalks, TechCrunch

548
We have 547 users with pages likes in 'lavazzagroup'.



[('lavazzagroup', 536),
 ('BarackObama', 227),
 ('repubblica', 193),
 ('nytimes', 184),
 ('sole24ore', 179),
 ('Twitter', 171),
 ('wireditalia', 167),
 ('Corriere', 164),
 ('YouTube', 161),
 ('Google', 159),
 ('ninjamarketing', 152),
 ('Forbes', 152),
 ('WIRED', 148),
 ('Internazionale', 148),
 ('Expo2015Milano', 147),
 ('SkyTG24', 146),
 ('matteorenzi', 145),
 ('instagram', 145),
 ('TheEconomist', 142),
 ('BillGates', 141),
 ('LaStampa', 137),
 ('spinozait', 135),
 ('mashable', 135),
 ('Agenzia_Ansa', 135),
 ('Pontifex_it', 133),
 ('fattoquotidiano', 131),
 ('BBCBreaking', 130),
 ('WSJ', 128),
 ('skande', 128),
 ('CNN', 126),
 ('TechCrunch', 126),
 ('hootsuite', 124),
 ('BBCWorld', 120),
 ('startup_italia', 119),
 ('Starbucks', 118),
 ('lorenzojova', 116),
 ('RiccardoLuna', 115),
 ('TIME', 114),
 ('Barilla', 114),
 ('SlowFoodItaly', 113),
 ('TEDTalks', 112),
 ('beppesevergnini', 112),
 ('RudyBandiera', 108),
 ('cnnbrk', 106),
 ('LinkedIn', 105),
 ('Reuters', 104),
 ('richardbranson', 

<a id ='Arianna'></a>

# <font color = #00B0F0 > Arianna, the traveller 
[Back on top](#top)
 
## <font color = #00B0F0 >  1.1	 Arianna’s description 
Personal description:
- Citizen of the world
-	Student and/or worker, usually engaged in several activities, whether they are for fun, for helping out others or to earn money for future travels
-	Lives in the present, but she likes to watch photos and remember past experiences as much as she likes to plan her future travels
-	Gets easily bored if alone and/or has nothing to do
-	Curios, open minded, always on the lookout for new experiences
-	Shares her day-to-day life on Facebook and/or Instagram
- She likes food and drinks


Travelling traits (These were identified also thanks to the work of [(Park et al., 2010)](https://www.researchgate.net/publication/232823986_Travel_Personae_of_American_Pleasure_Travelers_A_Network_Analysis), drawing the relevant combination that fitted to Arianna’s description, while keeping in mind that the study referred to the American market):
-	All Arounder
-	Sight Seeker
-   Food Traveller


Arianna as a coffee consumer:
-	Coffee is a perfect match to her life-enthusiast personality, energic and positive
-	Has a 360° appreciation for coffee
-	Coffee helps her ushering in an eventful day, while it is a social moment in the afternoon
-	Coffee helps her to get the most out of travels
-	She likes good coffee taste, but she’s used to adapting



Note: our analysis above showed that about 1 out of 4 of travellers who follow Lavazza are particularly interested in luxury. here we described a more general version of this persona, that can account also for the other travellers, but it will be important to keep in mind that a wealthier and more luxury oriented specification of this persona exists, among Lavazza's fanbase

## <font color = #00B0F0 > 1.2	Arianna’s Customer Journey as-is
We identified 2 relevant customer journeys for this persona, as she has different needs and pain-points when she travels compared to when she drinks coffee during her normal days. 
    
    
*Arianna day to day customer journey:*


Arianna’s use of coffee is guided by different drivers, but let’s start with the moment of purchase. Her user need in this case is, obviously, refilling her coffee powder stocks. To achieve this, she will take one of the following routes:


- **Supermarket**, she usually buys it here, since it’s more convenient, as she can buy it together with her usual grocery shopping. Sometimes, though, she can lose track of some things, leading to pain points, for example having to walk to the supermarket if she’s out of stock, or, possibly, having to renounce at it altogether. Usually, walking by itself isn’t a problem for a person like her, but having to do so for a contingency certainly isn’t appealing. An opportunity to address these pain points is an app/service which, using localization and other data, reminds her to buy coffee powder when at the supermarket and almost out of powder. Of course she may not like the idea of being ‘tracked’, so it would work better if it just pops out a reminder when she is likely to have finished her previously purchased coffee powder.
- **Online**,   buying online can potentially save her time, allow her to choose better and among more alternatives, and it can be more convenient as well. There are a lot of potential pain points though, for example there are longer timings involved she needs to be home when it is being delivered, there are expedition fees which can be annoying, as she might consider them as ‘wasted’ money, but there are also problems with setting the threshold for a free delivery or setting offers in general . Finally, she has some environmental sensitivity, so she may be bothered by excessive delivery packaging. As for the opportunities in this step, one needs to be careful with the offers and the free expedition thresholds, on top of sticking to essential packaging, possibly having cardboard of the right dimensions for most orders.

As for her use of coffee, we distinguish among 4 main potential moments:
-	**Morning**, she consumes coffee as part of her morning ritual, to kickstart her day. She may need to wake up early to be able to prepare it while being on time, something which she cares a lot about. An opportunity here would be to have her use a machine or prepare it the day before and keep it warm
-	**Afternoon 1/2**, she uses coffee after eating, as a pleasure and a habit. Analysing Lavazza internal data and focusing on the Italian market, we can see that 24% of people interviewed in a survey conducted in 2016 drank coffee in the afternoon. In particular, consumers reported drinking coffee during this time of the day for several reasons: 
    *	they want a boost of energy
    *	they want to take a break
    *	it is a habit
    *	they want to relax
    *	they want to spend quality time together with other people
-	**Afternoon 2/2**, she may take this coffee to relax, before resuming to her activity or switching to another one. Pain points include the fact that she may want to spend more time with this relaxing ritual, plus, after a bit, it can get old for a curios persona like her.
This goes without saying that it may be the 3d or 4th coffee of the day, so it can start to get unhealthy. 
Opportunities include the ideation of a product which combines coffee with novelty and a longer, more active ritual, while possibly reducing the coffee intake for this moment Drinks coffee after dinner, if she has to work 'til late
-	**Evening**, she may consume coffee if she has to work ‘till late, although there is the obvious pain point given by the fact that it interferes with her sleep


*Arianna customer journey during travel:*

Here she may want to taste a decent coffee product that helps her to reach the right mood and get the most out of otherwise filler moments. Often, coffee lovers can’t resist the urge of having a coffee, even while flying, despite it being of extremely low quality. Furthermore, an active persona as our Arianna is easily offset by unavoidable waiting times, resorting to social media self-amusement which quickly leads to boredom anyway. We will propose the customer journey for this scenario in a more compact way:

 During travel |<font color = #ED7D31 > Morning      | <font color = #ED7D31 >Before departure | <font color = #ED7D31 >During flight
--|:---------:|:-----------:|:----------:
<font color = #538135> User needs      |<font color = #538135> Wake up time may be pretty early, coffee to get in the mood |  <font color = #538135> Coffee to ease the pain of waiting  | <font color = #538135>Coffee to help time pass or to enjoy more the travelling moment
<font color = #FF0000>Pain points   | <font color = #FF0000>Need to wake up even earlier | <font color = #FF0000>Bad quality coffees if bought in-place |<font color = #FF0000> Bad tasting and costly coffees if bought in-place (e.g. due to the [bad water](https://www.thrillist.com/travel/nation/why-airplane-coffee-is-bad) in the airplane case )
<font color = #2F5496>Opportunities  | <font color = #2F5496>High quality coffee machines, to combine taste and speed | <font color = #2F5496>[Travel mugs](https://www.goodhousekeeping.com/travel-products/travel-coffee-mug-reviews/g785/best-travel-coffee-mugs/) |


## <font color = #00B0F0 > 1.3	Arianna’s Market size
    
*In order to estimate market sizes for our personas, we will bring up two or more estimates: some are broader and consider only quintessential characteristics of our personas, while the others restricts the market to people who are really similar to our pictured persona*

Considering 18-35 y.o. italian speakers living in Italy, who are who are frequent travellers and travel frequently abroad, interested in coffee, cocktails and airplane travels, we get an estimate of 360.000 people.
If we restrict also to those who have had a mobile device for more than 19 months, and who use Facebook via mobile device (this means that they are more likely to share photos taken with the smartphone on social media, being accustomed to its use on a mobile device), we get a market size estimate of 180.000 people. 
Restricting to women, 20-30 y.o., with interest in Glamour, we get an estimate of 36.000 people, while further restricting to interests in 'sweets' and 'handmade' brings the number down to 5.700.

While we used Facebook Api to get these results, one can obtain them from the [Facebook ads campaign section](https://www.facebook.com/adsmanager/creation?act=235630865#) as well. It suffices to [login](https://www.facebook.com/ads/audience-insights/people?act=264885008&age=18-&country=US), click on audience insights on the top left of the page, click on insertions, create an insertion using a campaign (which one can create on the fly), click on 'use an existing campaign', and with a couple more simple steps one can comfortably get a numerical estimate of the people who have certain behavours or interests. We still deemed important to be able to use the api, though. Indeed, as BA students, we need to be able to get results by coding. In particular, we created a function that, given some characteristics of the target users such as gender, age, interests, location and behaviors, returns the number of people corresponding to those characteristics

In [1]:
'''
Before importing the libraries, it is needed to update 2 scripts (adset.py and campaign.py).

In particular, to update the first script go to the following page and copy the entire script:
- https://github.com/facebook/facebook-python-business-sdk/blob/master/facebook_business/adobjects/adset.py
Then, using jupyter, go to the Anaconda3/Lib/site-packages/facebookads/adobjects folder 
and find the adset.py script. Open it, paste the one you copied from github and save it. 

The same procedure must be done for the campaign.py script.
- https://github.com/facebook/facebook-python-business-sdk/blob/master/facebook_business/adobjects/campaign.py 

It is also needed to have the script credentials.py containing
one's App ID, App Secret, personal Access Token and the ad account number 
in the same folder as this notebook

It's best to use the 'run' command instead of ctrl + enter, to quickly get
to the end of this very long cell
'''

from credentials import App_ID,App_Secret,Access_Token,Ad_Account_num
from facebook_business.api import FacebookAdsApi
from facebook_business.adobjects.adaccount import AdAccount
from facebook_business import adobjects
from facebook_business.adobjects.targetingsearch import TargetingSearch
from facebookads.adobjects.adset import AdSet
from facebookads.adobjects.campaign import Campaign
from facebook_business.adobjects.adaccount import AdAccount
from facebook_business.adobjects.adaccountdeliveryestimate import AdAccountDeliveryEstimate
from facebook_business.adobjects.adaccount import AdAccount
from facebook_business.adobjects.adaccountdeliveryestimate import AdAccountDeliveryEstimate

# connect
FacebookAdsApi.init(app_id=App_ID,app_secret=App_Secret,access_token=Access_Token)
my_account = AdAccount(Ad_Account_num)
#print (my_account)

#We can retrieve the potential size of a market from the estimated reach of a virtual ad campaign. 
#We can select several characteristics of the target audience we want to select, 
#such as location, age, relationship_statuses, life_events, interests and so on

def get_country_code(country):
    
    """
    returns a list containing the country code
    """
    if country == None:
        return None
    params = {
        'q': country,
        'type': 'adgeolocation',
        'location_types': ['country'],
    }
    
    # get the response
    resp = TargetingSearch.search(params=params)
    
    # get the country code
    country_code = resp[0]['country_code']
    
    return [country_code]

def get_regions_id(regions_list:list):
    """
    Regions is a list containing the names of the regions you are interested in, use a list even if you want to select
    a single region. 
    
    A maximum of 200 regions can be provided to the function
    
    returns a dictionary containin gthe key and the name of the regions selected
    
    """
    if regions_list == None:
        return None
    #create an empty dictionary which will contain the final output 
    output = {}
    res_list = []

    for reg in regions_list:
        params = {
        'q': reg,
        'type': 'adgeolocation',
        'location_types': ['region'],
        }

        resp = TargetingSearch.search(params=params)

        # select the firts element in the list resp
        d = resp[0]
        
        # keep only the id and the name of the region
        keys = ["key", "name", 'country_code']
        result = { k: d[k] for k in keys}

        res_list.append(result)
#         print(result)
#         print()
        
    output['regions'] = res_list
    #return output
    return res_list
    
    
def get_cities_id(cities_list:list, radius = 10, distance_unit = 'kilometer'):
    """
    cities is a list containing the names of the cities you are interested in, use a list even if you want to select
    a single city
    
    radius selects the distance from the city you want to include
    
    distance_unit is set to default to km, the other option is 'mile'
    
    returns a dictionary containing the key and the name of the cities selected
    
    """
    if cities_list == None:
        return None
    #create an empty dictionary which will contain the final output 
    output = {}
    res_list = []

    for city in cities_list:
        params = {
        'q': city,
        'type': 'adgeolocation',
        'location_types': ['city'],
        }

        resp = TargetingSearch.search(params=params)

        # select the firts element in the list resp
        d = resp[0]

        keys = ["key", "name", 'country_code', "region", 'region_id']
        result = { k: d[k] for k in keys}

        res_list.append(result)
#         print(result)
#         print()
        
    output['cities'] = res_list
    #return output
    return res_list
    
    
def regions_in_same_country(country:str, regions_list:list):
    """
    returns true if all regions in regions_list are in the same country
    """
    
    # get the country_code
    country_code = get_country_code(country)[0]
    
    # get all the info regarding the regions
    regions = get_regions_id(regions_list)
    
    for r in regions:
        if r['country_code'] == country_code:
            pass
        else:
            return False
        
    # if it exit the for loop it means that all the regions belong to the same country    
    return True

def cities_in_same_country(country:str, cities_list:list):
    """
    returns true if all the cities in cities_list are in the same country
    """
    
    # get the country_code
    country_code = get_country_code(country)[0]
    
    # get all the info regarding the cities
    cities = get_cities_id(cities_list)
    
    for c in cities:
        if c['country_code'] == country_code:
            pass
        else:
            return False
        
    return True

def cities_in_same_regions(regions_list, cities_list):
    """
    returns true if all the cities in cities_list are in at least one of the regions
    """
    
    # get all the info regarding the regions
    regions = get_regions_id(regions_list)
    
    # get all the info regarding the cities
    cities = get_cities_id(cities_list)
    
    # get a set of all the regions from cities
    regions_from_cities = set()
    for c in cities:
        regions_from_cities.add(c['region'])
        
    # get a set of all the regions from regions
    regions_from_regions = set(regions_list)
    
    if regions_from_cities == regions_from_regions:
        return True
    
    else:
        return False

def extra_regions(regions_list, cities_list):
    """
    returns the regions that do not contain any city from cities_list
    """
    
    # get all the info regarding the regions
    regions = get_regions_id(regions_list)
    
    # get all the info regarding the cities
    cities = get_cities_id(cities_list)
    
    # get a set of all the regions from cities
    regions_from_cities = set()
    for c in cities:
        regions_from_cities.add(c['region'])
        
    # get a set of all the regions from regions
    regions_from_regions = set(regions_list)
    
    # keep the regions that do not contain any city
    regions_keep = regions_from_regions - regions_from_cities
    
    return list(regions_keep)

def extra_cities(regions_list, cities_list):
    """
    returns the cities that are not contained in any of the regions from regions_list
    """
    
    # get all the info regarding the regions
    regions = get_regions_id(regions_list)
    
    # get all the info regarding the cities
    cities = get_cities_id(cities_list)
    
    # get a set of all the regions from cities
    cities_keep = []
    regions_from_cities = set()
    for c in cities:
        
        if c['region'] not in set(regions_list):
            cities_keep.append(c['name'])
    
    return cities_keep

def combine_geo_locations(country:str, regions_list:list, cities_list:list):
    
    """
    combines the results of get_country_code, get_regions_id and get_cities_id in order to avoid overlaps
    IMPORTANT in the case of an overlap, the smaller of the 3 will be considered 
    """
    # if all the arguments are none, return none
    if (country == None) and (regions_list == None) and (cities_list == None):
        return None
    
    # regions if all the regions are within the country selected and cities_list is None,
    # cities if all the cities are within the regions_list
    output = {}
    
    if (country == None) and (regions_list != None) and (cities_list == None):
        output['regions'] = get_regions_id(regions_list)
        return output
    
    if (country == None) and (regions_list == None) and (cities_list != None):
        output['cities'] = get_cities_id(cities_list)
        return output
    
    # the output need to be a dictonary with key countries if both regions_list and cities_list are None
    if (regions_list == None) and (cities_list == None):
        output['countries'] = get_country_code(country)
        return output
    
    # if regions_list != None but cities_list == None i need to check if all the regions are in the same country
    if (regions_list != None) and (cities_list == None):
        if regions_in_same_country(country, regions_list):
            output['regions'] = get_regions_id(regions_list)
            return output
        #else:
    
    # if regions_list is None but cities_list is not, i have to chech if all the cities are in the same country
    if (regions_list == None) and (cities_list != None):
        
        # if all the cities are in the same country the output is a dictionary with cities as the only key
        if cities_in_same_country(country, cities_list):
            output['cities'] = get_cities_id(cities_list)
            return output
        #else:
        
    # if both regions_list and cities_list are different than None 
    if (regions_list != None) and (cities_list != None):
        
        # check if all the cities are in one one of the regions
        if cities_in_same_regions(regions_list, cities_list):
            output['cities'] = get_cities_id(cities_list)
            return output
            
        # if there are regions containing no cities from cities_list
        if len(extra_regions(regions_list, cities_list)) > 0:
            output['regions'] = get_regions_id(extra_regions(regions_list, cities_list)) 
            output['cities'] = get_cities_id(cities_list)
            return output
        
        # if there are cities not contained in regions_list but all the other cities are in the regions_list
        if len(extra_cities(regions_list, cities_list)) > 0:
            output['regions'] = get_regions_id(extra_regions(regions_list, cities_list)) 
            output['cities'] = get_cities_id(extra_cities(regions_list, cities_list))
            return output
        
def excluded_geo_locations(country_exclude:str, regions_list_exclude:list, cities_list_exclude:list):
    
    if  (country_exclude == None) and (regions_list_exclude== None) and (cities_list_exclude==None):
        return None
    
    output = combine_geo_locations(country_exclude, regions_list_exclude, cities_list_exclude)
    
    return output

def get_interest_id_valid(interest:str):
    """
    returns a dictionary containing the interest id and name of an interest  
    Important: it returns the result only for the first interest
    """
    
    params = {
    'type': 'adinterestvalid',
    'interest_list': [interest],
    }
    
    # get the answer
    resp = TargetingSearch.search(params=params)
    #print('resp is {}'.format(resp))
    
    # select the firts interest
    d = resp[0]
    if d['valid'] == False:
        print('Invalid interest')
        return
        
    #print('d is {}'.format(d))
    
    keys = ["id", "name"]
    result = { k: d[k] for k in keys}
    #result = {'id': d['id']}
    #print('result is {}'.format(result))
    #print(result)
    
    return {"interests":[result]}

def intersecate_interests(interests_list_intersecate):
    """
    intersecates the interests
    """
    
    # create a list where to store the final result
    inter_ids_list = []
    
    # iterate over the inderest ang get the relative ids
    if inter_ids_list != None:
        for i in interests_list_intersecate:
            inter_ids_list.append(get_interest_id_valid(str(i)))
            
    return inter_ids_list

def combine_interests(interests_list_combine:list):
    """
    combines the interets 
    """
    
    # the result needs to be a list of dictionaries
    # create a list where to store the final result
    output = []
    
    if interests_list_combine == None:
        return None
    
    for i in interests_list_combine:
        # get the dictionary containing the id and the name of the interest and append it to the output
        output.append(get_interest_id_valid(str(i))['interests'][0])
        
    return output

def add_interests(interests_list:list, interests_list_or:list):
    
    """
    returns a list containing both the combined and intersecated interests
    """
    if (interests_list == None) and (interests_list_or == None):
        return None
    
    elif (interests_list != None) and (interests_list_or == None):
        
        # first get the intersecated outputs
        output = intersecate_interests(interests_list)
        
        return output
    
    elif (interests_list == None) and (interests_list_or != None):
        
        d = {}
        d['interests'] = combine_interests(interests_list_or)
        return [d]
    
    elif (interests_list != None) and (interests_list_or != None):
    
        # first get the intersecated outputs
        output = intersecate_interests(interests_list) 

        # add the or-interests
        d = {}
        d['interests'] = combine_interests(interests_list_or)
        output.append(d)

        return output

# create a dictionary for all possible classes
ad_targeting_category_dict = {}

classes_list = 'behaviors demographics life_events industries income family_statuses user_device user_os'.split()

# iterate over the possible classes
for cl in classes_list:
    
    params = {
    'type' : 'adTargetingCategory',
    'class': cl
    }
    
    resp = TargetingSearch.search(params=params)
    
    ad_targeting_category_dict[cl] = {r['name']:r for r in resp}
    
def get_all_behaviors():
    """
    returns a list containing all the possible behaviors
    """
    
    return list(ad_targeting_category_dict['behaviors'].keys())

def get_behaviors_id(behaviors_list:list):
    
    if behaviors_list == None:
        return None
    
    res = []
    
    for b in behaviors_list:
        b_id = ad_targeting_category_dict['behaviors'][b]['id']
        res.append({'id':b_id})
        
    return res

def get_all_life_events():
    """
    returns a list containing all the possible life_events
    """
    
    return list(ad_targeting_category_dict['life_events'].keys())

def get_life_events_id(life_events_list:list):
    
    """
    IMPORTANT: if you pass more than one life_event the result will be the sum and not the intersection
    of those life events. This makes sense since many are mutually exlusive (such as Fidanzati da 6 mesi e 
    fidanzati da 1 anno)
    """
    if life_events_list == None:
        return None
    
    res = []
    
    for b in life_events_list:
        b_id = ad_targeting_category_dict['life_events'][b]['id']
        res.append({'id':b_id})
        
    return res

def get_all_industries():
    """
    returns a list containing all the possible industries
    """
    
    return list(ad_targeting_category_dict['industries'].keys())

def get_industries_id(industries_list:list):
    
    if industries_list == None:
        return None
    
    res = []
    
    for b in industries_list:
        b_id = ad_targeting_category_dict['industries'][b]['id']
        res.append({'id':b_id})
        
    return res

def get_all_family_statuses():
    """
    returns a list containing all the possible family_statuses
    """
    
    return list(ad_targeting_category_dict['family_statuses'].keys())

def get_family_statuses_id(family_statuses_list:list):
    
    if family_statuses_list == None:
        return None
    
    res = []
    
    for b in family_statuses_list:
        b_id = ad_targeting_category_dict['family_statuses'][b]['id']
        res.append({'id':b_id})
        
    return res

def get_all_relationship_statuses():
    """
    returns a dictionary containing the relationship statuses and the corresponding ids
    """
    
    # save all possible relationship statuses in a list
    rel_statuses = """ single in_relationship married engaged empty_space not_specified civil_union domestic_partnership 
                    open_relationship It's_complicated Separated Divorced Widowed""".split()
    
    # create a dictionary containing the relationship status and the corresponding id
    d = {rel_statuses[i]: i+1 for i in range(len(rel_statuses))}
    
    del d['empty_space']
    
    return d

def get_relationship_statuses_id(relationship_statuses_list:list):
    
    if relationship_statuses_list == None:
        return None
    
    res = []
    
    for rs in relationship_statuses_list:
        rs_id = get_all_relationship_statuses()[rs]
        res.append(rs_id)
        
    return res

def create_custom_location(min_population:int, max_population:int, country:str, custom_type = 'multi_city'):
    
    """
    IMPORTANT min_population needs to be at least 100000
    """
    # create a dictionary where to store the result
    res = {}
    
    # create an inner dictionary where to store the parameters
    params = {}
    params['custom_type'] = custom_type
    params['min_population'] = min_population
    params['max_population'] = max_population
    params['country'] = get_country_code(country)[0]
    
    res['custom_locations'] = [params]
    
    return res

def get_education_schools_id(school_name:str):
    """
    returns the id and the name of a university
    WARNING: in case of multiple universities with the same name it returns the first result
    """
    
    if school_name == None:
        return None
    
    params = {
        'q': school_name,
        'type': 'adeducationschool',
    }
    resp = TargetingSearch.search(params=params)
    
    # select the first result
    d = resp[0]
    
    # keep only the id and the name 
    keys = ["id", "name"]
    result = { k: d[k] for k in keys}
    return result

def combine_education_schools(schools_list:list):
    """
    combines multiple results from get_education_schools_id into a list
    """
    if schools_list == None:
        return None
    
    # create a list
    output = []
    
    for s in schools_list:
        output.append(get_education_schools_id(s))
        
    return output

def get_all_education_statuses():
    """
    returns a dictionary containing all the education statuses and the corresponding ids
    """
    
    # save all the possible statuses in a list
    edu_statuses = """HIGH_SCHOOL UNDERGRAD ALUM HIGH_SCHOOL_GRAD SOME_COLLEGE ASSOCIATE_DEGREE IN_GRAD_SCHOOL 
    SOME_GRAD_SCHOOL MASTER_DEGREE PROFESSIONAL_DEGREE DOCTORATE_DEGREE UNSPECIFIED SOME_HIGH_SCHOOL""".lower().split()
    
    # create a dictionary containing the relationship status and the corresponding id
    d = {edu_statuses[i]: i+1 for i in range(len(edu_statuses))}
    
    return d

def get_education_statuses_id(education_statuses_list:list):
    """
    returns a list containing the id of the education_statuses selected
    """
    
    if education_statuses_list == None:
        return None
    
    output = []
    
    for es in education_statuses_list:
        es_id = get_all_education_statuses()[es]
        output.append(es_id)
        
    return output

def get_education_majors_id(major:str):
    """
    returns the id and the name of a major
    WARNING: in case of multiple majors with the same name it returns the first result
    """
    
    params = {
        'q': major,
        'type': 'adeducationmajor',
    }

    resp = TargetingSearch.search(params=params)
    
    # select the first result
    d = resp[0]
    
    # keep only the id and the name 
    keys = ["id", "name"]
    result = { k: d[k] for k in keys}
    return result

def combine_education_majors(majors_list:list):
    """
    combines multiple results from get_education_majors_id into a list
    """
    if majors_list == None:
        return None
    
    # create empty list
    output = []
    
    for m in majors_list:
        output.append(get_education_majors_id(m))
    
    return output

def get_work_employers(employer:str, return_all = False):
    """
    returns the id and the name of an employer
    WARNING: in case of multiple employers with the same name it returns the first result
    """
    params = {
        'q': employer,
        'type': 'adworkemployer',
    }

    resp = TargetingSearch.search(params=params)
    
    if return_all == True:
        return resp
    
    else:
        # select the first result
        d = resp[0]

        # keep only the id and the name 
        keys = ["id", "name"]
        result = { k: d[k] for k in keys}
        return result
    
def combine_work_employers(employers_list:list):
    """
    combines multiple results from get_work_employers into a list
    """
    
    if employers_list == None:
        return None
    
    # create empty list
    output = []
    
    for e in employers_list:
        output.append(get_work_employers(e))
    
    return output

def get_work_positions(work_position:str, return_all = False):
    """
    returns the id and the name of a work_position
    WARNING: in case of multiple employers with the same name it returns the first result
    if return_all is set to false
    """
    
    params = {
        'q': work_position,
        'type': 'adworkposition',
    }

    resp = TargetingSearch.search(params=params)
    
    if len(resp) == 0:
        print('Work position not found, try again')
        return
    
    if return_all == True:
        return resp
    
    else:
        # select the first result
        d = resp[0]

        # keep only the id and the name 
        keys = ["id", "name"]
        result = { k: d[k] for k in keys}
        return result
    
def combine_work_positions(work_positions_list:list):
    """
    combines multiple results from get_work_positions into a list
    """
    
    if work_positions_list == None:
        return None
    
    # create empty list
    output = []
    
    for wp in work_positions_list:
        output.append(get_work_positions(wp))
    
    return output

def intersecate_behaviors(behaviors_list_intersecate:list):
    """
    intersecate behaviors and returns a list of dictionaries, all with the same key in 
    """
    
    if behaviors_list_intersecate == None:
        return None
    
    # create a list where to store the final result
    output = []
    
    # iterate over all the behaviors and get the relative id and create a dictionary with key 'behaviors'
    for b in behaviors_list_intersecate:
        temp_d = {}
        temp_d['behaviors'] = get_behaviors_id([b])
        output.append(temp_d)
        
    return output

def combine_behaviors(behaviors_list_combine:list):
    """
    returns a list of dictionaries containing all the behaviors id combined
    """
    output = []
    d = {}
    d['behaviors'] = get_behaviors_id(behaviors_list_combine)
    output.append(d)
    
    return output

def add_behaviors(behaviors_list_intersecate, behaviors_list_combine):
    """
    """
    
    if (behaviors_list_intersecate == None) and (behaviors_list_combine) == None:
        return None
    
    elif (behaviors_list_intersecate!= None) and (behaviors_list_combine) == None:
        #get only the intersecated behaviors
        output = intersecate_behaviors(behaviors_list_intersecate)
        return output
    
    elif (behaviors_list_intersecate== None) and (behaviors_list_combine) != None:
        
        return combine_behaviors(behaviors_list_combine)
    
    else:
        # first get the intersecated behaviors
        output = intersecate_behaviors(behaviors_list_intersecate)
        
        # the add the combined behaviors
        for i in combine_behaviors(behaviors_list_combine):
            output.append(i)
            
        return output
    
def create_flexible_spec(interests_list_intersecate, 
                         interests_list_combine, 
                         behaviors_list_intersecate, 
                         behaviors_list_combine):
    """
    combines the results from add_behaviors and add_interests and returns a list
    """
    output = []
    
    if (add_interests(interests_list_intersecate, interests_list_combine) == None) and (add_behaviors(behaviors_list_intersecate, behaviors_list_combine) == None):
        return None
    
    if (add_interests(interests_list_intersecate, interests_list_combine) == None):
        for b in add_behaviors(behaviors_list_intersecate, behaviors_list_combine):
            output.append(b)
        return output
    
    if add_behaviors(behaviors_list_intersecate, behaviors_list_combine) == None:
        for i in add_interests(interests_list_intersecate, interests_list_combine):
            output.append(i)
        return output
    
    #if add_interests(interests_list_intersecate, interests_list_combine) != None: 
    for i in add_interests(interests_list_intersecate, interests_list_combine):
        output.append(i)
        
    #elif add_behaviors(behaviors_list_intersecate, behaviors_list_combine) != None:
    for b in add_behaviors(behaviors_list_intersecate, behaviors_list_combine):
        output.append(b)

    return output

def get_market_size(Ad_Account_num, 
                    country: str, 
                    regions_list:None, 
                    cities_list:list, 
                    interests_list_intersecate:list, 
                    interests_list_combine:list,
                    age_min:int, 
                    age_max:int,
                    behaviors_list_intersecate:list,
                    behaviors_list_combine:list,
                    life_events_list:list,
                    industries_list:list,
                    family_statuses_list:list,
                    relationship_statuses_list:list,
                    schools_list:list,
                    education_statuses_list:list,
                    majors_list:list,
                    employers_list:list,
                    work_positions_list: list,
                    min_population:int, 
                    max_population:int,
                    custom_location = False,
                    gender_list = [1,2],
                    country_exclude = None,
                    regions_list_exclude = None,
                    cities_list_exclude = None
                   ):
    
    """
    returns the potential market size given the characteristics selected
    
    If you want to custom location you need to set custom_location to True
    """
                
    
    params = {
        'targeting_spec': {
            
            'geo_locations': combine_geo_locations(country, regions_list, cities_list),
            
            'excluded_geo_locations': excluded_geo_locations(country_exclude, regions_list_exclude, cities_list_exclude),
            
            'age_min':age_min,
            'age_max':age_max,

            #'flexible_spec': add_interests(interests_list, interests_list_or),
            'flexible_spec':create_flexible_spec(interests_list_intersecate, 
                                                    interests_list_combine, 
                                                    behaviors_list_intersecate, 
                                                    behaviors_list_combine),
            
            #'behaviors': get_behaviors_id(behaviors_list), 
            
            'life_events': get_life_events_id(life_events_list),
            
            'industries': get_industries_id(industries_list),
            
            'family_statuses': get_family_statuses_id(family_statuses_list),
            
            'relationship_statuses': get_relationship_statuses_id(relationship_statuses_list),
            
            'genders': gender_list,
            
            'education_schools': combine_education_schools(schools_list),
            
            'education_statuses': get_education_statuses_id(education_statuses_list),
            
            'education_majors': combine_education_majors(majors_list),
            
            'work_employers': combine_work_employers(employers_list),
            
            'work_positions': combine_work_positions(work_positions_list),
        }}
    
    #update geo_locations if custom_location is true
    if custom_location == True:
        params['targeting_spec']['geo_locations'] = create_custom_location(min_population, max_population, country)
        
    
    res = AdAccount(Ad_Account_num).get_reach_estimate(params=params)[0]
    
    estimate = res['users']
    
    print('The estimate for this target market is {}'.format(estimate))
    
    return estimate


ModuleNotFoundError: No module named 'credentials'

In [28]:
# set all the relevant parameters
interests_list_intersecate = ['Caffè','Cocktail','Viaggi in aereo']
behaviors_list_intersecate = ['Viaggiatori frequenti', 'Viaggiatori internazionali abituali']
gender_list = [1,2]

In [29]:
get_market_size(Ad_Account_num,
                country = 'Italy',
                regions_list = None,
                cities_list = None,           
                interests_list_intersecate = interests_list_intersecate, 
                interests_list_combine = None,
                age_min = 18,
                age_max = 35,
                behaviors_list_intersecate = behaviors_list_intersecate,
                behaviors_list_combine = None,
                life_events_list = None, 
                industries_list = None, 
                family_statuses_list = None, 
                relationship_statuses_list = None, 
                schools_list = None,
                education_statuses_list = None,
                majors_list = None,
                employers_list = None,
                work_positions_list = None,
                custom_location=False,
                min_population = 120000,
                max_population = 550000,
                gender_list=gender_list,
                country_exclude= None,
                regions_list_exclude=None,
                cities_list_exclude=None,
               )

The estimate for this target market is 370000


370000

In [30]:
interests_list_intersecate = ['Caffè','Cocktail','Viaggi in aereo', 'Glamour (rivista)', 'Dolce','Handmade']
behaviors_list_intersecate = ['Viaggiatori frequenti',
                  'Viaggiatori internazionali abituali',
                   'Accesso a Facebook (mobile): smartphone e tablet' ]

# Here we combine to obtain the info on who has been using a mobile device
# for more than 19 months
behaviors_list_combine = ['Usa un dispositivo mobile (più di 25 mesi)','Usa un dispositivo mobile (19-24 mesi)']
# only females
gender_list = [2]

In [31]:
get_market_size(Ad_Account_num,
                country = 'Italy',
                regions_list = None,
                cities_list = None,           
                interests_list_intersecate = interests_list_intersecate, 
                interests_list_combine = None,
                age_min = 20,
                age_max = 30,
                behaviors_list_intersecate = behaviors_list_intersecate,
                behaviors_list_combine = behaviors_list_combine,
                life_events_list = None, 
                industries_list = None, 
                family_statuses_list = None, 
                relationship_statuses_list = None, 
                schools_list = None,
                education_statuses_list = None,
                majors_list = None,
                employers_list = None,
                work_positions_list = None,
                custom_location=False,
                min_population = 120000,
                max_population = 550000,
                gender_list=gender_list,
                country_exclude= None,
                regions_list_exclude=None,
                cities_list_exclude=None,
               )

The estimate for this target market is 7100


7100

<a id ='Andrea'></a>

# <font color = #D09E00 > 	Andrea, the business man

[Back on top](#top)
## <font color = #D09E00 > 1.4	Andrea’s description
Andrea lives in a big city and shares a fancy flat with his girlfriend. He is a competitive hard worker, always putting in his best effort in everything he does. He is very busy, due to his overachieving nature, but he finds time to attend events and to be with his girlfriend or his friends, who he always tries to impress.
He is a sophisticated person, always seeking the pleasure of the senses; he likes photography and arts in general  and is interested in wine and good food.

We used the social media analysis work reported in (Final_WAS_Y&R_Lavazza_ContentStrategy_2019_040519.pdf, slide 44) to be sure that the mentioned interests are relevant

## <font color = #D09E00 > 1.5	Andrea’s customer journey as-is
Coffee is a part of his lifestyle and a reward for his sophisticated personality.
For him, coffee moments represent:
-	social opportunities to share with his friends or girlfriend, possibly in a cool place like the flagship store 
-	premium experiences in which to taste the amazing flavor of coffee. 

Therefore, he is both a pleasure and status seeker.
He tends to consume 2 coffees per day:

The **first coffee** is in the morning. Andrea wants to start the day with coffee taste in his mouth, and not just the coffee but the preparation as well is part of his morning ritual. 
He is aware that only if made by him, or by a qualified barista, a coffee can satisfy his taste, as he knows the right combination of ingredients and timings, on top of the procedure to give the coffee the particular features he is seeking.
For these reasons he gladly prepares the coffee by himself, enjoying choosing the perfect blend (among the wide variety of Lavazza blends he preserves in airtight containers) and doing all the steps his perfect coffee requires. 
In the morning coffee represents a moment of pleasure to be enjoyed, alone or with his girlfriend, in the peaceful atmosphere of his home. Neither the coffee from the automatic distributors at the workplace nor that taken in the nearest bar can be good substitutes. 

**Emotional driver**: pleasure seeking 

The **second coffee** is taken:

-	during the weekend afternoons, while spending pleasant social time surrounded by his friends or colleagues in a fancy place, possibly trying to impress them by choosing among different brands and varieties, thus showing his coffee expertise, but without missing out on the opportunity to upload some Instagram stories.
Functional drivers: status seeking. He has to show the coffee as a symbol, signaling his tastes and aesthetics.
Emotional drivers: need for interpersonal interactions 
-	during the week, in the post-dinner evenings spent at home. Here coffee represents a moment to be shared with his girlfriend, allowing him to the end the day on a positive note, while possibly using coffee also as a digestive in this occasion.

    **Functional drivers**: need for a digestive, 
    alternative for the coffee he could not have taken in the afternoon 

    **Emotional drivers**: ending the day in a pleasant way, having a sort of  ‘reward’ after a  busy day




**Main pain point in his general costumer journey**


The principal problem for Andrea is not having the opportunity to live to the fullest the afternoon coffee sharing/social moments. Indeed, during working afternoons, he can’t enjoy a premium and quality coffee experience with his friends of colleagues, as his best bet would be to call the serving from the nearest bar, or use the vendor machines during a break.


Therefore, while he is able to satisfy his ‘pleasure need’ by preparing good coffee at home both in the mornings and in the evenings, on the other hand, during the week, he needs some social moments during which high-quality coffee acts as a sort of symbol of his lifestyle, an opportunity to show his social status. Furthermore, we imagined some moments in which Andrea may have a coffee, but, looking at internal data (U&A italia 2016, slide 18), we can fairly rightly think that a certan number of Andreas isn't even able to get coffee during the week: workplace is, for many people, their only chance to have a coffee during the day, and it makes up for 30% of the outside-home consumption


This pain point in his customer journey is easily solved by the product we are proposing.
Indeed, our kit would allow him to live a reinvented coffee experience at a time of the afternoon usually dedicated to aperitives more than to coffees, so he could invite his friends or colleagues at home after work, offering them sophisticated coffee-based drinks.
In this way he would be able to impress them while spending some good time in company enjoying coffee at an otherwise unusual time.


Our product would also be a good alternative to the after-dinner coffee, since it offers him a way to end the days a bit differently, without having to give up on the coffee taste. 
In general, our product would be appealing to him for several reasons: first of all it brings together two otherwise different worlds, the aperitif and the coffee one, which he likes both. On top of this, it allows himself to live high status experiences with his friends, during which he can fully express his capabilities. 


## <font color = #D09E00 > 1.6	Andrea’s Market size

In accordance with the persona developed so far, we believe that our target is composed of those individuals who are interested not only in coffee and aperitifs but also in other activities such as photography and cooking. Moreover, they are engaged, between 26 and 35 years old, live in Italy and work in a field such as management, finance, sales, architecture/engineering or legal services. Using our function, we found out that there are about 2.700 people corresponding to all these criteria.

We have obtained a very low number. However, not everyone on Facebook tends to specify the kind of industry their job belongs to. Consequently, we have decided to re-estimate our market size, but this time without setting any restriction in terms of industries. Given this new criterion, we obtain an estimate of 68.000  individuals, a result showing a promising potential for this new product

In [5]:
# set the parameters for Andrea
interests_list_intersecate = ['Caffè','Aperitivo']
#industries_list = ['Aziende e finanza', 'Management', 'Vendite']
relationship_statuses_list = ['in_relationship', 'married','engaged','open_relationship']
gender_list = [1]

In [9]:
get_market_size(Ad_Account_num,
                country = 'Italy',
                regions_list = None,
                cities_list = None,           
                interests_list_intersecate = interests_list_intersecate, 
                interests_list_combine = None,
                age_min = 26,
                age_max = 35,
                behaviors_list_intersecate = None, 
                behaviors_list_combine=None,
                life_events_list = None, 
                industries_list = None, #industries_list, 
                family_statuses_list = None, 
                relationship_statuses_list = relationship_statuses_list, 
                schools_list = None,
                education_statuses_list = None,
                majors_list = None,
                employers_list = None,
                work_positions_list = None,
                custom_location=False,
                min_population = 120000,
                max_population = 550000,
                gender_list=gender_list,
                country_exclude= None,
                regions_list_exclude=None,
                cities_list_exclude=None,
               )

The estimate for this target market is 140000


140000

In [11]:
# set the parameters for Andrea
interests_list_intersecate = ['Caffè','Aperitivo','Cucina','Fotografia']
industries_list = ['Aziende e finanza', 'Management', 'Vendite']
relationship_statuses_list = ['in_relationship', 'married','engaged','open_relationship']
gender_list = [1]

In [12]:
get_market_size(Ad_Account_num,
                country = 'Italy',
                regions_list = None,
                cities_list = None,           
                interests_list_intersecate = interests_list_intersecate, 
                interests_list_combine = None,
                age_min = 26,
                age_max = 35,
                behaviors_list_intersecate = None,
                behaviors_list_combine=None,
                life_events_list = None, 
                industries_list = industries_list, 
                family_statuses_list = None, 
                relationship_statuses_list = relationship_statuses_list, 
                schools_list = None,
                education_statuses_list = None,
                majors_list = None,
                employers_list = None,
                work_positions_list = None,
                custom_location=False,
                min_population = 120000,
                max_population = 550000,
                gender_list=gender_list,
                country_exclude= None,
                regions_list_exclude=None,
                cities_list_exclude=None,
               )

The estimate for this target market is 1700


1700

<a id ='Marta'></a>

# <font color = #C45911> 	Marta, the off-site student
    
[Back on top](#top)
## <font color = #C45911 > 1.7	Marta’s description
Age: 18/25 years old. 

Personal description: 
-	Marta is an off-site university student, living in a new city.
-	She is committed to her studies and used to study ‘till late. 
-	She is socially active, always looking for social interactions, engaged on social networks, trend follower.
-	Her budget is not so high, she can’t frequently afford expensive night-outs, so she is used to home partying.
-	Status seeker: she is interested in other people’s opinion; she is looking for approval. 
-	Interested in art, culture, and music, especially Italian [pop-indie](https://www.facebook.com/ads/audience-insights/interests?act=235630865&age=18-25&country=IT&education=3&gender=1&interests=6003210597333) one ; she is a concert goer.

Marta as a coffee consumer:
-	She has started recently to drink coffee
-	She is too young to appreciate the more adult “coffee mood”-	Used to take coffee at university bar, or at the automatic distributor 
-	Coffee consumer more for necessity (study needs)


## <font color = #C45911 > 1.8 Marta’s customer journey as-is
*Marta’s coffee journey depends on her routine and on her daily timetable. She always drinks it in the morning and after lunch, while the one during the day is more aleatory. Marta has a strict routine; coffee is a complementary aspect of it. Also due to the fact that she is not a “veteran” of coffee consumption, she hasn’t developed yet a strong affection or belonging to a particular brand, her main driver is purely functional.*
    
    
#### *From Monday to Friday:*


The first coffee is in the morning; due to time constraints and since she cannot really afford a coffee machine, she often doesn’t have it at home. 
Therefore, she opts for two different paths: 
-	**Bar** -> Aside from other constraints, when she has time, this is her best call. Indeed she enjoys having a rounded experience, a satisfying moment for interpersonal interactions, the possibility of uploading IG stories, gaining higher status perception. She takes coffee without caring too much about brand or quality, the important thing is being with others. Her first choice is a macchiato or a cappuccino, not a basic espresso. Sometimes she has a complete breakfast also consuming food. In that case she enjoys the moment at the fullest.

-	**Vendor machine** -> faster, cheaper, satisfies the need for a pre-lecture boost of energy, but does not satisfy anything in the emotional sphere. In this case she likely had breakfast at home. 

Drivers:
-	**functional**: energy boost 
-	**emotional**: need for aggregation, improvement of her social image 

**The second coffee**:  the second coffee is taken after lunch. This coffee is totally functional driven, she needs to be mentally performing also after lunch, due to coming lessons or to her studying duties. 
Whether it is in a bar adjacent to the university, at home, or at the machines, coffee is taken according to the afternoon work. More than a ritual the attitude toward consumption is still a necessity.


Drivers:
-	**functional**: coffee is a recovery from lunch, a digestive 
-	**emotional**: it is a brief moment of peace before another session of work

**The bonus coffee**: An occasion for a mid-morning break between to lectures. It is a fast coffee, a chance to share a sparing moment with her class peers. She does not enjoy the coffee moment in general as it is merely functional.


#### *The weekend:*

Marta exploits weekend’s night to meet people and satisfy her social needs both Friday and Saturday night she meets friends. 
There are mainly three moments to analyse: aperitif, dinner, and pre-night.

As for the aperitif, if she decides to go out, she usually does not consume coffee: she prefers an Aperol spritz or Hugo, or even a beer if not a more sophisticated cocktail. She looks for good price, while she cares for being good looking as well, thus having something cool in her glass as well. She wants to take photos, make IG stories, and let people know she is living a great moment.


The alternative would be to chill at home and get a rest before the night out. In this case she would drink a beer with some friends. 


While aperitif and nights are the moments in which she goes out to have fun, due to budget constraints she sometimes organizes dinners at home with friends: she cannot afford restaurants (if not rarely), that she would otherwise prefer. 
Which function does coffee have in these moments? She usually prepares an after-dinner coffee with the moka, sharing it with her friends or house-mates, to be active during the pre-night and to start getting in the mood (an alternative to this moment could be home-made drinks with alcohol and energy drinks, but a slightly different costumer journey should be taken in consideration)


## <font color = #C45911 > 1.9	Marta’s Market size
    
Firstly we got a broad idea of off-site female students interested in aperitifs, cocktails or coffee, getting an estimate of 58000. Restricting to an interest both in coffe and aperitifs, we obtain an estimate of 16000


[Back on top](#top)

In [13]:
# set all the relevant parameters
interests_list_combine = ['Cappuccino', 'Caffè','Cocktail', 'Aperol', 'Aperitivo', 'Campari','Bacardi','drink']
education_statuses_list = ['undergrad', 'in_grad_school','master_degree','professional_degree']
life_events_list = ["Lontano dalla città di origine"]
gender_list = [2]

In [14]:
get_market_size(Ad_Account_num,
                country = 'Italy',
                regions_list = None,
                cities_list = None,           
                interests_list_intersecate = None, 
                interests_list_combine=  interests_list_combine,
                age_min = 18,
                age_max = 25,
                behaviors_list_intersecate = None,
                behaviors_list_combine = None,
                life_events_list = life_events_list, 
                industries_list = None, 
                family_statuses_list = None, 
                relationship_statuses_list = None, 
                schools_list = None,
                education_statuses_list = education_statuses_list,
                majors_list = None,
                employers_list = None,
                work_positions_list = None,
                custom_location=False,
                min_population = 120000,
                max_population = 550000,
                gender_list=gender_list,
                country_exclude= None,
                regions_list_exclude=None,
                cities_list_exclude=None,
               )

The estimate for this target market is 58000


58000

In [15]:
interests_list_intersecate = ['Caffè','Aperitivo',]
education_statuses_list = ['undergrad', 'in_grad_school','master_degree']
life_events_list = ["Lontano dalla città di origine"]
gender_list = [2]

In [16]:
get_market_size(Ad_Account_num,
                country = 'Italy',
                regions_list = None,
                cities_list = None,           
                interests_list_intersecate = interests_list_intersecate, 
                interests_list_combine = None,
                age_min = 18,
                age_max = 25,
                behaviors_list_intersecate = None,
                behaviors_list_combine= None,
                life_events_list = life_events_list, 
                industries_list = None, 
                family_statuses_list = None, 
                relationship_statuses_list = None, 
                schools_list = None,
                education_statuses_list = education_statuses_list,
                majors_list = None,
                employers_list = None,
                work_positions_list = None,
                custom_location=False,
                min_population = 120000,
                max_population = 550000,
                gender_list=gender_list,
                country_exclude= None,
                regions_list_exclude=None,
                cities_list_exclude=None,
               )

The estimate for this target market is 16000


16000