# MACBOOK VS WINDOWS ANALYSIS REPORT 

**Introduction:**

The age-old question of whether to buy a Mac or a Windows has not only been bothering us but the respective companies as well. A tremendous amount of effort goes into studying customer buying behavior by both companies. Our team has been commissioned by Apple to analyze behavior from the perspective of the Big Five personality as well as the Hult DNA traits (see Appendix A). The goal is to help Apple maintain customer retention and acquire customers who show a willingness to switch to Macbook. 

Our hypothesis is as follow: 
1. Macbook users have higher levels of extraversion, openness and thinking because they usually like retro designs, bold colors, and appreciate objects that express their individuality (Mac vs. PC People: Personality Traits & Aesthetic/Media Choices, 2009).
2. Windows users have higher levels of conscientiousness and agreeableness and a lower level of thinking category; specifically men who are more down-to-earth, enjoy sports an number-oriented.

**Insights and Recommendations Based on Psychology and Behaviors:**
 
**Big Five based-strategy:**  
 
*1. Customizable products and additional discounts for the Obsessive and Equanimous:* 

Both spontaneous and passionate customers who exhibit obsessive and equanimous traits prefer Macbook over Windows. These are people who are determined and can either act on sudden impulse or do not have any particular interest towards a specific product. However, both are willing to buy Windows rather than Macbook. This is because they both enjoy the ability to organize folders and have control over the computer itself; control is a behavior generally exhibited in people who have high obsessive - that can sometime be mistook as passionate - tendencies (Kelly, 2020). To increase their willingness to buy a Macbook, we must offer incentives to effectively persuade these individuals; evoke them to act on impulse or to give value. Therefore, giving incentives such as additional discounts on other Apple products or loyalty points or customizable features for a discounted price might increase their retention or even acquisition. Incentives can be provided via tailored email marketing or customer service via chat-box or phone calls. 
 
*2. Sales calls to the Indecisive:*

Even though an Indecisive person owns a Macbook more than Windows, they are eager to switch to Windows or Chromebook. These customers are slow in the decision-making process and they don’t have a specific product in mind. Therefore, they go back and forth all the time. Through this behavior, it is easy to detect them because we can track their activity either on the website or mobile app via user acceptance of cookies and caches. To target these individuals, we can conduct A/B testing where they can easily connect with a sales representative. Providing tech support followed by email marketing - a summary of topics discussed - will alleviate the stress of having to decide. Meanwhile, on our end, we can track whether direct communication can lead to acquisition.

**Hult DNA based-strategy:** 

*1) Boost brand presence through Social Media for Apathetic Trend Followers:*

According to our data, Apathetic trend followers seem to prefer Macbooks for their next computer. These individuals tend to focus on themselves, not necessarily demonstrate a feeling or interest towards anything. However, they do prefer Macbook because of its simplicity, design, and impact led by the brand (Surur, 2019). Being of apathetic nature, they are also susceptible to things that are of hype. Moreover, our data suggest the samples collected were reflective of millennials. Research suggests that the millennials are the most impressionable, more subjective to trends, and can permanently change purchase behavior (Bona, 2021). During the pandemic, 60% of millennials claim that social media advertising influenced their purchase decision (Fry, 2020). With this knowledge, we can target apathetic millennials, specifically those on the younger end of the spectrum. This is because, based on our data, 65% of young millennials have shown interest in owning a Macbook. Therefore, we can develop strong online campaigns to demonstrate our brand values and vision, connect and nudge with apathetic millennials worldwide. 

*2) Conduct A/B Testing for Payment plan options to Apathetic Realists:*

Apathetic Realists don’t seem to be interested in owning Macbooks as their next computer. Given the nature of their behavior, they are very cautious about their decisions. This suggests that price could be a significant factor in the lack of interest; as generally, a Macbook is more expensive than a Windows. Therefore, they lean on the conservative side and most likely don't like seeing large chunks of money leave their pockets. They often want to plan ahead and are rarely impulse shoppers (Caldwell, 2020). Therefore, we recommend doing A/B testing - providing them zero interest installments and see if this will add value to this customer group. In fact, many companies have begun adopting this method (Dickler, 2020), and especially in a pandemic or post-pandemic world, this group may become even more cautious with their money.

3) It is important to note that when taking Hult DNA behaviors as a base, it is best to focus on acquiring customers rather than retaining them. As evident in our boxplots, there is distinctive insight between different personas with Hult DNA. 

**Additional Insights:**

*1) Attract Customers with Affordable Trade-in options for South America:*

The percentage of people who currently use Macbook is significantly higher in almost all regions, except for South America. However, when we analyze whether or not those users are willing to buy a Macbook in the future, the percentage of people who want to buy Macbooks by region is higher than the current user profile; meaning people who don’t use Macbook are willing to switch to Macbooks. This is due to Apple products costing higher in South America regions. For instance, in Brazil, a normal customer would need to work more than 6 months to afford a Macbook (Stangel, 2016). Therefore, with an emphasis on regions with the highest difference between current customers and willing-to-buy customers, we can create regional promotions such as affordable trade-ins. An affordable trade-in option will increase the eagerness of buying a Macbook for both current Mac users and aspiring Mac buyers.
 
*2) Promotion through sports for male customers:* 

Currently, 42% of males own a Macbook, compared to the number of females at 61%. However, looking at how males who are willing to buy Macbook increases by 9%, there is an evident decline in male's Windows laptop preference. In comparison, females' preference for Macbooks stays almost similar. Therefore, it is better to focus on targeting males for future campaigns as that may result in higher retention and customer acquisition. Campaigns can be done through at sports events or commercials on the weekends where men have more time for leisure activities (Leadem, 2017).

**Conclusion:**

In conclusion, we have seen that traits have a huge impact on buying behaviors. We can confirm our first hypothesis as personalization is important for users who exhibit the Big 5 features to promote their individuality. Meanwhile, denouncing the second hypothesis, as Macbook users are also down-to-earth, thus, will be susceptible to promotional plans such as payment installments. Lastly, although it is important to develop a marketing strategy specifically for a certain persona, purchasing behaviors can also depend on other areas such as age, region and gender - as evident in additional insights found. 
















# CODE FOR INTERNAL ANALYTICS TEAM

## Load & Initial Exploration of Data

In [None]:
######################################
# IMPORT DATA & LIBRARIES 
######################################

# import libraries
import numpy             as np                          # mathematical essentials
import pandas            as pd                          # data science essentials
import matplotlib.pyplot as plt                         # fundamental data visualization
import seaborn           as sns                         # enhanced visualization
from sklearn.preprocessing import StandardScaler        # standard scaler
from sklearn.decomposition import PCA                   # pca
from scipy.cluster.hierarchy import dendrogram, linkage # dendrograms
from sklearn.cluster         import KMeans              # k-means clustering

# setting print options
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
pd.set_option('display.max_colwidth', 100)

# read featured engineered the file into Python
computer = pd.read_excel(io ='survey_data.xlsx')

In [None]:
# check data info
# print(computer.info())


# check missing values 
# print(computer.isnull().sum().any())

**Insights:**
- no missing values 
- 3 columns had duplicated questions; evident by the '.1' 
- some columns may need cleaning 
- need to drop object types and 'surveyID' to conduct PCA models 

## Clean Dataset 

**1) REMOVE DUPLICATED COLUMNS**

In [None]:
# Step 1: combine duplicates and take the average 

# col: Respond effectively to multiple priorities
computer.iloc[:,55] = (computer.iloc[:,58] + computer.iloc[:,55])/2

# col: Take initiative even when circumstances, objectives, or rules aren't clear
computer.iloc[:,56] = (computer.iloc[:,59] + computer.iloc[:,56])/2

# col: Encourage direct and open discussions
computer.iloc[:,57] = (computer.iloc[:,60] + computer.iloc[:,57])/2


# Step 2: drop the duplicated columns  
computer = computer.drop(computer.columns[[58, 59, 60]], axis = 1) 

**2) REORGANIZE 'NATIONALITY' COLUMN** 

In [None]:
# define inputs that need to be cleaned 
nationality = {"Indian"      : "India",
               "indian"      : "India",
               "indian."     : "India",
               "INDIAN"      : "India",
               "Chinese"     : "China",
               "chinese"     : "China",
               "CHINA"       : "China",
               "china"       : "China",
               "German"      : "Germany",
               "Mexican"     : "Mexico",
               "mexican"     : "Mexico",
               "Peruvian"    : "Peru",
               "American"    : "USA",
               "Russian"     : "Russia",   
               "Norwegian"   : "Norway",  
               "Colombian"   : "Colombia",
               "colombian"   : "Colombia",
               "Taiwan"      : "Taiwan", 
               "Brazilian"   : "Brazil", 
               "Vietnamese"  : "Vietnam",
               "Thai"        : "Thailand",
               "Nigerian"    : "Nigeria",
               "nigerian"    : "Nigeria",
               "Turkish"     : "Turkey",    
               "Republic of Korea" : "South Korea",
               "Korea"       : "South Korea",
               "Indonesian"  : "Indonesia",
               "Italian"     : "Italy",
               "italian"     : "Italy",
               "Ghanaian"    : "Ghana",                 
               "British"     : "UK",             
               "ecuador"     : "Ecuador",
               "Ecuadorian"  : "Ecuador",
               "Dominican"    : "Dominican Republic", 
               "Dominican "   : "Dominican Republic", 
               "Filipino"    : "Philippines",  
               "Filipino "    : "Philippines",  
               "Congolese"   : "Congo",
               "Congolese (DR CONGO)" : "Congo",
               "canadian"    : "Canada",
               "Swiss"       : "Switzerland",
               "Czech"       : "Czech Republic",
               "Spanish"     : "Spain",
               "Belgian "    : "Belgium", 
               "Venezuelan"  : "Venezuela",
               "Ukrainian"   : "Ukraine",
               "Ukrainia"    : "Ukraine",
               "Pakistani"   : "Pakistan",
               "Kenyan"      : "Kenya",
               "Ugandan"     : "Uganda",
               "Costarrican" : "Costa Rica",
               "Portuguese"  : "Portugal",   
               "Italian and Spanish" : "Dual Nationality",
               "German/American" : "Dual Nationality",
               "British, Indian" : "Dual Nationality",  
               "prefer not to answer" : "Prefer not to answer"
    
}

# replace the matching strings
computer.iloc[:,74].replace(nationality, inplace = True)

# checking to see all is good
# computer.iloc[:,74].value_counts()

In [None]:
# group nationalities by regions --> new col: 'Regions'

# placeholder for new column
regions = []

# For loop for region
for nationality in computer.iloc[ : , 74]:
    
    if nationality == 'India' or nationality == 'China' \
    or nationality == 'Russia' or nationality == 'South Korea' \
    or nationality == 'Taiwan' or nationality == 'Indonesia' \
    or nationality == 'Vietnam' or nationality == 'Thailand' \
    or nationality == 'Philippines' or nationality == 'Japan'\
    or nationality == 'Japanese' or nationality == 'Kyrgyz'\
    or nationality == 'Kyrgyz'or nationality == 'Pakistan'\
    or nationality == 'Turkey':
        regions.append('Asia')
        

    elif nationality == 'Germany' or nationality == 'Italy'\
    or nationality == 'Norway' or nationality == 'Spain'\
    or nationality == 'Czech Republic' or nationality == 'Belarus'\
    or nationality == 'Belgium' or nationality == 'Switzerland'\
    or nationality == 'Ukraine'or nationality == 'UK' \
    or nationality == 'Portugal' :
        regions.append('Europe')

        
    elif nationality == 'Africa'   or nationality == 'Nigeria'\
    or nationality == 'Ghana'  or nationality == 'Kenya'\
    or nationality == 'Uganda' or nationality == 'Congo'\
    or nationality == 'Mauritius':
        regions.append('Africa')
   

    elif nationality == 'Mexico'  or nationality == 'USA'\
    or nationality == 'Dominican Republic' or nationality == 'Canada'\
    or nationality == 'Panama' or nationality == 'Costa Rica':
        regions.append('North America')
     

    elif nationality == 'Peru'    or nationality == 'Colombia'\
    or nationality == 'Brazil'   or nationality == 'Ecuador'\
    or nationality == 'Venezuela':
        regions.append('South America')
        
        
    elif nationality == 'Dual Nationality':
        regions.append('Global')
        
    else:
        regions.append('Prefer not to answer')

# Create new columns
computer['Regions'] = regions

# check results 
computer['Regions'].value_counts()

## Explore Dataset in Details

**1) LAPTOPS: CURRENT OWNERSHIP & FUTURE ASPIRATIONS**

In [None]:
# current laptop ownership 
computer['What laptop do you currently have?'].value_counts()

In [None]:
# want new laptop 
next_laptop = computer['What laptop would you buy in next assuming if all laptops cost the same?']

counts = next_laptop.value_counts()

percent100 = next_laptop.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'

pd.DataFrame({'counts': counts, 'percentage' : percent100})

**Insights:** 
- The percentage for 'Chromebook' is very little, consider it as an outlier

**2) LAPTOPS & AGE GROUPS**

In [None]:
# age x windows laptop
computer['What is your age?'].plot.hist(color = 'blue',alpha = 0.5)

# display plot
computer['What is your age?']\
[computer['What laptop do you currently have?']=='Windows laptop'].plot.hist(color = 'red',
                                                                             alpha = 0.5)

In [None]:
# age x macbook 
computer['What is your age?'].plot.hist(color = 'blue',alpha = 0.5)

# display plot
computer['What is your age?']\
[computer['What laptop do you currently have?']=='Macbook'].plot.hist(color = 'yellow',
                                                                      alpha = 0.5)

In [None]:
# median age = 26
computer['What is your age?'].median()

In [None]:
# current laptop + age

# categorize younger and older millenials  
young_m = computer['What is your age?'][computer['What is your age?'] <= 26]
old_m = computer['What is your age?'][computer['What is your age?'] >  26]

# set threshold
young = computer[(computer['What is your age?'] <= 26) \
    & (computer['What laptop do you currently have?']=='Macbook')]['surveyID'].count()/ \
    computer[(computer['What is your age?'] <= 26)]['surveyID'].count()

old = computer[(computer['What is your age?'] > 26) \
    & (computer['What laptop do you currently have?']=='Macbook')]['surveyID'].count()/ \
    computer[(computer['What is your age?'] > 26)]['surveyID'].count()

# display results
print(f"""
Macbook ownership
------------------
% of young millenials: {young.round(2)}
% of old millenials:   {old.round(2)}
"""
     )

In [None]:
# future laptop + age

# categorize younger and older millenials  
young_m = computer['What is your age?'][computer['What is your age?'] <= 26]
old_m = computer['What is your age?'][computer['What is your age?'] >  26]

# set threshold
young = computer[(computer['What is your age?'] <= 26) \
    & (computer['What laptop would you buy in next assuming if all laptops cost the same?']=='Macbook')]['surveyID'].count()/ \
    computer[(computer['What is your age?'] <= 26)]['surveyID'].count()

old = computer[(computer['What is your age?'] > 26) \
    & (computer['What laptop would you buy in next assuming if all laptops cost the same?']=='Macbook')]['surveyID'].count()/ \
    computer[(computer['What is your age?'] > 26)]['surveyID'].count()

# display results
print(f"""
Macbook ownership
------------------
% of young millenials: {young.round(2)}
% of old millenials:   {old.round(2)}
"""
     )

**Insights:**
- Comparing the distribution: both Macbook and Windows laptop share similar distribution with Age 
- However, students under the median, age of 26, seem to use more Macbooks than Windows 
- This is further supported by the % calculated, where more young millenials are owning Macbooks than older millenials 

**3) LAPTOPS & GENDER GROUPS**

In [None]:
# current laptop by gender 

# females with Macbook 
female = computer[(computer['Gender']!="Male") & \
    (computer['What laptop do you currently have?']== 'Macbook')]['surveyID'].count()/ \
    computer[computer['Gender']!="Male"]['surveyID'].count()

# males with Macbook
male = computer[(computer['Gender']=="Male") & \
    (computer['What laptop do you currently have?']== 'Macbook')]['surveyID'].count()/ \
    computer[computer['Gender']=="Male"]['surveyID'].count()

# display results
print(f"""
Macbook ownership
------------------
% of female: {female.round(2)}
% of male:   {male.round(2)}
"""
     )


In [None]:
# desire for Macbook by gender 

# females with Macbook 
female = computer[(computer['Gender']!="Male") & \
    (computer['What laptop would you buy in next assuming if all laptops cost the same?']== 'Macbook')]['surveyID'].count()/ \
    computer[computer['Gender']!="Male"]['surveyID'].count()

# males with Macbook 
male = computer[(computer['Gender']=="Male") & \
    (computer['What laptop would you buy in next assuming if all laptops cost the same?']== 'Macbook')]['surveyID'].count()/ \
    computer[computer['Gender']=="Male"]['surveyID'].count()


# display results
print(f"""
Macbook ownership
------------------
% of female: {female.round(2)}
% of male:   {male.round(2)}
"""
     )

**Insights:**
- In both cases, current and futures, females prefer Macbook over Windows

**4) LAPTOP & REGIONS:**

In [None]:
# current laptop + regions 

# Asia with Macbook
as_m = computer[(computer['Regions']== "Asia") & \
    (computer['What laptop do you currently have?']== 'Macbook')]['surveyID'].count()/\
    computer[computer['Regions']=="Asia"]['surveyID'].count()

# Europe with Macbook
e_m = computer[(computer['Regions']=="Europe") & \
    (computer['What laptop do you currently have?']== 'Macbook')]['surveyID'].count()/\
    computer[computer['Regions']=="Europe"]['surveyID'].count()

# Africa with Macbook
af_m = computer[(computer['Regions']=="Africa") & \
    (computer['What laptop do you currently have?']== 'Macbook')]['surveyID'].count()/\
    computer[computer['Regions']=="Africa"]['surveyID'].count()

# North America with Macbook
na_m = computer[(computer['Regions']=="North America") & \
    (computer['What laptop do you currently have?']== 'Macbookp')]['surveyID'].count()/\
    computer[computer['Regions']=="North America"]['surveyID'].count()

# South America with Macbook
sa_m = computer[(computer['Regions']=="South America") & \
    (computer['What laptop do you currently have?']== 'Macbook')]['surveyID'].count()/\
    computer[computer['Regions']=="South America"]['surveyID'].count()

# display results
print(f"""
Percentage of People with Macbooks by Regions:
-----------------------------------------------
Asia with Macbook:           {as_m.round(2)}
Europe with Macbook:         {e_m.round(2)}
Africa with Macbook:         {af_m.round(2)}
North America with Macbook:  {na_m.round(2)}
South America with Macbook:  {sa_m.round(2)}
""")   

In [None]:
# aspiring laptop + regions 

# Asia with Macbook
as_m = computer[(computer['Regions']=="Asia") & \
    (computer['What laptop would you buy in next assuming if all laptops cost the same?']== 'Macbook')]['surveyID'].count()/\
    computer[computer['Regions']=="Asia"]['surveyID'].count()

# Africa with Macbook
e_m = computer[(computer['Regions']=="Africa") & \
    (computer['What laptop would you buy in next assuming if all laptops cost the same?']== 'Macbook')]['surveyID'].count()/\
    computer[computer['Regions']=="Africa"]['surveyID'].count()


# Europe with Macbook
af_m = computer[(computer['Regions']=="Europe") & \
    (computer['What laptop would you buy in next assuming if all laptops cost the same?']== 'Macbook')]['surveyID'].count()/\
    computer[computer['Regions']=="Europe"]['surveyID'].count()


# North America with Macbook
na_m = computer[(computer['Regions']=="North America") & \
    (computer['What laptop would you buy in next assuming if all laptops cost the same?']== 'Macbook')]['surveyID'].count()/\
    computer[computer['Regions']=="North America"]['surveyID'].count()


# South America with Macbook 
sa_m = computer[(computer['Regions']=="South America") & \
    (computer['What laptop would you buy in next assuming if all laptops cost the same?']== 'Macbook')]['surveyID'].count()/\
    computer[computer['Regions']=="South America"]['surveyID'].count()


print(f"""
Percentage of People who want Macbooks by Regions:
-----------------------------------------------
Asia with Macbook:           {as_m.round(2)}
Europe with Macbook:         {e_m.round(2)}
Africa with Macbook:         {af_m.round(2)}
North America with Macbook:  {na_m.round(2)}
South America with Macbook:  {sa_m.round(2)}
""")   

## Prepare Data for Modeling

In [None]:
# store dictionary of 'demographic features' --> to drop for PCA 
demographic_features = ['surveyID',
                         'What laptop do you currently have?',
                         'What laptop would you buy in next assuming if all laptops cost the same?',
                         'What program are you in?',
                         'What is your age?',
                         'Gender',
                         'What is your nationality? ',
                         'What is your ethnicity?',
                         'Regions']
                                                                  

# dropping demographic features/ object types 
traits = computer.drop(demographic_features, axis = 1)

### BIG 5 TRAITS

The survey collected contain questions taken from the 'Big Five Personality Test'. Big Five Traits are 5 broad domains which define a person's overall personality based on their individual differences (Cherry, 2020). To group questions according to the official test, a formula provided by Temple University was adopted. The formula took into consideration that some questions juxtapose others. The formula is as follow:


*Note: number in brackets refers to the question number in the test sheet*

   - Extraversion      = 20 + (1) - (6) + (11) - (16) + (21) - (26) + (31) - (36) + (41) - (46) = ___
   

   - Agreeableness     = 14 - (2)  + (7)  - (12)  + (17)  - (22)  + (27)  - (32)  + (37)  +  (42)  + (47)  = ___


   - Conscientiousness = 14 + (3)  - (8)  + (13)  - (18)  + (23)  - (28)  + (33)  - (38)  + (43)  +  (48)  = ___


   - Neuroticism       = 38 - (4)  + (9)  - (14)  + (19)  - (24)  - (29)  - (34)  - (39)  - (44) - (49) = ___


   - Openness          = 8 + (5)  - (10)  + (15)  - (20)  + (25)  - (30)  + (35)  + (40)  +  (45)  + (50)  = ___

In [None]:
# group columns and label according to traits

# EXTRAVERSION 
e = [0, 5, 10, 15, 20, 25, 30, 35, 40, 45]
extraversion = traits.iloc[:,e]    

# AGREEABLENESS 
a = [1, 6, 11, 16, 21, 26, 31, 36, 41, 46]
agreeableness = traits.iloc[:,a]     
     
# CONSCIENTIOUSNESS 
c = [2, 7, 12, 17, 22, 27, 32, 37, 42, 47]
conscientiousness = traits.iloc[:,c]
          
# NEUROTICISM 
n = [3, 8, 13, 18, 23, 28, 33, 38, 43, 48]
neuroticism = traits.iloc[:,n] 
         
# OPENNESS
o = [4, 9, 14, 19, 24, 29, 34, 39, 44, 49]
openness = traits.iloc[:,o]

In [None]:
# aggregate results using position in list 

# sum of all extraversion points 
sum_extraversion = 20 + extraversion.iloc[:0] - extraversion.iloc[:,1]\
+ extraversion.iloc[:,2] - extraversion.iloc[:, 3] + extraversion.iloc[:, 4]\
- extraversion.iloc[:, 5] + extraversion.iloc[:, 6] - extraversion.iloc[:, 7]\
+ extraversion.iloc[:, 8] - extraversion.iloc[:, 9]

# sum of all agreeableness points 
sum_agreeableness = 14 - agreeableness.iloc[:,0] + agreeableness.iloc[:,1]\
- agreeableness.iloc[:,2] + agreeableness.iloc[:, 3] - agreeableness.iloc[:, 4]\
+ agreeableness.iloc[:, 5] - agreeableness.iloc[:, 6] + agreeableness.iloc[:, 7]\
+ agreeableness.iloc[:, 8] + agreeableness.iloc[:, 9]

# sum of all conscientiousness points 
sum_conscientiousness = 14 + conscientiousness.iloc[:,0]\
- conscientiousness.iloc[:,1] + conscientiousness.iloc[:,2]\
- conscientiousness.iloc[:, 3] + conscientiousness.iloc[:, 4]\
- conscientiousness.iloc[:, 5] + conscientiousness.iloc[:, 6]\
- conscientiousness.iloc[:, 7] + conscientiousness.iloc[:, 8]\
+ conscientiousness.iloc[:, 9]

# sum of all neuroticism points 
sum_neuroticism = 38 - neuroticism.iloc[:,0] + neuroticism.iloc[:,1]\
- neuroticism.iloc[:,2] + neuroticism.iloc[:, 3] - neuroticism.iloc[:, 4]\
- neuroticism.iloc[:, 5] - neuroticism.iloc[:, 6] - neuroticism.iloc[:, 7]\
- neuroticism.iloc[:, 8] - neuroticism.iloc[:, 9]

# sum of all openness points 
sum_openness = 8 + openness.iloc[:,0] - openness.iloc[:,1]\
+ openness.iloc[:,2] - openness.iloc[:, 3] + openness.iloc[:, 4]\
- openness.iloc[:, 5] + openness.iloc[:, 6] + openness.iloc[:, 7]\
+ openness.iloc[:, 8] + openness.iloc[:, 9]

In [None]:
# save into Big 5 into 1 dataframe 
big5 = pd.DataFrame(list(zip(sum_extraversion, sum_agreeableness, 
                             sum_conscientiousness, sum_neuroticism, 
                             sum_openness)),
                    columns=['EXTRAVERSION','AGREEABLENESS', 
                             'CONSCIENTIOUSNESS', 'NEUROTICISM', 'OPENNESS'])

### HULT DNA

Hult DNA is set of cognitive-behavioral skills exhibited in every student at Hult. The 'DNA' is comprised of: Thinking, Communicating and Team Building (Shaheem, 2019). To balance opposite trait questions, the Big 5 formula was adopted and adjusted to best fit the Hult DNA questions. Questions were grouped as follow:
- Each group had 6 questions total (5 positive and 1 negative behavior) 
- Each group was calculated as follow: Group =  sum of all positive behaviors - 1 negative behavior = ___


In [None]:
# group columns and label according to traits

# THINKING 
think = [50, 52, 53, 54, 65, 51]
thinking = traits.iloc[:,think]   

# COMMUNICATING 
com = [55, 56, 57, 63, 66, 58]
communicating = traits.iloc[:,com]     
     
# TEAM BUILDING 
team = [59, 60, 61, 64, 67, 62]
team_building = traits.iloc[:,team]

In [None]:
# aggregate results using position in list 

# sum of all thinking points 
sum_thinking = thinking.iloc[:,0] + thinking.iloc[:,1] + thinking.iloc[:,2]\
+ thinking.iloc[:, 3] + thinking.iloc[:, 4] - thinking.iloc[:, 5]

# sum of all communication points 
sum_communication = communicating.iloc[:,0] + communicating.iloc[:,1] + communicating.iloc[:,2]\
+ communicating.iloc[:, 3] + communicating.iloc[:, 4] - communicating.iloc[:, 5]

# sum of all team points 
sum_teambuilding = team_building.iloc[:,0] + team_building.iloc[:,1]\
+ team_building.iloc[:,2] + team_building.iloc[:, 3] + team_building.iloc[:, 4]\
- team_building.iloc[:, 5]

In [None]:
# save into Hult DNA into 1 dataframe 
hult = pd.DataFrame(list(zip(sum_thinking, sum_communication, sum_teambuilding)),
                    columns=['THINKING','COMMUNICATING', 'TEAM BUILDING '])

### Load User Define Functions 

In [None]:
########################################
# inertia
########################################
def interia_plot(data, max_clust = 50):
    """
PARAMETERS
----------
data      : DataFrame, data from which to build clusters. Dataset should be scaled
max_clust : int, maximum of range for how many clusters to check interia, default 50
    """

    ks = range(1, max_clust)
    inertias = []


    for k in ks:
        # INSTANTIATING a kmeans object
        model = KMeans(n_clusters = k)


        # FITTING to the data
        model.fit(data)


        # append each inertia to the list of inertias
        inertias.append(model.inertia_)



    # plotting ks vs inertias
    fig, ax = plt.subplots(figsize = (12, 8))
    plt.plot(ks, inertias, '-o')


    # labeling and displaying the plot
    plt.xlabel('number of clusters, k')
    plt.ylabel('inertia')
    plt.xticks(ks)
    plt.show()


########################################
# scree_plot
########################################
def scree_plot(pca_object, export = False):
    # building a scree plot

    # setting plot size
    fig, ax = plt.subplots(figsize=(10, 8))
    features = range(pca_object.n_components_)


    # developing a scree plot
    plt.plot(features,
             pca_object.explained_variance_ratio_,
             linewidth = 2,
             marker = 'o',
             markersize = 10,
             markeredgecolor = 'black',
             markerfacecolor = 'grey')


    # setting more plot options
    plt.title('Scree Plot')
    plt.xlabel('PCA feature')
    plt.ylabel('Explained Variance')
    plt.xticks(features)

    if export == True:
    
        # exporting the plot
        plt.savefig('./analysis_images/top_customers_correlation_scree_plot.png')
        
    # displaying the plot
    plt.show()

## PCA Models

### PCA - BIG 5 TRAITS

#### Scale Dataset

In [None]:
# explanatory variables should be scaled before a PCA analysis algorithm 

# INSTANTIATING a StandardScaler() object
scaler = StandardScaler()


# FITTING the scaler with the data
scaler.fit(big5)


# TRANSFORMING our data after fit
X_scaled = scaler.transform(big5)


# converting scaled data into a DataFrame
big5_scaled = pd.DataFrame(X_scaled)


# reattaching column names
big5_scaled.columns = big5.columns


# checking pre- and post-scaling variance
print(pd.np.var(big5), '\n\n')
print(pd.np.var(big5_scaled))

#### Transform PCA Model

In [None]:
# INSTANTIATING a PCA object with no limit to principal components
pca = PCA(n_components = None,
            random_state = 802)


# FITTING and TRANSFORMING the scaled data
big5_pca = pca.fit_transform(big5_scaled)

In [None]:
# EVALUATE PCA ALGORITHM 

# component number counter
component_number = 0

# looping over each principal component (.explained_variance_ratio_)
for variance in pca.explained_variance_ratio_:
    component_number += 1
    
    print(f"PC {component_number} : {variance.round(3)}")
    

# checking to make sure PCA algorithm is good to proceed 
print(f"""Sum  : {(pca.explained_variance_ratio_).sum().round()}""")

**Insights:** The sum of all explained variance ratios is 1.0, which means model is good to proceed. 

#### Principal Components Analysis

**1) MAX MODEL:**

In [None]:
# VISUALIZING the pca 
scree_plot(pca_object = pca)

In [None]:
####################
### Max Model
####################

# transposing pca components
factor_loadings_df = pd.DataFrame(pd.np.transpose(pca.components_))

# naming rows as original features
factor_loadings_df = factor_loadings_df.set_index(big5_scaled.columns)

# displaying max model
print(factor_loadings_df)

# saving to Excel to easily reduce number of principal components
# factor_loadings_df.to_excel('big5_customer_factor_loadings.xlsx')

**2) LIMITED MODEL:** 

Reduce model to contain reasonable number of principal components; following the 'elbow cut-off rule' or '80-20 rule'

In [None]:
# INSTANTIATING a new model using the first three principal components
pca_4 = PCA(n_components =4,
            random_state = 219)


# FITTING and TRANSFORMING the purchases_scaled
big_pca_4 = pca_4.fit_transform(big5_scaled)


# calling the scree_plot function
scree_plot(pca_object = pca_4)

In [None]:
##################
### Limited Model
##################

# transposing pca components 
factor_loadings_4 = pd.DataFrame(pd.np.transpose(pca_4.components_))

# naming rows as original features
factor_loadings_4 = factor_loadings_4.set_index(big5_scaled.columns)


# displaying max model
print(factor_loadings_4.round(2))

### PCA - BIG 5 PERSONA 

#### Persona Development 

Taking into considerations of any correlation above 0.50 and both positive and negative outputs of the reduced PCA model, persona was developed as follow: 



**PC 1: AMEANABLE**    
- People who are curious about the world and others around them. 
- They are mindful of details,  eager to learn and easy to be persuaded if it means they are exploring and enjoying new horizons in a scheduled manner. 



**PC 2: EQUANIMOUS**
- People who are calm and has evenness of temper.
- They are composed when in the face of stimuli, events and/or situations that tends to provoke certain reactions in others. 



**PC 3: OBSESSIVE**
- People who strives for the best in an emotional manner, therefore, experiences a tremendous amount of stress.
- Given the amount of things they worry about, they tend to be more reserved, gets upset easily and anxious when things fall out of place. 



**PC 4: INDECISIVE**
- People who see endless possibilities but has trouble making decisions.
- They tend to become 'unintentional' procrastinators because they have put off making decisions for so long that they run out of time.

In [None]:
# naming each principal component
factor_loadings_4.columns = ['AMEANABLE',
                             'EQUANIMOUS',
                             'OBSESSIVE',
                             'INDECISIVE'] 

# checking the result
factor_loadings_4.round(2)

#### Analyze Factor Loadings for Customers 

**STRATEGY TO IDENTIFY POTENTIAL CUSTOMER GROUPS:**

1) Looking at any PC component, there are some groups with very high or very low factor loadings 

2) Find the percentage of customers who exhibit/ do not exhibit a specific trait, and only noting findings that can be of value

In [None]:
# PREPARE TO IDENTIFY GROUPS

# analyzing factor strengths per customer
big5_pca_reduced = pca_4.transform(big5_scaled)

# converting to a DataFrame
big5_pca_df = pd.DataFrame(big5_pca_reduced)

# renaming columns
big5_pca_df.columns = factor_loadings_4.columns

# checking the results
big5_pca_df.head(5)

In [None]:
# TRAIT: EQUANIMOUS

# percentage of customers who DO exhibiting this psychology 
do = len(big5_pca_df['EQUANIMOUS'][big5_pca_df['EQUANIMOUS'] > 1.0]) / \
    len(big5_pca_df)

# percentage of customers who DO NOT exhibiting this psychology 
do_not = len(big5_pca_df['EQUANIMOUS'][big5_pca_df['EQUANIMOUS'] <- 1.0]) / \
        len(big5_pca_df)

print(f"""
Percentage of Customers: 
-----------------
Do exhibit this behavior:     {do}
Do not exhibit this behavior: {do_not}
""")

**Insights:**
- 20% of customers has a high level of equanimity
- 18% of customers exhibit the opposite psychology  
- Since the percentage of customers who do exhibit equanimity is higher, we should consider this potential customer groups 

### PCA - HULT DNA

#### Scale Dataset

In [None]:
# explanatory variables should be scaled before developing a PCA analysis algorithm 

# INSTANTIATING a StandardScaler() object
scaler = StandardScaler()


# FITTING the scaler with the data
scaler.fit(hult)


# TRANSFORMING our data after fit
X_scaled = scaler.transform(hult)


# converting scaled data into a DataFrame
hult_scaled = pd.DataFrame(X_scaled)


# reattaching column names
hult_scaled.columns = hult.columns


# checking pre- and post-scaling variance
print(pd.np.var(hult), '\n\n')
print(pd.np.var(hult_scaled))

#### Transform PCA Model

In [None]:
# INSTANTIATING a PCA object with no limit to principal components
pca = PCA(n_components = None,
            random_state = 802)


# FITTING and TRANSFORMING the scaled data
hult_pca = pca.fit_transform(hult_scaled)

In [None]:
# EVALUATE PCA ALGORITHM 

# component number counter
component_number = 0

# looping over each principal component (.explained_variance_ratio_)
for variance in pca.explained_variance_ratio_:
    component_number += 1
    
    print(f"PC {component_number} : {variance.round(3)}")
    

# checking to make sure PCA algorithm is good to proceed 
print(f"""Sum  : {(pca.explained_variance_ratio_).sum().round()}""")

**Insights:** The sum of all explained variance ratios is 1.0, which means model is good to proceed.

#### Principal Component Analysis

**1) MAX MODEL:**

In [None]:
# VISUALIZING the pca 
scree_plot(pca_object = pca)

In [None]:
####################
### Max Model
####################

# transposing pca components
factor_loadings_df = pd.DataFrame(pd.np.transpose(pca.components_))

# naming rows as original features
factor_loadings_df = factor_loadings_df.set_index(hult_scaled.columns)


# displaying max model
print(factor_loadings_df)

# saving to Excel to easily reduce number of principal components
# factor_loadings_df.to_excel('hult_customer_factor_loadings.xlsx')

**2) LIMITED MODEL:**

Reduce model to contain reasonable number of principal components; following the 'elbow cut-off rule' or '80-20 rule'

In [None]:
# INSTANTIATING a new model using the first three principal components
pca_2 = PCA(n_components =2,
            random_state = 219)


# FITTING and TRANSFORMING the purchases_scaled
big_pca_2 = pca_2.fit_transform(hult_scaled)


# calling the scree_plot function
scree_plot(pca_object = pca_2)

In [None]:
##################
### Limited Model
##################

# transposing pca components 
factor_loadings_2 = pd.DataFrame(pd.np.transpose(pca_2.components_))

# naming rows as original features
factor_loadings_2 = factor_loadings_2.set_index(hult_scaled.columns)


# displaying max model
print(factor_loadings_2.round(2))

### PCA - HULT DNA PERSONA

#### Persona Development

Taking into considerations of any correlation above 0.50 and both positive and negative outputs of the reduced PCA model, persona was developed as follow:

**PC 1: APATHETIC** 
- People who exhibits a lack of interest in interactions with others, feel unmotivated and uninterested in daily tasks. 
- Apathy is not a negative thing, but rather, it's the "conservation of energy" ("Apathy Is Misunderstood", 2016).
- People who are apathetic can ignore the noise and focus on things that matter most to them. 


**PC 2: CEREBRAL**
- People who are deep thinkers; making decisions based on intelligence instead of emotions.
- In the world of new ideas, they tend to be ahead.
- Their weak point is shooting for the best and unable to bend and persuade others to follow them. 


In [None]:
# naming each principal component
factor_loadings_2.columns = ['APATHETIC',
                             'CEREBRAL'] 

# checking the result
factor_loadings_2.round(2)

#### Analyze Factor Loadings for Customers

**STRATEGY TO IDENTIFY POTENTIAL CUSTOMER GROUPS:**

1) Looking at any PC component, there are some groups with very high or very low factor loadings

2) Find the percentage of customers who exhibit/ do not exhibit a specific trait, and only noting findings that can be of value

In [None]:
# PREPARE TO IDENTIFY GROUPS

# analyzing factor strengths per customer
hult_pca_reduced = pca_2.transform(hult_scaled)

# converting to a DataFrame
hult_pca_df = pd.DataFrame(hult_pca_reduced)

# renaming columns
hult_pca_df.columns = factor_loadings_2.columns

# checking the results
hult_pca_df.head(5)

In [None]:
# BEHAVIOR: CEREBRAL

# percentage of customers who DO exhibiting this behavior 
do = len(hult_pca_df['CEREBRAL'][hult_pca_df['CEREBRAL'] > 1.0]) / \
    len(hult_pca_df)

# percentage of customers who DO NOT exhibiting this behavior 
do_not = len(hult_pca_df['CEREBRAL'][hult_pca_df['CEREBRAL'] <- 1.0]) / \
        len(hult_pca_df)

print(f"""
Percentage of Customers: 
-----------------
Do exhibit this behavior:     {do}
Do not exhibit this behavior: {do_not}
""")

**Insights:**
- Very small percentage of customers exhibit/do not exhibit this behavior
- For that reason, we will not focus on this potential customer group

## CLUSTERING

### CUSTOMER GROUPING - BIG 5 TRAIT

#### Prepare Data

In [None]:
# scaling before clustering 

# INSTANTIATING a StandardScaler() object
scaler = StandardScaler()


# FITTING the scaler with the data
scaler.fit(big5_pca_df)


# TRANSFORMING our data after fit
X_scaled_pca = scaler.transform(big5_pca_df)


# converting scaled data into a DataFrame
pca_scaled = pd.DataFrame(X_scaled_pca)


# reattaching column names
pca_scaled.columns = ['AMEANABLE','EQUANIMOUS', 'OBSESSIVE', 'INDECISIVE'] 


# checking pre- and post-scaling variance
print(pd.np.var(big5_pca_df), '\n\n')
print(pd.np.var(pca_scaled))

#### Begin Clustering

Steps taken was as follow:
    
1) Develop a dendogram to visualize the potential number of clusters 

2) Plot to determine the number of clusters 

3) Store cluster centers into a data frame 

4) Name clusters for easy analysis

In [None]:
# STEP 1: Develop a dendogram to visualize the potential number of clusters

# grouping data based on Ward distance
# standard_mergings_ward = linkage(y = pca_scaled,  
                                 # method = 'ward',
                                 # optimal_ordering = True)


# setting plot size
# fig, ax = plt.subplots(figsize=(8, 8))

# developing a dendrogram
# dendrogram(Z = standard_mergings_ward,
           # leaf_rotation = 90,
           # leaf_font_size = 6)

# plt.show()

In [None]:
# STEP 2: Plot to determine the number of clusters

# calling the inertia_plot() function
# interia_plot(data = pca_scaled)

In [None]:
####################################
# CLUSTER: 4
####################################

# INSTANTIATING a k-Means object with five clusters
customers_k_pca = KMeans(n_clusters   = 4,
                        random_state = 219)


# fitting the object to the data
customers_k_pca.fit(pca_scaled)


# converting the clusters to a DataFrame
customers_kmeans_pca = pd.DataFrame({'Cluster': customers_k_pca.labels_})


# checking the results
print(customers_kmeans_pca.iloc[: , 0].value_counts())

In [None]:
# STEP 3: Store cluster centers into a data frame

# storing cluster centers
centroids_pca = customers_k_pca.cluster_centers_


# converting cluster centers into a DataFrame
centroids_pca_df = pd.DataFrame(centroids_pca)


# renaming principal components
centroids_pca_df.columns = ['AMENABLE','EQUANIMOUS', 'OBSESSIVE', 'INDECISIVE'] 


# checking results (clusters = rows, pc = columns)
centroids_pca_df.round(2)

**STEP 4: Naming clusters for easy analysis**


- Cluster 0: **INDIFFERENT** - calm and has no ideology; most likely go with the flow  
- Cluster 1: **PASSIONATE**  - open to discover new things and has tendency to be obsessive about it 
- Cluster 2: **SPONTANEOUS** - quick to make decisions and has no strings attached 
- Cluster 3: **TEMPERAMENTAL** - constant switch in mood that then hinders the decision-making process 

### FINAL OUTPUT  - BIG 5 TRAITS

#### Aggregate Results: Demographic Features + Big 5 + Clusters

In [None]:
# concatenating cluster memberships with principal components
clst_pca_df = pd.concat([customers_kmeans_pca,
                         big5_pca_df],
                         axis = 1)


# checking results
clst_pca_df

# concatenating demographic information with pca-clusters
final_pca_clust_df = pd.concat([computer.loc[ : , ['What laptop do you currently have?', 
                                                   'What laptop would you buy in next assuming if all laptops cost the same?']],
                                clst_pca_df],
                                axis = 1)


# renaming columns
final_pca_clust_df.columns = ['Current Laptop',
                              'Future Laptop',
                              'Cluster',
                              'Ameanable',
                              'Equanimous',
                              'Obsessive',
                              'Indecisive']

# renaming clusters 
cluster_names = {0 : 'Indifferent',
                 1 : 'Passionate',
                 2 : 'Spontaneous',
                 3 : 'Temperamental'}

final_pca_clust_df['Cluster'].replace(cluster_names, inplace = True)


# adding a productivity step
big5_df = final_pca_clust_df

# checking results
big5_df.head(5)

#### Analyze Final Results

In [None]:
# CURRENT LAPTOPS + BIG 5
####################################

# setting figure size
fig, ax = plt.subplots(figsize = (15, 10))


# Ameanable
plt.subplot(2, 2, 1)
sns.boxplot(x = 'Current Laptop',
            y = 'Ameanable',
            hue = 'Cluster',
            data = big5_df)


# Equanimous 
plt.subplot(2, 2, 2)
sns.boxplot(x    = 'Current Laptop',
            y    = 'Equanimous',
            hue  = 'Cluster',
            data = big5_df)


# Obsessive 
plt.subplot(2, 2, 3)
sns.boxplot(x    = 'Current Laptop',
            y    = 'Obsessive',
            hue  = 'Cluster',
            data = big5_df)


# Indecsive 
plt.subplot(2, 2, 4)
sns.boxplot(x    = 'Current Laptop',
            y    = 'Indecisive',
            hue  = 'Cluster',
            data = big5_df)


# formatting and displaying all plots
plt.tight_layout()
plt.show()

**Insights**:
- Areas that can derive insights: 
    - Amenable (Indifferent and Spontaneous groups towards Macbook) 
    - Equanimous (Temperamental group towards Macbook) 
    - Obsessive (Indifferent and Spontaneous towards Macbook) 

In [None]:
# FUTURE LAPTOPS + BIG 5
####################################

# setting figure size
fig, ax = plt.subplots(figsize = (15, 10))


# Ameanable
plt.subplot(2, 2, 1)
sns.boxplot(x = 'Future Laptop',
            y = 'Ameanable',
            hue = 'Cluster',
            data = big5_df)


# Equanimous 
plt.subplot(2, 2, 2)
sns.boxplot(x    = 'Future Laptop',
            y    = 'Equanimous',
            hue  = 'Cluster',
            data = big5_df)


# Obsessive 
plt.subplot(2, 2, 3)
sns.boxplot(x    = 'Future Laptop',
            y    = 'Obsessive',
            hue  = 'Cluster',
            data = big5_df)


# Indecsive 
plt.subplot(2, 2, 4)
sns.boxplot(x    = 'Future Laptop',
            y    = 'Indecisive',
            hue  = 'Cluster',
            data = big5_df)


# formatting and displaying all plots
plt.tight_layout()
plt.show()

**Insights**:
- Areas that can derive insights: 
    -  Equanimous + Obsessive (Passionate & Spontaneous groups and Windows laptop)

### CUSTOMER GROUPING - HULT DNA

#### Prepare Data

In [None]:
# scaling before clustering 

# INSTANTIATING a StandardScaler() object
scaler = StandardScaler()


# FITTING the scaler with the data
scaler.fit(hult_pca_df)


# TRANSFORMING our data after fit
X_scaled_pca = scaler.transform(hult_pca_df)


# converting scaled data into a DataFrame
pca_scaled = pd.DataFrame(X_scaled_pca)


# reattaching column names
pca_scaled.columns = ['APATHETIC','CEREBRAL'] 


# checking pre- and post-scaling variance
print(pd.np.var(hult_pca_df), '\n\n')
print(pd.np.var(pca_scaled))

#### Begin Clustering

Steps taken was as follow:

1) Develop a dendogram to visualize the potential number of clusters

2) Plot to determine the number of clusters

3) Store cluster centers into a data frame

4) Name clusters for easy analysis

In [None]:
# STEP 1: Develop a dendogram to visualize the potential number of clusters

# grouping data based on Ward distance
# standard_mergings_ward = linkage(y = pca_scaled,  
                                 # method = 'ward',
                                 # optimal_ordering = True)


# setting plot size
# fig, ax = plt.subplots(figsize=(8, 8))

# developing a dendrogram
# dendrogram(Z = standard_mergings_ward,
           # leaf_rotation = 90,
           # leaf_font_size = 6)

#plt.show()

In [None]:
# STEP 2: Plot to determine the number of clusters

# calling the inertia_plot() function
# interia_plot(data = pca_scaled)

In [None]:
####################################
# CLUSTER: 3
####################################

# INSTANTIATING a k-Means object with five clusters
customers_k_pca = KMeans(n_clusters   = 3,
                        random_state = 219)


# fitting the object to the data
customers_k_pca.fit(pca_scaled)


# converting the clusters to a DataFrame
customers_kmeans_pca = pd.DataFrame({'Cluster': customers_k_pca.labels_})


# checking the results
print(customers_kmeans_pca.iloc[: , 0].value_counts())

In [None]:
# STEP 4: Store cluster centers into a data frame

# storing cluster centers
centroids_pca = customers_k_pca.cluster_centers_


# converting cluster centers into a DataFrame
centroids_pca_df = pd.DataFrame(centroids_pca)


# renaming principal components
centroids_pca_df.columns = ['APATHETIC','CEREBRAL'] 


# checking results (clusters = rows, pc = columns)
centroids_pca_df.round(2)

**STEP 4: Naming clusters for easy analysis**

- Cluster 0: **TREND-FOLLOWER** - has little curiosity nor thinks too much and will follow the status-quo 
- Cluster 1: **REALIST** - exhibit curiosity and deeply thinks about all aspects; weighing all pros and cons 
- Cluster 2: **AUTOGENICIST** - the 'Independent Thinker'

### FINAL OUTPUT - HULT DNA 

#### Aggregate Results: Demographic Features + Hult DNA + Clusters

In [None]:
# concatenating cluster memberships with principal components
clst_pca_df = pd.concat([customers_kmeans_pca,
                         hult_pca_df],
                         axis = 1)


# checking results
clst_pca_df

# concatenating demographic information with pca-clusters
final_pca_clust_df = pd.concat([computer.loc[ : , ['What laptop do you currently have?',
                                                   'What laptop would you buy in next assuming if all laptops cost the same?']],
                                clst_pca_df],
                                axis = 1)


# renaming columns
final_pca_clust_df.columns = ['Current Laptop',
                              'Future Laptop',
                              'Cluster',
                              'Apathetic',
                              'Cerebral']

# renaming clusters 
cluster_names = {0 : 'Trend-Follower',
                 1 : 'Realist',
                 2 : 'Autogenicist'}

final_pca_clust_df['Cluster'].replace(cluster_names, inplace = True)


# adding a productivity step
hultdna_df = final_pca_clust_df

# checking results
hultdna_df.head(5)

#### Analyze Final Results

In [None]:
# CURRENT LAPTOPS + HULT DNA 
####################################

# setting figure size
fig, ax = plt.subplots(figsize = (15, 10))


# Apathetic
plt.subplot(2, 2, 1)
sns.boxplot(x = 'Current Laptop',
            y = 'Apathetic',
            hue = 'Cluster',
            data = hultdna_df)


# Cerebral 
plt.subplot(2, 2, 2)
sns.boxplot(x    = 'Current Laptop',
            y    = 'Cerebral',
            hue  = 'Cluster',
            data = hultdna_df)


# formatting and displaying all plots
plt.tight_layout()
plt.show()

**Insights**:
- No big difference between Macbook vs Windows laptop in both 'Apathetic' and 'Cerebral' boxplots 
- Leave very litte room for insight, therefore, will focus on other areas

In [None]:
# FUTURE LAPTOPS + HULT DNA 
####################################

# setting figure size
fig, ax = plt.subplots(figsize = (15, 10))


# Apathetic
plt.subplot(2, 2, 1)
sns.boxplot(x = 'Future Laptop',
            y = 'Apathetic',
            hue = 'Cluster',
            data = hultdna_df)


# Cerebral 
plt.subplot(2, 2, 2)
sns.boxplot(x    = 'Future Laptop',
            y    = 'Cerebral',
            hue  = 'Cluster',
            data = hultdna_df)


# formatting and displaying all plots
plt.tight_layout()
plt.show()

**Insights**:
- Areas that can derive insights: 
    - Apathetic (Realist and Trend-Follower groups and Macbook) - perhaps, consider 'Chromebook' as a benchmark
    - Cerebral (Trend-Follower group towards Windows laptop)

# APPENDICES

**APPENDIX A**: Outline of Big Five Personality Traits and Hult DNA Criteria 

**What are the Big Five Personality traits?**
- Openness - open to new ideas and experiences.
- Conscientiousness - goal-directed, persistent, and organized.
- Extraversion - motivated by surroundings.
- Agreeableness - puts others' interests and needs ahead of their own
- Neuroticism - sensitive to stress and negative emotional triggers

**What are Hult DNA traits?**
- Thinking
    - Show self-awareness
    - Embraces changes
    - Demonstrate dynamic thinking
- Communicating
    - Speaks and listen skillfully
    - Influence confidently
    - Present idea effectively
- Team Building
    - Foster collaborative relationships
    - Inspire productivity
    - Resolve conflict constructively


# REFERENCES 

"Apathy Is Misunderstood".(2016). Medium. Retrieved February 1, 2021, from https://circa-navigate.corsairs.network/apathy-is-misunderstood-f3855a5ae33a

Bialik, K. (2020, March 2). “How Millennials Compare with Prior Generations.” Pew Research Center's Social & Demographic Trends Project. Retrieved February 1, 2021, from  www.pewsocialtrends.org/essay/millennial-life-how-young-adulthood-today-compares-with-prior-generations/

Bona, C. (2021, January 8). “How Marketers Can Win with Gen Z and Millennials Post-COVID-19.” BCG Global, BCG Global. Retrieved February 1, 2021, from www.bcg.com/publications/2020/how-marketers-can-win-with-gen-z-millennials-post-covid

Caldwell, M. (2020, December 26). “Is It Better to Take Financial Risks or Be Financially Conservative?”. The Balance. Retrieved February 1, 2021, from www.thebalance.com/should-i-be-financially-conservative-2385839

Francis, T., & Hoefel, F. (2020, December 16). 'True Gen': Generation Z and its implications for companies. Retrieved January 31, 2021, from https://www.mckinsey.com/industries/consumer-packaged-goods/our-insights/true-gen-generation-z-and-its-implications-for-companies

Fingas, J. (2021, January 31). “Chromebook Demand More than Doubled in 2020 Due to the Pandemic.” Engadget. Retrieved February 1, 2021, from www.engadget.com/chromebook-shipments-double-due-to-pandemic-193424657.html

Howard, C. (n.d.). Men are from PCs, women are from Macs. Retrieved January 31, 2021, from http://www.applematters.com/article/men_are_from_pcs_women_are_from_macs/index.html

Kelly, O. (2020), Common Obsessive Behaviors Among People With OCD, RetrievedFebruary 1, 2021, from https://www.verywellmind.com/what-are-common-obsessive-behaviors-2510679

"Mac vs PC People: Personality Traits & Aesthetic/Media Choices". (2009, November 24). Retrieved January 31, 2021, from https://kellynford.com/2009/11/24/mac-vs-pc-people-personality-traits-aestheticmedia-choices/

Nottrodt, J. (2020, April 20). "The Cheapest Places in the World to Buy Apple Devices". Retrieved January 31, 2021, from https://toomanyadapters.com/cheapest-places-world-buy-apple-devices/

Rahul, M.(2019, May 23). “[Survey] Most Students Prefer Macs over PCs.” MSPoweruser. Retrieved February 1, 2021, from mspoweruser.com/survey-most-students-prefer-macs-over-pcs/

Shaheem, S. (2019). "Why every leader needs a growth mindset. Hult International Business School". Retrieved February 1, 2021, from https://www.hult.edu/blog/why-every-leader-needs-growth-mindset/

Stangel, L. (2018). "A little less in Japan, way more in Brazil — here’s how much Apple's revamped Macbook will cost around the world". Retrieved February 1, 2021 from https://www.bizjournals.com/sanjose/news/2016/10/28/a-little-less-in-japan-way-more-in-brazil-here-s.html

"The Big Five Personality Test (BFPT)". (n.d.). Retrieved February 1, 2021, from https://sites.temple.edu/rtassessment/files/2018/10/Table_BFPT.pdf

"What Are the Big 5 Personality Traits?". (n.d.). Verywell Mind. Retrieved February 1, 2021, from https://www.verywellmind.com/the-big-five-personality-dimensions-2795422
