This notebook does the following:
1. Identifies Villager dimensions that **indicate interests**. (e.g. personality, hobby, Favorite Song).
2. Evaluates an **item's giftability** (by exploring the columns in each catalog & aligning with the determined 'interest' considerations)
3. Determine **required next steps/to do's for cleaning data**.

In [1]:
import os
import sys
import pandas as pd
os.chdir('..') #change the current working directory
sys.path.append(os.getcwd()) #add diretory to the path

In [2]:
from load_python import load_data
data = load_data(data_folder='data')

Loaded accessories with 222 rows and 22 columns.
Loaded achievements with 84 rows and 21 columns.
Loaded art with 70 rows and 26 columns.
Loaded bags with 96 rows and 20 columns.
Loaded bottoms with 726 rows and 20 columns.
Loaded construction with 236 rows and 7 columns.
Loaded dress-up with 913 rows and 22 columns.
Loaded fencing with 19 rows and 11 columns.
Loaded fish with 80 rows and 41 columns.
Loaded floors with 176 rows and 19 columns.
Loaded fossils with 73 rows and 14 columns.
Loaded headwear with 698 rows and 22 columns.
Loaded housewares with 3275 rows and 32 columns.
Loaded insects with 80 rows and 38 columns.
Loaded miscellaneous with 1307 rows and 31 columns.
Loaded music with 98 rows and 13 columns.
Loaded other with 353 rows and 15 columns.
Loaded photos with 3128 rows and 20 columns.
Loaded posters with 452 rows and 13 columns.
Loaded reactions with 44 rows and 5 columns.
Loaded recipes with 595 rows and 24 columns.
Loaded rugs with 132 rows and 19 columns.
Loaded sho

<h4>Identify villager dimensions that should be considered when recommending a gift</h4>

In [3]:
#Villagers
villagers_df = data['villagers'] #access the Villager data from the dictionary
print(villagers_df.columns) #retrieve the column names

Index(['Name', 'Species', 'Gender', 'Personality', 'Hobby', 'Birthday',
       'Catchphrase', 'Favorite Song', 'Style 1', 'Style 2', 'Color 1',
       'Color 2', 'Wallpaper', 'Flooring', 'Furniture List', 'Filename',
       'Unique Entry ID'],
      dtype='object')


**Personality**, **Hobby**, **Favorite Song**, **Style 1** and **Style 2** indicate a villager's interests.

<h4>Investigate Hobby</h4>

In [22]:
value_counts = villagers_df['Hobby'].value_counts()

print(value_counts)

Hobby
Nature       66
Fitness      66
Fashion      66
Play         65
Education    64
Music        64
Name: count, dtype: int64


<h4>Investigate Favorite Song</h4>

In [15]:
value_counts = villagers_df['Favorite Song'].value_counts()

print(value_counts)

Favorite Song
Forest Life      15
K.K. Soul        12
K.K. Cruisin'    11
K.K. Stroll      10
Neapolitan        9
                 ..
K.K. Lullaby      1
K.K. Jazz         1
K.K. Tango        1
K.K. Aria         1
K.K. Parade       1
Name: count, Length: 89, dtype: int64


In [17]:
#Investigate Music datafarme
music_df = data['music'] #access the Music data from the dictionary
print(music_df.columns) #retrieve the column names

Index(['Name', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size', 'Source',
       'Source Notes', 'Version', 'Catalog', 'Filename', 'Internal ID',
       'Unique Entry ID'],
      dtype='object')


<h4>Investigate Personalities</h4>

In [5]:
#How many personalities are there?
value_counts = villagers_df['Personality'].value_counts()

print(value_counts)

Personality
Lazy          60
Normal        59
Snooty        55
Cranky        55
Jock          55
Peppy         49
Smug          34
Big Sister    24
Name: count, dtype: int64


<h4>Investigate Style 1 & 2</h4>

In [19]:
#How many Styles are there?
value_counts = villagers_df['Style 1'].value_counts()

print(value_counts)

Style 1
Simple      118
Cool         68
Cute         63
Elegant      54
Active       50
Gorgeous     38
Name: count, dtype: int64


What columns in other dataframes may indicate that the item aligns with a villager's style 1 and/or style 2?
1. Style (indicates item aesthetic)
2. Label Themes (related to Label's Fashion Challenges, indicates what theme/style/aesthetic the item belongs)
3. HHA Concept 1 (the primary theme/aesthetic/style/concept as determined by Happy Home Academy)
4. HHA Concept 2 (the scondary theme/aesthetic/style/concept as determined by Happy Home Academy)
Identify Data frames containing each of those columns.

In [6]:
#Check which dataframes contain the Style column
#create the function that'll return relevant dataframes
def get_dfs_with_column(data, column_name): 
    items = []
    for name, df in data.items():
        if column_name in df.columns:
            items.append(name)
    return items
#call the function
results = get_dfs_with_column(data, 'Style')
print('Dataframes with "Style" Column: ', results)

Dataframes with "Style" Column:  ['accessories', 'bags', 'bottoms', 'dress-up', 'headwear', 'shoes', 'socks', 'tops']


In [None]:
#Check which dataframes contain the Label Themes column
#create the function that'll return relevant dataframes
def get_dfs_with_column(data, column_name): 
    items = []
    for name, df in data.items():
        if column_name in df.columns:
            items.append(name)
    return items
#call the function
results = get_dfs_with_column(data, 'Label Themes')
print('Dataframes with "Label Themes" Column: ', results)

In [None]:
#Check which dataframes contain an HHA Concept column
#create the function that'll return relevant dataframes
def get_dfs_with_column(data, column_name): 
    items = []
    for name, df in data.items():
        if column_name in df.columns:
            items.append(name)
    return items
#call the function
results = get_dfs_with_column(data, 'HHA Concept 1')
print('Dataframes with "HHA Concept 1" Column: ', results)

In [7]:
#Check which dataframes contain an HHA Concept column
#create the function that'll return relevant dataframes
def get_dfs_with_column(data, column_name): 
    items = []
    for name, df in data.items():
        if column_name in df.columns:
            items.append(name)
    return items
#call the function
results = get_dfs_with_column(data, 'HHA Concept 2')
print('Dataframes with "HHA Concept 2" Column: ', results)

Dataframes with "HHA Concept 2" Column:  ['art', 'floors', 'housewares', 'miscellaneous', 'rugs', 'wall-mounted', 'wallpaper']


In [8]:
#Make sure all dataframes contain Unique Entry ID
#create the function that'll return relevant dataframes
def get_dfs_with_column(data, column_name): 
    items = []
    for name, df in data.items():
        if column_name in df.columns:
            items.append(name)
    return items
#call the function
results = get_dfs_with_column(data, 'Unique Entry ID')
print('Dataframes with "Unique Entry ID" Column: ', results)

Dataframes with "Unique Entry ID" Column:  ['accessories', 'achievements', 'art', 'bags', 'bottoms', 'construction', 'dress-up', 'fencing', 'fish', 'floors', 'fossils', 'headwear', 'housewares', 'insects', 'miscellaneous', 'music', 'other', 'photos', 'posters', 'reactions', 'recipes', 'rugs', 'shoes', 'socks', 'tools', 'tops', 'umbrellas', 'villagers', 'wall-mounted', 'wallpaper']


<h3> Cleaning To Do's</h3>

Build out the giftable_df (a catalog containing only the items relevant for this analysis.)
2.  join accessories_df, art_df, bags_df, bottoms_df, dress_up_df, floors_df, headwear_df, housewares_df, miscellaneous_df, music_df, rugs_df, shoes_df, socks_df, tops_df, wall_mounted_df, and wallpaper_df by the unique entry ID columns.
3. remove items from the giftable dataframe if they contain empty/null/NA values in the music.name, style, label theme, HHA concept.

<h3>Gift Guide Planning</h3>

Include Item in Gift Guide if:
- music's (name) is villager's (favorite song)
- item's (style) matches villager's (style 1|style 2)
- item's (Label Theme) matches villager's (style 1|style 2)
- item's (HHA Concept 1) matches villager's (style 1|style 2)
- item's (HHA Concept 2) matches villager's (style 1|style 2)