<div id="qr" style="display:inline-block; align:left;vertical-align: middle;" >
  <h2>Google Trends</h2>
    <h4>Analysis of the trending search words on Google using pytrends</h4>
</div>
<div style="display:inline-block;vertical-align: middle;float:right;">
  <img style="float: right;" src='https://cdn.freebiesupply.com/logos/large/2x/gap-logo-png-transparent.png' width='100'>
</div>

The idea is to understand what is trending on Google and what are people searching for as this would provide some insight on current trends. The starting point has been some categories available on GAP websites.

<h2> Table of Contents</h2>
<ol>
    <li><a href='#Import-Libraries'>Importing Libraries</a></li>
    <li><a href='#GAP-Product-Categories'>GAP Product Categories</a></li>
    <li><a href='#Pytrends'>Pytrends</a></li>
</ol>

## Import Libraries

In [None]:
# !pip install pytrends

In [2]:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import pandas as pd
from tqdm import tqdm
import time

## GAP Product Categories

<p> We first get a list of keywords by scraping the women section catalogue of GAP website as a sample</p>

In [18]:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url = "https://www.gap.com/"
driver = webdriver.Chrome()
driver.maximize_window()
driver.get(url)

wait = WebDriverWait(driver, 10)

email = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.sitewide-13o7eu2'))).get_attribute("value")

hover = driver.find_element(By.XPATH, "//a[@data-divisionname='Women']")
actions = ActionChains(driver)
actions.move_to_element(hover).perform()

In [19]:
html = driver.page_source
soup = BeautifulSoup(html)

In [20]:
res = soup.find_all('a',attrs={'class':'catnav--item--link'})

In [21]:
categories = []
for i in tqdm(range(len(res))):
    categories.append(res[i].text)

100%|███████████████████████████████████████| 37/37 [00:00<00:00, 434703.78it/s]


In [22]:
categories = [categories.replace("'","").strip() for categories in categories]
# remove irrelevant keywords
categories_to_remove = ['New Arrivals',
 'The CashSoft Shop',
 'The Vegan-Leather Shop',
 'The Fall Edit',
 'Editors Picks',
 'The Inspo Board',
 'The Work Shop',
 'GapBody','Bras',
 'Underwear',
 'Matching Sets',
 'Sale',
    'The Petite Shop',
 'The Tall Shop',
 'Maternity',
                       'Shop All Styles']
categories = [i for i in categories if i not in categories_to_remove]

In [23]:
categories

['Jeans',
 'Dresses',
 'Shirts & Tops',
 'T-Shirts & Tanks',
 'Pants',
 'Outerwear & Jackets',
 'Sweaters',
 'Sweatshirts & Sweatpants',
 'Shorts',
 'Swim',
 'Pajamas & Loungewear',
 'Leggings & Joggers',
 'Skirts',
 'GapFit Activewear',
 'GapFit Active Tops',
 'GapFit Leggings & Joggers',
 'GapFit Sports Bras',
 'Jewelry',
 'Shoes',
 'Beauty & Home',
 'Bags, Scarves, & More']

## Pytrends

<p> We use the above categories as a basis to understand similar trending words on Google Trends using pytrends</p>

In [24]:
from pytrends.request import TrendReq
from bs4 import BeautifulSoup
import requests

def get_related_queries(keyword):
    pytrends = TrendReq(hl='en-US', tz=360)
    pytrends.build_payload([keyword], timeframe='today 5-y', geo='US')
    if (pytrends.related_queries()[keyword]['top']) is not None:
        related_queries = pytrends.related_queries()[keyword]['top'].values
    else:
        related_queries = []
    return list(related_queries)


initial_keywords = categories

# Get related queries for each initial keyword
related_keywords = []
for keyword in initial_keywords:
    related_keywords.extend(get_related_queries(keyword))
    time.sleep(60)

# Remove duplicates
df = pd.DataFrame(related_keywords, columns=['Keyword', 'Search Score'])


In [25]:
df = pd.DataFrame(related_keywords, columns=['Keyword', 'Search Score'])

### From the keywords, we now divide it into article, fabric and color to understand each of these categories better

In [26]:
article=['jeans','pants','dresses','skirts','shirts','shorts','cargo']

fabric=['wool', 'cashmere', 'cotton', 'silk', 'denim', 'velvet', 'leather', 'satin', 'chiffon']

colors=["black", "white", "gray", "navy", "brown", "beige", "red", "pink", "orange", "yellow", "green", "blue", "Purple", "Lavender", "Mauve", "Burgundy", "Maroon", "Teal", "Turquoise", "Mint Green", "Olive", "Khaki", "Coral", "Peach", "Salmon", "Aqua", "Sky Blue", "Sapphire", "Emerald", "Forest Green", "Lilac", "Violet", "Fuchsia", "Magenta", "Gold", "Silver", "Bronze", "Copper", "Rose Gold", "Charcoal", "Plum", "Royal Blue", "Mustard", "Ivory", "Cream", "Mint", "Champagne", "Indigo", "Pearl", "Lime Green", "Ruby", "Lavender", "Periwinkle", "Taupe", "Ballet Pink", "Crimson", "Sage Green", "Nude", "Cobalt Blue", "Tangerine", "Steel Gray", "Pistachio", "Rust", "Terracotta", "Denim Blue"]



In [27]:
df['Article'] = df['Keyword'].apply(lambda x: next((a for a in article if a.lower() in x.lower()), None))
df['Fabric'] = df['Keyword'].apply(lambda x: next((fab for fab in fabric if fab.lower() in x.lower()), None))
df['Color'] = df['Keyword'].apply(lambda x: next((color for color in colors if color.lower() in x.lower()), None))

### The score of 100 represents most searched for element, 50 represents half as much as the 100 keyword

In [28]:
df

Unnamed: 0,Keyword,Search Score,Article,Fabric,Color
0,black jeans,100,jeans,,black
1,mens jeans,93,jeans,,
2,skinny jeans,81,jeans,,
3,men jeans,80,jeans,,
4,blue jeans,79,jeans,,blue
...,...,...,...,...,...
243,ulta,17,,,
244,home goods,16,,,
245,featured,12,,,red
246,sally beauty,9,,,


### We not take a mean of score across the defined article, colors and fabric to understand whats trending more than the other

In [35]:
df_jeans = df[df['Article']=='jeans']
df_jeans.to_csv('jeans_keywords.csv')

In [38]:
df_jeans = df[df['Article']=='shorts']
df_jeans.to_csv('shorts.csv')

In [39]:
df_jeans = df[df['Article']=='dresses']
df_jeans.to_csv('dresses.csv')

In [40]:
df_jeans

Unnamed: 0,Keyword,Search Score,Article,Fabric,Color
25,wedding dresses,100,dresses,,
27,prom dresses,54,dresses,,
28,women dresses,37,dresses,,
29,formal dresses,35,dresses,,
30,plus size dresses,35,dresses,,
31,bridesmaid dresses,31,dresses,,
32,dresses for women,30,dresses,,
33,long dresses,30,dresses,,
34,black dresses,28,dresses,,black
35,white dresses,26,dresses,,white


In [29]:
article_df = df.groupby('Article')['Search Score'].mean().reset_index()

# Group by 'Fabric' and calculate the mean search score
fabric_df = df.groupby('Fabric')['Search Score'].mean().reset_index()

# Group by 'Color' and calculate the mean search score
color_df = df.groupby('Color')['Search Score'].mean().reset_index()


In [30]:
article_df.sort_values('Search Score',ascending=False)

Unnamed: 0,Article,Search Score
1,jeans,52.318182
2,pants,43.391304
4,shorts,40.0
5,skirts,32.826087
0,dresses,27.708333
3,shirts,23.0


In [31]:
fabric_df.sort_values('Search Score',ascending=False)

Unnamed: 0,Fabric,Search Score
2,leather,31.0
1,denim,29.333333
0,cashmere,20.0
3,wool,15.0


In [32]:
color_df.sort_values('Search Score',ascending=False)

Unnamed: 0,Color,Search Score
2,black,54.428571
3,blue,53.5
0,Gold,42.5
8,white,40.0
1,Khaki,28.0
5,green,27.0
4,brown,21.0
7,red,20.6
6,navy,16.0


In [41]:
article_df.to_csv('article_mean_scores.csv')

In [43]:
color_df.to_csv('color_mean_scores.csv')

In [44]:
fabric_df.to_csv('fabric_mean_scores.csv')

<h3> As we can see from the above sample data, <i>black colour, leather fabric and jeans</i> are the most trending words and can be used to understand what people are searching for</h3>