### Prepping Data Challenge:  What's Trendy? (week 36)
For this week's challenge, We wanted to use Google Trends to take a look back over the past couple of years and see what people were searching for. In particular, are these categories still as popular now as they were in peak lockdown? How does the experience vary around the world? We'll be looking at:

Pet adoption (who didn't want a furry work from home buddy?!)
Online streamer (can one make money from playing video games?)
Staycations (everyone's favourite word, right?)

### Requirements
- Input the data
- Calculate the overall average index for each search term
- Work out the earliest peak for each of these search terms
- For each year (1st September - 31st August), calculate the average index
- Classify each search term as either a Lockdown Fad or Still Trendy based on whether the average index has increased or decreased since last year
- Filter the countries so that only those with values for each search term remain
- For each search term, work out which country has the highest percentage
- Bring everything together into one dataset
- Output the data

I used part of <b>Arsene Xie</b> code.

In [1]:
import pandas as pd
import numpy as np

In [2]:
#Input the data
with pd.ExcelFile('Wk36-Input.xlsx') as xl:
    tl = pd.read_excel(xl,'Timeline', header=2)\
        .melt(id_vars='Week', var_name='Search Term', value_name = 'index')
    cb = pd.read_excel(xl, 'Country Breakdown', header=2)\
        .melt(id_vars='Country', var_name='Search Term', value_name = 'index')

In [3]:
tl['Week'] = tl['Week'].dt.date
tl['Search Term'] = tl['Search Term'].str.replace('(:.*)','',regex = True)
cb['Search Term'] = cb['Search Term'].str.replace('(:.*)','',regex = True)

In [4]:
cb.head()

Unnamed: 0,Country,Search Term,index
0,Hong Kong,Pet adoption,0.03
1,South Korea,Pet adoption,
2,Guernsey,Pet adoption,
3,Singapore,Pet adoption,0.11
4,Barbados,Pet adoption,


In [5]:
#Calculate the overall average index for each search term
tl['Average Index'] = round(tl['index'].groupby(tl['Search Term']).transform('mean'),1)

In [6]:
tl.head()

Unnamed: 0,Week,Search Term,index,Average Index
0,2016-09-04,Pet adoption,69,63.5
1,2016-09-11,Pet adoption,70,63.5
2,2016-09-18,Pet adoption,64,63.5
3,2016-09-25,Pet adoption,64,63.5
4,2016-10-02,Pet adoption,63,63.5


In [7]:
#Work out the earliest peak for each of these search terms
tl['peak index'] = tl['index'].groupby(tl['Search Term']).transform('max')
tl['peak index week'] = tl.apply(lambda x: x['Week'] if x['index']==x['peak index'] else max(tl['Week']), axis=1)
tl['earliest peak'] = tl['peak index week'].groupby(tl['Search Term']).transform('min')

In [8]:
#For each year (1st September - 31st August), calculate the average index
tl['year'] = tl['Week'].apply(lambda x: x.year+1 if x.month>=9 else x.year)
tl = tl[tl['year']>=max(tl['year'])-1].copy()

In [9]:
#Classify each search term as either a Lockdown Fad or Still Trendy based on whether the average index has 
#increased or decreased since last year
tl['YearMeasure'] = tl['year'].apply(lambda x: f'{str(x-1)}/{str(x)[2:]} avg index')
tl = tl.drop(['Week', 'peak index week', 'year'], axis=1)

In [10]:
#Filter the countries so that only those with values for each search term remain
cb['Search Term'] = cb['Search Term'].replace('(:.*)','', regex = True)
cb = cb.loc[cb.reset_index().groupby(['Search Term'])['index'].idxmax()][['Search Term','Country']]

In [11]:
#For each search term, work out which country has the highest percentage
output = tl.pivot_table(index=[c for c in tl.columns if c not in ['index','YearMeasure']], 
                             columns='YearMeasure', values='index', aggfunc='mean').reset_index()
output['Status'] = output.apply(lambda x: 'Still Trendy' if x['2020/21 avg index']>=x['2019/20 avg index'] else 'Lockdown Fad', axis=1)
output['2020/21 avg index'] = round(output['2020/21 avg index'],1)

In [12]:
#Bring everything together into one dataset
output = output.merge(cb, on='Search Term').rename(columns={'Country':'Country with highest percentage'})

In [13]:
output = output[['Search Term', 'Status', '2020/21 avg index', 'Average Index', 'peak index', 'earliest peak', 'Country with highest percentage']]

In [14]:
output.head(10)

Unnamed: 0,Search Term,Status,2020/21 avg index,Average Index,peak index,earliest peak,Country with highest percentage
0,Online streamer,Still Trendy,53.1,29.8,84,2020-11-29,Slovakia
1,Pet adoption,Lockdown Fad,66.5,63.5,100,2020-04-12,Australia
2,Staycation,Still Trendy,34.8,14.2,44,2020-07-19,Guernsey


In [15]:
#output the data
output.to_csv('wk35-output.csv', index=False)