# Google Trends in AQL
In this notebook, we inspect the most prominent queries with respect to certain time windows and compare them to the most frequent queries from google in that same time window.
First, we look at the most frequent queries in a year. We examine the years from 1999 up to 2022.

Let's load the annual top 25 queries from google:

In [17]:
import pandas as pd

annual_google_trends = pd.read_csv("/mnt/ceph/storage/data-in-progress/data-teaching/theses/thesis-schneg/google_trends/google_trends_total.csv")

print(annual_google_trends['year'].unique())
print(annual_google_trends.columns)

[2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009
 2008 2007 2006 2005 2004]
Index(['query', 'score', 'year'], dtype='object')


Now, we load the annual top 25 queries, the annual top 25 english queries and the annual top 25 google queries from the the aql:

In [49]:
annual_aql_trends = pd.read_parquet("/mnt/ceph/storage/data-in-progress/data-teaching/theses/thesis-schneg/analysis_data/analysis/aql-get-annual-top-queries-special")
annual_aql_trends_eng = pd.read_parquet("/mnt/ceph/storage/data-in-progress/data-teaching/theses/thesis-schneg/analysis_data/analysis/aql-get-annual-top-queries-english")
annual_aql_trends_google = pd.read_parquet("/mnt/ceph/storage/data-in-progress/data-teaching/theses/thesis-schneg/analysis_data/analysis/aql-get-annual-top-queries-google")


annual_aql_trends.rename(columns={'serp_query_text_url': 'query', 'count()': 'score'}, inplace=True)
annual_aql_trends_eng.rename(columns={'serp_query_text_url': 'query', 'count()': 'score'}, inplace=True)
annual_aql_trends_google.rename(columns={'serp_query_text_url': 'query', 'count()': 'score'}, inplace=True)
# print(annual_aql_trends.columns)
print(annual_aql_trends['score'].sum())
aql_top_queries = {}
aql_top_queries.update({'aql_google': annual_aql_trends_google})
aql_top_queries.update({'aql_english': annual_aql_trends_eng})
aql_top_queries.update({'aql_native': annual_aql_trends})


83543021


Let's find out if we have intersections in the annual top 25 of both query logs:

In [50]:

# transform counts into scores by assigning a score of 100 to the most frequent query of each year and compute the ratio of the other queries
sizes = {}
for key, data in aql_top_queries.items():
    print(f"AQL {key.upper()}:")
    years = set(list(data['year']))
    sizes.update({f"{key}_count": [data['score'].sum(), 100*data['score'].sum()/346310968]})
    for year in years:
        # print(year)
        max_count = data[data['year'] == year]['score'].max()
        data.loc[data['year'] == year, 'score'] = round(data.loc[data['year'] == year, 'score'] / max_count * 100)
        # print(year)

    years = set(list(data['year']))
    for year in years:
        if year >= 2004:
            aql_queries = set(data[data['year'] == year]['query'].reset_index(drop=True))
            google_queries = set(annual_google_trends[annual_google_trends['year'] == year]['query'].reset_index(drop=True))
            # google_queries_set = set(google_queries)
            # find matching queries
            matches = google_queries.intersection(aql_queries)
            # print the matching queries
            print(f"{year}: {matches}")

for key,value in sizes.items():
    print(f"{key}: {value}")    


AQL AQL_GOOGLE:
2004: set()
2005: set()
2006: set()
2007: set()
2008: set()
2009: set()
2010: set()
2011: set()
2012: {'youtube', 'facebook', 'google'}
2013: {'youtube', 'yahoo', 'facebook', 'google'}
2014: set()
2015: set()
2016: set()
2017: set()
2018: set()
2019: set()
2020: set()
2021: set()
2022: set()
AQL AQL_ENGLISH:
2004: set()
2005: set()
2006: set()
2007: set()
2008: set()
2009: set()
2010: set()
2011: set()
2012: set()
2013: set()
2014: set()
2015: set()
2016: set()
2017: set()
2018: set()
2019: set()
2020: set()
2021: set()
2022: set()
AQL AQL_NATIVE:
2004: {'free'}
2005: set()
2006: set()
2007: {'video'}
2008: set()
2009: set()
2010: set()
2011: {'you'}
2012: {'google'}
2013: set()
2014: set()
2015: set()
2016: set()
2017: set()
2018: set()
2019: set()
2020: set()
2021: set()
2022: set()
aql_google_count: [16596750, 4.792441341332279]
aql_english_count: [2412914, 0.6967477853603528]
aql_native_count: [83543021, 24.123700581149368]


As we can see, there are only few intersections in the annual top 25 queries of google and the aql queries. 