# Googlesearch

googlesearch is a Python library for searching Google, easily. googlesearch uses requests and BeautifulSoup4 to scrape Google. Please read the following website for more information.

https://github.com/Nv7-GitHub/googlesearch


## Step 1: Install googlesearch package

We need to run the following command in the terminal for the **first time use**.

pip install googlesearch-python

## Step 2: Perform google search by providing query

In [1]:
import pandas as pd
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from googleapiclient.discovery import build

In [2]:
from googleapiclient.discovery import build

# Define the Google Custom Search function
def google_search(query, api_key, cse_id, num_results=100):
    service = build("customsearch", "v1", developerKey=api_key)
    results = []
    start_index = 1

    while start_index < num_results:
        response = service.cse().list(
            q=query,
            cx=cse_id,
            start=start_index,
            num=min(num_results - len(results), 10)
        ).execute()

        # Add all items from the response to the results
        if "items" in response:
            results.extend(response['items'])
        
        # Update the start index for the next page of results
        start_index += 10

        # Break if no more results are available
        if "items" not in response:
            break

    return results

# Parameters
query = "Winter snowstorm"

# Perform the search
raw_results = google_search(query, API_KEY, CSE_ID, num_results=100)

# Print raw results (optional, to debug structure)
print(raw_results[:2])  # Show first 2 raw results

[{'kind': 'customsearch#result', 'title': 'Winter Snowstorm | Loose Watercolor Painting by Sarah Cray ...', 'htmlTitle': '<b>Winter Snowstorm</b> | Loose Watercolor Painting by Sarah Cray ...', 'link': 'https://www.youtube.com/watch?v=K-lltP6ArUQ', 'displayLink': 'www.youtube.com', 'snippet': 'Feb 11, 2024 ... For additional supplies, visit ➝ http://letsmakeart.com The lesson to be learned from this landscape painting is to fall in love with the\xa0...', 'htmlSnippet': 'Feb 11, 2024 <b>...</b> For additional supplies, visit ➝ http://letsmakeart.com The lesson to be learned from this landscape painting is to fall in love with the&nbsp;...', 'formattedUrl': 'https://www.youtube.com/watch?v=K-lltP6ArUQ', 'htmlFormattedUrl': 'https://www.youtube.com/watch?v=K-lltP6ArUQ', 'pagemap': {'cse_thumbnail': [{'src': 'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSuU0difHbIM3gMPpg7A9TG38yzkAnKFvW-3lx4CxvxNC6ChwcbzCnb_uq1&s', 'width': '299', 'height': '168'}], 'metatags': [{'apple-itunes-app

## Step 3 Convert SearchResult objects to a list of strings

We can efficiently perform text mining on strings instead of SearchResult.

In [3]:
results_list = []
for item in raw_results:
    results_list.append({
        "URL": item.get("link", "N/A"),
        "Title": item.get("title", "N/A"),
        "Description": item.get("snippet", "N/A") 
    })


print(results_list[:3])

[{'URL': 'https://www.youtube.com/watch?v=K-lltP6ArUQ', 'Title': 'Winter Snowstorm | Loose Watercolor Painting by Sarah Cray ...', 'Description': 'Feb 11, 2024 ... For additional supplies, visit ➝ http://letsmakeart.com The lesson to be learned from this landscape painting is to fall in love with the\xa0...'}, {'URL': 'https://www.reddit.com/r/SkyrimModsXbox/comments/19dbmtd/wintersnowstorm_mod_help/', 'Title': 'Winter/snowstorm Mod help : r/SkyrimModsXbox', 'Description': "Jan 23, 2024 ... Climates of Tamriel - WE converts most of Skyrim's landscape into snow and adjusts the weather to be colder and snowier, which works nicely with\xa0..."}, {'URL': 'https://www.youtube.com/watch?v=fNs7xYkCB7Y', 'Title': 'CAMPING in a BLIZZARD - Winter Snowstorm - The Calm Before the ...', 'Description': 'May 31, 2024 ... Brutal tent camping in a snowstorm with tent and tarp. Join our channel here... https://www.youtube.com/@AbelandVictoria/join Alton 3x3 Tarp\xa0...'}]


## Step 4 Convert a list of strings to a data frame

Sometime, we may need to store information in a data frame instead of a list to build some machine learning models.

In [4]:
import pandas as pd

# Convert the list of dictionaries to a pandas DataFrame
df = pd.DataFrame(results_list)

# Print the DataFrame
print(df)

# Save the DataFrame to a CSV file (optional)
df.to_csv("google_search_results.csv", index=False)

                                                  URL  \
0         https://www.youtube.com/watch?v=K-lltP6ArUQ   
1   https://www.reddit.com/r/SkyrimModsXbox/commen...   
2         https://www.youtube.com/watch?v=fNs7xYkCB7Y   
3   https://charlottegibbblog.com/photography/land...   
4         https://www.youtube.com/watch?v=pIOkZaD0opA   
..                                                ...   
74  http://www.soest.hawaii.edu/MET/Faculty/bwang/...   
75                 https://www.lsce.ipsl.fr/en/513-2/   
76  https://www.researchgate.net/publication/33026...   
77          https://m.youtube.com/watch?v=-JOwgo365EQ   
78  https://www.kktv.com/video/2025/01/07/early-mo...   

                                                Title  \
0   Winter Snowstorm | Loose Watercolor Painting b...   
1        Winter/snowstorm Mod help : r/SkyrimModsXbox   
2   CAMPING in a BLIZZARD - Winter Snowstorm - The...   
3     A Winter Snowstorm in Yosemite - Charlotte Gibb   
4   Winter Snowstorm Watercolo

## Step 5 Remove the hyperlink URL in the search_result using a regular expression.

In [5]:
# Concatenate URL, Title, and Description into a single column
df['search_result'] = df['URL'] + " " + df['Title'] + " " + df['Description']

# Remove the hyperlink URL using a regular expression
df['search_result'] = df['search_result'].str.replace(r'http\S+', '', regex=True)

## Step 6 Remove all words containing at most two characters such as "a", "an", "in", "on", "etc".

In [6]:
# Remove all words with at most two characters
df['search_result'] = df['search_result'].apply(
    lambda x: ' '.join([word for word in x.split() if len(word) > 2])
)

## Step 7 Remove the following five stop words: "are", "but", "very", "since", "could" using regular expression.

In [7]:
# Define stop words to remove
stop_words = {'are', 'but', 'very', 'since', 'could'}

# Remove specific stop words
df['search_result'] = df['search_result'].apply(
    lambda x: ' '.join([word for word in x.split() if word.lower() not in stop_words])
)

## Step 8 Remove all special characters, punctuation using a regular expression.

In [8]:
# Remove special characters and punctuation
df['search_result'] = df['search_result'].str.replace(r'[^\w\s]', '', regex=True)

## Final Result

In [9]:
print(df['search_result'])

0     Winter Snowstorm Loose Watercolor Painting Sar...
1     Wintersnowstorm Mod help rSkyrimModsXbox Jan 2...
2     CAMPING BLIZZARD Winter Snowstorm The Calm Bef...
3     Winter Snowstorm Yosemite Charlotte Gibb Mar 1...
4     Winter Snowstorm Watercolor Art Tutorial YouTu...
                            ...                        
74    MJO Modulation 200910 Winter Snowstorms the Un...
75    LSCE  winter snowstorms that  2019 ESD follow ...
76    EFFECT INCREASING NUMBER ROAD CLOSE DUE  Oct 2...
77    4ft FRESH SNOW WINTER SNOWSTORM the MOUNTAIN  ...
78    Early morning look the roads during Southern C...
Name: search_result, Length: 79, dtype: object
