# Automate Market Research


## Overview

This Automate Market Research Agent is a powerful tool that automates the process of collecting and organizing AI-related news articles. It performs two main functions:

1.   Collecting relevant news articles using the **[News API](https://newsapi.org/)**
2.   Organizing the content into Google Docs for easy access and analysis


## Prerequisites

Before using the Automate Research Agent, you'll need:

*   News API key

##### **Google Cloud credentials with access to**:
*   Google Sheets API

If you don't know how to get your Google Cloud credentials, just ask any LLM the following prompt:


```
I am new to Google Cloud Console and haven't built a project yet.Could you please walk me through the step-by-step process to
download the credentials file and activate the Google Sheets API?
```



##### **Python environment with the following packages**:
*   newsapi-python
*   google-auth
*   google-api-python-client
*   gspread
*   requests


## Step-by-Step Guide


### **1.   Setting Up Your Environment**

First, configure your environment variables in Google Colab's "Secrets" section:

*   **news_api**: Your NewsAPI key  
*   **sheet_id**: ID of your Google Sheet
*   **email_address**: Email for sharing documents  
*   **folder_name**: Name for the research folder  
*   **starting_date**: Start date for news collection  
*   **ending_date**: End date for news collection  

Also, Python environment with the following packages:

In [None]:
pip install newsapi-python google-auth google-api-python-client gspread beautifulsoup4 requests

#### Import modules

In [None]:
import gspread
from google.oauth2.service_account import Credentials
from newsapi import NewsApiClient
import json

from bs4 import BeautifulSoup
import requests
from googleapiclient.discovery import build
from datetime import datetime

import time

from google.colab import userdata
import os

#### Define the scope and authorize the Google credentials

In [None]:
# Define the scope and authorize the credentials
SCOPES = ["https://www.googleapis.com/auth/spreadsheets", "https://www.googleapis.com/auth/drive"]
credentials_path = 'yourcrednentialpath.json'
creds = Credentials.from_service_account_file(credentials_path, scopes=SCOPES)

In [None]:
# Initialize gspread client
client = gspread.authorize(creds)

### **2. Collecting News Article**

Think of this phase as having a team of researchers who search through news websites for you, but automated. Here's what happens:

#### 1. Setting Your Search Topics
- The script starts with a list of topics you want to research
- For example, topics could be:
  * Apple Inc.
  * Google and Search
  * Meta Ray-Ban
  * Amazon
  * Netflix

#### 2. Defining Your Search Timeline
- You specify two dates:
  * A starting date: When you want to begin looking for news
  * An ending date: The latest date for news articles
- This helps focus on recent, relevant information. If you're in the free tier of NewsAPI

#### 3. The Search Process
- For each topic (like "AI and marketing"):
  * The system connects to NewsAPI (think of it as a huge digital newspaper archive)
  * It looks for English-language articles about that topic
  * It organizes articles by relevance
  * It can search through multiple pages of results (currently set to 5 pages)

#### 4. Organizing the Results
- For each topic, the system creates a separate sheet in your Google Spreadsheet
- Each article entry includes:
  * The article's title
  * Author's name
  * Source (which news website it's from)
  * Publication date
  * A brief description
  * The article's URL
  * A snippet of the content

  </br>

  Now let's start!

#### a. Connect NewsAPI

In [None]:
# Init
newsapi = NewsApiClient(api_key = userdata.get('news_api'))

#### b. Build Functions

##### Export to google sheet

In [None]:
def export_to_google_sheet(all_articles, sheet_id, sheet_name):
    # Open the Google Sheet and add a new worksheet or access an existing one
    sheet = client.open_by_key(sheet_id)
    try:
        worksheet = sheet.worksheet(sheet_name)
        worksheet.clear()
    except gspread.exceptions.WorksheetNotFound:
        worksheet = sheet.add_worksheet(title=sheet_name, rows="100", cols="20")

    # Prepare the header
    header = ["Title", "Author", "Source", "Published At", "Description", "URL", "Content"]
    worksheet.append_row(header)

    # Prepare the data rows for batch update
    rows = []
    for article in all_articles['articles']:
        title = article.get('title', 'No Title')
        author = article.get('author', 'No Author')
        source = article['source']['name']
        published_at = article.get('publishedAt')
        description = article.get('description', 'No Description')
        url = article.get('url', 'No URL')
        content = article.get('content', 'No Content')

        rows.append([title, author, source, published_at, description, url, content])

    # Perform batch update
    if rows:
        worksheet.append_rows(rows)

    print(f"Articles exported successfully to the sheet: https://docs.google.com/spreadsheets/d/{sheet_id}/edit#gid={worksheet.id}")


#### c. Query News API

You can update the parameters to have advanced queries based on your need. Check [the official documentation](https://newsapi.org/docs/client-libraries/python) to know how it works.

In [None]:
# Function to query NewsAPI for a given keyword
def query_news_api(keyword, starting_date, ending_date, pages):
    all_articles = newsapi.get_everything(q=keyword,
                                          sources=None,
                                          domains=None,
                                          from_param= starting_date,
                                          to= ending_date,
                                          language='en',
                                          sort_by='relevancy',
                                          page= pages)
    return all_articles

#### d. Process multiple keywords

In [None]:
def process_keywords(keywords, starting_date, ending_date, pages, sheet_id):
    for keyword in keywords:
        print(f"Processing keyword: {keyword}")
        # Query the News API for the current keyword
        all_articles = query_news_api(keyword, starting_date, ending_date, pages)

        # Use the keyword directly as the sheet name
        sheet_name = keyword  # Assuming the tab in the Google Sheet has the exact same name as the keyword

        # Export the articles to the specific sheet/tab named after the keyword
        export_to_google_sheet(all_articles, sheet_id, sheet_name)
        print(f"Finished processing for keyword: {keyword}")

#### e.Run Main Function

In [None]:
if __name__ == "__main__":
    keywords = ["Keyword A","Keyword B", "Keyword C", "Keyword D", "Keyword E"]
    starting_date = userdata.get('starting_date')
    ending_date = userdata.get('ending_date')
    pages = 5
    sheet_id = userdata.get('sheet_id')

    process_keywords(keywords, starting_date, ending_date, pages, sheet_id)