# 📰 Investopedia

## 📌 Instructions

1. Enter your **search term** by changing the `q` parameter inside `params` (e.g., `"economy"`, `"sports"`, `"technolody"`).  
2. Define the **offset values** to control pagination:  
   - Each page shows 24 results.  
   - Example: `[0, 24, 48]` will scrape the first 3 pages.  
   - Increase the list (e.g., `[0, 24, 48, 72, 96]`) to scrape more pages.  
3. The script retrieves:  
   - Title  
   - Date 
   - Link  
   - Full article content  
4. The results are stored in a **pandas DataFrame** and can be exported to CSV:

```python
inv_df.to_csv("data_investopedia_df.csv", index=False)

In [None]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time

base_url = "https://www.investopedia.com/search"
headers = {"User-Agent": "Mozilla/5.0"}

all_articles = []

offsets = [0, 24, 48]

for offset in offsets:
    params = {
        "q": "technology",  # change to "economy", "sports", "technology", etc.
        "offset": offset
    }

    response = requests.get(base_url, headers=headers, params=params)
    soup = BeautifulSoup(response.text, "html.parser")

    article_cards = soup.find_all("a", class_="mntl-card-list-card--extendable")

    if not article_cards:
        print(f"No articles found for offset {offset}.")
    else:
        for card in article_cards:
            link = card.get("href")
            title_tag = card.find("span", class_="card__title-text")
            title = title_tag.text.strip() if title_tag else "No Title"
           
            all_articles.append({
                "Title": title,
                "Link": link
            })

    time.sleep(1)  

final_data = []

for article in all_articles:
    try:
        article_response = requests.get(article["Link"], headers=headers)
        soup = BeautifulSoup(article_response.content, "html.parser")

        # Extract publication date
        date_div = soup.find("div", class_="mntl-attribution__item-date")
        date = None
        if date_div and "Published" in date_div.text:
            date = date_div.get_text(strip=True).replace("Published ", "")
        if not date:
            # Skip if no date found
            continue

        # Extract article content
        content = ""
        article_tag = soup.find("article", id="article--sc_1-0")
        if article_tag:
            paragraphs = article_tag.find_all(["p", "h2"])
            content = "\n".join(p.get_text(strip=True) for p in paragraphs)

        # Add the extracted details to the final data list
        final_data.append({
            "Title": article["Title"],
            "Date": date,
            "Link": article["Link"],
            "Content": content
        })

        time.sleep(1)  
    except Exception as e:
        print(f"Failed to scrape {article['Link']}: {e}")
        continue

inv_df = pd.DataFrame(final_data)

search_term = params["q"]
print(f"\nInvestopedia Articles (Keyword: '{search_term}'):")

inv_df["Date"] = pd.to_datetime(inv_df["Date"], errors="coerce")
inv_df.head()


Investopedia Articles (Keyword: 'technology'):


Unnamed: 0,Title,Date,Link,Content
0,Micron Technology Stock Jumps on Price Target ...,2025-09-11,https://www.investopedia.com/micron-technology...,Bill McColl has 25+ years of experience as a s...
1,"Top Stock Movers Now: Centene, Micron Technolo...",2025-09-11,https://www.investopedia.com/top-stock-movers-...,Bill McColl has 25+ years of experience as a s...
2,Watch These Marvell Technology Price Levels as...,2025-08-29,https://www.investopedia.com/watch-these-marve...,Marvell Technology (MRVL) shares plunged Frida...
3,From Smart Homes to Health Apps: How Technolog...,2025-07-25,https://www.investopedia.com/how-technology-is...,Halfpoint Images / Getty Images\nRetirement li...
4,S&P 500 Gains and Losses Today: Align Technolo...,2025-07-31,https://www.investopedia.com/s-and-p-500-gains...,Michael Bromberg is a finance editor with a de...


In [None]:
inv_df.to_csv("data_investopedia_df.csv", index=False)