## Documents Extraction and Processing

In [1]:
%load_ext autoreload
%autoreload 2

Lets first obtain the file that maps the tickers with the CIKs

In [1]:
import requests
import json
from pathlib import Path
import os

# --- Configuration (from previous step) ---
HEADERS = {
    "User-Agent": "EdgarTutorial/1.0 (YourName your.email@domain.com)" 
}
TICKER_CIK_URL = "https://www.sec.gov/files/company_tickers.json"
OUTPUT_FILE = Path("sec_data/company_tickers.json")

# Ensure directory exists
OUTPUT_FILE.parent.mkdir(parents=True, exist_ok=True)
# ----------------------------------------

# 1. Download the JSON data
print("Downloading CIK-Ticker map...")
response = requests.get(TICKER_CIK_URL, headers=HEADERS, timeout=15)
response.raise_for_status()
raw_data = response.json() # Load into Python dictionary

# 2. Open the file and use json.dump() with indent=4
print(f"Saving JSON in readable format to {OUTPUT_FILE.absolute()}...")

# Use 'w' mode to write the file
with open(OUTPUT_FILE, 'w') as f:
    # Key Fix: The 'indent=4' parameter tells the JSON module to format the output 
    # with 4 spaces for each level of nesting, adding line breaks automatically.
    json.dump(raw_data, f, indent=4) 

print("✅ JSON saved successfully with proper line breaks and indentation.")

# --- Optional: Print a Snippet to Console (Also Pretty-Printed) ---
# If you want to print to the console instead of a file, use json.dumps()
print("\n--- Console Snippet (Pretty-Printed) ---")
# Print the first 3 key-value pairs from the dictionary
keys = list(raw_data.keys())
snippet = {k: raw_data[k] for k in keys[:3]}

# Use json.dumps() with indent=2 to format the string output
pretty_string = json.dumps(snippet, indent=2)
print(pretty_string)

Downloading CIK-Ticker map...
Saving JSON in readable format to /home/alvar/CascadeProjects/windsurf-project/RAG/notebooks/sec_data/company_tickers.json...
✅ JSON saved successfully with proper line breaks and indentation.

--- Console Snippet (Pretty-Printed) ---
{
  "0": {
    "cik_str": 1045810,
    "ticker": "NVDA",
    "title": "NVIDIA CORP"
  },
  "1": {
    "cik_str": 320193,
    "ticker": "AAPL",
    "title": "Apple Inc."
  },
  "2": {
    "cik_str": 789019,
    "ticker": "MSFT",
    "title": "MICROSOFT CORP"
  }
}


### Vanguard Index Funds

In [18]:
import pandas as pd
from io import StringIO
from edgar import Company, set_identity 
import sys
from pathlib import Path

RAG_DIR = Path("/home/alvar/CascadeProjects/windsurf-project/RAG")
if str(RAG_DIR) not in sys.path:
    sys.path.insert(0, str(RAG_DIR))

from src.simple_rag.extraction.parser import BlackRockFiling


set_identity("luis.alvarez.conde@alumnos.upm.es")

ticker = "VOO"
fund = Company(ticker)
all_filings = fund.get_filings(form="N-CSR")


if all_filings:
    
    latest_date_str = max(f.report_date for f in all_filings)
    
    target_year = latest_date_str[:4]
    
    # 3. Filter: Keep ALL filings where the report_date starts with that year
    # This captures the March, June, and December reports for that fiscal year
    latest_filings = [
        f for f in all_filings 
        if f.report_date and f.report_date.startswith(target_year)
    ]
    print("Found filings: ", len(latest_filings), "for year: ", target_year)


funds_total = []
performance_funds = []
df_performance = []
for filing in latest_filings:

    print("Processing filing: ", filing.report_date)
    html_content = filing.html()
    
    parser = BlackRockFiling(html_content)
    funds = parser.get_funds()
    count = 0
    for fund in funds:
        if fund.performance_table is not None:
            performance_funds.append(fund.ticker)
            count += 1

    df_performance.append(parser.get_financial_highlights())

    print(count)
    print("Adding funds: ", len(funds))
    
    funds_total.extend(funds)

print(len(performance_funds))
print(performance_funds)
print(len(df_performance))


stamina.retry_scheduled
stamina.retry_scheduled
stamina.retry_scheduled


Found filings:  2 for year:  2024
Processing filing:  2024-12-31
Processing: Vanguard Extended Market Index Fund
Extracting context:  FY2024_C000007779Member
Tag not found:  dei:SecurityExchangeName FY2024_C000007779Member
Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Processing: Vanguard Extended Market Index Fund
Extracting context:  FY2024_C000007782Member
Tag not found:  dei:SecurityExchangeName FY2024_C000007782Member
Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Processing: Vanguard Extended Market Index Fund
Extracting context:  FY2024_C000007780Member
Tag not found:  dei:SecurityExchangeName FY2024_C000007780Member
Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Processing: Vanguard Extended Market Index Fund
Extracting context:  FY2024_C000007781Member
Ta

In [8]:
parser.print_fund_info(funds_total)

Showing information of 52 funds


### 🏦 Vanguard Extended Market Index Fund

🆔 Context ID:      FY2024_C000007779Member
🎫 Ticker:          VEXMX
🏷️ Share Class:     Investor Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 111,156
Expense Ratio       : 0.19
Turnover Rate       : 11
Costs per $10k      : 21
Advisory Fees       : 1,799
Number of Holdings  : 3,485

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed roughly in line with its benchmark index.   U.S. economic growth hovered around 3% on a year-over-year basis for much of the period,..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Investor Shares,16.76%,9.75%,9.31%
3,S&P Completion Index,16.88%,9.77%,9.33%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,
1,Communication Services,4.3%
2,Consumer Discretionary,12.0%
3,Consumer Staples,3.0%
4,Energy,4.1%
5,Financials,18.0%
6,Health Care,11.4%
7,Industrials,17.4%
8,Information Technology,17.9%
9,Materials,4.7%






### 🏦 Vanguard Extended Market Index Fund

🆔 Context ID:      FY2024_C000007782Member
🎫 Ticker:          VXF
🏷️ Share Class:     ETF Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 111,156
Expense Ratio       : 0.06
Turnover Rate       : 11
Costs per $10k      : 7
Advisory Fees       : 1,799
Number of Holdings  : 3,485

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed roughly in line with its benchmark index.   U.S. economic growth hovered around 3% on a year-over-year basis for much of the period,..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,ETF Shares Net Asset Value,16.90%,9.89%,9.45%
3,ETF Shares Market Price,16.89%,9.90%,9.46%
4,S&P Completion Index,16.88%,9.77%,9.33%
5,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,
1,Communication Services,4.3%
2,Consumer Discretionary,12.0%
3,Consumer Staples,3.0%
4,Energy,4.1%
5,Financials,18.0%
6,Health Care,11.4%
7,Industrials,17.4%
8,Information Technology,17.9%
9,Materials,4.7%






### 🏦 Vanguard Extended Market Index Fund

🆔 Context ID:      FY2024_C000007780Member
🎫 Ticker:          VEXAX
🏷️ Share Class:     Admiral Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 111,156
Expense Ratio       : 0.06
Turnover Rate       : 11
Costs per $10k      : 7
Advisory Fees       : 1,799
Number of Holdings  : 3,485

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed roughly in line with its benchmark index.   U.S. economic growth hovered around 3% on a year-over-year basis for much of the period,..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Admiral Shares,16.91%,9.89%,9.45%
3,S&P Completion Index,16.88%,9.77%,9.33%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,
1,Communication Services,4.3%
2,Consumer Discretionary,12.0%
3,Consumer Staples,3.0%
4,Energy,4.1%
5,Financials,18.0%
6,Health Care,11.4%
7,Industrials,17.4%
8,Information Technology,17.9%
9,Materials,4.7%






### 🏦 Vanguard Extended Market Index Fund

🆔 Context ID:      FY2024_C000007781Member
🎫 Ticker:          VIEIX
🏷️ Share Class:     Institutional Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 111,156
Expense Ratio       : 0.05
Turnover Rate       : 11
Costs per $10k      : 5
Advisory Fees       : 1,799
Number of Holdings  : 3,485

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed roughly in line with its benchmark index.   U.S. economic growth hovered around 3% on a year-over-year basis for much of the period,..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Institutional Shares,16.91%,9.90%,9.47%
3,S&P Completion Index,16.88%,9.77%,9.33%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,
1,Communication Services,4.3%
2,Consumer Discretionary,12.0%
3,Consumer Staples,3.0%
4,Energy,4.1%
5,Financials,18.0%
6,Health Care,11.4%
7,Industrials,17.4%
8,Information Technology,17.9%
9,Materials,4.7%






### 🏦 Vanguard Extended Market Index Fund

🆔 Context ID:      FY2024_C000096110Member
🎫 Ticker:          VEMPX
🏷️ Share Class:     Institutional Plus Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 111,156
Expense Ratio       : 0.04
Turnover Rate       : 11
Costs per $10k      : 4
Advisory Fees       : 1,799
Number of Holdings  : 3,485

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed roughly in line with its benchmark index.   U.S. economic growth hovered around 3% on a year-over-year basis for much of the period,..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Institutional Plus Shares,16.94%,9.91%,9.48%
3,S&P Completion Index,16.88%,9.77%,9.33%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,
1,Communication Services,4.3%
2,Consumer Discretionary,12.0%
3,Consumer Staples,3.0%
4,Energy,4.1%
5,Financials,18.0%
6,Health Care,11.4%
7,Industrials,17.4%
8,Information Technology,17.9%
9,Materials,4.7%






### 🏦 Vanguard Extended Market Index Fund

🆔 Context ID:      FY2024_C000170275Member
🎫 Ticker:          VSEMX
🏷️ Share Class:     Institutional Select Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 111,156
Expense Ratio       : 0.02
Turnover Rate       : 11
Costs per $10k      : 2
Advisory Fees       : 1,799
Number of Holdings  : 3,485

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed roughly in line with its benchmark index.   U.S. economic growth hovered around 3% on a year-over-year basis for much of the period,..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,Since Inception (6/27/2016)
2,Institutional Select Shares,16.96%,9.94%,12.09%
3,S&P Completion Index,16.88%,9.77%,11.91%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,14.95%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,
1,Communication Services,4.3%
2,Consumer Discretionary,12.0%
3,Consumer Staples,3.0%
4,Energy,4.1%
5,Financials,18.0%
6,Health Care,11.4%
7,Industrials,17.4%
8,Information Technology,17.9%
9,Materials,4.7%






### 🏦 Vanguard Mid-Cap Index Fund

🆔 Context ID:      FY2024_C000007791Member
🎫 Ticker:          VIMSX
🏷️ Share Class:     Investor Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 176,987
Expense Ratio       : 0.17
Turnover Rate       : 16
Costs per $10k      : 18
Advisory Fees       : 2,958
Number of Holdings  : 327

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed roughly in line with its benchmark, the CRSP US Mid Cap Index.   U.S. economic growth hovered around 3% on a year-over-year basis fo..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Investor Shares,15.09%,9.72%,9.42%
3,CRSP US Mid Cap Index,15.25%,9.86%,9.57%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,2.6%
2,Consumer Discretionary,13.1%
3,Consumer Staples,5.9%
4,Energy,5.5%
5,Financials,13.5%
6,Health Care,8.6%
7,Industrials,19.9%
8,Real Estate,7.6%
9,Technology,13.9%






### 🏦 Vanguard Mid-Cap Index Fund

🆔 Context ID:      FY2024_C000007794Member
🎫 Ticker:          VO
🏷️ Share Class:     ETF Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 176,987
Expense Ratio       : 0.04
Turnover Rate       : 16
Costs per $10k      : 4
Advisory Fees       : 2,958
Number of Holdings  : 327

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed roughly in line with its benchmark, the CRSP US Mid Cap Index.   U.S. economic growth hovered around 3% on a year-over-year basis fo..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,ETF Shares Net Asset Value,15.23%,9.85%,9.56%
3,ETF Shares Market Price,15.28%,9.87%,9.56%
4,CRSP US Mid Cap Index,15.25%,9.86%,9.57%
5,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,2.6%
2,Consumer Discretionary,13.1%
3,Consumer Staples,5.9%
4,Energy,5.5%
5,Financials,13.5%
6,Health Care,8.6%
7,Industrials,19.9%
8,Real Estate,7.6%
9,Technology,13.9%






### 🏦 Vanguard Mid-Cap Index Fund

🆔 Context ID:      FY2024_C000007792Member
🎫 Ticker:          VIMAX
🏷️ Share Class:     Admiral Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 176,987
Expense Ratio       : 0.05
Turnover Rate       : 16
Costs per $10k      : 5
Advisory Fees       : 2,958
Number of Holdings  : 327

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed roughly in line with its benchmark, the CRSP US Mid Cap Index.   U.S. economic growth hovered around 3% on a year-over-year basis fo..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Admiral Shares,15.22%,9.85%,9.55%
3,CRSP US Mid Cap Index,15.25%,9.86%,9.57%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,2.6%
2,Consumer Discretionary,13.1%
3,Consumer Staples,5.9%
4,Energy,5.5%
5,Financials,13.5%
6,Health Care,8.6%
7,Industrials,19.9%
8,Real Estate,7.6%
9,Technology,13.9%






### 🏦 Vanguard Mid-Cap Index Fund

🆔 Context ID:      FY2024_C000007793Member
🎫 Ticker:          VMCIX
🏷️ Share Class:     Institutional Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 176,987
Expense Ratio       : 0.04
Turnover Rate       : 16
Costs per $10k      : 4
Advisory Fees       : 2,958
Number of Holdings  : 327

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed roughly in line with its benchmark, the CRSP US Mid Cap Index.   U.S. economic growth hovered around 3% on a year-over-year basis fo..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Institutional Shares,15.23%,9.86%,9.56%
3,CRSP US Mid Cap Index,15.25%,9.86%,9.57%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,2.6%
2,Consumer Discretionary,13.1%
3,Consumer Staples,5.9%
4,Energy,5.5%
5,Financials,13.5%
6,Health Care,8.6%
7,Industrials,19.9%
8,Real Estate,7.6%
9,Technology,13.9%






### 🏦 Vanguard Mid-Cap Index Fund

🆔 Context ID:      FY2024_C000096111Member
🎫 Ticker:          VMCPX
🏷️ Share Class:     Institutional Plus Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 176,987
Expense Ratio       : 0.03
Turnover Rate       : 16
Costs per $10k      : 3
Advisory Fees       : 2,958
Number of Holdings  : 327

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed roughly in line with its benchmark, the CRSP US Mid Cap Index.   U.S. economic growth hovered around 3% on a year-over-year basis fo..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Institutional Plus Shares,15.25%,9.87%,9.57%
3,CRSP US Mid Cap Index,15.25%,9.86%,9.57%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,2.6%
2,Consumer Discretionary,13.1%
3,Consumer Staples,5.9%
4,Energy,5.5%
5,Financials,13.5%
6,Health Care,8.6%
7,Industrials,19.9%
8,Real Estate,7.6%
9,Technology,13.9%






### 🏦 Vanguard Mid-Cap Growth Index Fund

🆔 Context ID:      FY2024_C000034427Member
🎫 Ticker:          VMGIX
🏷️ Share Class:     Investor Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 27,704
Expense Ratio       : 0.19
Turnover Rate       : 21
Costs per $10k      : 21
Advisory Fees       : 456
Number of Holdings  : 143

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed roughly in line with its benchmark, the CRSP US Mid Cap Growth Index.   U.S. economic growth hovered around 3% on a year-over-year b..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Investor Shares,16.27%,10.44%,10.26%
3,CRSP US Mid Cap Growth Index,16.48%,10.62%,10.45%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,1.1%
2,Consumer Discretionary,15.8%
3,Consumer Staples,1.2%
4,Energy,5.4%
5,Financials,8.5%
6,Health Care,11.7%
7,Industrials,21.3%
8,Real Estate,6.8%
9,Technology,21.7%






### 🏦 Vanguard Mid-Cap Growth Index Fund

🆔 Context ID:      FY2024_C000034428Member
🎫 Ticker:          VOT
🏷️ Share Class:     ETF Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 27,704
Expense Ratio       : 0.07
Turnover Rate       : 21
Costs per $10k      : 8
Advisory Fees       : 456
Number of Holdings  : 143

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed roughly in line with its benchmark, the CRSP US Mid Cap Growth Index.   U.S. economic growth hovered around 3% on a year-over-year b..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,ETF Shares Net Asset Value,16.41%,10.57%,10.40%
3,ETF Shares Market Price,16.30%,10.56%,10.39%
4,CRSP US Mid Cap Growth Index,16.48%,10.62%,10.45%
5,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,1.1%
2,Consumer Discretionary,15.8%
3,Consumer Staples,1.2%
4,Energy,5.4%
5,Financials,8.5%
6,Health Care,11.7%
7,Industrials,21.3%
8,Real Estate,6.8%
9,Technology,21.7%






### 🏦 Vanguard Mid-Cap Growth Index Fund

🆔 Context ID:      FY2024_C000105306Member
🎫 Ticker:          VMGMX
🏷️ Share Class:     Admiral Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 27,704
Expense Ratio       : 0.07
Turnover Rate       : 21
Costs per $10k      : 8
Advisory Fees       : 456
Number of Holdings  : 143

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed roughly in line with its benchmark, the CRSP US Mid Cap Growth Index.   U.S. economic growth hovered around 3% on a year-over-year b..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Admiral Shares,16.41%,10.57%,10.40%
3,CRSP US Mid Cap Growth Index,16.48%,10.62%,10.45%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,1.1%
2,Consumer Discretionary,15.8%
3,Consumer Staples,1.2%
4,Energy,5.4%
5,Financials,8.5%
6,Health Care,11.7%
7,Industrials,21.3%
8,Real Estate,6.8%
9,Technology,21.7%






### 🏦 Vanguard Mid-Cap Value Index Fund

🆔 Context ID:      FY2024_C000034429Member
🎫 Ticker:          VMVIX
🏷️ Share Class:     Investor Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 30,104
Expense Ratio       : 0.19
Turnover Rate       : 19
Costs per $10k      : 20
Advisory Fees       : 532
Number of Holdings  : 195

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed roughly in line with its benchmark, the CRSP US Mid Cap Value Index.   U.S. economic growth hovered around 3% on a year-over-year ba..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Investor Shares,13.89%,8.63%,8.37%
3,CRSP US Mid Cap Value Index,14.05%,8.79%,8.53%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,3.9%
2,Consumer Discretionary,10.8%
3,Consumer Staples,9.9%
4,Energy,5.6%
5,Financials,17.7%
6,Health Care,6.0%
7,Industrials,18.8%
8,Real Estate,8.4%
9,Technology,7.4%






### 🏦 Vanguard Mid-Cap Value Index Fund

🆔 Context ID:      FY2024_C000034430Member
🎫 Ticker:          VOE
🏷️ Share Class:     ETF Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 30,104
Expense Ratio       : 0.07
Turnover Rate       : 19
Costs per $10k      : 7
Advisory Fees       : 532
Number of Holdings  : 195

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed roughly in line with its benchmark, the CRSP US Mid Cap Value Index.   U.S. economic growth hovered around 3% on a year-over-year ba..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,ETF Shares Net Asset Value,14.03%,8.76%,8.49%
3,ETF Shares Market Price,14.00%,8.76%,8.49%
4,CRSP US Mid Cap Value Index,14.05%,8.79%,8.53%
5,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,3.9%
2,Consumer Discretionary,10.8%
3,Consumer Staples,9.9%
4,Energy,5.6%
5,Financials,17.7%
6,Health Care,6.0%
7,Industrials,18.8%
8,Real Estate,8.4%
9,Technology,7.4%






### 🏦 Vanguard Mid-Cap Value Index Fund

🆔 Context ID:      FY2024_C000105307Member
🎫 Ticker:          VMVAX
🏷️ Share Class:     Admiral Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 30,104
Expense Ratio       : 0.07
Turnover Rate       : 19
Costs per $10k      : 7
Advisory Fees       : 532
Number of Holdings  : 195

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed roughly in line with its benchmark, the CRSP US Mid Cap Value Index.   U.S. economic growth hovered around 3% on a year-over-year ba..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Admiral Shares,14.03%,8.76%,8.50%
3,CRSP US Mid Cap Value Index,14.05%,8.79%,8.53%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,3.9%
2,Consumer Discretionary,10.8%
3,Consumer Staples,9.9%
4,Energy,5.6%
5,Financials,17.7%
6,Health Care,6.0%
7,Industrials,18.8%
8,Real Estate,8.4%
9,Technology,7.4%






### 🏦 Vanguard Small-Cap Index Fund

🆔 Context ID:      FY2024_C000007795Member
🎫 Ticker:          NAESX
🏷️ Share Class:     Investor Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 155,233
Expense Ratio       : 0.17
Turnover Rate       : 13
Costs per $10k      : 18
Advisory Fees       : 2,566
Number of Holdings  : 1,377

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Small Cap Index.   U.S. economic growth hovered around 3% on a year-over-year basis for much..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Investor Shares,14.10%,9.17%,8.96%
3,CRSP US Small Cap Index,14.22%,9.26%,9.06%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,3.5%
2,Consumer Discretionary,16.0%
3,Consumer Staples,3.6%
4,Energy,4.5%
5,Financials,14.6%
6,Health Care,10.5%
7,Industrials,21.7%
8,Real Estate,7.1%
9,Technology,13.4%






### 🏦 Vanguard Small-Cap Index Fund

🆔 Context ID:      FY2024_C000007798Member
🎫 Ticker:          VB
🏷️ Share Class:     ETF Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 155,233
Expense Ratio       : 0.05
Turnover Rate       : 13
Costs per $10k      : 5
Advisory Fees       : 2,566
Number of Holdings  : 1,377

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Small Cap Index.   U.S. economic growth hovered around 3% on a year-over-year basis for much..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,ETF Shares Net Asset Value,14.23%,9.30%,9.09%
3,ETF Shares Market Price,14.13%,9.29%,9.09%
4,CRSP US Small Cap Index,14.22%,9.26%,9.06%
5,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,3.5%
2,Consumer Discretionary,16.0%
3,Consumer Staples,3.6%
4,Energy,4.5%
5,Financials,14.6%
6,Health Care,10.5%
7,Industrials,21.7%
8,Real Estate,7.1%
9,Technology,13.4%






### 🏦 Vanguard Small-Cap Index Fund

🆔 Context ID:      FY2024_C000007796Member
🎫 Ticker:          VSMAX
🏷️ Share Class:     Admiral Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 155,233
Expense Ratio       : 0.05
Turnover Rate       : 13
Costs per $10k      : 5
Advisory Fees       : 2,566
Number of Holdings  : 1,377

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Small Cap Index.   U.S. economic growth hovered around 3% on a year-over-year basis for much..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Admiral Shares,14.23%,9.30%,9.09%
3,CRSP US Small Cap Index,14.22%,9.26%,9.06%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,3.5%
2,Consumer Discretionary,16.0%
3,Consumer Staples,3.6%
4,Energy,4.5%
5,Financials,14.6%
6,Health Care,10.5%
7,Industrials,21.7%
8,Real Estate,7.1%
9,Technology,13.4%






### 🏦 Vanguard Small-Cap Index Fund

🆔 Context ID:      FY2024_C000007797Member
🎫 Ticker:          VSCIX
🏷️ Share Class:     Institutional Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 155,233
Expense Ratio       : 0.04
Turnover Rate       : 13
Costs per $10k      : 4
Advisory Fees       : 2,566
Number of Holdings  : 1,377

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Small Cap Index.   U.S. economic growth hovered around 3% on a year-over-year basis for much..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Institutional Shares,14.23%,9.31%,9.10%
3,CRSP US Small Cap Index,14.22%,9.26%,9.06%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,3.5%
2,Consumer Discretionary,16.0%
3,Consumer Staples,3.6%
4,Energy,4.5%
5,Financials,14.6%
6,Health Care,10.5%
7,Industrials,21.7%
8,Real Estate,7.1%
9,Technology,13.4%






### 🏦 Vanguard Small-Cap Index Fund

🆔 Context ID:      FY2024_C000096112Member
🎫 Ticker:          VSCPX
🏷️ Share Class:     Institutional Plus Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 155,233
Expense Ratio       : 0.03
Turnover Rate       : 13
Costs per $10k      : 3
Advisory Fees       : 2,566
Number of Holdings  : 1,377

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Small Cap Index.   U.S. economic growth hovered around 3% on a year-over-year basis for much..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Institutional Plus Shares,14.25%,9.32%,9.11%
3,CRSP US Small Cap Index,14.22%,9.26%,9.06%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,3.5%
2,Consumer Discretionary,16.0%
3,Consumer Staples,3.6%
4,Energy,4.5%
5,Financials,14.6%
6,Health Care,10.5%
7,Industrials,21.7%
8,Real Estate,7.1%
9,Technology,13.4%






### 🏦 Vanguard Small-Cap Growth Index Fund

🆔 Context ID:      FY2024_C000007799Member
🎫 Ticker:          VISGX
🏷️ Share Class:     Investor Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 38,107
Expense Ratio       : 0.19
Turnover Rate       : 21
Costs per $10k      : 21
Advisory Fees       : 628
Number of Holdings  : 596

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Small Cap Growth Index.   U.S. economic growth hovered around 3% on a year-over-year basis f..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Investor Shares,16.35%,7.56%,8.96%
3,CRSP US Small Cap Growth Index,16.48%,7.66%,9.05%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,1.7%
2,Consumer Discretionary,16.1%
3,Consumer Staples,3.3%
4,Energy,5.3%
5,Financials,5.8%
6,Health Care,16.2%
7,Industrials,20.3%
8,Real Estate,5.2%
9,Technology,22.7%






### 🏦 Vanguard Small-Cap Growth Index Fund

🆔 Context ID:      FY2024_C000007801Member
🎫 Ticker:          VBK
🏷️ Share Class:     ETF Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 38,107
Expense Ratio       : 0.07
Turnover Rate       : 21
Costs per $10k      : 8
Advisory Fees       : 628
Number of Holdings  : 596

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Small Cap Growth Index.   U.S. economic growth hovered around 3% on a year-over-year basis f..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,ETF Shares Net Asset Value,16.49%,7.69%,9.09%
3,ETF Shares Market Price,16.49%,7.70%,9.09%
4,CRSP US Small Cap Growth Index,16.48%,7.66%,9.05%
5,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,1.7%
2,Consumer Discretionary,16.1%
3,Consumer Staples,3.3%
4,Energy,5.3%
5,Financials,5.8%
6,Health Care,16.2%
7,Industrials,20.3%
8,Real Estate,5.2%
9,Technology,22.7%






### 🏦 Vanguard Small-Cap Growth Index Fund

🆔 Context ID:      FY2024_C000105304Member
🎫 Ticker:          VSGAX
🏷️ Share Class:     Admiral Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 38,107
Expense Ratio       : 0.07
Turnover Rate       : 21
Costs per $10k      : 8
Advisory Fees       : 628
Number of Holdings  : 596

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Small Cap Growth Index.   U.S. economic growth hovered around 3% on a year-over-year basis f..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Admiral Shares,16.49%,7.69%,9.09%
3,CRSP US Small Cap Growth Index,16.48%,7.66%,9.05%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,1.7%
2,Consumer Discretionary,16.1%
3,Consumer Staples,3.3%
4,Energy,5.3%
5,Financials,5.8%
6,Health Care,16.2%
7,Industrials,20.3%
8,Real Estate,5.2%
9,Technology,22.7%






### 🏦 Vanguard Small-Cap Growth Index Fund

🆔 Context ID:      FY2024_C000007800Member
🎫 Ticker:          VSGIX
🏷️ Share Class:     Institutional Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 38,107
Expense Ratio       : 0.06
Turnover Rate       : 21
Costs per $10k      : 6
Advisory Fees       : 628
Number of Holdings  : 596

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Small Cap Growth Index.   U.S. economic growth hovered around 3% on a year-over-year basis f..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Institutional Shares,16.50%,7.70%,9.10%
3,CRSP US Small Cap Growth Index,16.48%,7.66%,9.05%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,1.7%
2,Consumer Discretionary,16.1%
3,Consumer Staples,3.3%
4,Energy,5.3%
5,Financials,5.8%
6,Health Care,16.2%
7,Industrials,20.3%
8,Real Estate,5.2%
9,Technology,22.7%






### 🏦 Vanguard Small-Cap Value Index Fund

🆔 Context ID:      FY2024_C000007802Member
🎫 Ticker:          VISVX
🏷️ Share Class:     Investor Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 58,332
Expense Ratio       : 0.19
Turnover Rate       : 16
Costs per $10k      : 20
Advisory Fees       : 997
Number of Holdings  : 845

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Small Cap Value Index.   U.S. economic growth hovered around 3% on a year-over-year basis fo..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Investor Shares,12.25%,9.77%,8.54%
3,CRSP US Small Cap Value Index,12.42%,9.89%,8.67%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,4.9%
2,Consumer Discretionary,15.8%
3,Consumer Staples,3.9%
4,Energy,3.8%
5,Financials,21.3%
6,Health Care,6.1%
7,Industrials,22.8%
8,Real Estate,8.6%
9,Technology,6.2%






### 🏦 Vanguard Small-Cap Value Index Fund

🆔 Context ID:      FY2024_C000007804Member
🎫 Ticker:          VBR
🏷️ Share Class:     ETF Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 58,332
Expense Ratio       : 0.07
Turnover Rate       : 16
Costs per $10k      : 7
Advisory Fees       : 997
Number of Holdings  : 845

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Small Cap Value Index.   U.S. economic growth hovered around 3% on a year-over-year basis fo..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,ETF Shares Net Asset Value,12.39%,9.89%,8.67%
3,ETF Shares Market Price,12.30%,9.89%,8.67%
4,CRSP US Small Cap Value Index,12.42%,9.89%,8.67%
5,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,4.9%
2,Consumer Discretionary,15.8%
3,Consumer Staples,3.9%
4,Energy,3.8%
5,Financials,21.3%
6,Health Care,6.1%
7,Industrials,22.8%
8,Real Estate,8.6%
9,Technology,6.2%






### 🏦 Vanguard Small-Cap Value Index Fund

🆔 Context ID:      FY2024_C000105305Member
🎫 Ticker:          VSIAX
🏷️ Share Class:     Admiral Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 58,332
Expense Ratio       : 0.07
Turnover Rate       : 16
Costs per $10k      : 7
Advisory Fees       : 997
Number of Holdings  : 845

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Small Cap Value Index.   U.S. economic growth hovered around 3% on a year-over-year basis fo..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Admiral Shares,12.39%,9.90%,8.67%
3,CRSP US Small Cap Value Index,12.42%,9.89%,8.67%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,4.9%
2,Consumer Discretionary,15.8%
3,Consumer Staples,3.9%
4,Energy,3.8%
5,Financials,21.3%
6,Health Care,6.1%
7,Industrials,22.8%
8,Real Estate,8.6%
9,Technology,6.2%






### 🏦 Vanguard Small-Cap Value Index Fund

🆔 Context ID:      FY2024_C000007803Member
🎫 Ticker:          VSIIX
🏷️ Share Class:     Institutional Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 58,332
Expense Ratio       : 0.06
Turnover Rate       : 16
Costs per $10k      : 6
Advisory Fees       : 997
Number of Holdings  : 845

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Small Cap Value Index.   U.S. economic growth hovered around 3% on a year-over-year basis fo..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Institutional Shares,12.41%,9.91%,8.68%
3,CRSP US Small Cap Value Index,12.42%,9.89%,8.67%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of ...,Portfolio Composition % of Net Assets (as of ...
1,Basic Materials,4.9%
2,Consumer Discretionary,15.8%
3,Consumer Staples,3.9%
4,Energy,3.8%
5,Financials,21.3%
6,Health Care,6.1%
7,Industrials,22.8%
8,Real Estate,8.6%
9,Technology,6.2%






### 🏦 Vanguard Total Stock Market Index Fund

🆔 Context ID:      FY2024_C000007805Member
🎫 Ticker:          VTSMX
🏷️ Share Class:     Investor Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 1,777,963
Expense Ratio       : 0.14
Turnover Rate       : 2
Costs per $10k      : 16
Advisory Fees       : 33,526
Number of Holdings  : 3,624

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Total Market Index.   U.S. economic growth hovered around 3% on a year-over-year basis for m..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Investor Shares,23.61%,13.69%,12.38%
3,CRSP US Total Market Index,23.77%,13.81%,12.50%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of D...,Portfolio Composition % of Net Assets (as of D...
1,Basic Materials,1.4%
2,Consumer Discretionary,15.1%
3,Consumer Staples,3.9%
4,Energy,3.4%
5,Financials,11.3%
6,Health Care,10.0%
7,Industrials,12.5%
8,Real Estate,2.6%
9,Technology,35.0%






### 🏦 Vanguard Total Stock Market Index Fund

🆔 Context ID:      FY2024_C000007808Member
🎫 Ticker:          VTI
🏷️ Share Class:     ETF Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 1,777,963
Expense Ratio       : 0.03
Turnover Rate       : 2
Costs per $10k      : 3
Advisory Fees       : 33,526
Number of Holdings  : 3,624

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Total Market Index.   U.S. economic growth hovered around 3% on a year-over-year basis for m..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,ETF Shares Net Asset Value,23.75%,13.80%,12.50%
3,ETF Shares Market Price,23.71%,13.81%,12.50%
4,CRSP US Total Market Index,23.77%,13.81%,12.50%
5,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of D...,Portfolio Composition % of Net Assets (as of D...
1,Basic Materials,1.4%
2,Consumer Discretionary,15.1%
3,Consumer Staples,3.9%
4,Energy,3.4%
5,Financials,11.3%
6,Health Care,10.0%
7,Industrials,12.5%
8,Real Estate,2.6%
9,Technology,35.0%






### 🏦 Vanguard Total Stock Market Index Fund

🆔 Context ID:      FY2024_C000007806Member
🎫 Ticker:          VTSAX
🏷️ Share Class:     Admiral Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 1,777,963
Expense Ratio       : 0.04
Turnover Rate       : 2
Costs per $10k      : 4
Advisory Fees       : 33,526
Number of Holdings  : 3,624

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Total Market Index.   U.S. economic growth hovered around 3% on a year-over-year basis for m..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Admiral Shares,23.74%,13.80%,12.49%
3,CRSP US Total Market Index,23.77%,13.81%,12.50%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of D...,Portfolio Composition % of Net Assets (as of D...
1,Basic Materials,1.4%
2,Consumer Discretionary,15.1%
3,Consumer Staples,3.9%
4,Energy,3.4%
5,Financials,11.3%
6,Health Care,10.0%
7,Industrials,12.5%
8,Real Estate,2.6%
9,Technology,35.0%






### 🏦 Vanguard Total Stock Market Index Fund

🆔 Context ID:      FY2024_C000007807Member
🎫 Ticker:          VITSX
🏷️ Share Class:     Institutional Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 1,777,963
Expense Ratio       : 0.03
Turnover Rate       : 2
Costs per $10k      : 3
Advisory Fees       : 33,526
Number of Holdings  : 3,624

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Total Market Index.   U.S. economic growth hovered around 3% on a year-over-year basis for m..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,10 Years
2,Institutional Shares,23.75%,13.81%,12.50%
3,CRSP US Total Market Index,23.77%,13.81%,12.50%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of D...,Portfolio Composition % of Net Assets (as of D...
1,Basic Materials,1.4%
2,Consumer Discretionary,15.1%
3,Consumer Staples,3.9%
4,Energy,3.4%
5,Financials,11.3%
6,Health Care,10.0%
7,Industrials,12.5%
8,Real Estate,2.6%
9,Technology,35.0%






### 🏦 Vanguard Total Stock Market Index Fund

🆔 Context ID:      FY2024_C000155407Member
🎫 Ticker:          VSMPX
🏷️ Share Class:     Institutional Plus Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 1,777,963
Expense Ratio       : 0.02
Turnover Rate       : 2
Costs per $10k      : 2
Advisory Fees       : 33,526
Number of Holdings  : 3,624

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Total Market Index.   U.S. economic growth hovered around 3% on a year-over-year basis for m..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,Since Inception (4/28/2015)
2,Institutional Plus Shares,23.76%,13.82%,12.52%
3,CRSP US Total Market Index,23.77%,13.81%,12.51%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of D...,Portfolio Composition % of Net Assets (as of D...
1,Basic Materials,1.4%
2,Consumer Discretionary,15.1%
3,Consumer Staples,3.9%
4,Energy,3.4%
5,Financials,11.3%
6,Health Care,10.0%
7,Industrials,12.5%
8,Real Estate,2.6%
9,Technology,35.0%






### 🏦 Vanguard Total Stock Market Index Fund

🆔 Context ID:      FY2024_C000170276Member
🎫 Ticker:          VSTSX
🏷️ Share Class:     Institutional Select Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 1,777,963
Expense Ratio       : 0.01
Turnover Rate       : 2
Costs per $10k      : 1
Advisory Fees       : 33,526
Number of Holdings  : 3,624

📝 Commentary: "How did the Fund perform during the reporting period?   For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Total Market Index.   U.S. economic growth hovered around 3% on a year-over-year basis for m..."


**📊 Average Annual Returns**

Unnamed: 0,0,1,2,3
0,Average Annual Total Returns,,,
1,,1 Year,5 Years,Since Inception (6/27/2016)
2,Institutional Select Shares,23.78%,13.83%,15.00%
3,CRSP US Total Market Index,23.77%,13.81%,14.98%
4,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,14.95%


**🏗️ Sector Allocation**

Unnamed: 0,0,1
0,Portfolio Composition % of Net Assets (as of D...,Portfolio Composition % of Net Assets (as of D...
1,Basic Materials,1.4%
2,Consumer Discretionary,15.1%
3,Consumer Staples,3.9%
4,Energy,3.4%
5,Financials,11.3%
6,Health Care,10.0%
7,Industrials,12.5%
8,Real Estate,2.6%
9,Technology,35.0%






### 🏦 500 Index Fund

🆔 Context ID:      From2024-01-01to2024-12-31_C000007773Member
🎫 Ticker:          VFINX
🏷️ Share Class:     Investor Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 1,350,332
Expense Ratio       : 0.14
Turnover Rate       : 2
Costs per $10k      : 16
Advisory Fees       : 20,816
Number of Holdings  : 516

📝 Commentary: "How did the Fund perform during the reporting period? For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the Standard & Poor's 500 Index.U.S. economic growth hovered around 3% on a year-over-year basis for much ..."


**📈 Performance History**

Unnamed: 0.1,Unnamed: 0,Investor Shares,S&P 500 Index,Dow Jones U.S. Total Stock Market Float Adjusted Index
0,2014,"$10,000","$10,000","$10,000"
1,2015,"$10,091","$10,095","$10,180"
2,2015,"$10,117","$10,123","$10,192"
3,2015,"$9,462","$9,471","$9,451"
4,2015,"$10,125","$10,138","$10,044"
5,2016,"$10,258","$10,275","$10,136"
6,2016,"$10,506","$10,527","$10,401"
7,2016,"$10,907","$10,933","$10,862"
8,2016,"$11,321","$11,351","$11,312"
9,2017,"$12,004","$12,039","$11,967"


**📊 Average Annual Returns**

Unnamed: 0.1,Unnamed: 0,1 Year,5 Years,10 Years
0,Investor Shares,24.84%,14.37%,12.95%
1,S&P 500 Index,25.02%,14.53%,13.10%
2,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1,2
0,Communication Services,9.4%,
1,Consumer Discretionary,11.2%,
2,Consumer Staples,5.5%,
3,Energy,3.2%,
4,Financials,13.6%,
5,Health Care,10.1%,
6,Industrials,8.1%,
7,Information Technology,32.4%,
8,Materials,1.9%,
9,Real Estate,2.1%,






### 🏦 500 Index Fund

🆔 Context ID:      From2024-01-01to2024-12-31_C000092055Member
🎫 Ticker:          VOO
🏷️ Share Class:     ETF Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 1,350,332
Expense Ratio       : 0.03
Turnover Rate       : 2
Costs per $10k      : 3
Advisory Fees       : 20,816
Number of Holdings  : 516

📝 Commentary: "How did the Fund perform during the reporting period? For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the Standard & Poor's 500 Index.U.S. economic growth hovered around 3% on a year-over-year basis for much ..."


**📈 Performance History**

Unnamed: 0.1,Unnamed: 0,ETF Shares Net Asset Value,S&P 500 Index,Dow Jones U.S. Total Stock Market Float Adjusted Index
0,2014,"$10,000","$10,000","$10,000"
1,2015,"$10,094","$10,095","$10,180"
2,2015,"$10,123","$10,123","$10,192"
3,2015,"$9,469","$9,471","$9,451"
4,2015,"$10,135","$10,138","$10,044"
5,2016,"$10,271","$10,275","$10,136"
6,2016,"$10,522","$10,527","$10,401"
7,2016,"$10,927","$10,933","$10,862"
8,2016,"$11,345","$11,351","$11,312"
9,2017,"$12,031","$12,039","$11,967"


**📊 Average Annual Returns**

Unnamed: 0.1,Unnamed: 0,1 Year,5 Years,10 Years
0,ETF Shares Net Asset Value,24.98%,14.48%,13.06%
1,ETF Shares Market Price,24.94%,14.49%,13.06%
2,S&P 500 Index,25.02%,14.53%,13.10%
3,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1,2
0,Communication Services,9.4%,
1,Consumer Discretionary,11.2%,
2,Consumer Staples,5.5%,
3,Energy,3.2%,
4,Financials,13.6%,
5,Health Care,10.1%,
6,Industrials,8.1%,
7,Information Technology,32.4%,
8,Materials,1.9%,
9,Real Estate,2.1%,






### 🏦 500 Index Fund

🆔 Context ID:      From2024-01-01to2024-12-31_C000007774Member
🎫 Ticker:          VFIAX
🏷️ Share Class:     Admiral Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 1,350,332
Expense Ratio       : 0.04
Turnover Rate       : 2
Costs per $10k      : 4
Advisory Fees       : 20,816
Number of Holdings  : 516

📝 Commentary: "How did the Fund perform during the reporting period? For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the Standard & Poor's 500 Index.U.S. economic growth hovered around 3% on a year-over-year basis for much ..."


**📈 Performance History**

Unnamed: 0.1,Unnamed: 0,Admiral Shares,S&P 500 Index,Dow Jones U.S. Total Stock Market Float Adjusted Index
0,2014,"$10,000","$10,000","$10,000"
1,2015,"$10,094","$10,095","$10,180"
2,2015,"$10,123","$10,123","$10,192"
3,2015,"$9,470","$9,471","$9,451"
4,2015,"$10,136","$10,138","$10,044"
5,2016,"$10,272","$10,275","$10,136"
6,2016,"$10,523","$10,527","$10,401"
7,2016,"$10,928","$10,933","$10,862"
8,2016,"$11,345","$11,351","$11,312"
9,2017,"$12,032","$12,039","$11,967"


**📊 Average Annual Returns**

Unnamed: 0.1,Unnamed: 0,1 Year,5 Years,10 Years
0,Admiral Shares,24.97%,14.48%,13.06%
1,S&P 500 Index,25.02%,14.53%,13.10%
2,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1,2
0,Communication Services,9.4%,
1,Consumer Discretionary,11.2%,
2,Consumer Staples,5.5%,
3,Energy,3.2%,
4,Financials,13.6%,
5,Health Care,10.1%,
6,Industrials,8.1%,
7,Information Technology,32.4%,
8,Materials,1.9%,
9,Real Estate,2.1%,






### 🏦 500 Index Fund

🆔 Context ID:      From2024-01-01to2024-12-31_C000170274Member
🎫 Ticker:          VFFSX
🏷️ Share Class:     Institutional Select Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 1,350,332
Expense Ratio       : 0.01
Turnover Rate       : 2
Costs per $10k      : 1
Advisory Fees       : 20,816
Number of Holdings  : 516

📝 Commentary: "How did the Fund perform during the reporting period? For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the Standard & Poor's 500 Index.U.S. economic growth hovered around 3% on a year-over-year basis for much ..."


**📈 Performance History**

Unnamed: 0.1,Unnamed: 0,Institutional Select Share Class,S&P 500 Index,Dow Jones U.S. Total Stock Market Float Adjusted Index
0,6/24/16,"$5,000,000,000","$5,000,000,000","$5,000,000,000"
1,6/30/16,"$5,152,656,423","$5,152,577,879","$5,147,526,005"
2,9/30/16,"$5,351,187,741","$5,351,059,644","$5,375,816,703"
3,12/31/16,"$5,556,006,018","$5,555,696,904","$5,598,236,535"
4,3/31/17,"$5,892,894,070","$5,892,712,361","$5,922,524,381"
5,6/30/17,"$6,074,753,290","$6,074,688,823","$6,100,664,780"
6,9/30/17,"$6,347,363,071","$6,346,856,796","$6,379,289,163"
7,12/31/17,"$6,769,029,785","$6,768,592,456","$6,782,952,897"
8,3/31/18,"$6,717,278,809","$6,717,209,557","$6,741,940,992"
9,6/30/18,"$6,947,757,117","$6,947,870,304","$7,003,081,023"


**📊 Average Annual Returns**

Unnamed: 0.1,Unnamed: 0,1 Year,5 Years,Since Inception 6/24/16
0,Institutional Select Share Class,25.00%,14.52%,15.25%
1,S&P 500 Index,25.02%,14.53%,15.26%
2,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,14.66%


**🏗️ Sector Allocation**

Unnamed: 0,0,1,2
0,Communication Services,9.4%,
1,Consumer Discretionary,11.2%,
2,Consumer Staples,5.5%,
3,Energy,3.2%,
4,Financials,13.6%,
5,Health Care,10.1%,
6,Industrials,8.1%,
7,Information Technology,32.4%,
8,Materials,1.9%,
9,Real Estate,2.1%,






### 🏦 Value Index Fund

🆔 Context ID:      From2024-01-01to2024-12-31_C000007775Member
🎫 Ticker:          VIVAX
🏷️ Share Class:     Investor Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 183,569
Expense Ratio       : 0.17
Turnover Rate       : 9
Costs per $10k      : 18
Advisory Fees       : 3,184
Number of Holdings  : 348

📝 Commentary: "How did the Fund perform during the reporting period? For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Large Cap Value Index.U.S. economic growth hovered around 3% on a year-over-year basis for muc..."


**📈 Performance History**

Unnamed: 0.1,Unnamed: 0,Investor Shares,CRSP US Large Cap Value Index,Dow Jones U.S. Total Stock Market Float Adjusted Index
0,2014,"$10,000","$10,000","$10,000"
1,2015,"$9,940","$9,945","$10,180"
2,2015,"$9,984","$9,993","$10,192"
3,2015,"$9,257","$9,268","$9,451"
4,2015,"$9,897","$9,914","$10,044"
5,2016,"$10,055","$10,078","$10,136"
6,2016,"$10,434","$10,462","$10,401"
7,2016,"$10,748","$10,781","$10,862"
8,2016,"$11,554","$11,592","$11,312"
9,2017,"$11,926","$11,972","$11,967"


**📊 Average Annual Returns**

Unnamed: 0.1,Unnamed: 0,1 Year,5 Years,10 Years
0,Investor Shares,15.84%,9.80%,9.86%
1,CRSP US Large Cap Value Index,16.00%,9.93%,10.01%
2,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1,2
0,Basic Materials,1.8%,
1,Consumer Discretionary,9.2%,
2,Consumer Staples,8.6%,
3,Energy,6.6%,
4,Financials,21.6%,
5,Health Care,15.5%,
6,Industrials,15.6%,
7,Real Estate,3.1%,
8,Technology,8.9%,
9,Telecommunications,3.4%,






### 🏦 Value Index Fund

🆔 Context ID:      From2024-01-01to2024-12-31_C000007778Member
🎫 Ticker:          VTV
🏷️ Share Class:     ETF Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    NYSE

--- 💰 Costs & Financials ---
Net Assets          : 183,569
Expense Ratio       : 0.04
Turnover Rate       : 9
Costs per $10k      : 4
Advisory Fees       : 3,184
Number of Holdings  : 348

📝 Commentary: "How did the Fund perform during the reporting period? For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Large Cap Value Index.U.S. economic growth hovered around 3% on a year-over-year basis for muc..."


**📈 Performance History**

Unnamed: 0.1,Unnamed: 0,ETF Shares Net Asset Value,CRSP US Large Cap Value Index,Dow Jones U.S. Total Stock Market Float Adjusted Index
0,2014,"$10,000","$10,000","$10,000"
1,2015,"$9,945","$9,945","$10,180"
2,2015,"$9,993","$9,993","$10,192"
3,2015,"$9,266","$9,268","$9,451"
4,2015,"$9,911","$9,914","$10,044"
5,2016,"$10,074","$10,078","$10,136"
6,2016,"$10,456","$10,462","$10,401"
7,2016,"$10,775","$10,781","$10,862"
8,2016,"$11,585","$11,592","$11,312"
9,2017,"$11,963","$11,972","$11,967"


**📊 Average Annual Returns**

Unnamed: 0.1,Unnamed: 0,1 Year,5 Years,10 Years
0,ETF Shares Net Asset Value,16.00%,9.93%,10.00%
1,ETF Shares Market Price,15.94%,9.93%,10.00%
2,CRSP US Large Cap Value Index,16.00%,9.93%,10.01%
3,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1,2
0,Basic Materials,1.8%,
1,Consumer Discretionary,9.2%,
2,Consumer Staples,8.6%,
3,Energy,6.6%,
4,Financials,21.6%,
5,Health Care,15.5%,
6,Industrials,15.6%,
7,Real Estate,3.1%,
8,Technology,8.9%,
9,Telecommunications,3.4%,






### 🏦 Value Index Fund

🆔 Context ID:      From2024-01-01to2024-12-31_C000007776Member
🎫 Ticker:          VVIAX
🏷️ Share Class:     Admiral Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 183,569
Expense Ratio       : 0.05
Turnover Rate       : 9
Costs per $10k      : 5
Advisory Fees       : 3,184
Number of Holdings  : 348

📝 Commentary: "How did the Fund perform during the reporting period? For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Large Cap Value Index.U.S. economic growth hovered around 3% on a year-over-year basis for muc..."


**📈 Performance History**

Unnamed: 0.1,Unnamed: 0,Admiral Shares,CRSP US Large Cap Value Index,Dow Jones U.S. Total Stock Market Float Adjusted Index
0,2014,"$10,000","$10,000","$10,000"
1,2015,"$9,943","$9,945","$10,180"
2,2015,"$9,994","$9,993","$10,192"
3,2015,"$9,266","$9,268","$9,451"
4,2015,"$9,914","$9,914","$10,044"
5,2016,"$10,076","$10,078","$10,136"
6,2016,"$10,459","$10,462","$10,401"
7,2016,"$10,777","$10,781","$10,862"
8,2016,"$11,586","$11,592","$11,312"
9,2017,"$11,963","$11,972","$11,967"


**📊 Average Annual Returns**

Unnamed: 0.1,Unnamed: 0,1 Year,5 Years,10 Years
0,Admiral Shares,15.99%,9.93%,9.99%
1,CRSP US Large Cap Value Index,16.00%,9.93%,10.01%
2,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1,2
0,Basic Materials,1.8%,
1,Consumer Discretionary,9.2%,
2,Consumer Staples,8.6%,
3,Energy,6.6%,
4,Financials,21.6%,
5,Health Care,15.5%,
6,Industrials,15.6%,
7,Real Estate,3.1%,
8,Technology,8.9%,
9,Telecommunications,3.4%,






### 🏦 Value Index Fund

🆔 Context ID:      From2024-01-01to2024-12-31_C000007777Member
🎫 Ticker:          VIVIX
🏷️ Share Class:     Institutional Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 183,569
Expense Ratio       : 0.04
Turnover Rate       : 9
Costs per $10k      : 4
Advisory Fees       : 3,184
Number of Holdings  : 348

📝 Commentary: "How did the Fund perform during the reporting period? For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Large Cap Value Index.U.S. economic growth hovered around 3% on a year-over-year basis for muc..."


**📈 Performance History**

Unnamed: 0.1,Unnamed: 0,Institutional Shares,CRSP US Large Cap Value Index,Dow Jones U.S. Total Stock Market Float Adjusted Index
0,2014,"$5,000,000","$5,000,000","$5,000,000"
1,2015,"$4,971,871","$4,972,417","$5,090,054"
2,2015,"$4,997,132","$4,996,694","$5,096,078"
3,2015,"$4,633,467","$4,633,912","$4,725,720"
4,2015,"$4,957,449","$4,957,090","$5,022,045"
5,2016,"$5,038,463","$5,039,202","$5,067,957"
6,2016,"$5,228,516","$5,231,204","$5,200,452"
7,2016,"$5,387,648","$5,390,397","$5,431,090"
8,2016,"$5,793,893","$5,796,215","$5,655,796"
9,2017,"$5,982,618","$5,986,133","$5,983,419"


**📊 Average Annual Returns**

Unnamed: 0.1,Unnamed: 0,1 Year,5 Years,10 Years
0,Institutional Shares,15.98%,9.94%,10.00%
1,CRSP US Large Cap Value Index,16.00%,9.93%,10.01%
2,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1,2
0,Basic Materials,1.8%,
1,Consumer Discretionary,9.2%,
2,Consumer Staples,8.6%,
3,Energy,6.6%,
4,Financials,21.6%,
5,Health Care,15.5%,
6,Industrials,15.6%,
7,Real Estate,3.1%,
8,Technology,8.9%,
9,Telecommunications,3.4%,






### 🏦 Growth Index Fund

🆔 Context ID:      From2024-01-01to2024-12-31_C000007783Member
🎫 Ticker:          VIGRX
🏷️ Share Class:     Investor Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 284,976
Expense Ratio       : 0.17
Turnover Rate       : 11
Costs per $10k      : 20
Advisory Fees       : 4,355
Number of Holdings  : 183

📝 Commentary: "How did the Fund perform during the reporting period? For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Large Cap Growth Index.U.S. economic growth hovered around 3% on a year-over-year basis for mu..."


**📈 Performance History**

Unnamed: 0.1,Unnamed: 0,Investor Shares,CRSP US Large Cap Growth Index,Dow Jones U.S. Total Stock Market Float Adjusted Index
0,2014,"$10,000","$10,000","$10,000"
1,2015,"$10,340","$10,346","$10,180"
2,2015,"$10,314","$10,325","$10,192"
3,2015,"$9,690","$9,705","$9,451"
4,2015,"$10,317","$10,338","$10,044"
5,2016,"$10,349","$10,376","$10,136"
6,2016,"$10,451","$10,482","$10,401"
7,2016,"$10,985","$11,021","$10,862"
8,2016,"$10,936","$10,975","$11,312"
9,2017,"$11,981","$12,031","$11,967"


**📊 Average Annual Returns**

Unnamed: 0.1,Unnamed: 0,1 Year,5 Years,10 Years
0,Investor Shares,32.50%,18.21%,15.61%
1,CRSP US Large Cap Growth Index,32.73%,18.41%,15.80%
2,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1,2
0,Basic Materials,0.7%,
1,Consumer Discretionary,19.8%,
2,Consumer Staples,0.4%,
3,Energy,0.8%,
4,Financials,2.7%,
5,Health Care,5.7%,
6,Industrials,8.4%,
7,Real Estate,1.3%,
8,Technology,59.0%,
9,Telecommunications,0.9%,






### 🏦 Growth Index Fund

🆔 Context ID:      From2024-01-01to2024-12-31_C000007786Member
🎫 Ticker:          VUG
🏷️ Share Class:     ETF Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    NYSE

--- 💰 Costs & Financials ---
Net Assets          : 284,976
Expense Ratio       : 0.04
Turnover Rate       : 11
Costs per $10k      : 5
Advisory Fees       : 4,355
Number of Holdings  : 183

📝 Commentary: "How did the Fund perform during the reporting period? For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Large Cap Growth Index.U.S. economic growth hovered around 3% on a year-over-year basis for mu..."


**📈 Performance History**

Unnamed: 0.1,Unnamed: 0,ETF Shares Net Asset Value,CRSP US Large Cap Growth Index,Dow Jones U.S. Total Stock Market Float Adjusted Index
0,2014,"$10,000","$10,000","$10,000"
1,2015,"$10,344","$10,346","$10,180"
2,2015,"$10,321","$10,325","$10,192"
3,2015,"$9,701","$9,705","$9,451"
4,2015,"$10,332","$10,338","$10,044"
5,2016,"$10,367","$10,376","$10,136"
6,2016,"$10,473","$10,482","$10,401"
7,2016,"$11,010","$11,021","$10,862"
8,2016,"$10,965","$10,975","$11,312"
9,2017,"$12,018","$12,031","$11,967"


**📊 Average Annual Returns**

Unnamed: 0.1,Unnamed: 0,1 Year,5 Years,10 Years
0,ETF Shares Net Asset Value,32.68%,18.36%,15.76%
1,ETF Shares Market Price,32.64%,18.37%,15.76%
2,CRSP US Large Cap Growth Index,32.73%,18.41%,15.80%
3,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1,2
0,Basic Materials,0.7%,
1,Consumer Discretionary,19.8%,
2,Consumer Staples,0.4%,
3,Energy,0.8%,
4,Financials,2.7%,
5,Health Care,5.7%,
6,Industrials,8.4%,
7,Real Estate,1.3%,
8,Technology,59.0%,
9,Telecommunications,0.9%,






### 🏦 Growth Index Fund

🆔 Context ID:      From2024-01-01to2024-12-31_C000007784Member
🎫 Ticker:          VIGAX
🏷️ Share Class:     Admiral Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 284,976
Expense Ratio       : 0.05
Turnover Rate       : 11
Costs per $10k      : 6
Advisory Fees       : 4,355
Number of Holdings  : 183

📝 Commentary: "How did the Fund perform during the reporting period? For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Large Cap Growth Index.U.S. economic growth hovered around 3% on a year-over-year basis for mu..."


**📈 Performance History**

Unnamed: 0.1,Unnamed: 0,Admiral Shares,CRSP US Large Cap Growth Index,Dow Jones U.S. Total Stock Market Float Adjusted Index
0,2014,"$10,000","$10,000","$10,000"
1,2015,"$10,344","$10,346","$10,180"
2,2015,"$10,320","$10,325","$10,192"
3,2015,"$9,699","$9,705","$9,451"
4,2015,"$10,330","$10,338","$10,044"
5,2016,"$10,365","$10,376","$10,136"
6,2016,"$10,470","$10,482","$10,401"
7,2016,"$11,009","$11,021","$10,862"
8,2016,"$10,963","$10,975","$11,312"
9,2017,"$12,014","$12,031","$11,967"


**📊 Average Annual Returns**

Unnamed: 0.1,Unnamed: 0,1 Year,5 Years,10 Years
0,Admiral Shares,32.66%,18.36%,15.75%
1,CRSP US Large Cap Growth Index,32.73%,18.41%,15.80%
2,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1,2
0,Basic Materials,0.7%,
1,Consumer Discretionary,19.8%,
2,Consumer Staples,0.4%,
3,Energy,0.8%,
4,Financials,2.7%,
5,Health Care,5.7%,
6,Industrials,8.4%,
7,Real Estate,1.3%,
8,Technology,59.0%,
9,Telecommunications,0.9%,






### 🏦 Growth Index Fund

🆔 Context ID:      From2024-01-01to2024-12-31_C000007785Member
🎫 Ticker:          VIGIX
🏷️ Share Class:     Institutional Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 284,976
Expense Ratio       : 0.04
Turnover Rate       : 11
Costs per $10k      : 5
Advisory Fees       : 4,355
Number of Holdings  : 183

📝 Commentary: "How did the Fund perform during the reporting period? For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Large Cap Growth Index.U.S. economic growth hovered around 3% on a year-over-year basis for mu..."


**📈 Performance History**

Unnamed: 0.1,Unnamed: 0,Institutional Shares,CRSP US Large Cap Growth Index,Dow Jones U.S. Total Stock Market Float Adjusted Index
0,2014,"$5,000,000","$5,000,000","$5,000,000"
1,2015,"$5,172,206","$5,173,192","$5,090,054"
2,2015,"$5,161,071","$5,162,318","$5,096,078"
3,2015,"$4,850,557","$4,852,558","$4,725,720"
4,2015,"$5,166,364","$5,169,146","$5,022,045"
5,2016,"$5,184,201","$5,187,759","$5,067,957"
6,2016,"$5,237,051","$5,240,785","$5,200,452"
7,2016,"$5,506,240","$5,510,594","$5,431,090"
8,2016,"$5,483,263","$5,487,362","$5,655,796"
9,2017,"$6,010,352","$6,015,412","$5,983,419"


**📊 Average Annual Returns**

Unnamed: 0.1,Unnamed: 0,1 Year,5 Years,10 Years
0,Institutional Shares,32.68%,18.37%,15.76%
1,CRSP US Large Cap Growth Index,32.73%,18.41%,15.80%
2,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1,2
0,Basic Materials,0.7%,
1,Consumer Discretionary,19.8%,
2,Consumer Staples,0.4%,
3,Energy,0.8%,
4,Financials,2.7%,
5,Health Care,5.7%,
6,Industrials,8.4%,
7,Real Estate,1.3%,
8,Technology,59.0%,
9,Telecommunications,0.9%,






### 🏦 Large-Cap Index Fund

🆔 Context ID:      From2024-01-01to2024-12-31_C000007787Member
🎫 Ticker:          VLACX
🏷️ Share Class:     Investor Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 56,785
Expense Ratio       : 0.17
Turnover Rate       : 2
Costs per $10k      : 19
Advisory Fees       : 955
Number of Holdings  : 494

📝 Commentary: "How did the Fund perform during the reporting period? For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Large Cap Index.U.S. economic growth hovered around 3% on a year-over-year basis for much of t..."


**📈 Performance History**

Unnamed: 0.1,Unnamed: 0,Investor Shares,CRSP US Large Cap Index,Dow Jones U.S. Total Stock Market Float Adjusted Index
0,2014,"$10,000","$10,000","$10,000"
1,2015,"$10,125","$10,131","$10,180"
2,2015,"$10,137","$10,147","$10,192"
3,2015,"$9,457","$9,470","$9,451"
4,2015,"$10,093","$10,111","$10,044"
5,2016,"$10,189","$10,217","$10,136"
6,2016,"$10,436","$10,471","$10,401"
7,2016,"$10,855","$10,894","$10,862"
8,2016,"$11,254","$11,298","$11,312"
9,2017,"$11,946","$11,998","$11,967"


**📊 Average Annual Returns**

Unnamed: 0.1,Unnamed: 0,1 Year,5 Years,10 Years
0,Investor Shares,24.95%,14.34%,12.87%
1,CRSP US Large Cap Index,25.15%,14.51%,13.05%
2,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1,2
0,Basic Materials,1.2%,
1,Consumer Discretionary,15.1%,
2,Consumer Staples,4.0%,
3,Energy,3.2%,
4,Financials,10.8%,
5,Health Care,9.8%,
6,Industrials,11.3%,
7,Real Estate,2.0%,
8,Technology,38.0%,
9,Telecommunications,2.0%,






### 🏦 Large-Cap Index Fund

🆔 Context ID:      From2024-01-01to2024-12-31_C000007790Member
🎫 Ticker:          VV
🏷️ Share Class:     ETF Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    NYSE

--- 💰 Costs & Financials ---
Net Assets          : 56,785
Expense Ratio       : 0.04
Turnover Rate       : 2
Costs per $10k      : 5
Advisory Fees       : 955
Number of Holdings  : 494

📝 Commentary: "How did the Fund perform during the reporting period? For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Large Cap Index.U.S. economic growth hovered around 3% on a year-over-year basis for much of t..."


**📈 Performance History**

Unnamed: 0.1,Unnamed: 0,ETF Shares Net Asset Value,CRSP US Large Cap Index,Dow Jones U.S. Total Stock Market Float Adjusted Index
0,2014,"$10,000","$10,000","$10,000"
1,2015,"$10,129","$10,131","$10,180"
2,2015,"$10,145","$10,147","$10,192"
3,2015,"$9,468","$9,470","$9,451"
4,2015,"$10,107","$10,111","$10,044"
5,2016,"$10,207","$10,217","$10,136"
6,2016,"$10,459","$10,471","$10,401"
7,2016,"$10,881","$10,894","$10,862"
8,2016,"$11,284","$11,298","$11,312"
9,2017,"$11,982","$11,998","$11,967"


**📊 Average Annual Returns**

Unnamed: 0.1,Unnamed: 0,1 Year,5 Years,10 Years
0,ETF Shares Net Asset Value,25.12%,14.48%,13.02%
1,ETF Shares Market Price,25.05%,14.48%,13.01%
2,CRSP US Large Cap Index,25.15%,14.51%,13.05%
3,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1,2
0,Basic Materials,1.2%,
1,Consumer Discretionary,15.1%,
2,Consumer Staples,4.0%,
3,Energy,3.2%,
4,Financials,10.8%,
5,Health Care,9.8%,
6,Industrials,11.3%,
7,Real Estate,2.0%,
8,Technology,38.0%,
9,Telecommunications,2.0%,






### 🏦 Large-Cap Index Fund

🆔 Context ID:      From2024-01-01to2024-12-31_C000007788Member
🎫 Ticker:          VLCAX
🏷️ Share Class:     Admiral Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 56,785
Expense Ratio       : 0.05
Turnover Rate       : 2
Costs per $10k      : 6
Advisory Fees       : 955
Number of Holdings  : 494

📝 Commentary: "How did the Fund perform during the reporting period? For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Large Cap Index.U.S. economic growth hovered around 3% on a year-over-year basis for much of t..."


**📈 Performance History**

Unnamed: 0.1,Unnamed: 0,Admiral Shares,CRSP US Large Cap Index,Dow Jones U.S. Total Stock Market Float Adjusted Index
0,2014,"$10,000","$10,000","$10,000"
1,2015,"$10,131","$10,131","$10,180"
2,2015,"$10,146","$10,147","$10,192"
3,2015,"$9,468","$9,470","$9,451"
4,2015,"$10,107","$10,111","$10,044"
5,2016,"$10,206","$10,217","$10,136"
6,2016,"$10,457","$10,471","$10,401"
7,2016,"$10,881","$10,894","$10,862"
8,2016,"$11,284","$11,298","$11,312"
9,2017,"$11,981","$11,998","$11,967"


**📊 Average Annual Returns**

Unnamed: 0.1,Unnamed: 0,1 Year,5 Years,10 Years
0,Admiral Shares,25.10%,14.47%,13.01%
1,CRSP US Large Cap Index,25.15%,14.51%,13.05%
2,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1,2
0,Basic Materials,1.2%,
1,Consumer Discretionary,15.1%,
2,Consumer Staples,4.0%,
3,Energy,3.2%,
4,Financials,10.8%,
5,Health Care,9.8%,
6,Industrials,11.3%,
7,Real Estate,2.0%,
8,Technology,38.0%,
9,Telecommunications,2.0%,






### 🏦 Large-Cap Index Fund

🆔 Context ID:      From2024-01-01to2024-12-31_C000007789Member
🎫 Ticker:          VLISX
🏷️ Share Class:     Institutional Shares
📅 Report Date:     December 31, 2024
🏛️ Sec Exchange:    N/A

--- 💰 Costs & Financials ---
Net Assets          : 56,785
Expense Ratio       : 0.04
Turnover Rate       : 2
Costs per $10k      : 5
Advisory Fees       : 955
Number of Holdings  : 494

📝 Commentary: "How did the Fund perform during the reporting period? For the 12 months ended December 31, 2024, the Fund performed in line with its benchmark, the CRSP US Large Cap Index.U.S. economic growth hovered around 3% on a year-over-year basis for much of t..."


**📈 Performance History**

Unnamed: 0.1,Unnamed: 0,Institutional Shares,CRSP US Large Cap Index,Dow Jones U.S. Total Stock Market Float Adjusted Index
0,2014,"$5,000,000","$5,000,000","$5,000,000"
1,2015,"$5,064,717","$5,065,516","$5,090,054"
2,2015,"$5,072,527","$5,073,505","$5,096,078"
3,2015,"$4,733,990","$4,735,147","$4,725,720"
4,2015,"$5,053,400","$5,055,301","$5,022,045"
5,2016,"$5,103,277","$5,108,683","$5,067,957"
6,2016,"$5,229,161","$5,235,336","$5,200,452"
7,2016,"$5,440,308","$5,446,752","$5,431,090"
8,2016,"$5,642,490","$5,649,147","$5,655,796"
9,2017,"$5,991,191","$5,998,908","$5,983,419"


**📊 Average Annual Returns**

Unnamed: 0.1,Unnamed: 0,1 Year,5 Years,10 Years
0,Institutional Shares,25.12%,14.49%,13.02%
1,CRSP US Large Cap Index,25.15%,14.51%,13.05%
2,Dow Jones U.S. Total Stock Market Float Adjust...,23.88%,13.78%,12.48%


**🏗️ Sector Allocation**

Unnamed: 0,0,1,2
0,Basic Materials,1.2%,
1,Consumer Discretionary,15.1%,
2,Consumer Staples,4.0%,
3,Energy,3.2%,
4,Financials,10.8%,
5,Health Care,9.8%,
6,Industrials,11.3%,
7,Real Estate,2.0%,
8,Technology,38.0%,
9,Telecommunications,2.0%,






In [19]:
from src.simple_rag.models.fund import FinancialHighlights
import pandas as pd
from IPython.display import display

print(len(df_performance))

total_df = pd.concat([df_performance[0], df_performance[1]], ignore_index=True)

returns_lookup = total_df.copy()

display(returns_lookup['fund_name'].unique())

numeric_columns = ['portfolio_turnover', 'expense_ratio', 'net_assets', 
                   'nav_beginning', 'nav_end', 'net_income_ratio', 'distribution_shares']
for col in numeric_columns:
    if col in returns_lookup.columns:
        if returns_lookup[col] is not None:        
            returns_lookup[f'{col}_clean'] = (
                returns_lookup[col]
                .astype(str)
                .str.replace('%', '')
                .str.replace('$', '')
                .str.replace(',', '')
                .replace('N/A', '0')
                .replace('', '0')
                .replace('None', '0')
                .astype(float)
            )
count = 0
# Now you can efficiently match and update your funds
for fund_obj in funds_total:
    print(f"\nProcessing fund object: {fund_obj.name} - {fund_obj.share_class}")
    
    # Initialize annual returns
    if not hasattr(fund_obj, 'annual_returns') or fund_obj.annual_returns is None:
        fund_obj.annual_returns = {}
    
    # Clean the name: remove "Vanguard" and strip whitespace
    name = fund_obj.name.replace("Vanguard", "").strip()
    print(f"Cleaned name: '{name}'")
    
    # Find matching rows based on fund name
    name_matches = returns_lookup[returns_lookup['fund_name'].str.strip().str.lower() == name.lower()]
    if len(name_matches) == 0:
        print("  No name matches found for ticker: ", fund_obj.ticker)

        continue
    
    print(f"  Found {len(name_matches)} name matches")
    
    # Clean share class (remove trademark symbol)
    share_class = fund_obj.share_class
    if "™" in share_class:
        share_class = share_class.replace("™", "")
    
    # Now match share class
    share_class_matches = name_matches[
        name_matches['share_class'].str.contains(share_class, case=False, na=False, regex=False)]
    
    if len(share_class_matches) == 0:
        print(f"  No share class matches found for '{share_class}' ticker: ", fund_obj.ticker)
        print(f"  Available share classes: {name_matches['share_class'].unique()}")
        continue
    
    
    print(f"  Found {len(share_class_matches)} matching records")
    count += 1
    # Add all matching returns
    for _, row in share_class_matches.iterrows():
        year = str(row['year'])
        
       
        highlights = FinancialHighlights(
            turnover=row.get('portfolio_turnover_clean', 0),
            expense_ratio=row.get('expense_ratio_clean', 0),
            total_return=row['total_return'],
            net_assets=row.get('net_assets_clean', 0),
            net_assets_value_begining=row.get('nav_beginning_clean', 0),
            net_assets_value_end=row.get('nav_end_clean', 0),
            net_income_ratio=row.get('net_income_ratio_clean', 0.0)
        )
        
        fund_obj.financial_highlights[year] = highlights
        print(f"  {year}: Total Return = {highlights.total_return}%, Expense Ratio = {highlights.expense_ratio}%, Net Assets = {highlights.net_assets}, Net Income Ratio = {highlights.net_income_ratio}, Turnover = {highlights.turnover}, Net Assets Value Begining = {highlights.net_assets_value_begining}, Net Assets Value End = {highlights.net_assets_value_end}")
print("count: ",count)
print("Total funds: ",len(funds_total))
    

2


array(['Small-Cap Index Fund', 'Small-Cap Growth Index Fund',
       'Small-Cap Value Index Fund', 'Extended Market Index Fund',
       'Mid-Cap Index Fund', 'Mid-Cap Growth Index Fund',
       'Mid-Cap Value Index Fund', 'Total Stock Market Index Fund',
       '500 Index Fund', 'Growth Index Fund', 'Value Index Fund',
       'Large-Cap Index Fund'], dtype=object)


Processing fund object: Vanguard Extended Market Index Fund - Investor Shares
Cleaned name: 'Extended Market Index Fund'
  Found 30 name matches
  Found 5 matching records
  2024: Total Return = 16.76%, Expense Ratio = 0.19%, Net Assets = 195.0, Net Income Ratio = 1.09, Turnover = 11.0, Net Assets Value Begining = 124.78, Net Assets Value End = 144.2
  2023: Total Return = 25.22%, Expense Ratio = 0.19%, Net Assets = 232.0, Net Income Ratio = 1.28, Turnover = 11.0, Net Assets Value Begining = 100.93, Net Assets Value End = 124.78
  2022: Total Return = -26.56%, Expense Ratio = 0.19%, Net Assets = 229.0, Net Income Ratio = 1.14, Turnover = 11.0, Net Assets Value Begining = 138.8, Net Assets Value End = 100.93
  2021: Total Return = 12.31%, Expense Ratio = 0.19%, Net Assets = 399.0, Net Income Ratio = 0.87, Turnover = 19.0, Net Assets Value Begining = 124.83, Net Assets Value End = 138.8
  2020: Total Return = 32.04%, Expense Ratio = 0.19%, Net Assets = 454.0, Net Income Ratio = 1.04, Tu

In [20]:
import sys
%reload_ext autoreload
from simple_rag.extraction.parser import compute_annual_returns

for fund in funds_total:
    if fund.ticker in performance_funds:
        returns = compute_annual_returns(fund.performance_table)
        print("\nFinal Annual Returns:")
        fund.annual_returns = returns
        print(f"  {fund.ticker}: {returns}")
        for year, return_ in returns.items():
            print(fund.financial_highlights.keys())
            if year not in fund.financial_highlights.keys():
                new_highlight = FinancialHighlights(
                year=int(year),
                total_return=return_,
                turnover=0.0,
                expense_ratio=0.0,
                net_assets=0.0,
                net_assets_value_begining=0.0,
                net_assets_value_end=0.0,
                net_income_ratio=0.0
                )
                fund.financial_highlights[year] = new_highlight
                print(f"    {year}: {new_highlight}")

Detected format: Year (YYYY)
Found years: [np.int64(2014), np.int64(2015), np.int64(2016), np.int64(2017), np.int64(2018), np.int64(2019), np.int64(2020), np.int64(2021), np.int64(2022), np.int64(2023), np.int64(2024)]
  2015 Return: $10,000.00 -> $10,125.00 = 1.25%
  2016 Return: $10,125.00 -> $11,321.00 = 11.81%
  2017 Return: $11,321.00 -> $13,774.00 = 21.67%
  2018 Return: $13,774.00 -> $13,151.00 = -4.52%
  2019 Return: $13,151.00 -> $17,271.00 = 31.33%
  2020 Return: $17,271.00 -> $20,423.00 = 18.25%
  2021 Return: $20,423.00 -> $26,250.00 = 28.53%
  2022 Return: $26,250.00 -> $21,465.00 = -18.23%
  2023 Return: $21,465.00 -> $27,069.00 = 26.11%
  2024 Return: $27,069.00 -> $33,794.00 = 24.84%

Final Annual Returns:
  VFINX: {'2015': 1.25, '2016': 11.81, '2017': 21.67, '2018': -4.52, '2019': 31.33, '2020': 18.25, '2021': 28.53, '2022': -18.23, '2023': 26.11, '2024': 24.84}
dict_keys(['2024', '2023', '2022', '2021', '2020'])
    2015: turnover=0.0 expense_ratio=0.0 total_return=1.

  df['parsed_date'] = pd.to_datetime(df[date_col], errors='coerce')


In [21]:
for fund in funds_total:
    if isinstance(fund.net_assets, str):
        fund.net_assets = fund.net_assets.replace(",", "")
    if isinstance(fund.advisory_fees, str):
        fund.advisory_fees = fund.advisory_fees.replace(",", "")
    if isinstance(fund.n_holdings, str):
        fund.n_holdings = fund.n_holdings.replace(",", "")

    fund.n_holdings = int(fund.n_holdings)
    fund.turnover_rate = int(fund.turnover_rate)
    fund.expense_ratio = float(fund.expense_ratio)
    fund.net_assets = float(fund.net_assets) 
    fund.advisory_fees = float(fund.advisory_fees) 
    
    for prop in vars(fund):
        value = getattr(fund, prop)
        print(f"{prop}: {value} (type: {type(value).__name__})")

name: Vanguard Extended Market Index Fund (type: str)
registrant: Vanguard Index Funds (type: str)
context_id: FY2024_C000007779Member (type: str)
share_class: Investor Shares (type: ShareClassType)
ticker: VEXMX (type: str)
security_exchange: N/A (type: str)
costs_per_10k: 21 (type: str)
expense_ratio: 0.19 (type: float)
net_assets: 111156.0 (type: float)
turnover_rate: 11 (type: int)
advisory_fees: 1799.0 (type: float)
n_holdings: 3485 (type: int)
report_date: December 31, 2024 (type: str)
annual_returns: {} (type: dict)
performance: {} (type: dict)
avg_annual_returns:                                                    0       1        2  \
0                       Average Annual Total Returns     NaN      NaN   
1                                                NaN  1 Year  5 Years   
2                                    Investor Shares  16.76%    9.75%   
3                               S&P Completion Index  16.88%    9.77%   
4  Dow Jones U.S. Total Stock Market Float Adjust...  23.

## Vanguard World Fund

In [22]:
import pandas as pd
from io import StringIO
import sys
from pathlib import Path
RAG_DIR = Path("/home/alvar/CascadeProjects/windsurf-project/RAG")
if str(RAG_DIR) not in sys.path:
    sys.path.insert(0, str(RAG_DIR))
%reload_ext autoreload
from src.simple_rag.extraction.parser import BlackRockFiling
from edgar import set_identity, Company


set_identity("luis.alvarez.conde@alumnos.upm.es")

ticker = "MGK"
fund = Company(ticker)
all_filings = fund.get_filings(form="N-CSR")


if all_filings:
    # 1. Find the most recent date in the entire history (e.g., "2024-12-31")
    latest_date_str = max(f.report_date for f in all_filings)
    
    # 2. Extract just the YEAR (e.g., "2024")
    target_year = latest_date_str[:4]
    
    # 3. Filter: Keep ALL filings where the report_date starts with that year
    # This captures the March, June, and December reports for that fiscal year
    latest_filings = [
        f for f in all_filings 
        if f.report_date and f.report_date.startswith(target_year)
    ]
    target_year = "2024"
    filings2 = sorted(
        [f for f in all_filings if f.report_date and f.report_date.startswith(target_year)],
        key=lambda f: f.report_date,
        reverse=True
    )

    latest_filings.append(filings2[0])
    print("Found filings: ", len(latest_filings), "for year: ", target_year)



performance_funds = []
df_performance = []
world_funds = set()

abort = False
for filing in latest_filings:

    html_content = filing.html()
    
    parser = BlackRockFiling(html_content)
    funds = parser.get_funds()
    count = 0
    for fund in funds:
        if fund.performance_table is not None:
            if fund.ticker not in performance_funds:
                performance_funds.append(fund.ticker)
                count += 1
        if fund.ticker not in world_funds:
            world_funds.add(fund.ticker)
        else:
            print("Exiting filing, repeated ticker found: ", fund.ticker)
            abort = True
            break
    if not abort:

        df_performance.append(parser.get_financial_highlights())
        print(count)
        print("Adding funds: ", len(funds))
        funds_total.extend(funds)

print("Total world funds added: ", len(world_funds))
print(len(performance_funds))
print(performance_funds)

print(len(df_performance))


Found filings:  4 for year:  2024
Processing: Mega Cap Growth Index Fund
Extracting context:  From2024-10-01to2025-09-30_C000055216Member
Processing: Mega Cap Growth Index Fund
Extracting context:  From2024-10-01to2025-09-30_C000055215Member
Tag not found:  dei:SecurityExchangeName From2024-10-01to2025-09-30_C000055215Member
2
Adding funds:  2
Processing: Vanguard Extended Duration Treasury Index Fund
Extracting context:  FY2025_C000051981Member
Tag not found:  dei:SecurityExchangeName FY2025_C000051981Member
Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Processing: Vanguard Extended Duration Treasury Index Fund
Extracting context:  FY2025_C000051979Member
Tag not found:  dei:SecurityExchangeName FY2025_C000051979Member
Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Processing: Vanguard ESG U.S. Stock ETF
Extracting context:  FY2025_C000

In [23]:
from src.simple_rag.models.fund import FinancialHighlights
import pandas as pd
from IPython.display import display

print(len(df_performance))

total_df = pd.concat([df_performance[0], df_performance[1], df_performance[2]], ignore_index=True)

returns_lookup = total_df.copy()

# Optional: Clean the total_return column (remove % sign if needed)
print(returns_lookup.head())
display(returns_lookup['fund_name'].unique())

numeric_columns = ['portfolio_turnover', 'expense_ratio', 'net_assets', 
                   'nav_beginning', 'nav_end', 'net_income_ratio', 'distribution_shares']


for col in numeric_columns:
    if col in returns_lookup.columns:
        if returns_lookup[col] is not None:  
            try:      
                returns_lookup[f'{col}_clean'] = (
                    returns_lookup[col]
                    .astype(str)
                    .str.replace('%', '')
                    .str.replace('$', '')
                    .str.replace(',', '')
                    .replace('N/A', '0')
                    .replace('', '0')
                    .replace('None', '0')
                    .astype(float)
                )
            except Exception as e:
                print(f"Error cleaning column '{col}': {str(e)}")
                print(returns_lookup[col].to_string())
count = 0
# Now you can efficiently match and update your funds
for fund_obj in funds_total:
    print(f"\nProcessing fund object: {fund_obj.name} - {fund_obj.share_class}")
    
    # Initialize annual returns
    if not hasattr(fund_obj, 'annual_returns') or fund_obj.annual_returns is None:
        fund_obj.annual_returns = {}
    
    # Clean the name: remove "Vanguard" and strip whitespace
    name = fund_obj.name.replace("Vanguard", "").strip()
    print(f"Cleaned name: '{name}'")
    
    if "™" in name:
        name = name.replace("™", "")
    elif "®" in name:
        name = name.replace("®", "")
    # Find matching rows based on fund name
    name_matches = returns_lookup[returns_lookup['fund_name'].str.strip().str.lower() == name.lower()]
    if len(name_matches) == 0:
        print("  No name matches found for ticker: ", fund_obj.ticker)
        continue
    
    print(f"  Found {len(name_matches)} name matches")
    
    # Clean share class (remove trademark symbol)
    share_class = fund_obj.share_class
    if "™" in share_class:
        share_class = share_class.replace("™", "")
    
    # Now match share class
    share_class_matches = name_matches[
        name_matches['share_class'].str.contains(share_class, case=False, na=False, regex=False)]
    
    if len(share_class_matches) == 0:
        print(f"  No share class matches found for '{share_class}' ticker: ", fund_obj.ticker)
        print(f"  Available share classes: {name_matches['share_class'].unique()}")
        continue
    elif len(share_class_matches) > 5:
        print("  More than 5 share class matches found:")
        print(share_class_matches)
    
    
    print(f"  Found {len(share_class_matches)} matching records")
    count += 1
    # Add all matching returns
    for _, row in share_class_matches.iterrows():
        year = str(row['year'])
        
       
        highlights = FinancialHighlights(
            turnover=row.get('portfolio_turnover_clean', 0),
            expense_ratio=row.get('expense_ratio_clean', 0),
            total_return=row['total_return'],
            net_assets=row.get('net_assets_clean', 0),
            net_assets_value_begining=row.get('nav_beginning_clean', 0),
            net_assets_value_end=row.get('nav_end_clean', 0),
            net_income_ratio=row.get('net_income_ratio_clean', 0.0)
        )
        
        fund_obj.financial_highlights[year] = highlights
        print(f"  {year}: Total Return = {highlights.total_return}%, Expense Ratio = {highlights.expense_ratio}%, Net Assets = {highlights.net_assets}, Net Income Ratio = {highlights.net_income_ratio}, Turnover = {highlights.turnover}, Net Assets Value Begining = {highlights.net_assets_value_begining}, Net Assets Value End = {highlights.net_assets_value_end}")
print("count: ",count)
print("Total funds: ",world_funds)
    

3
                    fund_name share_class  year  net_assets  nav_beginning  \
0  Mega Cap Growth Index Fund  ETF Shares  2025     31195.0         321.87   
1  Mega Cap Growth Index Fund  ETF Shares  2024     22954.0         314.83   
2  Mega Cap Growth Index Fund  ETF Shares  2024     21996.0         241.25   
3  Mega Cap Growth Index Fund  ETF Shares  2023     14376.0         195.20   
4  Mega Cap Growth Index Fund  ETF Shares  2022     11168.0         248.50   

   nav_end  total_return  expense_ratio  net_income_ratio  portfolio_turnover  \
0   402.45         25.58           0.07              0.42                14.0   
1   321.87          2.35           0.07              0.40                 6.0   
2   314.83         31.16           0.07              0.51                14.0   
3   241.25         24.39           0.07              0.62                 7.0   
4   195.20        -21.08           0.07              0.51                 5.0   

  distribution_shares  
0                N

array(['Mega Cap Growth Index Fund',
       'Extended Duration Treasury Index Fund', 'ESG U.S. Stock ETF',
       'ESG International Stock ETF', 'Global Wellington Fund',
       'Global Wellesley Income Fund', 'ESG U.S. Corporate Bond ETF',
       'U.S. Growth Fund', 'International Growth Fund',
       'FTSE Social Index Fund', 'Communication Services Index Fund',
       'Consumer Discretionary Index Fund', 'Consumer Staples Index Fund',
       'Energy Index Fund', 'Financials Index Fund',
       'Health Care Index Fund', 'Industrials Index Fund',
       'Information Technology Index Fund', 'Materials Index Fund',
       'Utilities Index Fund', 'Mega Cap Index Fund',
       'Mega Cap Value Index Fund'], dtype=object)


Processing fund object: Vanguard Extended Market Index Fund - Investor Shares
Cleaned name: 'Extended Market Index Fund'
  No name matches found for ticker:  VEXMX

Processing fund object: Vanguard Extended Market Index Fund - ETF Shares
Cleaned name: 'Extended Market Index Fund'
  No name matches found for ticker:  VXF

Processing fund object: Vanguard Extended Market Index Fund - Admiral Shares
Cleaned name: 'Extended Market Index Fund'
  No name matches found for ticker:  VEXAX

Processing fund object: Vanguard Extended Market Index Fund - Institutional Shares
Cleaned name: 'Extended Market Index Fund'
  No name matches found for ticker:  VIEIX

Processing fund object: Vanguard Extended Market Index Fund - Institutional Plus Shares
Cleaned name: 'Extended Market Index Fund'
  No name matches found for ticker:  VEMPX

Processing fund object: Vanguard Extended Market Index Fund - Institutional Select Shares
Cleaned name: 'Extended Market Index Fund'
  No name matches found for ticker

In [24]:
import sys
%reload_ext autoreload
sys.path.append('../src')


from simple_rag.extraction.parser import compute_annual_returns

for fund in funds_total:
    if fund.ticker in performance_funds:
        returns = compute_annual_returns(fund.performance_table)
        print("\nFinal Annual Returns:")
        fund.annual_returns = returns
        print(f"  {fund.ticker}: {returns}")
        for year, return_ in returns.items():
            print(fund.financial_highlights.keys())
            if year not in fund.financial_highlights.keys():
                new_highlight = FinancialHighlights(
                year=int(year),
                total_return=return_,
                turnover=0.0,
                expense_ratio=0.0,
                net_assets=0.0,
                net_assets_value_begining=0.0,
                net_assets_value_end=0.0,
                net_income_ratio=0.0
                )
                fund.financial_highlights[year] = new_highlight
                print(f"    {year}: {new_highlight}")

Detected format: Year (YYYY)
Found years: [np.int64(2015), np.int64(2016), np.int64(2017), np.int64(2018), np.int64(2019), np.int64(2020), np.int64(2021), np.int64(2022), np.int64(2023), np.int64(2024), np.int64(2025)]
  2016 Return: $10,724.00 -> $11,409.00 = 6.39%
  2017 Return: $11,409.00 -> $14,772.00 = 29.48%
  2018 Return: $14,772.00 -> $14,349.00 = -2.86%
  2019 Return: $14,349.00 -> $19,736.00 = 37.54%
  2020 Return: $19,736.00 -> $27,826.00 = 40.99%
  2021 Return: $27,826.00 -> $35,753.00 = 28.49%
  2022 Return: $35,753.00 -> $23,755.00 = -33.56%
  2023 Return: $23,755.00 -> $36,004.00 = 51.56%
  2024 Return: $36,004.00 -> $47,873.00 = 32.97%
  2025 Return: $47,873.00 -> $56,289.00 = 17.58%

Final Annual Returns:
  MGK: {'2016': 6.39, '2017': 29.48, '2018': -2.86, '2019': 37.54, '2020': 40.99, '2021': 28.49, '2022': -33.56, '2023': 51.56, '2024': 32.97, '2025': 17.58}
dict_keys(['2025', '2024', '2023', '2022', '2021'])
    2016: turnover=0.0 expense_ratio=0.0 total_return=6.39

In [25]:
for fund in funds_total:
    
    if isinstance(fund.net_assets, str):
        fund.net_assets = fund.net_assets.replace(",", "")
    if isinstance(fund.advisory_fees, str):
        fund.advisory_fees = fund.advisory_fees.replace(",", "")
    if isinstance(fund.n_holdings, str):
        fund.n_holdings = fund.n_holdings.replace(",", "")

    fund.n_holdings = int(fund.n_holdings)
    fund.turnover_rate = int(fund.turnover_rate)
    fund.expense_ratio = float(fund.expense_ratio)
    fund.net_assets = float(fund.net_assets) 
    fund.advisory_fees = float(fund.advisory_fees) 
    for prop in vars(fund):
        value = getattr(fund, prop)
        print(f"{prop}: {value} (type: {type(value).__name__})")

name: Vanguard Extended Market Index Fund (type: str)
registrant: Vanguard Index Funds (type: str)
context_id: FY2024_C000007779Member (type: str)
share_class: Investor Shares (type: ShareClassType)
ticker: VEXMX (type: str)
security_exchange: N/A (type: str)
costs_per_10k: 21 (type: str)
expense_ratio: 0.19 (type: float)
net_assets: 111156.0 (type: float)
turnover_rate: 11 (type: int)
advisory_fees: 1799.0 (type: float)
n_holdings: 3485 (type: int)
report_date: December 31, 2024 (type: str)
annual_returns: {} (type: dict)
performance: {} (type: dict)
avg_annual_returns:                                                    0       1        2  \
0                       Average Annual Total Returns     NaN      NaN   
1                                                NaN  1 Year  5 Years   
2                                    Investor Shares  16.76%    9.75%   
3                               S&P Completion Index  16.88%    9.77%   
4  Dow Jones U.S. Total Stock Market Float Adjust...  23.

In [26]:
import pickle
from pathlib import Path
import sys

# Add RAG directory to path
RAG_DIR = Path("/home/alvar/CascadeProjects/windsurf-project/RAG")
if str(RAG_DIR) not in sys.path:
    sys.path.insert(0, str(RAG_DIR))

# Define pickle file path
PKL_PATH = Path("./funds_backup.pkl")

print("Current working directory:", Path.cwd())
print("PKL_PATH resolves to:", PKL_PATH.resolve())

# Save to pickle file
try:
    with PKL_PATH.open("wb") as f:
        pickle.dump(funds_total, f)
    
    print(f"Successfully saved {len(funds_total)} funds to pickle file")
    print(f"File size: {PKL_PATH.stat().st_size / 1024:.2f} KB")
    
except Exception as e:
    print(f"Error saving to pickle file: {e}")

Current working directory: /home/alvar/CascadeProjects/windsurf-project/RAG/notebooks
PKL_PATH resolves to: /home/alvar/CascadeProjects/windsurf-project/RAG/notebooks/funds_backup.pkl
Successfully saved 93 funds to pickle file
File size: 394.20 KB


## Vanguard Specialized Funds

In [None]:
import pandas as pd
from io import StringIO

import sys
from pathlib import Path
from src.simple_rag.extraction.parser import BlackRockFiling
from edgar import set_identity, Company


set_identity("luis.alvarez.conde@alumnos.upm.es")

ticker = "VDIGX"
fund = Company(ticker)
all_filings = fund.get_filings(form="N-CSR")


if all_filings:
    # 1. Find the most recent date in the entire history (e.g., "2024-12-31")
    latest_date_str = max(f.report_date for f in all_filings)
    
    # 2. Extract just the YEAR (e.g., "2024")
    target_year = latest_date_str[:4]
    
    # 3. Filter: Keep ALL filings where the report_date starts with that year
    # This captures the March, June, and December reports for that fiscal year
    latest_filings = [
        f for f in all_filings 
        if f.report_date and f.report_date.startswith(target_year)
    ]
    
    print("Found filings: ", len(latest_filings), "for year: ", target_year)



performance_funds = []
specialized_funds = set()
df_performance = []
abort = False
for filing in latest_filings:

    html_content = filing.html()
    
    parser = BlackRockFiling(html_content)
    funds = parser.get_funds()
    count = 0
    for fund in funds:
        if fund.performance_table is not None:
            if fund.ticker not in performance_funds: 
                performance_funds.append(fund.ticker)
                count += 1
        if fund.ticker not in specialized_funds:
            specialized_funds.add(fund.ticker)
        else:
            print("Exiting filing, repeated ticker found: ", fund.ticker)
            abort = True
            break
    if abort:
        break
    df_performance.append(parser.get_financial_highlights())

    print(count)
    print("Adding funds: ", len(funds))
    funds_total.extend(funds)

print("Specialized funds: ", len(specialized_funds))
print(len(performance_funds))
print(performance_funds)
print(len(df_performance))



Found filings:  2 for year:  2025
Processing: Dividend Growth Fund
Extracting context:  From2024-02-01to2025-01-31_C000008004Member
Tag not found:  dei:SecurityExchangeName From2024-02-01to2025-01-31_C000008004Member
Processing: Energy Fund
Extracting context:  From2024-02-01to2025-01-31_C000008005Member
Tag not found:  dei:SecurityExchangeName From2024-02-01to2025-01-31_C000008005Member
Processing: Energy Fund
Extracting context:  From2024-02-01to2025-01-31_C000008006Member
Tag not found:  dei:SecurityExchangeName From2024-02-01to2025-01-31_C000008006Member
Processing: Health Care Fund
Extracting context:  From2024-02-01to2025-01-31_C000008007Member
Tag not found:  dei:SecurityExchangeName From2024-02-01to2025-01-31_C000008007Member
Processing: Health Care Fund
Extracting context:  From2024-02-01to2025-01-31_C000008008Member
Tag not found:  dei:SecurityExchangeName From2024-02-01to2025-01-31_C000008008Member
Processing: Dividend Appreciation Index Fund
Extracting context:  From2024-02

In [None]:
from src.simple_rag.models.fund import FinancialHighlights
import pandas as pd
from IPython.display import display

print(len(df_performance))

total_df = pd.concat([df_performance[0], df_performance[1]], ignore_index=True)

returns_lookup = total_df.copy()

# Optional: Clean the total_return column (remove % sign if needed)
print(returns_lookup.head())
display(returns_lookup['fund_name'].unique())

numeric_columns = ['portfolio_turnover', 'expense_ratio', 'net_assets', 
                   'nav_beginning', 'nav_end', 'net_income_ratio', 'distribution_shares']


for col in numeric_columns:
    if col in returns_lookup.columns:
        if returns_lookup[col] is not None:  
            try:      
                returns_lookup[f'{col}_clean'] = (
                    returns_lookup[col]
                    .astype(str)
                    .str.replace('%', '')
                    .str.replace('$', '')
                    .str.replace(',', '')
                    .replace('N/A', '0')
                    .replace('', '0')
                    .replace('None', '0')
                    .astype(float)
                )
            except Exception as e:
                print(f"Error cleaning column '{col}': {str(e)}")
                print(returns_lookup[col].to_string())
count = 0
# Now you can efficiently match and update your funds
for fund_obj in funds_total:
    if fund_obj.ticker not in specialized_funds:
        continue
    print(f"\nProcessing fund object: {fund_obj.name} - {fund_obj.share_class}")
    
    # Initialize annual returns
    if not hasattr(fund_obj, 'annual_returns') or fund_obj.annual_returns is None:
        fund_obj.annual_returns = {}
    
    # Clean the name: remove "Vanguard" and strip whitespace
    name = fund_obj.name.replace("Vanguard", "").strip()
    print(f"Cleaned name: '{name}'")
    
    if "™" in name:
        name = name.replace("™", "")
    elif "®" in name:
        name = name.replace("®", "")
    # Find matching rows based on fund name
    name_matches = returns_lookup[returns_lookup['fund_name'].str.strip().str.lower() == name.lower()]
    if len(name_matches) == 0:
        print("  No name matches found for ticker: ", fund_obj.ticker)
        continue
    
    print(f"  Found {len(name_matches)} name matches")
    
    # Clean share class (remove trademark symbol)
    share_class = fund_obj.share_class
    if "™" in share_class:
        share_class = share_class.replace("™", "")
    
    # Now match share class
    share_class_matches = name_matches[
        name_matches['share_class'].str.contains(share_class, case=False, na=False, regex=False)]
    
    if len(share_class_matches) == 0:
        print(f"  No share class matches found for '{share_class}' ticker: ", fund_obj.ticker)
        print(f"  Available share classes: {name_matches}")
        print(f"  Found {len(share_class_matches)} matching records")
        count += 1
        # Add all matching returns
        for _, row in share_class_matches.iterrows():
            year = str(row['year'])
            highlights = FinancialHighlights(
                turnover=row.get('portfolio_turnover_clean', 0),
                expense_ratio=row.get('expense_ratio_clean', 0),
                total_return=row['total_return'],
                net_assets=row.get('net_assets_clean', 0),
                net_assets_value_begining=row.get('nav_beginning_clean', 0),
                net_assets_value_end=row.get('nav_end_clean', 0),
                net_income_ratio=row.get('net_income_ratio_clean', 0.0)
            )
            
            fund_obj.financial_highlights[year] = highlights
            print(f"  {year}: Total Return = {highlights.total_return}%, Expense Ratio = {highlights.expense_ratio}%, Net Assets = {highlights.net_assets}, Net Income Ratio = {highlights.net_income_ratio}, Turnover = {highlights.turnover}, Net Assets Value Begining = {highlights.net_assets_value_begining}, Net Assets Value End = {highlights.net_assets_value_end}")
        continue
    elif len(share_class_matches) > 5:
        print("  More than 5 share class matches found:")
        print(share_class_matches)
    
    
    print(f"  Found {len(share_class_matches)} matching records")
    count += 1
    # Add all matching returns
    for _, row in share_class_matches.iterrows():
        year = str(row['year'])
        highlights = FinancialHighlights(
            turnover=row.get('portfolio_turnover_clean', 0),
            expense_ratio=row.get('expense_ratio_clean', 0),
            total_return=row['total_return'],
            net_assets=row.get('net_assets_clean', 0),
            net_assets_value_begining=row.get('nav_beginning_clean', 0),
            net_assets_value_end=row.get('nav_end_clean', 0),
            net_income_ratio=row.get('net_income_ratio_clean', 0.0)
        )
        
        fund_obj.financial_highlights[year] = highlights
        print(f"  {year}: Total Return = {highlights.total_return}%, Expense Ratio = {highlights.expense_ratio}%, Net Assets = {highlights.net_assets}, Net Income Ratio = {highlights.net_income_ratio}, Turnover = {highlights.turnover}, Net Assets Value Begining = {highlights.net_assets_value_begining}, Net Assets Value End = {highlights.net_assets_value_end}")
print("count: ",count)
print("Total funds: ",specialized_funds)
    

2
              fund_name share_class  year  net_assets  nav_beginning  nav_end  \
0  Dividend Growth Fund        None  2025     50424.0          37.76    37.14   
1  Dividend Growth Fund        None  2024     52553.0          35.42    37.76   
2  Dividend Growth Fund        None  2023     53452.0          37.85    35.42   
3  Dividend Growth Fund        None  2022     54186.0          31.82    37.85   
4  Dividend Growth Fund        None  2021     45099.0          30.63    31.82   

   total_return  expense_ratio  net_income_ratio  portfolio_turnover  \
0         10.20           0.22              1.68                16.0   
1          9.11           0.29              1.74                 9.0   
2         -0.76           0.30              1.68                11.0   
3         25.66           0.27              1.56                15.0   
4          7.03           0.26              1.85                15.0   

  distribution_shares  
0                None  
1                None  
2     

array(['Dividend Growth Fund', 'Energy Fund', 'Health Care Fund',
       'Dividend Appreciation Index Fund', 'Real Estate Index Fund',
       'Real Estate II Index Fund', 'Global Capital Cycles Fund',
       'Global ESG Select Stock Fund'], dtype=object)


Processing fund object: Dividend Growth Fund - Investor Shares
Cleaned name: 'Dividend Growth Fund'
  Found 5 name matches
  No share class matches found for 'Investor Shares' ticker:  VDIGX
  Available share classes:               fund_name share_class  year  net_assets  nav_beginning  nav_end  \
0  Dividend Growth Fund        None  2025     50424.0          37.76    37.14   
1  Dividend Growth Fund        None  2024     52553.0          35.42    37.76   
2  Dividend Growth Fund        None  2023     53452.0          37.85    35.42   
3  Dividend Growth Fund        None  2022     54186.0          31.82    37.85   
4  Dividend Growth Fund        None  2021     45099.0          30.63    31.82   

   total_return  expense_ratio  net_income_ratio  portfolio_turnover  \
0         10.20           0.22              1.68                16.0   
1          9.11           0.29              1.74                 9.0   
2         -0.76           0.30              1.68                11.0   
3     

In [34]:
import sys
%reload_ext autoreload
sys.path.append('../src')


from simple_rag.extraction.parser import compute_annual_returns

for fund in funds_total:
    if fund.ticker in performance_funds:
        returns = compute_annual_returns(fund.performance_table)
        print("\nFinal Annual Returns:")
        fund.annual_returns = returns
        print(f"  {fund.ticker}: {returns}")
        for year, return_ in returns.items():
            print(fund.financial_highlights.keys())
            if year not in fund.financial_highlights.keys():
                new_highlight = FinancialHighlights(
                year=int(year),
                total_return=return_,
                turnover=0.0,
                expense_ratio=0.0,
                net_assets=0.0,
                net_assets_value_begining=0.0,
                net_assets_value_end=0.0,
                net_income_ratio=0.0
                )
                fund.financial_highlights[year] = new_highlight
                print(f"    {year}: {new_highlight}")

Detected format: Year (YYYY)
Found years: [np.int64(2015), np.int64(2016), np.int64(2017), np.int64(2018), np.int64(2019), np.int64(2020), np.int64(2021), np.int64(2022), np.int64(2023), np.int64(2024), np.int64(2025)]
  2016 Return: $10,619.00 -> $10,874.00 = 2.40%
  2017 Return: $10,874.00 -> $12,860.00 = 18.26%
  2018 Return: $12,860.00 -> $14,143.00 = 9.98%
  2019 Return: $14,143.00 -> $16,879.00 = 19.35%
  2020 Return: $16,879.00 -> $17,545.00 = 3.95%
  2021 Return: $17,545.00 -> $23,646.00 = 34.77%
  2022 Return: $23,646.00 -> $22,953.00 = -2.93%
  2023 Return: $22,953.00 -> $23,134.00 = 0.79%
  2024 Return: $23,134.00 -> $28,121.00 = 21.56%
  2025 Return: $28,121.00 -> $28,555.00 = 1.54%

Final Annual Returns:
  VDIGX: {'2016': 2.4, '2017': 18.26, '2018': 9.98, '2019': 19.35, '2020': 3.95, '2021': 34.77, '2022': -2.93, '2023': 0.79, '2024': 21.56, '2025': 1.54}
dict_keys([])
    2016: turnover=0.0 expense_ratio=0.0 total_return=2.4 net_assets=0.0 net_assets_value_begining=0.0 ne

In [35]:
for fund in funds_total:
    
    if isinstance(fund.net_assets, str):
        fund.net_assets = fund.net_assets.replace(",", "")
    if isinstance(fund.advisory_fees, str):
        fund.advisory_fees = fund.advisory_fees.replace(",", "")
    if isinstance(fund.n_holdings, str):
        fund.n_holdings = fund.n_holdings.replace(",", "")

    fund.n_holdings = int(fund.n_holdings)
    fund.turnover_rate = int(fund.turnover_rate)
    fund.expense_ratio = float(fund.expense_ratio)
    fund.net_assets = float(fund.net_assets) 
    fund.advisory_fees = float(fund.advisory_fees) 
    for prop in vars(fund):
        value = getattr(fund, prop)
        print(f"{prop}: {value} (type: {type(value).__name__})")

name: Vanguard Extended Market Index Fund (type: str)
registrant: Vanguard Index Funds (type: str)
context_id: FY2024_C000007779Member (type: str)
share_class: Investor Shares (type: ShareClassType)
ticker: VEXMX (type: str)
security_exchange: N/A (type: str)
costs_per_10k: 21 (type: str)
expense_ratio: 0.19 (type: float)
net_assets: 111156.0 (type: float)
turnover_rate: 11 (type: int)
advisory_fees: 1799.0 (type: float)
n_holdings: 3485 (type: int)
report_date: December 31, 2024 (type: str)
annual_returns: {} (type: dict)
performance: {} (type: dict)
avg_annual_returns:                                                    0       1        2  \
0                       Average Annual Total Returns     NaN      NaN   
1                                                NaN  1 Year  5 Years   
2                                    Investor Shares  16.76%    9.75%   
3                               S&P Completion Index  16.88%    9.77%   
4  Dow Jones U.S. Total Stock Market Float Adjust...  23.

In [36]:
import pickle
from pathlib import Path
import sys

# Add RAG directory to path
RAG_DIR = Path("/home/alvar/CascadeProjects/windsurf-project/RAG")
if str(RAG_DIR) not in sys.path:
    sys.path.insert(0, str(RAG_DIR))

# Define pickle file path
PKL_PATH = Path("./funds_backup.pkl")

print("Current working directory:", Path.cwd())
print("PKL_PATH resolves to:", PKL_PATH.resolve())

# Save to pickle file
try:
    with PKL_PATH.open("wb") as f:
        pickle.dump(funds_total, f)
    
    print(f"Successfully saved {len(funds_total)} funds to pickle file")
    print(f"File size: {PKL_PATH.stat().st_size / 1024:.2f} KB")
    
except Exception as e:
    print(f"Error saving to pickle file: {e}")

Current working directory: /home/alvar/CascadeProjects/windsurf-project/RAG/notebooks
PKL_PATH resolves to: /home/alvar/CascadeProjects/windsurf-project/RAG/notebooks/funds_backup.pkl
Successfully saved 107 funds to pickle file
File size: 458.88 KB


In [48]:
import pickle
from pathlib import Path
from dataclasses import is_dataclass, asdict
import pandas as pd
import sys
from pathlib import Path
RAG_DIR = Path("/home/alvar/CascadeProjects/windsurf-project/RAG")
if str(RAG_DIR) not in sys.path:
    sys.path.insert(0, str(RAG_DIR))


PKL_PATH = Path("./funds_backup.pkl")
print("Current working directory:", Path.cwd())
print("PKL_PATH resolves to:", PKL_PATH.resolve())
with PKL_PATH.open("rb") as f:
    funds_total = pickle.load(f)

print(f"Loaded {len(funds_total)} funds from pickle file")

Current working directory: /home/alvar/CascadeProjects/windsurf-project/RAG/notebooks
PKL_PATH resolves to: /home/alvar/CascadeProjects/windsurf-project/RAG/notebooks/funds_backup.pkl
Loaded 107 funds from pickle file


## Vanguard Whitehall Funds

In [49]:
import pandas as pd
from io import StringIO
import sys
from pathlib import Path
from src.simple_rag.extraction.parser import BlackRockFiling
from edgar import set_identity, Company


set_identity("luis.alvarez.conde@alumnos.upm.es")
ticker = "VMGRX"
fund = Company(ticker)
all_filings = fund.get_filings(form="N-CSR")


if all_filings:
    # 1. Find the most recent date in the entire history (e.g., "2024-12-31")
    latest_date_str = max(f.report_date for f in all_filings)
    
    # 2. Extract just the YEAR (e.g., "2024")
    target_year = latest_date_str[:4]
    
    # 3. Filter: Keep ALL filings where the report_date starts with that year
    # This captures the March, June, and December reports for that fiscal year
    latest_filings = [
        f for f in all_filings 
        if f.report_date and f.report_date.startswith(target_year)
    ]
    
    print("Found filings: ", len(latest_filings), "for year: ", target_year)



performance_funds = []
whitehall_funds = set()
df_performance = []
abort = False

for filing in latest_filings:

    html_content = filing.html()
    
    parser = BlackRockFiling(html_content)
    funds = parser.get_funds()
    count = 0
    for fund in funds:
        if fund.performance_table is not None:
            if fund.ticker not in performance_funds: 
                performance_funds.append(fund.ticker)
                count += 1

        if fund.ticker not in whitehall_funds:
            whitehall_funds.add(fund.ticker)
        else:
            print("Exiting filing, repeated ticker found: ", fund.ticker)
            abort = True
            break
    if abort:
        break

    df_performance.append(parser.get_financial_highlights())

    print(count)
    print("Adding funds: ", len(funds))
    funds_total.extend(funds)

print("Whitehall funds: ", len(whitehall_funds))
print(len(performance_funds))
print(whitehall_funds)
print(len(df_performance))


Found filings:  2 for year:  2025
Processing: Mid-Cap Growth Fund
Extracting context:  From2024-11-01to2025-10-31_C000012166Member
Tag not found:  dei:SecurityExchangeName From2024-11-01to2025-10-31_C000012166Member
Processing: Selected Value Fund
Extracting context:  From2024-11-01to2025-10-31_C000012167Member
Tag not found:  dei:SecurityExchangeName From2024-11-01to2025-10-31_C000012167Member
Processing: Emerging Markets Government Bond Index Fund
Extracting context:  From2024-11-01to2025-10-31_C000126408Member
Processing: Emerging Markets Government Bond Index Fund
Extracting context:  From2024-11-01to2025-10-31_C000126407Member
Tag not found:  dei:SecurityExchangeName From2024-11-01to2025-10-31_C000126407Member
Processing: Emerging Markets Government Bond Index Fund
Extracting context:  From2024-11-01to2025-10-31_C000126409Member
Tag not found:  dei:SecurityExchangeName From2024-11-01to2025-10-31_C000126409Member
Processing: Global Minimum Volatility Fund
Extracting context:  From2

In [56]:
from src.simple_rag.models.fund import FinancialHighlights
import pandas as pd
from IPython.display import display

print(len(df_performance))

total_df = pd.concat([df_performance[0], df_performance[1]], ignore_index=True)

returns_lookup = total_df.copy()

# Optional: Clean the total_return column (remove % sign if needed)
print(returns_lookup.head())
returns_lookup['fund_name'] = (
    returns_lookup['fund_name']
    .str.replace('\n', ' ', regex=False)
)
display(returns_lookup['fund_name'].unique())

numeric_columns = ['portfolio_turnover', 'expense_ratio', 'net_assets', 
                   'nav_beginning', 'nav_end', 'net_income_ratio', 'distribution_shares']


for col in numeric_columns:
    if col in returns_lookup.columns:
        if returns_lookup[col] is not None:  
            try:      
                returns_lookup[f'{col}_clean'] = (
                    returns_lookup[col]
                    .astype(str)
                    .str.replace('%', '')
                    .str.replace('$', '')
                    .str.replace(',', '')
                    .replace('N/A', '0')
                    .replace('', '0')
                    .replace('None', '0')
                    .astype(float)
                )
            except Exception as e:
                print(f"Error cleaning column '{col}': {str(e)}")
                print(returns_lookup[col].to_string())
count = 0
# Now you can efficiently match and update your funds
for fund_obj in funds_total:
    if fund_obj.ticker not in whitehall_funds:
        continue
    print(f"\nProcessing fund object: {fund_obj.name} - {fund_obj.share_class}")
    
    # Initialize annual returns
    if not hasattr(fund_obj, 'annual_returns') or fund_obj.annual_returns is None:
        fund_obj.annual_returns = {}
    
    # Clean the name: remove "Vanguard" and strip whitespace
    name = fund_obj.name.replace("Vanguard", "").strip()
    print(f"Cleaned name: '{name}'")
    
    if "™" in name:
        name = name.replace("™", "")
    elif "®" in name:
        name = name.replace("®", "")
    elif "\n" in name:
        name = name.replace("\n", " ")
        print(name)
    # Find matching rows based on fund name
    name_matches = returns_lookup[returns_lookup['fund_name'].str.strip().str.lower() == name.lower()]
    if len(name_matches) == 0:
        print("  No name matches found for ticker: ", fund_obj.ticker)
        continue
    
    print(f"  Found {len(name_matches)} name matches")
    
    # Clean share class (remove trademark symbol)
    share_class = fund_obj.share_class
    if "™" in share_class:
        share_class = share_class.replace("™", "")
    
    # Now match share class
    share_class_matches = name_matches[
        name_matches['share_class'].str.contains(share_class, case=False, na=False, regex=False)]
    
    if len(share_class_matches) == 0:
        print(f"  No share class matches found for '{share_class}' ticker: ", fund_obj.ticker)
        print(f"  Found {len(name_matches)} name records")
        count += 1
        # Add all matching returns
        for _, row in name_matches.iterrows():
            year = str(row['year'])
            highlights = FinancialHighlights(
                turnover=row.get('portfolio_turnover_clean', 0),
                expense_ratio=row.get('expense_ratio_clean', 0),
                total_return=row['total_return'],
                net_assets=row.get('net_assets_clean', 0),
                net_assets_value_begining=row.get('nav_beginning_clean', 0),
                net_assets_value_end=row.get('nav_end_clean', 0),
                net_income_ratio=row.get('net_income_ratio_clean', 0.0)
            )
            
            fund_obj.financial_highlights[year] = highlights
            print(f"  {year}: Total Return = {highlights.total_return}%, Expense Ratio = {highlights.expense_ratio}%, Net Assets = {highlights.net_assets}, Net Income Ratio = {highlights.net_income_ratio}, Turnover = {highlights.turnover}, Net Assets Value Begining = {highlights.net_assets_value_begining}, Net Assets Value End = {highlights.net_assets_value_end}")
        continue
    elif len(share_class_matches) > 5:
        print("  More than 5 share class matches found:")
        print(share_class_matches)
    
    
    print(f"  Found {len(share_class_matches)} matching records")
    count += 1
    # Add all matching returns
    for _, row in share_class_matches.iterrows():
        year = str(row['year'])
        highlights = FinancialHighlights(
            turnover=row.get('portfolio_turnover_clean', 0),
            expense_ratio=row.get('expense_ratio_clean', 0),
            total_return=row['total_return'],
            net_assets=row.get('net_assets_clean', 0),
            net_assets_value_begining=row.get('nav_beginning_clean', 0),
            net_assets_value_end=row.get('nav_end_clean', 0),
            net_income_ratio=row.get('net_income_ratio_clean', 0.0)
        )
        
        fund_obj.financial_highlights[year] = highlights
        print(f"  {year}: Total Return = {highlights.total_return}%, Expense Ratio = {highlights.expense_ratio}%, Net Assets = {highlights.net_assets}, Net Income Ratio = {highlights.net_income_ratio}, Turnover = {highlights.turnover}, Net Assets Value Begining = {highlights.net_assets_value_begining}, Net Assets Value End = {highlights.net_assets_value_end}")
print("count: ",count)
print("Total funds: ",whitehall_funds)
    

2
             fund_name share_class  year  net_assets  nav_beginning  nav_end  \
0  Mid-Cap Growth Fund        None  2025      3116.0          26.21    29.55   
1  Mid-Cap Growth Fund        None  2024      3042.0          19.38    26.21   
2  Mid-Cap Growth Fund        None  2023      2530.0          19.24    19.38   
3  Mid-Cap Growth Fund        None  2022      2956.0          38.72    19.24   
4  Mid-Cap Growth Fund        None  2021      5290.0          29.89    38.72   

   total_return  expense_ratio  net_income_ratio  portfolio_turnover  \
0         14.77           0.32              0.26              1285.0   
1         35.77           0.33              0.37                69.0   
2          0.99           0.37              0.37                87.0   
3        -32.22           0.35              0.14                71.0   
4         37.68           0.33             -0.04                98.0   

  distribution_shares  
0                None  
1                None  
2           

array(['Mid-Cap Growth Fund', 'Selected Value Fund',
       'Emerging Markets Government Bond Index Fund',
       'Global Minimum Volatility Fund',
       'International Dividend Appreciation Index Fund',
       'International High Dividend Yield Index Fund',
       'International Dividend Growth Fund',
       'Advice Select International Growth Fund',
       'Advice Select Dividend Growth Fund',
       'Advice Select Global Value Fund', 'International Explorer Fund',
       'High Dividend Yield Index Fund'], dtype=object)


Processing fund object: Mid-Cap Growth Fund - Investor Shares
Cleaned name: 'Mid-Cap Growth Fund'
  Found 5 name matches
  No share class matches found for 'Investor Shares' ticker:  VMGRX
  Found 5 name records
  2025: Total Return = 14.77%, Expense Ratio = 0.32%, Net Assets = 3116.0, Net Income Ratio = 0.26, Turnover = 1285.0, Net Assets Value Begining = 26.21, Net Assets Value End = 29.55
  2024: Total Return = 35.77%, Expense Ratio = 0.33%, Net Assets = 3042.0, Net Income Ratio = 0.37, Turnover = 69.0, Net Assets Value Begining = 19.38, Net Assets Value End = 26.21
  2023: Total Return = 0.99%, Expense Ratio = 0.37%, Net Assets = 2530.0, Net Income Ratio = 0.37, Turnover = 87.0, Net Assets Value Begining = 19.24, Net Assets Value End = 19.38
  2022: Total Return = -32.22%, Expense Ratio = 0.35%, Net Assets = 2956.0, Net Income Ratio = 0.14, Turnover = 71.0, Net Assets Value Begining = 38.72, Net Assets Value End = 19.24
  2021: Total Return = 37.68%, Expense Ratio = 0.33%, Net Ass

In [57]:
import sys
%reload_ext autoreload
sys.path.append('../src')


from simple_rag.extraction.parser import compute_annual_returns

for fund in funds_total:
    if fund.ticker in performance_funds:
        returns = compute_annual_returns(fund.performance_table)
        print("\nFinal Annual Returns:")
        fund.annual_returns = returns
        print(f"  {fund.ticker}: {returns}")
        for year, return_ in returns.items():
            print(fund.financial_highlights.keys())
            if year not in fund.financial_highlights.keys():
                new_highlight = FinancialHighlights(
                year=int(year),
                total_return=return_,
                turnover=0.0,
                expense_ratio=0.0,
                net_assets=0.0,
                net_assets_value_begining=0.0,
                net_assets_value_end=0.0,
                net_income_ratio=0.0
                )
                fund.financial_highlights[year] = new_highlight
                print(f"    {year}: {new_highlight}")

Detected format: Year (YYYY)
Found years: [np.int64(2015), np.int64(2016), np.int64(2017), np.int64(2018), np.int64(2019), np.int64(2020), np.int64(2021), np.int64(2022), np.int64(2023), np.int64(2024), np.int64(2025)]
  2016 Return: $10,000.00 -> $9,451.00 = -5.49%
  2017 Return: $9,451.00 -> $11,595.00 = 22.69%
  2018 Return: $11,595.00 -> $12,710.00 = 9.62%
  2019 Return: $12,710.00 -> $14,434.00 = 13.56%
  2020 Return: $14,434.00 -> $17,079.00 = 18.32%
  2021 Return: $17,079.00 -> $23,515.00 = 37.68%
  2022 Return: $23,515.00 -> $15,939.00 = -32.22%
  2023 Return: $15,939.00 -> $16,097.00 = 0.99%
  2024 Return: $16,097.00 -> $21,856.00 = 35.78%
  2025 Return: $21,856.00 -> $25,084.00 = 14.77%

Final Annual Returns:
  VMGRX: {'2016': -5.49, '2017': 22.69, '2018': 9.62, '2019': 13.56, '2020': 18.32, '2021': 37.68, '2022': -32.22, '2023': 0.99, '2024': 35.78, '2025': 14.77}
dict_keys(['2025', '2024', '2023', '2022', '2021'])
    2016: turnover=0.0 expense_ratio=0.0 total_return=-5.49 

  df['parsed_date'] = pd.to_datetime(df[date_col], errors='coerce')
  df['parsed_date'] = pd.to_datetime(df[date_col], errors='coerce')
  df['parsed_date'] = pd.to_datetime(df[date_col], errors='coerce')
  df['parsed_date'] = pd.to_datetime(df[date_col], errors='coerce')
  df['parsed_date'] = pd.to_datetime(df[date_col], errors='coerce')


In [58]:
for fund in funds_total:
    
    if isinstance(fund.net_assets, str):
        fund.net_assets = fund.net_assets.replace(",", "")
    if isinstance(fund.advisory_fees, str):
        fund.advisory_fees = fund.advisory_fees.replace(",", "")
    if isinstance(fund.n_holdings, str):
        fund.n_holdings = fund.n_holdings.replace(",", "")

    fund.n_holdings = int(fund.n_holdings)
    fund.turnover_rate = int(fund.turnover_rate)
    fund.expense_ratio = float(fund.expense_ratio)
    fund.net_assets = float(fund.net_assets) 
    fund.advisory_fees = float(fund.advisory_fees) 
    for prop in vars(fund):
        value = getattr(fund, prop)
        print(f"{prop}: {value} (type: {type(value).__name__})")

name: Vanguard Extended Market Index Fund (type: str)
registrant: Vanguard Index Funds (type: str)
context_id: FY2024_C000007779Member (type: str)
share_class: Investor Shares (type: ShareClassType)
ticker: VEXMX (type: str)
security_exchange: N/A (type: str)
costs_per_10k: 21 (type: str)
expense_ratio: 0.19 (type: float)
net_assets: 111156.0 (type: float)
turnover_rate: 11 (type: int)
advisory_fees: 1799.0 (type: float)
n_holdings: 3485 (type: int)
report_date: December 31, 2024 (type: str)
annual_returns: {} (type: dict)
performance: {} (type: dict)
avg_annual_returns:                                                    0       1        2  \
0                       Average Annual Total Returns     NaN      NaN   
1                                                NaN  1 Year  5 Years   
2                                    Investor Shares  16.76%    9.75%   
3                               S&P Completion Index  16.88%    9.77%   
4  Dow Jones U.S. Total Stock Market Float Adjust...  23.

In [59]:
import pickle
from pathlib import Path
import sys

# Add RAG directory to path
RAG_DIR = Path("/home/alvar/CascadeProjects/windsurf-project/RAG")
if str(RAG_DIR) not in sys.path:
    sys.path.insert(0, str(RAG_DIR))

# Define pickle file path
PKL_PATH = Path("./funds_backup.pkl")

print("Current working directory:", Path.cwd())
print("PKL_PATH resolves to:", PKL_PATH.resolve())

# Save to pickle file
try:
    with PKL_PATH.open("wb") as f:
        pickle.dump(funds_total, f)
    
    print(f"Successfully saved {len(funds_total)} funds to pickle file")
    print(f"File size: {PKL_PATH.stat().st_size / 1024:.2f} KB")
    
except Exception as e:
    print(f"Error saving to pickle file: {e}")

Current working directory: /home/alvar/CascadeProjects/windsurf-project/RAG/notebooks
PKL_PATH resolves to: /home/alvar/CascadeProjects/windsurf-project/RAG/notebooks/funds_backup.pkl
Successfully saved 125 funds to pickle file
File size: 546.99 KB


In [2]:
import pickle
from pathlib import Path
from dataclasses import is_dataclass, asdict
import pandas as pd
import sys
from pathlib import Path
RAG_DIR = Path("/home/alvar/CascadeProjects/windsurf-project/RAG/src")
if str(RAG_DIR) not in sys.path:
    sys.path.insert(0, str(RAG_DIR))


PKL_PATH = Path("./funds_backup.pkl")
print("Current working directory:", Path.cwd())
print("PKL_PATH resolves to:", PKL_PATH.resolve())
with PKL_PATH.open("rb") as f:
    funds_total = pickle.load(f)

print(f"Loaded {len(funds_total)} funds from pickle file")

Current working directory: /home/alvar/CascadeProjects/windsurf-project/RAG/notebooks
PKL_PATH resolves to: /home/alvar/CascadeProjects/windsurf-project/RAG/notebooks/funds_backup.pkl
Loaded 125 funds from pickle file


## Ishares

In [3]:
from concurrent.futures import ProcessPoolExecutor, as_completed
import pandas as pd
from typing import List
import sys
from pathlib import Path
from tqdm import tqdm
%reload_ext autoreload
from simple_rag.extraction.parser import BlackRockFiling
from edgar import set_identity, Company

set_identity("luis.alvarez.conde@alumnos.upm.es")

ticker = "HEZU"
fund = Company(ticker)
all_filings = fund.get_filings(form="N-CSR")

def process_single_filing_multiprocess(filing_data):
    """
    Process a single filing (for multiprocessing).
    Note: Must pass serializable data, not the filing object directly
    """
    try:
        # Import inside function for multiprocessing
        import sys
        from pathlib import Path
        sys.path.append('../src')
        from simple_rag.extraction.parser import BlackRockFiling
        
        html_content, report_date = filing_data
        parser = BlackRockFiling(html_content)
        funds = parser.get_funds()
        
        performance_funds = []
        df_performance = None        
        count = 0
        for fund in funds:

            if fund.performance_table is not None:
                if fund.ticker not in performance_funds:
                    performance_funds.append(fund.ticker)
                    count += 1
            if fund.ticker not in ishares_funds:
                ishares_funds.append(fund.ticker)
        
            print("Calling get_financial_highlights2")
        df_performance = parser.get_financial_highlights2()
        
        print(f"Filing {report_date}: Found {count} funds with performance tables, Total funds: {len(funds)}")
        
        return {
            'funds': funds,
            'performance_tickers': performance_funds,
            'df_performance': df_performance,
            'report_date': report_date
        }
    except Exception as e:
        print(f"Error processing filing: {e}")
        return None

if all_filings:
    unique_dates = sorted({f.report_date for f in all_filings if f.report_date})
    print("Unique report dates:", unique_dates)
    
    # Filter for filings from 2024-08-31 onward
    cutoff_date = "2024-09-31"
    latest_filings = [
        f for f in all_filings 
        if f.report_date and f.report_date >= cutoff_date
    ]
    
    print("Found filings: ", len(latest_filings), "from", cutoff_date, "onward")
    
    # Optional: Show the dates of filtered filings
    print("Filtered filing dates:", sorted({f.report_date for f in latest_filings}))

# Prepare data for multiprocessing (fetch HTML first)
filing_data_list = []
failed_filings = []
for filing in latest_filings:
    try:
        html_content = filing.html()
        if html_content:  # Only add if HTML content exists
            filing_data_list.append((html_content, filing.report_date))
        else:
            print(f"⚠️  No HTML content for filing: {filing.report_date}")
            failed_filings.append(filing)
    except ValueError as e:
        print(f"❌ Error processing filing {filing.report_date}: {e}")
        failed_filings.append(filing)
    except Exception as e:
        print(f"❌ Unexpected error for filing {filing.report_date}: {e}")
        failed_filings.append(filing)
print(f"✅ Successfully prepared {len(filing_data_list)} filings")
print(f"❌ Failed to prepare {len(failed_filings)} filings")
# Continue with successful filings only

performance_funds = []
df_performances = []
ishares_funds = []

# Use ProcessPoolExecutor
with ProcessPoolExecutor() as executor:
    future_to_data = {executor.submit(process_single_filing_multiprocess, data): data 
                      for data in filing_data_list}
    
    for future in tqdm(as_completed(future_to_data), total=len(filing_data_list), desc="Processing filings"):
        result = future.result()
        if result:
            ishares_funds.extend(result['funds'])
            funds_total.extend(result['funds'])
            performance_funds.extend(result['performance_tickers'])
            
            if result['df_performance'] is not None:
                df_performances.append(result['df_performance'])

print(len(df_performances))
print(f"Total funds processed: {len(ishares_funds)}")


Unique report dates: ['2003-04-30', '2003-07-31', '2004-02-29', '2004-03-31', '2004-04-30', '2004-07-31', '2005-02-28', '2005-03-31', '2005-04-30', '2005-07-31', '2006-02-28', '2006-03-31', '2006-04-30', '2006-07-31', '2007-02-28', '2007-03-31', '2007-04-30', '2007-07-31', '2008-02-29', '2008-03-31', '2008-04-30', '2008-07-31', '2009-02-28', '2009-03-31', '2009-04-30', '2009-07-31', '2009-08-31', '2010-02-28', '2010-03-31', '2010-04-30', '2010-07-31', '2010-08-31', '2011-02-28', '2011-03-31', '2011-04-30', '2011-07-31', '2011-08-31', '2011-10-31', '2012-02-29', '2012-03-31', '2012-04-30', '2012-07-31', '2012-08-31', '2012-10-31', '2013-02-28', '2013-03-31', '2013-04-30', '2013-07-31', '2013-08-31', '2013-10-31', '2014-02-28', '2014-03-31', '2014-04-30', '2014-07-31', '2014-08-31', '2014-10-31', '2015-02-28', '2015-03-31', '2015-04-30', '2015-07-31', '2015-08-31', '2015-10-31', '2016-02-29', '2016-03-31', '2016-04-30', '2016-07-31', '2016-08-31', '2016-10-31', '2017-02-28', '2017-03-31'

Processing filings:   0%|          | 0/17 [00:00<?, ?it/s]

Processing: iShares Currency Hedged MSCI Eurozone ETF
Extracting context:  FY2025_C000141929Member
Tag not found:  dei:SecurityExchangeName FY2025_C000141929Member
Unknown Table:        0                                                  1
0  ​(a)  The underlying fund is iShares MSCI Eurozone ETF.
1  ​(b)                       Excludes money market funds.
Unknown table type:       0                                                  1
0  ​(a)  The underlying fund is iShares MSCI Eurozone ETF.
1  ​(b)                       Excludes money market funds.
Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Processing: iShares Currency Hedged MSCI Japan ETF
Extracting context:  FY2025_C000133234Member
Tag not found:  dei:SecurityExchangeName FY2025_C000133234Member
Unknown Table:        0                                               1
0  ​(a)  The underlying fund is iShares MSCI Japan ETF.
1  ​(b)                    Excludes

Processing filings:   6%|▌         | 1/17 [00:09<02:24,  9.06s/it]

Unknown table type:       Footnote                   Description
0  Footnote(a)  Excludes money market funds.
Tag not found:  dei:SecurityExchangeName FY2025_C000049094Member
Unknown Table:        0                             1
0  ​(a)  Excludes money market funds.Processing: iShares ESG Aware 60/40 Balanced Allocation ETF
Extracting context: 
 Unknown table type:       0                             1
0  ​(a)  Excludes money market funds.From2024-08-01to2025-07-31_C000219701Member

Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Tag not found:  Processing: iShares MSCI India Small-Cap ETFoef:ClassName 
From2024-05-01to2025-04-30_C000012098Member
Extracting context:  FY2025_C000106876Member
Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Calling get_financial_highlights2
Calling get_financial_highlights2
Calling get_financial_highlights2
Ca

Processing filings:  12%|█▏        | 2/17 [00:10<01:06,  4.45s/it]

Unknown Table:        0                             1
0  ​(a)  Excludes money market funds.
Unknown table type:       0                             1
0  ​(a)  Excludes money market funds.
Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Processing: iShares MSCI Indonesia ETF
Extracting context:  FY2025_C000087423Member
Tag not found:  oef:ClassName From2024-05-01to2025-04-30_C000012101Member
Unknown Table:        Footnote                   Description
0  Footnote(a)  Excludes money market funds.
Tag not found:  dei:SecurityExchangeNameUnknown table type:       Footnote                   Description
0  Footnote(a)  Excludes money market funds. 
FY2025_C000087423Member
Tag not found:  oef:ClassName From2024-05-01to2025-04-30_C000012060Member
Processing: iShares ESG Aware 80/20 Aggressive Allocation ETF
Extracting context:  From2024-08-01to2025-07-31_C000219702Member
Processing: iShares Core 30/70 Conservative Alloca

Processing filings:  18%|█▊        | 3/17 [00:14<00:59,  4.25s/it]

Unknown Table:        Footnote                   Description
0  Footnote(a)  Excludes money market funds.
Unknown table type:       Footnote                   Description
0  Footnote(a)  Excludes money market funds.
Tag not found:  oef:FactorsAffectingPerfTextBlock From2024-05-01to2025-04-30_C000012060Member
Processing: iShares Core 60/40 Balanced Allocation ETF
Tag not found: Extracting context:   From2024-08-01to2025-07-31_C000069399Memberdei:SecurityExchangeName 
FY2025_C000099165Member
Unknown Table:  Tag not found:       0                             1
0  ​(a)  Excludes money market funds. oef:ClassName
 From2024-05-01to2025-04-30_C000012198Member
Unknown table type:       0                             1
0  ​(a)  Excludes money market funds.
Failed to extract tables from block:  Tag not found: oef:LineGraphTableTextBlock
No tables found for block:   oef:LineGraphTableTextBlockoef:ClassName 
From2024-08-01to2025-07-31_C000153271Member
Processing: iShares Select Dividend ETF
Extract

Processing filings:  24%|██▎       | 4/17 [00:18<00:52,  4.04s/it]

Tag not found:  oef:ClassName From2024-08-01to2025-07-31_C000069400Member
Tag not found:  dei:SecurityExchangeName FY2025_C000235105Member
Unknown Table:        0                             1
0  ​(a)  Excludes money market funds.
Unknown table type:       0                             1
0  ​(a)  Excludes money market funds.
Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Processing: iShares Copper and Metals Mining ETF
Extracting context:  FY2025_C000241778Member
Tag not found:  oef:ClassName From2024-08-01to2025-07-31_C000216288Member
Tag not found:  Unknown Table: dei:SecurityExchangeName  FY2025_C000012099Member      0                             1
0  ​(a)  Excludes money market funds.

Unknown table type:       0                             1
0  ​(a)  Excludes money market funds.
Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Processi

Processing filings:  29%|██▉       | 5/17 [00:20<00:41,  3.48s/it]

 invalid literal for int() with base 10: ''
ValueError: invalid literal for int() with base 10: ''
ValueError: invalid literal for int() with base 10: ''
ValueError: invalid literal for int() with base 10: ''
ValueError: invalid literal for int() with base 10: ''
ValueError: invalid literal for int() with base 10: ''
Processing: iShares Biotechnology ETF
ValueError:Extracting context:   invalid literal for int() with base 10: ''From2024-04-01to2025-03-31_C000012080MemberTag not found: 

 oef:ClassName From2024-08-01to2025-07-31_C000112640MemberValueError:
 invalid literal for int() with base 10: ''
Total funds extracted: 9
  iShares Core U.S. REIT ETF - ETF Shares: 10 years
  iShares Core Dividend ETF - ETF Shares: 10 years
  iShares Core Dividend Growth ETF - ETF Shares: 10 years
  iShares Core High Dividend ETF - ETF Shares: 10 years
  iShares Select Dividend ETF - ETF Shares: 10 years
  iShares Morningstar Mid-Cap Value ETF - ETF Shares: 5 years
  iShares Morningstar Small-Cap ETF -

Processing filings:  35%|███▌      | 6/17 [00:21<00:27,  2.52s/it]

Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlockUnknown Table: 
       Footnote                   Description
0  Footnote(a)  Excludes money market funds.
Unknown table type:       Footnote                   Description
0  Footnote(a)  Excludes money market funds.
Processing: iShares Future AI & Tech ETF
Extracting context:  FY2025_C000201209Member
Processing: iShares U.S. Broker-Dealers & Securities Exchanges ETF
Extracting context:  From2024-04-01to2025-03-31_C000025774Member
Processing: iShares ESG Optimized MSCI USA Min Vol Factor ETF
Extracting context:  From2024-08-01to2025-07-31_C000231047Member
Tag not found:  oef:FactorsAffectingPerfTextBlock From2024-05-01to2025-04-30_C000038163Member
Tag not found:  oef:ClassName From2024-04-01to2025-03-31_C000012080Member
Tag not found:  dei:SecurityExchangeName FY2025_C000201209Member
Unknown Table:        0                             1
0  ​(a)  Excludes money market

Processing filings:  41%|████      | 7/17 [00:59<02:23, 14.36s/it]

Processing: iShares North American Natural Resources ETF
Extracting context:  From2024-04-01to2025-03-31_C000012086Member
Tag not found:  Unknown Table: oef:FactorsAffectingPerfTextBlock        Footnote                                        Description
0    Footnote*  Credit quality ratings shown reflect the ratin...
1  Footnote(a)                       Excludes money market funds.From2024-05-01to2025-04-30_C000012051Member

Unknown table type:       Footnote                                        Description
0    Footnote*  Credit quality ratings shown reflect the ratin...
1  Footnote(a)                       Excludes money market funds.
Processing: iShares iBonds Dec 2028 Term Muni Bond ETF
Extracting context:  From2023-11-01to2024-10-31_C000210858Member
Unknown Table:        Footnote                   Description
0  Footnote(a)  Excludes money market funds.
Unknown table type:       Footnote                   Description
0  Footnote(a)  Excludes money market funds.
Tag not found:  

Processing filings:  47%|████▋     | 8/17 [01:08<01:51, 12.40s/it]

Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Processing: iShares Core MSCI Total International Stock ETF
Extracting context:  FY2025_C000119716Member
Tag not found:  oef:FactorsAffectingPerfTextBlock From2024-04-01to2025-03-31_C000025771Member
Processing: iShares U.S. Financials ETF
Extracting context:  From2024-05-01to2025-04-30_C000012053Member
Tag not found:  oef:ClassName From2024-04-01to2025-03-31_C000012073Member
Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Processing: iShares 0-5 Year Investment Grade Corporate Bond ETF
Extracting context:  FY2024_C000131292Member
Unknown Table:        0                                                  1
0  ​(a)                       Excludes money market funds.
1    ​*  Credit quality ratings shown reflect the ratin...
Unknown table type:       0                                                

Processing filings:  53%|█████▎    | 9/17 [01:31<02:06, 15.79s/it]

Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Processing: iShares Future Cloud 5G and Tech ETF
Extracting context:  FY2025_C000210492Member
Tag not found:  oef:ClassName From2024-04-01to2025-03-31_C000012075Member
Tag not found:  oef:ClassName From2024-04-01to2025-03-31_C000012039Member
Tag not found:  dei:SecurityExchangeName FY2024_C000236812Member
Unknown Table:        0                                                  1
0  ​(a)  The underlying fund is iShares 20+ Year Treasu...
1  ​(b)                       Excludes money market funds.
Unknown table type:       0                                                  1
0  ​(a)  The underlying fund is iShares 20+ Year Treasu...
1  ​(b)                       Excludes money market funds.
Tag not found:  oef:ClassName From2023-11-01to2024-10-31_C000217188Member
Tag not found:  dei:SecurityExchangeName FY2024_C000194633Member
Unknown Table:        0                   

Processing filings:  59%|█████▉    | 10/17 [01:41<01:37, 13.99s/it]

Tag not found:  oef:ClassName From2024-04-01to2025-03-31_C000254701Member
Tag not found:  dei:SecurityExchangeName FY2024_C000153287Member
Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Processing: iShares International Equity Factor ETF
Extracting context:  FY2025_C000154547Member
Unknown Table:        0                             1
0  ​(a)  Excludes money market funds.
Unknown table type:       0                             1
0  ​(a)  Excludes money market funds.
Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Processing: iShares Core 1-5 Year USD Bond ETF
Extracting context:  FY2024_C000119711Member
Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Processing: iShares 1-5 Year Investment Grade Corporate Bond ETF
Extracting context:  Unknown Table: FY2025_C000037539

Processing filings:  65%|██████▍   | 11/17 [01:43<01:02, 10.50s/it]

Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Processing: iShares International Small-Cap Equity Factor ETF
Extracting context:  FY2025_C000154548Member
Unknown Table:        Footnote                                        Description
0    Footnote*  Credit quality ratings shown reflect the ratin...
1  Footnote(a)                       Excludes money market funds.
Unknown table type:       Footnote                                        Description
0    Footnote*  Credit quality ratings shown reflect the ratin...
1  Footnote(a)                       Excludes money market funds.
Processing: iShares iBonds Dec 2032 Term Treasury ETF
Extracting context:  From2023-11-01to2024-10-31_C000236733Member
Processing: iShares U.S. Digital Infrastructure and Real Estate ETF
Extracting context:  From2024-04-01to2025-03-31_C000012083Member
Tag not found:  dei:SecurityExchangeName FY2024_C000161648Member
Unknown Table:        

Processing filings:  71%|███████   | 12/17 [01:46<00:40,  8.19s/it]

Tag not found:  oef:ClassName From2023-11-01to2024-10-31_C000243160Member
Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Processing: iShares MSCI ACWI Low Carbon Target ETF
Extracting context:  FY2025_C000149539Member
Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Processing: iShares Floating Rate Bond ETF
Extracting context:  FY2024_C000102031Member
Unknown Table:        Footnote                                        Description
0    Footnote*  Credit quality ratings shown reflect the ratin...
1  Footnote(a)                       Excludes money market funds.
Unknown table type:       Footnote                                        Description
0    Footnote*  Credit quality ratings shown reflect the ratin...
1  Footnote(a)                       Excludes money market funds.
Processing: iShares iBonds 2028 Term High Yield and Income ETF
Ex

Processing filings:  76%|███████▋  | 13/17 [02:19<01:02, 15.53s/it]

Unknown Table:        Footnote                                        Description
0    Footnote*  Credit quality ratings shown reflect the ratin...
1  Footnote(a)                       Excludes money market funds.
2  Footnote(b)                          Rounds to less than 0.1%.
Unknown table type:       Footnote                                        Description
0    Footnote*  Credit quality ratings shown reflect the ratin...
1  Footnote(a)                       Excludes money market funds.
2  Footnote(b)                          Rounds to less than 0.1%.
Tag not found:  dei:SecurityExchangeName FY2024_C000152179Member
Unknown Table:        0                                                  1
0  ​(a)                       Excludes money market funds.
1    ​*  Credit quality ratings shown reflect the ratin...
Unknown table type:       0                                                  1
0  ​(a)                       Excludes money market funds.
1    ​*  Credit quality ratings shown re

Processing filings:  82%|████████▏ | 14/17 [02:21<00:34, 11.39s/it]

Tag not found:  dei:SecurityExchangeName FY2024_C000152180Member
Tag not found:  dei:SecurityExchangeName FY2024_C000204503MemberUnknown Table: 
       0                                                  1
0  ​(a)                       Excludes money market funds.
1    ​*  Credit quality ratings shown reflect the ratin...
Unknown table type:       0                                                  1
0  ​(a)                       Excludes money market funds.
1    ​*  Credit quality ratings shown reflect the ratin...
Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Processing: iShares iBonds Dec 2026 Term Corporate ETF
Extracting context:  FY2024_C000173141Member
Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Processing: iShares iBoxx $ Investment Grade Corporate Bond ETF
Extracting context:  FY2025_C000012091Member
Unknown Table:        0    

Processing filings:  88%|████████▊ | 15/17 [02:38<00:26, 13.18s/it]

No data obtained
Tag not found:  dei:SecurityExchangeName FY2024_C000249961Member
Tag not found:  oef:FactorsAffectingPerfTextBlock FY2024_C000249961Member
Unknown Table:        0                                                  1
0  ​(a)                       Excludes money market funds.
1    ​*  Credit quality ratings shown reflect the ratin...
Unknown table type:       0                                                  1
0  ​(a)                       Excludes money market funds.
1    ​*  Credit quality ratings shown reflect the ratin...
No data obtained
Processing: iShares iBonds Dec 2031 Term Corporate ETF
Extracting context:  FY2024_C000228040Member
Tag not found:  dei:SecurityExchangeName FY2024_C000228040Member
Unknown Table:        0                                                  1
0  ​(a)                       Excludes money market funds.
1    ​*  Credit quality ratings shown reflect the ratin...
Unknown table type:       0                                                  1


Processing filings:  94%|█████████▍| 16/17 [03:36<00:26, 26.62s/it]

Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Failed to extract tables from block:  oef:AvgAnnlRtrTableTextBlock
No tables found for block:  oef:AvgAnnlRtrTableTextBlock
Processing: iShares Treasury Floating Rate Bond ETF
Extracting context:  FY2024_C000137225Member
Tag not found:  dei:SecurityExchangeName FY2024_C000137225Member
Unknown Table:        0                             1
0  ​(a)  Excludes money market funds.
Unknown table type:       0                             1
0  ​(a)  Excludes money market funds.
Failed to extract tables from block:  oef:LineGraphTableTextBlock
No tables found for block:  oef:LineGraphTableTextBlock
Processing: iShares U.S. Fixed Income Balanced Risk Systematic ETF
Extracting context:  FY2024_C000196720Member
Tag not found:  dei:SecurityExchangeName FY2024_C000196720Member
Unknown Table:        0                                                  1
0  ​(a)                       

Processing filings: 100%|██████████| 17/17 [03:44<00:00, 13.21s/it]


17
Total funds processed: 349


In [20]:
failed_filings[0].url

'https://www.sec.gov/Archives/edgar/data/1100663/0001100663-25-000026-index.html'

In [4]:
import pandas as pd
import re
%reload_ext autoreload
from simple_rag.models.fund import FinancialHighlights

if df_performances:
    df_performance = pd.concat(df_performances, ignore_index=True)
else:
    df_performance = pd.DataFrame() # Empty fallback
    print("No performance data found.")

print(df_performance.head())

def clean_financial_number(val):
    """
    Parses financial strings like '23.19 %(b)' or '(24.82 )%'.
    - Extracts the numerical value.
    - Handles (12.34) as negative -12.34.
    - Ignores footnote markers like (a), (b).
    - Removes %, $, and commas.
    """
    if pd.isna(val) or val is None:
        return None
    
    # Convert to string and strip whitespace
    s = str(val).strip()
    
    # 1. Regex to find the number (handles decimals and commas)
    # Looks for digits, optional commas, and optional decimal part
    match = re.search(r'(\d{1,3}(?:,\d{3})*\.?\d*|\d*\.?\d+)', s)
    
    if not match:
        return None
        
    # Get the raw number string (e.g., "24.82" or "1,234.56")
    num_str = match.group(0)
    
    # 2. Check for negative indication: "(" at the start of the string
    # Accounting format always puts the negative parenthesis at the start: (24.82)%
    is_negative = s.startswith('(')
    
    try:
        # Remove commas and convert to float
        clean_num = float(num_str.replace(',', ''))
        
        # Apply negative sign if detected
        return -clean_num if is_negative else clean_num
        
    except ValueError:
        return None
returns_lookup = df_performance.copy()


# Apply to all financial columns
financial_cols = ['total_return', 'expense_ratio', 'net_income_ratio', 'portfolio_turnover', 'nav_end', 'nav_beginning', 'net_assets']
for col in financial_cols:
    if col in returns_lookup.columns:
        returns_lookup[f'{col}_clean'] = returns_lookup[col].apply(clean_financial_number)

ishares_tickers = [fund_obj.ticker for fund_obj in ishares_funds]
print("Tickers in ishares_funds:", ishares_tickers)


# Now you can efficiently match and update your funds
for fund_obj in funds_total:
    
    if fund_obj.ticker not in ishares_tickers:
        continue

    print(f"\nProcessing fund object: {fund_obj.name} - {fund_obj.share_class}")
    # Initialize annual returns
    if not hasattr(fund_obj, 'annual_returns') or fund_obj.annual_returns is None:
        fund_obj.annual_returns = {}

    if not hasattr(fund_obj, 'financial_highlights') or fund_obj.financial_highlights is None:
        fund_obj.financial_highlights = {}
    
    # Clean the name: remove "Vanguard" and strip whitespace
    name = fund_obj.name.replace("Vanguard", "").strip()
    print(f"Cleaned name: '{name}'")
    
    if "®" in name:
        name = name.replace("®", "")
    if "™" in name:
        name = name.replace("™", "")
        
    # Find matching rows based on fund name
    name_matches = returns_lookup[returns_lookup['fund_name'].str.contains(name, case=False, na=False, regex=False)]
    
    if len(name_matches) == 0:
        print("  No name matches found")
        continue
    
    print(f"  Found {len(name_matches)} name matches")
    
    # Clean share class (remove trademark symbol)
    fund_obj.share_class = "ETF Shares"
    share_class = fund_obj.share_class
    
    if "™" in share_class:
        share_class = share_class.replace("™", "")
    
    # Now match share class
    share_class_matches = name_matches[
        name_matches['share_class'].str.contains(share_class, case=False, na=False, regex=False)]
    
    if name_matches['share_class'].isna().all():
        fund_obj.annual_returns = dict(zip(name_matches['year'], name_matches['total_return_clean']))
        print("Annual return: ", fund_obj.annual_returns)
        continue
        
    if len(share_class_matches) == 0:
        print(f"  No share class matches found for '{share_class}'")
        print(f"  Available share classes: {name_matches['share_class'].unique()}")
        continue
    
    print(f"  Found {len(share_class_matches)} matching records")
    
    # Add all matching returns
    for _, row in share_class_matches.iterrows():
        year = str(row['year'])
        
        # Store annual return
        fund_obj.annual_returns[year] = row['total_return_clean']
        
        # Store full financial highlights snapshot
        fund_obj.financial_highlights[year] = FinancialHighlights(
            turnover=row.get('portfolio_turnover_clean'),
            expense_ratio=row.get('expense_ratio_clean'),
            total_return=row.get('total_return_clean'),
            net_assets=row.get('net_assets'),  # You may need to add this cleaning
            net_assets_value_begining=row.get('nav_beginning_clean'),
            net_assets_value_end=row.get('nav_end_clean') ,
            net_income_ratio=row.get('net_income_ratio_clean')
        )
    
    print(f"  Annual returns: {fund_obj.annual_returns}")
    print(f"  Financial highlights years: {list(fund_obj.financial_highlights.keys())}")
    for key, value in fund_obj.financial_highlights.items():
        print(f"    {key}: {value}")

                                   fund_name share_class  year   net_assets  \
0  iShares Large Cap Accelerated Outcome ETF  ETF Shares  2025          0.0   
1  iShares Large Cap Accelerated Outcome ETF  ETF Shares  2025   13473000.0   
2       iShares Large Cap Max Buffer Mar ETF  ETF Shares  2025   38203000.0   
3       iShares Large Cap Max Buffer Jun ETF  ETF Shares  2025  161134000.0   
4       iShares Large Cap Max Buffer Jun ETF  ETF Shares  2024   72687000.0   

   nav_beginning  nav_end  total_return  expense_ratio  net_income_ratio  \
0           0.00     0.00          0.00           0.00              0.00   
1          25.00    25.91          3.64           0.47              0.69   
2          25.00    25.81          3.25           0.47              0.65   
3          25.24    27.59         10.21           0.47              0.96   
4          25.00    25.24          0.95           0.47              0.00   

   portfolio_turnover distribution_shares  
0                 0.0   

In [45]:
import re
from collections import defaultdict
import pandas as pd

def infer_first_col_format(value: object) -> str:
    if value is None or (isinstance(value, float) and pd.isna(value)):
        return "EMPTY"

    s = str(value).strip()
    if s == "" or s.lower() == "nan":
        return "EMPTY"

    # Jan 23, Aug 15
    if re.match(r"^[A-Za-z]{3}\s+\d{2}$", s):
        return "MON_YY"

    # 2015
    if re.match(r"^\d{4}$", s):
        return "YYYY"

    # 2024-08-31
    if re.match(r"^\d{4}-\d{2}-\d{2}$", s):
        return "YYYY_MM_DD"

    # 08/31/24 or 8/31/2024
    if re.match(r"^\d{1,2}/\d{1,2}/\d{2,4}$", s):
        return "MM_DD_YY(YY)"

    # 31/08/24 (if you ever have EU style)
    if re.match(r"^\d{1,2}-\d{1,2}-\d{2,4}$", s):
        return "DD_MM_YY(YY)_or_MM_DD_YY(YY)_DASH"

    # Fallbacks
    if re.search(r"\d", s):
        return "OTHER_HAS_DIGITS"

    return "OTHER_TEXT"


def describe_first_column_formats(
    dfs,
    names=None,
    samples_per_df=3,
    max_groups_to_show=50,
    max_dfs_per_group_to_print=5,
):
    if names is None:
        names = [f"df[{i}]" for i in range(len(dfs))]

    groups = defaultdict(list)

    for name, df in zip(names, dfs):
        if df is None or not isinstance(df, pd.DataFrame) or df.empty:
            groups["EMPTY_DF"].append((name, df))
            continue

        first_col = df.columns[0]
        # take first non-empty sample from first column
        series = df[first_col].astype(str)
        sample_vals = [v for v in series.head(20).tolist() if str(v).strip() and str(v).lower() != "nan"]

        fmt = infer_first_col_format(sample_vals[0]) if sample_vals else "EMPTY_FIRST_COL"
        groups[fmt].append((name, df))

    sorted_groups = sorted(groups.items(), key=lambda kv: len(kv[1]), reverse=True)

    print(f"Total dataframes: {len(dfs)}")
    print(f"Unique first-column formats: {len(sorted_groups)}\n")

    for gi, (fmt, members) in enumerate(sorted_groups[:max_groups_to_show], start=1):
        print("=" * 100)
        print(f"Group #{gi}: {fmt}")
        print(f"Count: {len(members)}")

        example_shapes = [m[1].shape for m in members if isinstance(m[1], pd.DataFrame)]
        print(f"Example shapes (first 10): {example_shapes[:10]}")

        # Print a few examples per group
        for ex_i, (name, df) in enumerate(members[:max_dfs_per_group_to_print], start=1):
            if df is None or not isinstance(df, pd.DataFrame) or df.empty:
                print(f"  [Example {ex_i}] {name}: EMPTY/None")
                continue

            first_col = df.columns[0]
            vals = [v for v in df[first_col].head(20).tolist() if str(v).strip() and str(v).lower() != "nan"]
            vals = vals[:samples_per_df]

            print(f"  [Example {ex_i}] {name}")
            print(f"    first_col: {first_col!r}")
            print(f"    columns: {list(df.columns)[:12]}{' ...' if len(df.columns) > 12 else ''}")
            print(f"    first_col_samples: {vals}")

        print()


# Example usage with your list of performance tables
performances = []
perf_names = []
for i, fund in enumerate(funds_total):
    if fund.ticker in performance_funds and fund.performance_table is not None:
        performances.append(fund.performance_table)
        perf_names.append(f"{fund.ticker} | {fund.name} | {fund.share_class}")

describe_first_column_formats(performances, names=perf_names)

Total dataframes: 119
Unique first-column formats: 2

Group #1: MON_YY
Count: 118
Example shapes (first 10): [(62, 6), (62, 6), (62, 6), (62, 6), (107, 4), (120, 4), (120, 4), (67, 4), (120, 4), (120, 4)]
  [Example 1] EAOK | iShares ESG Aware 30/70 Conservative Allocation ETF | ETF Shares
    first_col: 'Unnamed: 0'
    columns: ['Unnamed: 0', 'Fund', 'Bloomberg U.S. Universal Index', 'MSCI All Country World Index (Net)', 'BlackRock ESG Aware Conservative Allocation Index', 'S&P Target Risk Conservative Index']
    first_col_samples: ['Jun 20', 'Jul 20', 'Aug 20']
  [Example 2] EAOM | iShares ESG Aware 40/60 Moderate Allocation ETF | ETF Shares
    first_col: 'Unnamed: 0'
    columns: ['Unnamed: 0', 'Fund', 'Bloomberg U.S. Universal Index', 'MSCI All Country World Index (Net)', 'BlackRock ESG Aware Moderate Allocation Index', 'S&P Target Risk Moderate Index']
    first_col_samples: ['Jun 20', 'Jul 20', 'Aug 20']
  [Example 3] EAOR | iShares ESG Aware 60/40 Balanced Allocation ETF | ET

In [None]:
import sys
from pathlib import Path
%reload_ext autoreload
RAG_DIR = Path("/home/alvar/CascadeProjects/windsurf-project/RAG")
if str(RAG_DIR) not in sys.path:
    sys.path.insert(0, str(RAG_DIR))


from src.simple_rag.extraction.parser import compute_annual_returns

for fund in funds_total:
    if fund.ticker in performance_funds:
        print(fund.performance_table)
        returns = compute_annual_returns(fund.performance_table)
        print("\nFinal Annual Returns:")
        fund.annual_returns = returns
        print(f"  {fund.ticker}: {returns}")
        print("---")

   Unnamed: 0     Fund Bloomberg U.S. Universal Index  \
0      Jun 20  $10,091                        $10,039   
1      Jul 20  $10,350                        $10,216   
2      Aug 20  $10,498                        $10,157   
3      Sep 20  $10,374                        $10,139   
4      Oct 20  $10,269                        $10,102   
..        ...      ...                            ...   
57     Mar 25  $11,423                         $9,828   
58     Apr 25  $11,453                         $9,863   
59     May 25  $11,606                         $9,813   
60     Jun 25  $11,902                         $9,966   
61     Jul 25  $11,920                         $9,951   

   MSCI All Country World Index (Net)  \
0                             $10,171   
1                             $10,708   
2                             $11,364   
3                             $10,998   
4                             $10,730   
..                                ...   
57                          

In [7]:
tickers = []
unique_funds = []
duplicates = 0

for fund in funds_total:
    if fund.ticker in tickers:
        print(f"DUPLICATE: {fund.name} ({fund.ticker})")
        duplicates += 1
    else:
        tickers.append(fund.ticker)
        unique_funds.append(fund)

# Replace the original list
funds_total = unique_funds
print(f"Removed {duplicates} duplicates")
print(f"Remaining funds: {len(funds_total)}")

DUPLICATE: iShares 0-5 Year High Yield Corporate Bond ETF (SHYG)
DUPLICATE: iShares 0-5 Year Investment Grade Corporate Bond ETF (SLQD)
DUPLICATE: iShares 1-3 Year International Treasury Bond ETF (ISHG)
DUPLICATE: iShares 20+ Year Treasury Bond BuyWrite Strategy ETF (TLTW)
DUPLICATE: iShares Aaa - A Rated Corporate Bond ETF (QLTA)
DUPLICATE: iShares BB Rated Corporate Bond ETF (HYBB)
DUPLICATE: iShares Broad USD High Yield Corporate Bond ETF (USHY)
DUPLICATE: iShares CMBS ETF (CMBS)
DUPLICATE: iShares Convertible Bond ETF (ICVT)
DUPLICATE: iShares Core 1-5 Year USD Bond ETF (ISTB)
DUPLICATE: iShares Core International Aggregate Bond ETF (IAGG)
DUPLICATE: iShares ESG Advanced High Yield Corporate Bond ETF (HYXF)
DUPLICATE: iShares Fallen Angels USD Bond ETF (FALN)
DUPLICATE: iShares Floating Rate Bond ETF (FLOT)
DUPLICATE: iShares GNMA Bond ETF (GNMA)
DUPLICATE: iShares High Yield Corporate Bond BuyWrite Strategy ETF (HYGW)
DUPLICATE: iShares iBonds 2024 Term High Yield and Income ETF (

In [8]:
import pickle
from pathlib import Path
import sys

# Add RAG directory to path
RAG_DIR = Path("/home/alvar/CascadeProjects/windsurf-project/RAG")
if str(RAG_DIR) not in sys.path:
    sys.path.insert(0, str(RAG_DIR))

# Define pickle file path
PKL_PATH = Path("./funds_backup.pkl")

print("Current working directory:", Path.cwd())
print("PKL_PATH resolves to:", PKL_PATH.resolve())

# Save to pickle file
try:
    with PKL_PATH.open("wb") as f:
        pickle.dump(funds_total, f)
    
    print(f"Successfully saved {len(funds_total)} funds to pickle file")
    print(f"File size: {PKL_PATH.stat().st_size / 1024:.2f} KB")
    
except Exception as e:
    print(f"Error saving to pickle file: {e}")

Current working directory: /home/alvar/CascadeProjects/windsurf-project/RAG/notebooks
PKL_PATH resolves to: /home/alvar/CascadeProjects/windsurf-project/RAG/notebooks/funds_backup.pkl
Successfully saved 430 funds to pickle file
File size: 4104.13 KB


In [1]:
import pickle
from pathlib import Path
from dataclasses import is_dataclass, asdict
import pandas as pd
import sys
from pathlib import Path
RAG_DIR = Path("/home/alvar/CascadeProjects/windsurf-project/RAG")
if str(RAG_DIR) not in sys.path:
    sys.path.insert(0, str(RAG_DIR))


PKL_PATH = Path("./funds_backup.pkl")
print("Current working directory:", Path.cwd())
print("PKL_PATH resolves to:", PKL_PATH.resolve())
with PKL_PATH.open("rb") as f:
    funds_total = pickle.load(f)

print(f"Loaded {len(funds_total)} funds from pickle file")

Current working directory: /home/alvar/CascadeProjects/windsurf-project/RAG/notebooks
PKL_PATH resolves to: /home/alvar/CascadeProjects/windsurf-project/RAG/notebooks/funds_backup.pkl
Loaded 383 funds from pickle file


## Summary Prospectus

In [None]:
from edgar import Company, set_identity
import pandas as pd
from typing import List, Dict
import sys
from tqdm import tqdm
from IPython.display import display, Markdown
from src.simple_rag.extraction.general_info import FundInfoExtractor
from pathlib import Path

set_identity('luis.alvarez.conde@alumnos.upm.es')
tickers = ["VOO", "MGK", "HEZU", "VMGRX", "VDIGX"]

for ticker in tickers:
    company = Company(ticker)
    processed_funds = []
    filings = company.get_filings(form="497K")

    for filing in filings:
        text = filing.text()
        extractor = FundInfoExtractor(text, ticker=ticker)
        fund_data = extractor.get_structured_data()
        if fund_data['ticker'] in processed_funds:
            print("First duplicate: ", fund_data['ticker'])
            break
        
        processed_funds.append(fund_data['ticker'])
        md = extractor.get_clean_markdown()
        for fund in funds_total:
            if fund.ticker == fund_data['ticker']:
                fund.summary_prospectus = md
                fund.managers = fund_data['managers']
                fund.strategies = fund_data['strategies']
                fund.risks = fund_data['risks']
                fund.objective = fund_data['objective']
                break
        
    print("Processed funds: ", len(processed_funds), "for ticker: ", ticker)





AttributeError: 'FundData' object has no attribute 'ticker'

## NPORT (Portfolio Composition)

In [2]:
from edgar import Company, set_identity
import pandas as pd
from typing import List, Dict
import sys
from tqdm import tqdm
from concurrent.futures import ThreadPoolExecutor, as_completed
import multiprocessing as mp
from threading import Lock
%load_ext autoreload
%autoreload 2
%reload_ext autoreload
from src.simple_rag.extraction.nport import NPortProcessor
from src.simple_rag.models.fund import PortfolioHolding, Derivatives, NonDerivatives
from pathlib import Path

company_json_path = Path("/home/alvar/CascadeProjects/windsurf-project/RAG/notebooks/sec_data/company_tickers.json")

set_identity('luis.alvarez.conde@alumnos.upm.es')

def process_single_filing(filing, ticker, company_json_path):
    """Process a single filing - can be parallelized"""
    try:
        xml_data = filing.obj()
        fund_name = xml_data.get_fund_series().name
        reporting_period = xml_data.reporting_period
        portfolio_list = xml_data.investments
        derivatives = xml_data.derivatives
        # Process holdings
        proc = NPortProcessor(company_tickers_json_path=company_json_path, min_similarity=0.74)
        holdings = proc.process_holdings(portfolio_list)
        result = proc.enrich_tickers(holdings, verbose=False)  # Set verbose=False to reduce I/O
        
        not_matches = result[result['matched_ticker'].isna() | (result['matched_ticker'] == '')]
        
        return {
            'fund_name': fund_name,
            'reporting_period': reporting_period,
            'holdings': holdings,
            'result': result,
            'derivatives': derivatives,
            'not_matches': not_matches,
            'ticker': ticker,
            'report_date': filing.report_date
        }
    except Exception as e:
        print(f"Error processing filing for {ticker}: {e}")
        return None

def process_ticker(ticker, company_json_path):
    """Process all filings for a single ticker - SEQUENTIAL within ticker"""
    try:
        nport_file = Company(ticker)
        filings = sorted(nport_file.get_filings(form="NPORT-P"), 
                        key=lambda x: x.report_date, reverse=True)
        
        if not filings:
            print(f"No filings found for {ticker}")
            return None
            
        print(f"Processing ticker: {ticker}, most recent filing date: {filings[0].report_date}")
        
        funds_processed_set = set()
        ticker_results = []
        
        # Process filings SEQUENTIALLY for this ticker (to respect the stop condition)
        for filing in filings:
            result = process_single_filing(filing, ticker, company_json_path)
            
            if result is not None:
                # Check if we've already processed this fund
                if result['fund_name'].lower() in funds_processed_set:
                    print(f"Stopping - already processed fund: {result['fund_name']}")
                    break
                
                funds_processed_set.add(result['fund_name'].lower())
                ticker_results.append(result)
                
                print(f"{ticker} - Fund: {result['fund_name']}, Holdings: {len(result['holdings'])}, Unmatched: {len(result['not_matches'])}")
        
        return {
            'ticker': ticker,
            'results': ticker_results,
            'funds_processed': list(funds_processed_set)
        }
    
    except Exception as e:
        print(f"Error processing ticker {ticker}: {e}")
        return None

# Main execution - PARALLEL across tickers only
tickers = ["VOO", "MGK", "HEZU", "VMGRX", "VDIGX"]

# Use fewer workers to avoid overwhelming the system
max_workers = min(5, len(tickers))  # Start with 3 workers
print(f"Using {max_workers} workers for tickers")

all_results = []
with ThreadPoolExecutor(max_workers=max_workers) as executor:
    future_to_ticker = {
        executor.submit(process_ticker, ticker, company_json_path): ticker 
        for ticker in tickers
    }
    
    for future in tqdm(as_completed(future_to_ticker), total=len(tickers), desc="Processing tickers"):
        ticker = future_to_ticker[future]
        try:
            result = future.result()
            if result:
                all_results.append(result)
                print(f"\nCompleted {ticker}: {len(result['funds_processed'])} funds processed")
        except Exception as e:
            print(f"Error with ticker {ticker}: {e}")

# Update funds_total object with the results
print("\n=== Updating funds_total ===")
for ticker_result in all_results:
    for filing_result in ticker_result['results']:
        fund_name = filing_result['fund_name']
        reporting_period = filing_result['reporting_period']
        holdings = filing_result['holdings']
        derivatives = filing_result['derivatives']
        
        # Update your funds_total structure
        for fund in funds_total:
            if fund_name.lower() == fund.name.lower():
                print(f"Updating fund: {fund.name}")
                
                fund.non_derivatives = NonDerivatives(
                    date=reporting_period,
                    holdings_df=holdings
                )
                fund.derivatives = Derivatives(
                    date=reporting_period,
                    derivatives_df=derivatives
                )
                break

print("\n=== Processing Complete ===")
print(f"Total tickers processed: {len(all_results)}")
for result in all_results:
    print(f"{result['ticker']}: {len(result['funds_processed'])} funds")

Using 5 workers for tickers


Processing tickers:   0%|          | 0/5 [00:00<?, ?it/s]

Processing ticker: MGK, most recent filing date: 2025-09-30
MGK - Fund: VANGUARD MEGA CAP GROWTH INDEX FUND, Holdings: 70, Unmatched: 3
Processing ticker: VMGRX, most recent filing date: 2025-10-31
Processing ticker: VOO, most recent filing date: 2025-09-30
Processing ticker: VDIGX, most recent filing date: 2025-10-31
Processing ticker: HEZU, most recent filing date: 2025-10-31
VDIGX - Fund: VANGUARD DIVIDEND GROWTH FUND, Holdings: 56, Unmatched: 3
VMGRX - Fund: VANGUARD SELECTED VALUE FUND, Holdings: 130, Unmatched: 2
VDIGX - Fund: VANGUARD ENERGY FUND, Holdings: 42, Unmatched: 6
VOO - Fund: VANGUARD MID-CAP VALUE INDEX FUND, Holdings: 186, Unmatched: 3
VDIGX - Fund: VANGUARD GLOBAL CAPITAL CYCLES FUND, Holdings: 75, Unmatched: 16
HEZU - Fund: iShares MSCI EAFE Min Vol Factor ETF, Holdings: 240, Unmatched: 115
VDIGX - Fund: VANGUARD HEALTH CARE FUND, Holdings: 99, Unmatched: 5
MGK - Fund: VANGUARD FTSE SOCIAL INDEX FUND, Holdings: 417, Unmatched: 5
MGK - Fund: VANGUARD COMMUNICATION S

Processing tickers:  20%|██        | 1/5 [00:35<02:22, 35.56s/it]

Stopping - already processed fund: VANGUARD ENERGY FUND

Completed VDIGX: 7 funds processed
MGK - Fund: VANGUARD MEGA CAP VALUE INDEX FUND, Holdings: 128, Unmatched: 4
MGK - Fund: VANGUARD EXTENDED DURATION TREASURY INDEX FUND, Holdings: 83, Unmatched: 1
MGK - Fund: VANGUARD INDUSTRIALS INDEX FUND, Holdings: 391, Unmatched: 6
MGK - Fund: VANGUARD CONSUMER STAPLES INDEX FUND, Holdings: 113, Unmatched: 5
MGK - Fund: VANGUARD UTILITIES INDEX FUND, Holdings: 73, Unmatched: 3
VMGRX - Fund: VANGUARD EMERGING MARKETS GOVERNMENT BOND INDEX FUND, Holdings: 841, Unmatched: 129
VMGRX - Fund: VANGUARD ADVICE SELECT GLOBAL VALUE FUND, Holdings: 107, Unmatched: 17
HEZU - Fund: iShares MSCI EAFE Small-Cap ETF, Holdings: 2029, Unmatched: 1161
VMGRX - Fund: VANGUARD INTERNATIONAL DIVIDEND APPRECIATION INDEX FUND, Holdings: 349, Unmatched: 157
VMGRX - Fund: VANGUARD INTERNATIONAL EXPLORER FUND, Holdings: 340, Unmatched: 157
HEZU - Fund: iShares CMBS ETF, Holdings: 521, Unmatched: 206
HEZU - Fund: iShare

Processing tickers:  40%|████      | 2/5 [02:43<04:29, 89.68s/it]

Stopping - already processed fund: VANGUARD ADVICE SELECT INTERNATIONAL GROWTH FUND

Completed VMGRX: 12 funds processed
VOO - Fund: VANGUARD SMALL-CAP INDEX FUND, Holdings: 1335, Unmatched: 15
HEZU - Fund: iShares iBonds Dec 2028 Term Corporate ETF, Holdings: 698, Unmatched: 56
HEZU - Fund: iShares Treasury Floating Rate Bond ETF, Holdings: 10, Unmatched: 0
MGK - Fund: VANGUARD ESG U.S. CORPORATE BOND ETF, Holdings: 2772, Unmatched: 105
VOO - Fund: VANGUARD LARGE-CAP INDEX FUND, Holdings: 458, Unmatched: 6
HEZU - Fund: iShares iBonds Dec 2034 Term Corporate ETF, Holdings: 375, Unmatched: 23
MGK - Fund: VANGUARD FINANCIALS INDEX FUND, Holdings: 418, Unmatched: 4
HEZU - Fund: iShares International Equity Factor ETF, Holdings: 472, Unmatched: 224
VOO - Fund: VANGUARD SMALL-CAP VALUE INDEX FUND, Holdings: 849, Unmatched: 10
HEZU - Fund: iShares 1-3 Year International Treasury Bond ETF, Holdings: 167, Unmatched: 155
HEZU - Fund: iShares iBonds Dec 2034 Term Treasury ETF, Holdings: 5, Unmat

Processing tickers:  60%|██████    | 3/5 [06:23<04:58, 149.36s/it]

Stopping - already processed fund: VANGUARD EXTENDED MARKET INDEX FUND

Completed VOO: 12 funds processed
MGK - Fund: VANGUARD ESG U.S. STOCK ETF, Holdings: 1330, Unmatched: 14
MGK - Fund: VANGUARD ENERGY INDEX FUND, Holdings: 117, Unmatched: 3
MGK - Fund: VANGUARD INTERNATIONAL GROWTH FUND, Holdings: 128, Unmatched: 38


Processing tickers:  80%|████████  | 4/5 [06:33<01:34, 94.47s/it] 

Stopping - already processed fund: VANGUARD MEGA CAP GROWTH INDEX FUND

Completed MGK: 22 funds processed
HEZU - Fund: iShares Core 1-5 Year USD Bond ETF, Holdings: 7008, Unmatched: 1674
HEZU - Fund: iShares Core MSCI Pacific ETF, Holdings: 1367, Unmatched: 677
HEZU - Fund: iShares Environmentally Aware Real Estate ETF, Holdings: 356, Unmatched: 109
HEZU - Fund: iShares Floating Rate Bond ETF, Holdings: 476, Unmatched: 108
HEZU - Fund: iShares Core International Aggregate Bond ETF, Holdings: 7102, Unmatched: 4371
HEZU - Fund: iShares Aaa - A Rated Corporate Bond ETF, Holdings: 3361, Unmatched: 210
HEZU - Fund: iShares Russell 2000 BuyWrite ETF, Holdings: 3, Unmatched: 2
HEZU - Fund: iShares MSCI ACWI Low Carbon Target ETF, Holdings: 981, Unmatched: 233
HEZU - Fund: iShares Global Equity Factor ETF, Holdings: 630, Unmatched: 184
HEZU - Fund: iShares iBonds Dec 2030 Term Corporate ETF, Holdings: 714, Unmatched: 47
HEZU - Fund: iShares Core MSCI International Developed Markets ETF, Holdin

Processing tickers: 100%|██████████| 5/5 [21:03<00:00, 252.79s/it]

Stopping - already processed fund: iShares iBonds 2032 Term High Yield and Income ETF

Completed HEZU: 356 funds processed

=== Updating funds_total ===
Updating fund: Vanguard Dividend Growth Fund
Updating fund: Vanguard Energy Fund
Updating fund: Vanguard Global Capital Cycles Fund
Updating fund: Vanguard Health Care Fund
Updating fund: Vanguard Real Estate Index Fund
Updating fund: Vanguard Global ESG Select Stock Fund
Updating fund: Vanguard Dividend Appreciation Index Fund
Updating fund: Vanguard Selected Value Fund
Updating fund: Vanguard High Dividend Yield Index Fund
Updating fund: Vanguard Mid-Cap Growth Fund
Updating fund: Vanguard International Dividend Growth Fund
Updating fund: Vanguard Advice Select Dividend Growth Fund
Updating fund: Vanguard Emerging Markets Government Bond Index Fund
Updating fund: Vanguard Advice Select Global Value Fund
Updating fund: Vanguard International Dividend Appreciation Index Fund
Updating fund: Vanguard International Explorer Fund
Updating 




In [None]:
from edgar import Company, set_identity
import pandas as pd
from typing import List, Dict
import sys
from tqdm import tqdm

%load_ext autoreload
%autoreload 2
%reload_ext autoreload
from src.simple_rag.extraction.nport import NPortProcessor
from src.simple_rag.models.fund import PortfolioHolding, Derivatives, NonDerivatives
from pathlib import Path

company_json_path = Path("/home/alvar/CascadeProjects/windsurf-project/RAG/notebooks/sec_data/company_tickers.json")

# 1. Initialize the Fund (can use Ticker or CIK)

set_identity('luis.alvarez.conde@alumnos.upm.es')
tickers = ["VOO", "MGK", "HEZU", "VMGRX", "VDIGX"]
for ticker in tickers:

    nport_file = Company(ticker)
    filings = sorted(nport_file.get_filings(form="NPORT-P"), key=lambda x: x.report_date, reverse=True)
    print(f"Processing ticker: {ticker}, most recent filing date: {filings[0].report_date}")
    funds_processed = []
    for filing in filings:

        print("Processing filing with date:", filing.report_date)
        xml_data = filing.obj() 
        # Show all attributes (filtering out internal python methods starting with __)
        #print([attr for attr in dir(xml_data) if not attr.startswith('__')])
        
        fund_name = xml_data.get_fund_series().name
        if fund_name.lower() in funds_processed:
            print("Last fund processed: ", fund_name)
            break
        print("Fund name:", fund_name)
        
        reporting_period = xml_data.reporting_period
        print("Reporting period:", reporting_period)
        
        portfolio_list = xml_data.investments
        
        proc = NPortProcessor(company_tickers_json_path=company_json_path, min_similarity=0.74)
        holdings = proc.process_holdings(portfolio_list)
        result = proc.enrich_tickers(holdings, verbose=True)
        
        print("Number of holdings:", len(holdings))
        # This method maps the title of the company to the ticker
        
        for fund in funds_total:
            if fund_name.lower() == fund.name.lower():
                print(f"Found fund: {fund.name}")
                funds_processed.append(fund.name.lower())
                fund.derivatives = Derivatives(
                    date=reporting_period,
                    derivatives_df=derivatives
                )
                fund.non_derivatives = NonDerivatives(
                    date=reporting_period,
                    holdings_df=holdings
                )
                break
        
        not_matches = result[result['matched_ticker'].isna() | (result['matched_ticker'] == '')]
        print(f"Number of unmatched holdings: {len(not_matches)}")
        print(not_matches.head())

    print(len(funds_processed))


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Processing ticker: VOO, most recent filing date: 2025-09-30
Processing filing with date: 2025-09-30
Fund name: VANGUARD MID-CAP VALUE INDEX FUND
Reporting period: 2025-09-30
Number of matched holdings: 181
Number of holdings: 186
Found fund: Vanguard Mid-Cap Value Index Fund
Number of unmatched holdings: 3
                                          holding_name ticker_before  \
147                                    Schlumberger NV          None   
173  Vanguard Cmt Funds-Vanguard Market Liquidity Fund          None   
183  Vanguard Cmt Funds-Vanguard Market Liquidity Fund          None   

    ticker_after matched_ticker matched_title  similarity  updated  
147         None           None          None    0.608696    False  
173         None           None          None    0.297297    False  
183         None           None          None    0.297297    False  
Processing filing with date: 2025-09-30

In [4]:
count = 0
%load_ext autoreload
%autoreload 2
%reload_ext autoreload
from src.simple_rag.extraction.nport import NPortProcessor

processor = NPortProcessor()
df = processor.to_df(holdings)

for fund in funds_total:
    if fund.non_derivatives is not None:
        print(fund.derivatives)
        df = processor.to_df(fund.non_derivatives.holdings_df)
        fund.non_derivatives.holdings_df = df
        count += 1
        print(df.head())
        break

        
print(f"Found {count} funds with non-derivatives data")
print(f"Total funds processed: {len(funds_total)}")

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Derivatives(date='2025-09-30', derivatives_df=[InvestmentOrSecurity(name='N/A', lei='N/A', title='S&P MID 400 EMINI Dec25', cusip='N/A', identifiers=Identifiers(ticker='FAZ5', isin=None, other={}), balance=Decimal('39.00000000'), units='NC', desc_other_units=None, currency_code='USD', currency_conditional_code=None, exchange_rate=None, value_usd=Decimal('-71676.18000000'), pct_value=Decimal('-0.00008568564'), payoff_profile='N/A', asset_category='DE', issuer_category='OTHER', investment_country='N/A', is_restricted_security=False, fair_value_level='1', debt_security=None, security_lending=SecurityLending(is_cash_collateral='N', is_non_cash_collateral='N', is_loan_by_fund='N'), derivative_info=DerivativeInfo(derivative_category='FUT', forward_derivative=None, swap_derivative=None, future_derivative=FutureDerivative(counterparty_name='MORGAN STANLEY & CO LLC', counterparty_lei='9R7GPTSO7KV3UQJZQ078', 

In [4]:
import pickle
from pathlib import Path

PKL_PATH = Path("./funds_backup.pkl")
TMP_PATH = PKL_PATH.with_suffix(PKL_PATH.suffix + ".tmp")

with TMP_PATH.open("wb") as f:
    pickle.dump(funds_total, f, protocol=pickle.HIGHEST_PROTOCOL)

TMP_PATH.replace(PKL_PATH)

print(f"Saved {len(funds_total)} funds to pickle file: {PKL_PATH.resolve()}")

Saved 383 funds to pickle file: /home/alvar/CascadeProjects/windsurf-project/RAG/notebooks/funds_backup.pkl


## Processing Phase

### General information about the fund

In [None]:
unique_share_classes = set(fund.share_class for fund in funds_total)
print("Unique Share Classes:")
for share_class in unique_share_classes:
    print(share_class)

for fund in funds_total:
    if "ETF" in fund.name:
        fund.share_class = "ETF Shares"
    elif "Institutional Select Share Class" in fund.share_class:
        fund.share_class = "Institutional Select Shares"
    elif "™" in fund.share_class:
        fund.share_class = fund.share_class.replace("™", "")

Unique Share Classes:
Institutional Shares
Institutional Select Shares
ETF Shares
Admiral Shares
Institutional Plus Shares
Investor Shares


I create another object with the descriptions and additional information of each of the classes.

In [None]:
share_classes_data = [
    {
        "name": "Admiral Shares",
        "clean_name": "Admiral Shares",
        "description": (
            "Admiral Shares are Vanguard’s main retail mutual fund share class and typically "
            "have lower expense ratios than legacy Investor Shares. They are intended for "
            "long-term individual investors and usually require a few thousand dollars as a "
            "minimum investment (often around $3,000, though it varies by fund)."
        ),
    },
    {
        "name": "Investor Shares",
        "clean_name": "Investor Shares",
        "description": (
            "Investor Shares are Vanguard’s legacy entry-level mutual fund share class. They "
            "generally have higher expense ratios than Admiral Shares and historically had "
            "lower minimum investments (often around $1,000–$3,000, depending on the fund). "
            "In many cases they are closed to new investors and may be automatically converted "
            "to Admiral Shares once the account balance meets the Admiral minimum."
        ),
    },
    {
        "name": "ETF Shares",
        "clean_name": "ETF Shares",
        "description": (
            "ETF (Exchange-Traded Fund) Shares trade on stock exchanges throughout the day like "
            "a stock. They generally provide the same portfolio exposure as a corresponding "
            "mutual fund share class but offer intraday liquidity. The minimum investment is "
            "the market price of one share (often tens to a few hundred dollars). ETFs are "
            "often more tax-efficient than mutual funds due to the in-kind creation/redemption "
            "mechanism, though investors may face bid–ask spreads."
        ),
    },
    {
        "name": "Institutional Shares",
        "clean_name": "Institutional Shares",
        "description": (
            "Institutional Shares are designed for large investors such as retirement plans, "
            "advisors, endowments, and other institutions. They typically have very low expense "
            "ratios and usually require large minimum investments, commonly in the millions of "
            "dollars (the exact amount varies by fund). Most individuals cannot buy them directly "
            "unless they access them through an employer plan or an institutional platform."
        ),
    },
    {
        "name": "Institutional Plus Shares",
        "clean_name": "Institutional Plus Shares",
        "description": (
            "Institutional Plus Shares are a higher tier of institutional pricing with even lower "
            "expense ratios, generally available only to very large investors. Minimums are typically "
            "in the tens to hundreds of millions of dollars, depending on the fund and access channel."
        ),
    },
    {
        "name": "Institutional Select Shares",
        "clean_name": "Institutional Select Shares",
        "description": (
            "Institutional Select Shares are among the lowest-cost share classes and are typically "
            "available only through very large institutional relationships (often aggregate or negotiated "
            "minimums, potentially in the hundreds of millions to billions). Pricing and availability are "
            "often bespoke and not consistently offered or publicly listed across all funds."
        ),
    },
]


### Annual Returns

### Geographic Allocation

### Top Holdings


### Sector Allocation