<a href="https://colab.research.google.com/github/NK-Mikey/Data_Analysis/blob/main/Project_AR.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project: Automate Report Process**

This project automates the end-to-end creation and delivery of a financial analytics report using Python. The system pulls real-time market data, processes it through a reproducible data pipeline, generates professional visualizations and risk metrics, and automatically emails a PDF report on a scheduled basis.

## 1. GDrive Connection

In [5]:
# Connect google drive for storage
from google.colab import drive
import os

if not os.path.exists('/content/drive'):
    drive.mount('/content/drive')

Connecting google drive for storing the pdf report output from this notebook.



## 2. Configuration

In [6]:
# Configure Tickers & Weights
Tickers = ["AAPL", "MSFT", "SPY"] # Portfolio + Benchmark
Weights = {"AAPL": 0.4, "MSFT": 0.4, "SPY": 0.2} # Ensuring all the weights sum to 1
Lookback_days = 365 * 3 # Length of data retrival
Report_dir = "/content/drive/MyDrive/Portfolio_reports"

1. We are choosing Apple, Microsoft, and SPY as the portfolio stocks.

    > SPY is an exchange-traded fund (ETF) that tracks the S&P 500 index, which represents 500 of the largest publicly traded companies in the U.S.

2. We assign 40% to Apple, 40% to Microsoft, and 20% to SPY so the script can calculate how much each stock contributes to our portfolio’s total performance.

3. We limit how many days of stock prices we download. In this case, we only retrieve the last 3 years of data.

4. We save the final PDF report from this project into Google Drive.

In [8]:
# Email Configuration (Colab Testing Only)
from getpass import getpass
SMTP_Server = "smtp.gmail.com"
SMTP_Port = 587
Email_User = "your_email@gmail.com"
Email_Pass = getpass("Email App Password: ")
Receiver_Email = "receiver_email@gmail.com"

Email App Password: ··········


1. Simple Mail Transfer Protocol (SMTP) is the standard protocol used to send emails either from a client to an email server or from one server to another.

    > Under the hood, Transmission Control Protocol (TCP) and Transport Layer Security (TLS) work together: TCP ensures the email delivered reliably and in the correct order while TLS encrypts the connection so the data remains secure and protected from eavesdropping.

2. A port is a virtual “door” on a server that handles specific types of network traffic. Each port is assigned to a particular protocol or service, ensuring data reaches the correct destination.

In [9]:
# Testing to see if the email configuration works
import smtplib
from email.message import EmailMessage

# Create the Email
Msg = EmailMessage()
Msg["Subject"] = "Colab SMTP Test"
Msg["From"] = Email_User
Msg["To"] = Receiver_Email
Msg.set_content("This is a test email sent from Google Colab.")

# Send the Email
try:
  server = smtplib.SMTP(SMTP_Server, SMTP_Port)
  server.starttls()
  server.login(Email_User, Email_Pass)
  server.send_message(Msg)
  server.quit()
  print("Email sent successfully.")
except Exception as e:
  print("Error sending email:", e)

Email sent successfully.


1. Using the EmailMessage class, we create the email for testing purpose in this environment before integrating into the GitHub Actions for full automation.

    > Note: Always call `starttls` before sending credentials on to the port.

2. Any exception during connect/Auth/send is caught and printed with `except Exception as e` to diagnose the problem.



## 3. Data Ingestion

In [10]:
# Import Libraries
import yfinance as yf
import pandas as pd
import numpy as np
import os
from datetime import datetime, timedelta

# Create a directory for reports
os.makedirs(Report_dir, exist_ok=True)

# Define the date range
end = datetime.now()
start = end - timedelta(days = Lookback_days)

# Define a function to fetch stock prices
def fetch_prices(tickers, start, end):
  data = {}
  for t in tickers:
    df = yf.download(t, start = start.strftime("%Y-%m-%d"),
                     end = end.strftime("%Y-%m-%d"),
                     progress = False, auto_adjust = True)
    if df.empty:
      print(f"Warning: no data for {t}")
    else:
        data[t] = df
  return data

# Fetch prices using the function
prices = fetch_prices(Tickers, start, end)

# Quick sanity check
len(prices), list(prices.keys())

(3, ['AAPL', 'MSFT', 'SPY'])

1. The financial stock price data is retrieved from Yahoo Finance using the Python library `yfinance`.

2. The `os` module is used to create a folder for storing outputs (such as reports and charts), only if the folder does not already exist.

3. A date range is created by calculating the start date and end date using `datetime` and `timedelta`.

   > `datetime` provides the current date and time, while `timedelta` is used to perform date arithmetic, such as calculating the difference between two dates.

4. A function is defined to fetch historical stock price data from Yahoo Finance for the predefined tickers within the specified date range. The function also checks whether any ticker returns empty data. If valid data is retrieved, it is stored in a dictionary named `data`.

   > `auto_adjust=True` ensures that the fetched price data is automatically adjusted for stock splits and dividends, which is suitable for accurately calculating financial KPIs over time.

5. The function is then called, and a quick validation is performed by checking the number of tickers retrieved and listing their keys to ensure they match the predefined tickers.



## 4. Validation & Processing

In [11]:
# Define a function to extract only Close price for all tickers
def validate_and_align(prices_dict):

    # Create a dictionary of Close price series for each ticker
    close_dict = {}
    for ticker, df in prices_dict.items():

        # Extract Close column based on column type
        if isinstance(df.columns, pd.MultiIndex):
            # MultiIndex columns
            if "Close" in df.columns.get_level_values(0):
                close_dict[ticker] = df.xs("Close", level=0, axis=1)[ticker]
            else:
                print(f"{ticker} has no 'Close' in MultiIndex columns")
        else:
            # Single-level columns
            if "Close" in df.columns:
                close_dict[ticker] = df["Close"]
            else:
                print(f"{ticker} has no 'Close' column")

    # Convert dictionary to a single DataFrame
    close_df = pd.DataFrame(close_dict)

    # Drop rows where all values are NaN/empty
    close_df = close_df.dropna(how="all")

    # Filling small gaps using forward and backward fills
    close_df = close_df.ffill().bfill()

    return close_df

# Calling the function to extract only the Close prices for all tickers
close_prices = validate_and_align(prices)

# Display the dataframe
close_prices.tail()

Unnamed: 0_level_0,AAPL,MSFT,SPY
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2025-12-12,278.279999,478.529999,681.76001
2025-12-15,274.109985,474.820007,680.72998
2025-12-16,274.609985,476.390015,678.869995
2025-12-17,271.839996,476.119995,671.400024
2025-12-18,272.190002,483.980011,676.469971


1. A function is defined to extract the “Close” price for each ticker from the `prices` dictionary and combine them into a single DataFrame.
    > The function is designed to handle both MultiIndex and single-level columns returned by `yfinance`. When the data uses a MultiIndex, the `.xs()` (cross-section) method is used to extract the `"Close"` price from the appropriate index level.
2. Rows where all ticker values are missing are removed using `dropna(how="all")`.
3. Fill small gaps here and there using forward and backward fills.
    > Any small gaps in the data are handled using forward fill (`ffill`) and backward fill (`bfill`). Forward fill propagates the last available value forward, while backward fill fills missing values at the beginning by using the next available value.
4. Finally, the last five observations are displayed to validate the output.