# Assessment Problems

# Problem 1: Data from yfinance

https://github.com/ranaroussi/yfinance


In [15]:
# Dates and times.
import datetime as dt

# Data frames.
import pandas as pd

# Operating system.
import os

# Yahoo finance data.
import yfinance as yf


In [16]:
# Tickers:
# A list of stock symbols used to find data from yfinance

# Get data : 
# The get_data function enables retrieval of pricing snapshots, as well as fundamental and reference data, in a single call.
# See: https://cdn.refinitiv.com/public/rd-lib-python-doc/1.0.0.0/book/en/sections/access-layer/access/get-data-function.html
# period an interval used to obtain historical data

# Download data:
# This function uses the yfinance Python library to download historical stock data.
# See: https://medium.com/%40anjalivemuri97/day-4-fetching-historical-stock-data-with-yfinance-f45f3bd8b9c6
# I use auto_adjust=True, to omit the future warning
# See: https://github.com/ranaroussi/yfinance/blob/0713d9386769b168926d3959efd8310b56a33096/yfinance/utils.py#L445-L462

# DataFrame:
# It’s widely used for data analysis, cleaning, and visualization.Supports filtering, sorting, aggregation, and analysis
# See: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

In [17]:
# Get historical data for multiple tickers at once:
tickers = ["META", "AAPL", "AMZN", "NFLX", "GOOGL"]

# Get data:
def get_data(tickers, period="5d", interval="1h"): 
    data = yf.download(tickers, period=period, interval=interval, group_by='ticker', auto_adjust=True) 
    return data
df=get_data(tickers,period="5d", interval="1d")

[*********************100%***********************]  5 of 5 completed


In [18]:
# Saving data into csv file:
# See: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html 

# Date time:
# Used to record the exact date and time
# See: https://docs.python.org/3/library/datetime.html

In [19]:
from datetime import datetime

def save_data(df):
    folder = "data"
    os.makedirs(folder, exist_ok=True)

    # Generate timestamp filename
    timestamp = datetime.now().strftime("%Y%m%d-%H%M%S") 
    filename = f"{timestamp}.csv"

    # Full path
    filepath = os.path.join(folder, filename)

    # Save dataframe
    df.to_csv(filepath, index=False)

    print(f"Saved file: {filepath}")
    return filepath
save_data(df)

Saved file: data\20251215-094702.csv


'data\\20251215-094702.csv'

# Problem 2: Plotting Data

In [20]:
import datetime as dt
import matplotlib.pyplot as plt
import os
import matplotlib
matplotlib.use('Agg')  # Anti-Grain Geometry, used to save graphicals into files not display on screen

In [21]:
# Display first few rows of the DataFrame
# The head() method comes from the pandas implementation of the DataFrame class, which lives in the pandas source code.

df.head()  

Ticker,AMZN,AMZN,AMZN,AMZN,AMZN,GOOGL,GOOGL,GOOGL,GOOGL,GOOGL,...,AAPL,AAPL,AAPL,AAPL,AAPL,META,META,META,META,META
Price,Open,High,Low,Close,Volume,Open,High,Low,Close,Volume,...,Open,High,Low,Close,Volume,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2025-12-08,229.589996,230.830002,226.270004,226.889999,35019200,320.049988,320.440002,311.220001,313.720001,33909400,...,278.130005,279.670013,276.149994,277.890015,38211800,669.340027,676.710022,665.070007,666.799988,13161000
2025-12-09,226.839996,228.570007,225.110001,227.919998,25841700,312.369995,317.98999,311.899994,317.079987,30194000,...,278.160004,280.029999,276.920013,277.179993,32193300,663.77002,664.47998,653.340027,656.960022,12997100
2025-12-10,228.809998,232.419998,228.460007,231.779999,38790700,315.829987,321.309998,314.679993,320.209991,33428900,...,277.75,279.75,276.440002,278.779999,33038300,649.950012,654.51001,643.400024,650.130005,16910900
2025-12-11,230.710007,232.110001,228.690002,230.279999,28249600,320.079987,321.119995,308.600006,312.429993,42353700,...,279.100006,279.589996,273.809998,278.029999,33248000,643.289978,655.280029,640.799988,652.710022,13056700
2025-12-12,230.020004,230.080002,225.119995,226.190002,28000162,313.720001,314.850006,305.559998,309.290009,27122970,...,277.795013,279.220001,276.820007,278.279999,39532887,650.210022,711.0,638.609985,644.22998,10143568


In [22]:
# Display columns of the DataFrame
# columns is implemented as a property inside the DataFrame class.
# It is defined in the pandas.DataFrame class and it returns a pandas.Index object
# See: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.columns.html

df.columns

MultiIndex([( 'AMZN',   'Open'),
            ( 'AMZN',   'High'),
            ( 'AMZN',    'Low'),
            ( 'AMZN',  'Close'),
            ( 'AMZN', 'Volume'),
            ('GOOGL',   'Open'),
            ('GOOGL',   'High'),
            ('GOOGL',    'Low'),
            ('GOOGL',  'Close'),
            ('GOOGL', 'Volume'),
            ( 'NFLX',   'Open'),
            ( 'NFLX',   'High'),
            ( 'NFLX',    'Low'),
            ( 'NFLX',  'Close'),
            ( 'NFLX', 'Volume'),
            ( 'AAPL',   'Open'),
            ( 'AAPL',   'High'),
            ( 'AAPL',    'Low'),
            ( 'AAPL',  'Close'),
            ( 'AAPL', 'Volume'),
            ( 'META',   'Open'),
            ( 'META',   'High'),
            ( 'META',    'Low'),
            ( 'META',  'Close'),
            ( 'META', 'Volume')],
           names=['Ticker', 'Price'])

In [23]:
# Display index of the DataFrame
# The index property is implemented as a property inside the DataFrame class.
# It is an attribute (property) of a pandas DataFrame 
# It represents the row labels of the DataFrame
# See: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.index.html

df.index

DatetimeIndex(['2025-12-08', '2025-12-09', '2025-12-10', '2025-12-11',
               '2025-12-12'],
              dtype='datetime64[ns]', name='Date', freq=None)

In [24]:
# Plotting stock closing prices
# The plot() method is implemented as a method inside the DataFrame class.
# It is defined in the pandas.DataFrame class and it creates various types of plots using matplotlib
# See: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html
df[[('AMZN', 'Close'),
    ('META', 'Close'),
    ('GOOGL', 'Close'),
    ('AAPL', 'Close'),
    ('NFLX', 'Close')]].plot(figsize=(12,6))


<Axes: xlabel='Date'>

In [28]:
# Save plot to file
# See: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html#pandas.DataFrame.plot

def plot_data():
    data_folder = "data"
    plots_folder = "plots"

    # Ensure plots folder exists
    os.makedirs(plots_folder, exist_ok=True)

    # Get list of files in data folder
    files = [os.path.join(data_folder, f) for f in os.listdir(data_folder)]

    # Find latest file by modification time
    latest_file = max(files, key=os.path.getmtime)

    print(f"Opening latest data file: {latest_file}")

    # Read file into DataFrame (supports CSV only here)
    if latest_file.endswith(".csv"):
        df = pd.read_csv(latest_file, header=[0, 1], index_col=0)
    else:
        print("Unsupported file type!")
        return
    

In [26]:
# Ensure df has MultiIndex columns like ('AAPL', 'Close')
tickers = ["AAPL", "AMZN", "META", "NFLX", "GOOGL"]

plt.figure(figsize=(12, 6))

for ticker in tickers: # Iterate over tickers to plot each
        try: 
            plt.plot(df.index, df[(ticker, "Close")], label=ticker) # Access MultiIndex column
        except KeyError:
            print(f"Close price for {ticker} not found")

# Use date from file name (optional improvement)
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S") # Current timestamp

plt.title(f"Closing Prices - {timestamp[:8]}")  # YYYYMMDD
plt.xlabel("Date")
plt.ylabel("Closing Price (USD)")
plt.legend()
plt.tight_layout()


In [27]:
# Save plot
# The savefig() function is part of the matplotlib library and is used to save the current figure to a file.
# See: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html

data_folder = "data"
plots_folder = "plots"
plot_path = os.path.join(plots_folder, f"{timestamp}.png") # Full path
plt.savefig(plot_path, dpi=300) 
plt.close()     # Close the figure to free memory

print(f"Saved plot: {plot_path}") # Print saved plot path

if __name__ == "__main__": #    Entry point for script execution
    plot_data()


Saved plot: plots\20251215-094702.png
Opening latest data file: data\20251215-094702.csv


# Problem 3: Script

## Steps Taken to Create and Run faang.py

I created a new Python file called faang.py inside the root folder of my repository.

I copied the previously written functions:

- download_data()

- plot_data()

I added a shebang line at the very top of the script:

###(#! C:/Users/jmnic/anaconda3/python.exe)


This tells Linux that this file should be executed using Python.

At the bottom of the script I added:
- if __name__ == "__main__":
    download_data()
    plot_data()
This makes sure the script runs automatically when called.

To allow execution using ./faang.py I used:

- chmod +x faang.py


Now I can run the script using either:

./faang.py or python faang.py

When executed: Data is downloaded using yfinance, issStored into the data/ folder,lLatest file is read, a plot is created and saved in the plots/ folder using timestamp naming


# Problem 4: Automation

## Explanation of my workflow 

### Workflow Name
- name: Weekly FAANG Script Run

This is the name that will appear in the GitHub Actions tab. It helps identify which workflow is running.

###   Run Label
- run-name: ${{ github.actor }} created a FAANG workflow run

This sets a dynamic label visible inside workflow history.
${{ github.actor }} prints the username of whoever triggered the workflow manually or by commit.

### Triggers ([on:])
- on:

This section controls when GitHub starts the workflow.

### Scheduled runs     
- 0 9 * * SAT     - Run every Saturday at 09:00 UTC

To Runs automatically every week according to cron syntax: minute hour dayOfMonth month dayOfWeek

###  Manual Trigger
- workflow_dispatch:

This allows you to click a Run workflow button in GitHub → Actions tab.
Useful for testing without waiting until next Saturday.

### Permissions
Script is allowed to write back to the repo, allows the workflow to: commit changes, push commits, create/update files, create tags or releases

### Jobs Section
- jobs:
  run_faang_script:

Jobs = tasks that must run on GitHub’s machine.

### Machine Configuration
- runs-on: ubuntu-latest

GitHub allocates a clean cloud machine with:
✔ Linux OS
✔ Python preinstalled
✔ Git tools
✔ Permissions to run workflows

GitHub uses Linux even if my laptop runs Windows, because are faster setup, cheaper to run, standard environment, Windows runners take longer and are less stable

## Steps Inside the Job

### Checkout repository
- name: Checkout repository
  uses: actions/checkout@v6

To downloads  GitHub repo into the runner and gives access to files (e.g. faang.py, requirements.txt)
Without it, the runner has nothing to execute.

### Configure Python
- name: Set up Python
  uses: actions/setup-python@v5
  with:
    python-version: '3.12'

This ensures that it uses the correct Python version, is in an isolated environment, and is not affected by the GitHub system Python.

### Install dependencies
- name: Install dependencies
  run: |
    python -m pip install --upgrade pip
    pip install -r requirements.txt

Explanation:
First upgrades pip → avoids dependency errors , and then installs everything required for the script.

This includes libraries like:

✔ yfinance
✔ pandas
✔ matplotlib

### Run Python script
- name: Run FAANG script
  run: |
    python faang.py

This executes script exactly like clicking Run locally.the workflow produces results automatically every week if script: downloads stocks, generates plots, saves output into /plots/

### Final Summary 
The workflow : 

- operates automatically at 9:00 UTC on Saturdays.

- can be started manually

- generates a clean Linux environment

- installs the necessary dependencies

- runs faang.py

This ensures that we can receive a fresh financial review every week without open laptop.


## END