# Computer Infrastructure Problems

This Juypter Notebook contains the problems covered in the [ATU module Computer Infrastructure](https://github.com/ianmcloughlin/computer-infrastructure/blob/main/assessment/problems.md)

## Problem 1: Data from yfinance

Using the yfinance Python package, write a function called get_data() that downloads all hourly data for the previous five days for the five FAANG stocks:

Facebook (META)
Apple (AAPL)
Amazon (AMZN)
Netflix (NFLX)
Google (GOOG)
The function should save the data into a folder called data in the root of your repository using a filename with the format YYYYMMDD-HHmmss.csv where YYYYMMDD is the four-digit year (e.g. 2025), followed by the two-digit month (e.g. 09 for September), followed by the two digit day, and HHmmss is hour, minutes, seconds. Create the data folder if you don't already have one.

## Install packages

Install all packages and libraries required for this assessment. Date time will handle dates and times, pandas will create dataframes and yfinance will give access the Yohoo Finance API.  

In [9]:
# Dates and times.
# https://docs.python.org/3/library/datetime.html
import datetime as dt

# Data frames.
# https://pandas.pydata.org/docs/
import pandas as pd


# Yahoo Finance data.
# https://ranaroussi.github.io/yfinance/reference/index.html
import yfinance as yf

# os module for file system operations.
# https://docs.python.org/3/library/os.html
import os


## Define the function.

This function will download hourly data for the previous 5 days for FAANG stocks and saves it as a CSV file. 

The FAANG stocks are: 

Facebook (META)

Apple (AAPL)

Amazon (AMZN)

Netflix (NFLX)

Google (GOOG)

## Create a Tickers object to work with multiple tickers at once.

https://github.com/ranaroussi/yfinance#tickers

Tickers are the identifiers for the function to pull the stock data from the desired company.
 
 
## Download the data using yfinance.

https://github.com/ranaroussi/yfinance#download

Uses yf.download to retrieve historical data for multiple tickers at once.

This will returns a pandas DataFrame with a MultiIndex columns (attribute, ticker).


## Save the data to a CSV file with a timestamped filename

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html

The filename format is YYYYMMDD_HHMMSS.csv

In [10]:
# Create a function to get the data from Yahoo Finance and save it as a CSV file
def get_data():
    # Download data using yfinance (hourly interval, past 5 days)
    # Define FAANG tickers
    data = yf.download(["META", "AAPL", "AMZN", "NFLX", "GOOG"], period="5d", interval="1h")


 # Create data folder directory in the repository.
    data_dir = os.path.join(os.getcwd(), "data")
    # exist ok=True to avoid error if the folder already exists. If rerunning the code to test, it won't fail or make duplicate folders.
    os.makedirs(data_dir, exist_ok=True)
    
    data.to_csv(f"data/{dt.datetime.now().strftime('%Y%m%d_%H%M%S')}.csv")
    return data

get_data()


  data = yf.download(["META", "AAPL", "AMZN", "NFLX", "GOOG"], period="5d", interval="1h")
[                       0%                       ]

[*********************100%***********************]  5 of 5 completed


Price,Close,Close,Close,Close,Close,High,High,High,High,High,...,Open,Open,Open,Open,Open,Volume,Volume,Volume,Volume,Volume
Ticker,AAPL,AMZN,GOOG,META,NFLX,AAPL,AMZN,GOOG,META,NFLX,...,AAPL,AMZN,GOOG,META,NFLX,AAPL,AMZN,GOOG,META,NFLX
Datetime,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2025-10-16 13:30:00+00:00,247.509995,217.574997,255.429993,719.930115,1200.319946,248.380005,218.589996,257.579987,725.48999,1216.709961,...,248.270004,215.669998,252.470001,717.559998,1212.150024,9564057,9257748,5904503,2035894,600363
2025-10-16 14:30:00+00:00,248.235001,216.759995,254.850006,719.125,1195.280029,249.039993,218.535004,256.910004,722.11499,1208.660034,...,247.490005,217.597,255.541306,719.835022,1199.819946,4826932,3851239,1906922,786962,338437
2025-10-16 15:30:00+00:00,247.720001,215.5,252.914993,713.150024,1189.060059,248.663101,217.039993,255.429993,720.398987,1196.349976,...,248.169998,216.75,254.809998,719.070007,1195.0,3305994,3656254,1403542,868860,347924
2025-10-16 16:30:00+00:00,247.220001,215.529999,253.5,711.950012,1188.77002,247.772095,215.860001,253.75,713.780029,1189.969971,...,247.700104,215.5,252.889999,712.5,1188.51001,4045208,5533686,1467532,979445,336606
2025-10-16 17:30:00+00:00,245.660004,213.179993,251.641006,705.436523,1180.157349,247.419998,215.759995,253.729996,712.72998,1189.589844,...,247.220001,215.509995,253.460007,711.820007,1188.77002,3995746,4659136,1237440,1075991,239168
2025-10-16 18:30:00+00:00,246.550003,214.070007,252.110001,708.559998,1179.400024,247.020004,214.869995,252.860001,710.5,1182.5,...,245.639999,213.199997,251.644394,705.434998,1179.410034,3603265,5234988,1346462,888484,239063
2025-10-16 19:30:00+00:00,247.429993,214.470001,251.880005,712.059998,1183.224976,247.910004,215.0,252.740005,713.0,1185.125,...,246.559998,214.100006,252.150497,708.559998,1179.630005,3734949,4346792,2758143,963254,406051
2025-10-17 13:30:00+00:00,249.560104,213.960007,252.130005,713.820007,1194.619995,250.320007,214.800003,252.889999,715.73999,1196.0,...,248.020004,214.559998,251.345001,707.075012,1183.599976,11142993,10392685,4548744,2995068,659310
2025-10-17 14:30:00+00:00,248.479996,211.589996,251.039993,708.960388,1189.930054,249.860001,213.964996,252.479996,713.98999,1197.041016,...,249.643997,213.964996,252.080002,713.705017,1194.47998,3459331,7630123,2490852,1305341,471388
2025-10-17 15:30:00+00:00,250.296494,213.059998,253.550003,712.690002,1197.25,250.350006,213.240005,253.679993,713.590027,1199.131958,...,248.479996,211.595001,251.059998,708.894775,1189.920044,4277714,6111493,1602856,1121503,383715


# Problem 2: Plotting Data

Write a function called plot_data() that opens the latest data file in the data folder and, on one plot, plots the Close prices for each of the five stocks. The plot should include axis labels, a legend, and the date as a title. The function should save the plot into a plots folder in the root of your repository using a filename in the format YYYYMMDD-HHmmss.png. Create the plots folder if you don't already have one.

## Install packages.

Install packages and libraries for the next assignment. 

In [11]:
# Dates and times
# https://docs.python.org/3/library/datetime.html
import datetime as dt

# Data frames
# https://pandas.pydata.org/docs/
import pandas as pd

# Plotting
# https://matplotlib.org/stable/api/pyplot_api.html
import matplotlib.pyplot as plt

# File system operations
import os


 ## Define the function.
 
 This function opens the latest CSV file in the 'data' folder, plots Close prices for META, AAPL, AMZN, NFLX, and GOOG, and saves the plot in a 'plots' folder.

In [12]:
# Define function to plot data.

def plot_data():
    

    # Set the path to the 'data' folder.
    data_dir = "data"

    # Get all CSV files in the data folder.
    csv_files = [f for f in os.listdir(data_dir) if f.endswith(".csv")]
    # use if not to check if the list is empty.
    if not csv_files:
        # raise an error if there are no cvs files found in the folder.
        raise FileNotFoundError("No CSV files found in the 'data' folder. Run get_data() first.")

    # Find the latest CSV file.
    # key lambda function will get the creation time of the file so the function can find the latest one.
    # https://docs.python.org/3/library/functions.html#max
    # https://docs.python.org/3/library/os.path.html#os.path.getctime
    # https://docs.python.org/3/library/functions.html#lambda
    # os.path.join is used to create the full path to the file.
    latest_file = max(csv_files, key=lambda f: os.path.getctime(os.path.join(data_dir, f)))

    # Create the full path to the latest file.
    latest_path = os.path.join(data_dir, latest_file)
    # print a message indicating which file is being loaded.
    print(f"Loading latest data file: {latest_path}")

    # Load the CSV into a pandas DataFrame.
    # https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
    data = pd.read_csv(latest_path, index_col=0, parse_dates=True)

    # Extract Close columns for each FAANG stock
    tickers = ["META", "AAPL", "AMZN", "NFLX", "GOOG"]
    # Create a dataframe.
    close_data = pd.DataFrame()

    # Create a loop to pull the close data for eacg ticker.
    # https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html
    # https://www.geeksforgeeks.org/python/python-if-else/

    for ticker in tickers:
        # Create the column name for the close prices.
        close_column = f"{ticker}_Close"
        if close_column in data.columns:
            close_data[ticker] = data[close_column]
        # Create a warning if the column is not found.
        else:
            print(f"Warning: {close_column} not found in data file.")

    # Create a plot.
    plt.figure(figsize=(12, 8))
    for ticker in close_data.columns:
        plt.plot(close_data.index, close_data[ticker], label=ticker)

    # Add axis labels, title, and legend
    # https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.xlabel.html

    plt.xlabel("Date and Time")
    plt.ylabel("Close Price (USD)")
    plt.title(f"FAANG Stock Close Prices - {dt.datetime.now().strftime('%Y-%m-%d')}")
    plt.legend()

    # Create a 'plots' folder if it doesn't exist
    plots_dir = "plots"

    # exist ok=True to avoid error if the folder already exists. If rerunning the code to test, it won't fail or make duplicate folders.
    os.makedirs(plots_dir, exist_ok=True)

    # Save the plot with timestamped filename
    timestamp = dt.datetime.now().strftime("%Y%m%d-%H%M%S")
    plot_path = os.path.join(plots_dir, f"{timestamp}.png")
    # tight layout to prevent clipping of labels
    plt.tight_layout()
    plt.savefig(plot_path)
    # Close the plot to free memory
    # https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.close.html
    plt.close()

    print(f"Plot successfully saved to: {plot_path}")
    return plot_path


# Run the plotting function
plot_data()

# show plots in the notebook
plt.show()


  data = pd.read_csv(latest_path, index_col=0, parse_dates=True)
  plt.legend()


Loading latest data file: data/20251023_095827.csv
Plot successfully saved to: plots/20251023-095827.png


need to look into these errors 