<a href="https://colab.research.google.com/github/biniwollo/Fall24-DSA-5900/blob/main/SetUp_TFF_for_FiveCompanies_V02.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Step-by-Step Guide to Setting Up Federated Learning with TensorFlow Federated:
1. Install TensorFlow Federated

You first need to install the tensorflow-federated library.

In Google Colab, or your local Python environment, run:

Install TensorFlow Federated

In [5]:
#import TensorFlow
import tensorflow as tf
print(tf.__version__)

2.14.1


In [5]:
#install tensorflow federated
!pip install tensorflow-federated




Collecting tensorflow-federated
  Downloading tensorflow_federated-0.87.0-py3-none-manylinux_2_31_x86_64.whl.metadata (19 kB)
Collecting attrs~=23.1 (from tensorflow-federated)
  Downloading attrs-23.2.0-py3-none-any.whl.metadata (9.5 kB)
Collecting dp-accounting==0.4.3 (from tensorflow-federated)
  Downloading dp_accounting-0.4.3-py3-none-any.whl.metadata (1.8 kB)
Collecting google-vizier==0.1.11 (from tensorflow-federated)
  Downloading google_vizier-0.1.11-py3-none-any.whl.metadata (10 kB)
Collecting jaxlib==0.4.14 (from tensorflow-federated)
  Downloading jaxlib-0.4.14-cp310-cp310-manylinux2014_x86_64.whl.metadata (2.0 kB)
Collecting jax==0.4.14 (from tensorflow-federated)
  Downloading jax-0.4.14.tar.gz (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m21.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml

2. Import TensorFlow Federated

In [1]:
import tensorflow_federated as tff

# Verify TensorFlow Federated version
print(f"TensorFlow Federated Version: {tff.__version__}")



TensorFlow Federated Version: 0.87.0


Retrieve Stock Data from Yahoo Finance

In [2]:
!pip install yfinance

Collecting yfinance
  Downloading yfinance-0.2.44-py2.py3-none-any.whl.metadata (13 kB)
Collecting multitasking>=0.0.7 (from yfinance)
  Downloading multitasking-0.0.11-py3-none-any.whl.metadata (5.5 kB)
Collecting lxml>=4.9.1 (from yfinance)
  Downloading lxml-5.3.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.8 kB)
Collecting frozendict>=2.3.4 (from yfinance)
  Downloading frozendict-2.4.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (23 kB)
Collecting peewee>=3.16.2 (from yfinance)
  Downloading peewee-3.17.6.tar.gz (3.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m38.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting html5lib>=1.1 (from yfinance)
  Downloading html5lib-1.1-py2.py3-none-any.whl.metadata (16 kB)
Downloading yfinance-0.2.44-py2.py3-

In [3]:
import yfinance as yf
import os
import pandas as pd

# Define the companies and their ticker symbols
companies = {
    'John Deere': 'DE',
    'Archer-Daniels-Midland': 'ADM',
    'Bunge Ltd': 'BG',
    'The Mosaic Company': 'MOS',
    'Corteva': 'CTVA'
}

# Set up directory in the default Colab environment
base_dir = '/content/FinancialData'
os.makedirs(base_dir, exist_ok=True)

# Loop through each company and download the stock data
for company, ticker in companies.items():
    print(f"Downloading data for {company} ({ticker})...")
    stock_data = yf.download(ticker, start='2019-09-16', end='2024-09-13')
    file_path = os.path.join(base_dir, f"{ticker}_stock_data.csv")
    stock_data.to_csv(file_path)
    print(f"Data for {company} ({ticker}) saved successfully at {file_path}")

# Combine all data into a single CSV (optional)
combined_file_path = os.path.join(base_dir, "combined_stock_data.csv")
combined_stock_data = pd.concat([pd.read_csv(os.path.join(base_dir, f"{ticker}_stock_data.csv")) for ticker in companies.values()])
combined_stock_data.to_csv(combined_file_path, index=False)
print(f"Combined stock data saved at: {combined_file_path}")


Downloading data for John Deere (DE)...


[*********************100%***********************]  1 of 1 completed


Data for John Deere (DE) saved successfully at /content/FinancialData/DE_stock_data.csv
Downloading data for Archer-Daniels-Midland (ADM)...


[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

Data for Archer-Daniels-Midland (ADM) saved successfully at /content/FinancialData/ADM_stock_data.csv
Downloading data for Bunge Ltd (BG)...





Data for Bunge Ltd (BG) saved successfully at /content/FinancialData/BG_stock_data.csv
Downloading data for The Mosaic Company (MOS)...


[*********************100%***********************]  1 of 1 completed


Data for The Mosaic Company (MOS) saved successfully at /content/FinancialData/MOS_stock_data.csv
Downloading data for Corteva (CTVA)...


[*********************100%***********************]  1 of 1 completed

Data for Corteva (CTVA) saved successfully at /content/FinancialData/CTVA_stock_data.csv
Combined stock data saved at: /content/FinancialData/combined_stock_data.csv





Now that both TensorFlow (2.14.0) and TensorFlow Federated (0.87.0) are successfully installed and ready to use, we're all set to start working on your project. Let's begin our financial analysis using TensorFlow Federated Learning with the stock data, here's a simple structure to get started:
Steps for Using TensorFlow Federated Learning:

1. Prepare the Dataset: Organize our stock data for each company into client datasets for federated learning.
2. Define the Model: Build a machine learning model (e.g., for stock price prediction or volatility analysis).
3. Federated Training: Set up the federated learning process where each company acts as a client in the federated environment.
4. Evaluate the Model: Assess the performance of the federated model across all clients.

Here's an outline of how you can structure the code:

In [6]:
import pandas as pd
import os

# Path to the saved stock data in Colab
base_dir = '/content/FinancialData'

# Load each company's stock data from the saved CSV files
def load_company_data(ticker):
    file_path = os.path.join(base_dir, f"{ticker}_stock_data.csv")
    return pd.read_csv(file_path)

# Define a function to load stock data for all companies and convert it to tf.data.Dataset
def create_client_data():
    companies = ['DE', 'ADM', 'BG', 'MOS', 'CTVA']  # List of ticker symbols
    client_data = []

    for ticker in companies:
        data = load_company_data(ticker)
        # Debug: Check if data is loaded correctly
        #print(f"Data for {ticker} loaded. Sample:\n", data.head())

        # Use relevant features for the model (e.g., Close price, Volume)
        features = data[['Open', 'Volume']].fillna(0).values.astype('float32')  # Changed to float32
        labels = data[['Close']].fillna(0).values.astype('float32')  # Predicting Close price, changed to float32

        # Debug: Check features and labels
        #print(f"Features shape for {ticker}: {features.shape}")
        #print(f"Labels shape for {ticker}: {labels.shape}")

        # Create dataset and check batches
        dataset = tf.data.Dataset.from_tensor_slices((features, labels))
        dataset = dataset.batch(32)  # Batch the dataset
        client_data.append(dataset)

        # Debug: Check batched dataset
        #print(f"Batched dataset for {ticker}:")
        #for batch in dataset:
            #print(batch)

        return client_data


clientData = create_client_data()

for batch in clientData[0]:
  print(batch)
  break


(<tf.Tensor: shape=(32, 2), dtype=float32, numpy=
array([[1.6414e+02, 1.1647e+06],
       [1.6299e+02, 1.3478e+06],
       [1.6347e+02, 1.2159e+06],
       [1.6515e+02, 9.8600e+05],
       [1.6483e+02, 2.9427e+06],
       [1.6289e+02, 1.4123e+06],
       [1.6561e+02, 3.1248e+06],
       [1.6568e+02, 2.8140e+06],
       [1.6517e+02, 1.8088e+06],
       [1.6650e+02, 1.5912e+06],
       [1.6678e+02, 1.8287e+06],
       [1.6960e+02, 2.2995e+06],
       [1.6400e+02, 2.6463e+06],
       [1.6321e+02, 1.3675e+06],
       [1.6480e+02, 1.2173e+06],
       [1.6794e+02, 1.2216e+06],
       [1.6429e+02, 1.8486e+06],
       [1.6640e+02, 2.0891e+06],
       [1.6651e+02, 1.9968e+06],
       [1.7000e+02, 5.0068e+06],
       [1.7080e+02, 1.7690e+06],
       [1.6956e+02, 1.8985e+06],
       [1.7206e+02, 1.8241e+06],
       [1.7203e+02, 1.2034e+06],
       [1.7203e+02, 1.6162e+06],
       [1.7466e+02, 1.3752e+06],
       [1.7475e+02, 1.4866e+06],
       [1.7368e+02, 1.2271e+06],
       [1.7500e+02, 2.4787

In [48]:
# Create a simple model for stock price prediction (e.g., linear regression)
def create_model():
    model = tf.keras.models.Sequential([
      tf.keras.layers.InputLayer(input_shape=(2,)),
      tf.keras.layers.Dense(10, kernel_initializer='zeros'),
      tf.keras.layers.Dense(10, kernel_initializer='zeros')
  ])

    # Debug: Check model summary
    #print(model.summary())
    return model

create_model()

<keras.src.engine.sequential.Sequential at 0x7d9dec4dfe50>

In [53]:
# Define a TFF model
def model_fn():
    try:
        # Create the Keras model using the function defined earlier
        keras_model = create_model()

        # Ensure input_spec matches the dataset structure (features, labels)
        input_spec = (
            tf.TensorSpec(shape=[None, 2], dtype=tf.float32),  # Features (2: Close price, Volume)
            tf.TensorSpec(shape=[None, 1], dtype=tf.float32)   # Labels (1: Close price prediction)
        )

        # Create a TFF model from the Keras model
        tff_model = tff.learning.models.from_keras_model(
            keras_model=keras_model,
            input_spec=input_spec,
            loss=tf.keras.losses.Huber(),  # Loss function (regression task)
            metrics=[tf.keras.metrics.MeanSquaredError()]  # Metrics to track
        )

        print("TFF model created successfully.")

        return tff_model

    except Exception as e:
        print(f"Error while creating TFF model: {e}")
        raise e

# Federated learning process setup
def federated_training():
    client_data = create_client_data()

    # Use TFF's internal sgdm optimizer instead of Keras's optimizer
    client_optimizer_fn = tff.learning.optimizers.build_sgdm(learning_rate=0.1)
    server_optimizer_fn = tff.learning.optimizers.build_sgdm(learning_rate=0.9)

    # Build federated averaging process using TFF's internal optimizers
    iterative_process = tff.learning.algorithms.build_weighted_fed_avg(
        model_fn=model_fn,
        client_optimizer_fn=client_optimizer_fn,  # Pass the optimizer directly
        server_optimizer_fn=server_optimizer_fn   # Pass the optimizer directly
    )

    print(iterative_process.initialize.type_signature.formatted_representation())

    # Initialize the model state
    state = iterative_process.initialize()

    print('Initial State....')
    print(state.distributor)

    # Simulate training across clients (5 companies)
    for round_num in range(1, 30):
        state, metrics = iterative_process.next(state, client_data)
        if round_num==29:
          print(state)
        print(f'Round {round_num}, Metrics: {metrics}')

# Start federated training
federated_training()

TFF model created successfully.
TFF model created successfully.
TFF model created successfully.
( -> <
  global_model_weights=<
    trainable=<
      float32[2,10],
      float32[10],
      float32[10,10],
      float32[10]
    >,
    non_trainable=<>
  >,
  distributor=<>,
  client_work=<
    learning_rate=float32
  >,
  aggregator=<
    value_sum_process=<>,
    weight_sum_process=<>
  >,
  finalizer=<
    learning_rate=float32
  >
>@SERVER)
Initial State....
()
Round 1, Metrics: OrderedDict([('distributor', ()), ('client_work', OrderedDict([('train', OrderedDict([('mean_squared_error', 113929.445), ('loss', 324.53735), ('num_examples', 1257), ('num_batches', 40)]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', OrderedDict([('update_non_finite', 0)]))])
Round 2, Metrics: OrderedDict([('distributor', ()), ('client_work', OrderedDict([('train', OrderedDict([('mean_squared_error', 113695.555), ('loss', 324.17737), ('num_examples', 1257), ('num