<a href="https://colab.research.google.com/github/biniwollo/DSA-5900/blob/main/SetUpTFFforFiveCompanies.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Step-by-Step Guide to Setting Up Federated Learning with TensorFlow Federated:
1. Install TensorFlow Federated

You first need to install the tensorflow-federated library.

In Google Colab, or your local Python environment, run:

Install TensorFlow Federated

In [4]:
#import TensorFlow
import tensorflow as tf
print(tf.__version__)

2.15.0


In [None]:
#install tensorflow federated
!pip install tensorflow-federated




2. Import TensorFlow Federated

In [1]:
import tensorflow_federated as tff

# Verify TensorFlow Federated version
print(f"TensorFlow Federated Version: {tff.__version__}")



TensorFlow Federated Version: 0.87.0


Retrieve Stock Data from Yahoo Finance

In [None]:
!pip install yfinance



In [2]:
import yfinance as yf
import os
import pandas as pd

# Define the companies and their ticker symbols
companies = {
    'John Deere': 'DE',
    'Archer-Daniels-Midland': 'ADM',
    'Bunge Ltd': 'BG',
    'The Mosaic Company': 'MOS',
    'Corteva': 'CTVA'
}

# Set up directory in the default Colab environment
base_dir = '/content/FinancialData'
os.makedirs(base_dir, exist_ok=True)

# Loop through each company and download the stock data
for company, ticker in companies.items():
    print(f"Downloading data for {company} ({ticker})...")
    stock_data = yf.download(ticker, start='2019-09-16', end='2024-09-13')
    file_path = os.path.join(base_dir, f"{ticker}_stock_data.csv")
    stock_data.to_csv(file_path)
    print(f"Data for {company} ({ticker}) saved successfully at {file_path}")

# Combine all data into a single CSV (optional)
combined_file_path = os.path.join(base_dir, "combined_stock_data.csv")
combined_stock_data = pd.concat([pd.read_csv(os.path.join(base_dir, f"{ticker}_stock_data.csv")) for ticker in companies.values()])
combined_stock_data.to_csv(combined_file_path, index=False)
print(f"Combined stock data saved at: {combined_file_path}")


Downloading data for John Deere (DE)...


[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


Data for John Deere (DE) saved successfully at /content/FinancialData/DE_stock_data.csv
Downloading data for Archer-Daniels-Midland (ADM)...
Data for Archer-Daniels-Midland (ADM) saved successfully at /content/FinancialData/ADM_stock_data.csv
Downloading data for Bunge Ltd (BG)...


[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


Data for Bunge Ltd (BG) saved successfully at /content/FinancialData/BG_stock_data.csv
Downloading data for The Mosaic Company (MOS)...
Data for The Mosaic Company (MOS) saved successfully at /content/FinancialData/MOS_stock_data.csv
Downloading data for Corteva (CTVA)...


[*********************100%%**********************]  1 of 1 completed

Data for Corteva (CTVA) saved successfully at /content/FinancialData/CTVA_stock_data.csv
Combined stock data saved at: /content/FinancialData/combined_stock_data.csv





Now that both TensorFlow (2.14.0) and TensorFlow Federated (0.87.0) are successfully installed and ready to use, we're all set to start working on your project. Let's begin our financial analysis using TensorFlow Federated Learning with the stock data, here's a simple structure to get started:
Steps for Using TensorFlow Federated Learning:

1. Prepare the Dataset: Organize our stock data for each company into client datasets for federated learning.
2. Define the Model: Build a machine learning model (e.g., for stock price prediction or volatility analysis).
3. Federated Training: Set up the federated learning process where each company acts as a client in the federated environment.
4. Evaluate the Model: Assess the performance of the federated model across all clients.

Here's an outline of how you can structure the code:

In [3]:
#prepare the dataset:
import tensorflow as tf
import tensorflow_federated as tff
import pandas as pd
import os

# Path to the saved stock data in Colab
base_dir = '/content/FinancialData'

# Load each company's stock data from the saved CSV files
def load_company_data(ticker):
    file_path = os.path.join(base_dir, f"{ticker}_stock_data.csv")
    return pd.read_csv(file_path)

# Define a function to load stock data for all companies and convert it to tf.data.Dataset
def create_client_data():
    companies = ['DE', 'ADM', 'BG', 'MOS', 'CTVA']  # List of ticker symbols
    client_data = []

    for ticker in companies:
        data = load_company_data(ticker)
        # Use relevant features for the model (e.g., Close price, Volume)
        features = data[['Close', 'Volume']].fillna(0).values.astype('float32') # Changed to float32
        labels = data[['Close']].fillna(0).values.astype('float32')  # Predicting Close price, changed to float32
        dataset = tf.data.Dataset.from_tensor_slices((features, labels))
        dataset = dataset.batch(32)  # Batch the dataset
        client_data.append(dataset)
    return client_data

# Create a simple model for stock price prediction (e.g., linear regression)
def create_model():
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(10, activation='relu', input_shape=(2,)),  # 2 features: Close price, Volume
        tf.keras.layers.Dense(1)  # Output layer for predicting next price
    ])
    return model

# Federated learning process setup
def federated_training():
    client_data = create_client_data()

    # Define input specification
    input_spec = {
        'x': tf.TensorSpec(shape=[None, 2], dtype=tf.float32),
        'y': tf.TensorSpec(shape=[None, 1], dtype=tf.float32)
    }

    # Create a TFF model
    def model_fn():
        keras_model = create_model()
        return tff.learning.models.from_keras_model(
            keras_model=keras_model,
            input_spec=input_spec,
            loss=tf.keras.losses.MeanSquaredError(),
            metrics=[tf.keras.metrics.MeanSquaredError()]
        )

    # Use TFF's internal sgdm optimizer instead of Keras's optimizer
    client_optimizer_fn = tff.learning.optimizers.build_sgdm(learning_rate=0.01)
    server_optimizer_fn = tff.learning.optimizers.build_sgdm(learning_rate=1.0)

    # Build federated averaging process using TFF's internal optimizers
    iterative_process = tff.learning.algorithms.build_weighted_fed_avg(
        model_fn=model_fn,
        client_optimizer_fn=client_optimizer_fn,  # Pass the optimizer directly
        server_optimizer_fn=server_optimizer_fn   # Pass the optimizer directly
    )

    # Initialize the model state
    state = iterative_process.initialize()

    # Simulate training across clients (5 companies)
    for round_num in range(1, 9):
        state, metrics = iterative_process.next(state, client_data)
        print(f'Round {round_num}, Metrics: {metrics}')

# Start federated training
federated_training()




Round 1, Metrics: OrderedDict([('distributor', ()), ('client_work', OrderedDict([('train', OrderedDict([('mean_squared_error', 0.0), ('loss', 0.0), ('num_examples', 0), ('num_batches', 0)]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', OrderedDict([('update_non_finite', 1)]))])
Round 2, Metrics: OrderedDict([('distributor', ()), ('client_work', OrderedDict([('train', OrderedDict([('mean_squared_error', 0.0), ('loss', 0.0), ('num_examples', 0), ('num_batches', 0)]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', OrderedDict([('update_non_finite', 1)]))])
Round 3, Metrics: OrderedDict([('distributor', ()), ('client_work', OrderedDict([('train', OrderedDict([('mean_squared_error', 0.0), ('loss', 0.0), ('num_examples', 0), ('num_batches', 0)]))])), ('aggregator', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('finalizer', OrderedDict([('update_non_finite', 1)]))])
Round 4, Metrics: OrderedDic

Enhance the Model Architecture

In [None]:
import tensorflow as tf
import tensorflow_federated as tff
import pandas as pd
import os

# Path to the saved stock data in Colab
base_dir = '/content/FinancialData'

# Load each company's stock data from the saved CSV files
def load_company_data(ticker):
    file_path = os.path.join(base_dir, f"{ticker}_stock_data.csv")
    return pd.read_csv(file_path)

# Define a function to load stock data for all companies and convert it to tf.data.Dataset
def create_client_data():
    companies = ['DE', 'ADM', 'BG', 'MOS', 'CTVA']  # List of ticker symbols
    client_data = []

    for ticker in companies:
        data = load_company_data(ticker)
        # Debug: Check if data is loaded correctly
        #print(f"Data for {ticker} loaded. Sample:\n", data.head())

        # Use relevant features for the model (e.g., Close price, Volume)
        features = data[['Close', 'Volume']].fillna(0).values.astype('float32')  # Changed to float32
        labels = data[['Close']].fillna(0).values.astype('float32')  # Predicting Close price, changed to float32

        # Debug: Check features and labels
        #print(f"Features shape for {ticker}: {features.shape}")
        #print(f"Labels shape for {ticker}: {labels.shape}")

        # Create dataset and check batches
        dataset = tf.data.Dataset.from_tensor_slices((features, labels))
        dataset = dataset.batch(32)  # Batch the dataset
        client_data.append(dataset)

        # Debug: Check batched dataset
        #print(f"Batched dataset for {ticker}:")
        for batch in dataset:
            #print(batch)
            break  # Print only the first batch for debugging

    return client_data

# Create a simple model for stock price prediction (e.g., linear regression)
def create_model():
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(10, activation='relu', input_shape=(2,)),  # 2 features: Close price, Volume
        tf.keras.layers.Dense(1)  # Output layer for predicting next price
    ])

    # Debug: Check model summary
    model.summary()
    return model

# Define a TFF model
def model_fn():
    try:
        # Create the Keras model using the function defined earlier
        keras_model = create_model()

        # Ensure input_spec matches the dataset structure (features, labels)
        input_spec = (
            tf.TensorSpec(shape=[None, 2], dtype=tf.float32),  # Features (2: Close price, Volume)
            tf.TensorSpec(shape=[None, 1], dtype=tf.float32)   # Labels (1: Close price prediction)
        )

        # Create a TFF model from the Keras model
        tff_model = tff.learning.models.from_keras_model(
            keras_model=keras_model,
            input_spec=input_spec,
            loss=tf.keras.losses.MeanSquaredError(),  # Loss function (regression task)
            metrics=[tf.keras.metrics.MeanSquaredError()]  # Metrics to track
        )

        print("TFF model created successfully.")

        return tff_model

    except Exception as e:
        print(f"Error while creating TFF model: {e}")
        raise e

# Federated learning process setup
def federated_training():
    client_data = create_client_data()

    # Use TFF's internal sgdm optimizer instead of Keras's optimizer
    client_optimizer_fn = tff.learning.optimizers.build_sgdm(learning_rate=0.01)
    server_optimizer_fn = tff.learning.optimizers.build_sgdm(learning_rate=1.0)

    # Build federated averaging process using TFF's internal optimizers
    iterative_process = tff.learning.algorithms.build_weighted_fed_avg(
        model_fn=model_fn,
        client_optimizer_fn=client_optimizer_fn,  # Pass the optimizer directly
        server_optimizer_fn=server_optimizer_fn   # Pass the optimizer directly
    )

    # Initialize the model state
    state = iterative_process.initialize()

    # Simulate training across clients (5 companies)
    for round_num in range(1, 11):
        state, metrics = iterative_process.next(state, client_data)
        print(f'Round {round_num}, Metrics: {metrics}')

# Start federated training
federated_training()


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 10)                30        
                                                                 
 dense_1 (Dense)             (None, 1)                 11        
                                                                 
Total params: 41 (164.00 Byte)
Trainable params: 41 (164.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
TFF model created successfully.
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 10)                30        
                                                                 
 dense_1 (Dense)             (None, 1)                 11        
                                  

In [5]:
import tensorflow as tf
import tensorflow_federated as tff
import pandas as pd
import os

# Path to the saved stock data in Colab
base_dir = '/content/FinancialData'

# Load each company's stock data from the saved CSV files
def load_company_data(ticker):
    file_path = os.path.join(base_dir, f"{ticker}_stock_data.csv")
    return pd.read_csv(file_path)

# Define a function to load stock data for all companies and convert it to tf.data.Dataset
def create_client_data():
    companies = ['DE', 'ADM', 'BG', 'MOS', 'CTVA']  # List of ticker symbols
    client_data = []

    for ticker in companies:
        data = load_company_data(ticker)
        # Debug: Check if data is loaded correctly
        #print(f"Data for {ticker} loaded. Sample:\n", data.head())

        # Use relevant features for the model (e.g., Close price, Volume)
        features = data[['Close', 'Volume']].fillna(0).values.astype('float32')  # Changed to float32
        labels = data[['Close']].fillna(0).values.astype('float32')  # Predicting Close price, changed to float32

        # Debug: Check features and labels
        #print(f"Features shape for {ticker}: {features.shape}")
        #print(f"Labels shape for {ticker}: {labels.shape}")

        # Create dataset and check batches
        dataset = tf.data.Dataset.from_tensor_slices((features, labels))
        dataset = dataset.batch(32)  # Batch the dataset
        client_data.append(dataset)

        # Debug: Check batched dataset
        #print(f"Batched dataset for {ticker}:")
        for batch in dataset:
            #print(batch)
            break  # Print only the first batch for debugging

    return client_data

# Create a simple model for stock price prediction (e.g., linear regression)
def create_model():
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(10, activation='relu', input_shape=(2,)),  # 2 features: Close price, Volume
        tf.keras.layers.Dense(1)  # Output layer for predicting next price
    ])

    # Debug: Check model summary
    model.summary()
    return model

# Define a TFF model
def model_fn():
    try:
        # Create the Keras model using the function defined earlier
        keras_model = create_model()

        # Ensure input_spec matches the dataset structure (features, labels)
        input_spec = (
            tf.TensorSpec(shape=[None, 2], dtype=tf.float32),  # Features (2: Close price, Volume)
            tf.TensorSpec(shape=[None, 1], dtype=tf.float32)   # Labels (1: Close price prediction)
        )

        # Create a TFF model from the Keras model
        tff_model = tff.learning.models.from_keras_model(
            keras_model=keras_model,
            input_spec=input_spec,
            loss=tf.keras.losses.MeanSquaredError(),  # Loss function (regression task)
            metrics=[tf.keras.metrics.MeanSquaredError()]  # Metrics to track
        )

        #print("TFF model created successfully.")

        return tff_model

    except Exception as e:
        #print(f"Error while creating TFF model: {e}")
        raise e

# Federated learning process setup
def federated_training():
    client_data = create_client_data()

    # Use TFF's internal sgdm optimizer instead of Keras's optimizer
    client_optimizer_fn = tff.learning.optimizers.build_sgdm(learning_rate=0.01)
    server_optimizer_fn = tff.learning.optimizers.build_sgdm(learning_rate=1.0)

    # Build federated averaging process using TFF's internal optimizers
    iterative_process = tff.learning.algorithms.build_weighted_fed_avg(
        model_fn=model_fn,
        client_optimizer_fn=client_optimizer_fn,  # Pass the optimizer directly
        server_optimizer_fn=server_optimizer_fn   # Pass the optimizer directly
    )

    # Initialize the model state
    state = iterative_process.initialize()

    # Simulate training across clients (5 companies)
    for round_num in range(1, 11):
        state, metrics = iterative_process.next(state, client_data)
        print(f'Round {round_num}, Metrics: {metrics}')

# Start federated training
federated_training()


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 10)                30        
                                                                 
 dense_1 (Dense)             (None, 1)                 11        
                                                                 
Total params: 41 (164.00 Byte)
Trainable params: 41 (164.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 10)                30        
                                                                 
 dense_1 (Dense)             (None, 1)                 11        
                                                                 


In [None]:
if not data.empty:
    print(f"Data loaded for {ticker}")
else:
    print(f"No data for {ticker}")


NameError: name 'data' is not defined

In [None]:
import os
import pandas as pd
import tensorflow as tf

# Path to the saved stock data in Colab
base_dir = '/content/FinancialData'

# Load each company's stock data from the saved CSV files
def load_company_data(ticker):
    file_path = os.path.join(base_dir, f"{ticker}_stock_data.csv")

    # Check if file exists
    if not os.path.exists(file_path):
        print(f"File {file_path} does not exist.")
        return pd.DataFrame()  # Return empty dataframe if file does not exist

    try:
        data = pd.read_csv(file_path)

        # Check if the data is empty or not
        if data.empty:
            print(f"No data found for {ticker}")
        else:
            print(f"Data loaded for {ticker}: {len(data)} rows.")

        return data

    except Exception as e:
        print(f"Error loading data for {ticker}: {e}")
        return pd.DataFrame()  # Return empty dataframe in case of error

# Define a function to load stock data for all companies and convert it to tf.data.Dataset
def create_client_data():
    companies = ['DE', 'ADM', 'BG', 'MOS', 'CTVA']  # List of ticker symbols
    client_data = []

    for ticker in companies:
        print(f"Loading data for {ticker}...")  # Debug statement
        data = load_company_data(ticker)

        if data.empty:
            print(f"Skipping {ticker} due to empty or missing data.")
            continue

        # Use relevant features for the model (e.g., Close price, Volume)
        features = data[['Close', 'Volume']].fillna(0).values.astype('float32')
        labels = data[['Close']].fillna(0).values.astype('float32')

        # Create dataset and check batches
        dataset = tf.data.Dataset.from_tensor_slices((features, labels))
        dataset = dataset.batch(32)  # Batch the dataset
        client_data.append(dataset)

        # Debug: Check batched dataset
        num_batches = len(list(dataset))
        print(f"Batched dataset for {ticker} contains {num_batches} batches.")

    return client_data

# Start the client data loading process
client_data = create_client_data()



Loading data for DE...
Data loaded for DE: 1257 rows.
Batched dataset for DE contains 40 batches.
Loading data for ADM...
Data loaded for ADM: 1257 rows.
Batched dataset for ADM contains 40 batches.
Loading data for BG...
Data loaded for BG: 1257 rows.
Batched dataset for BG contains 40 batches.
Loading data for MOS...
Data loaded for MOS: 1257 rows.
Batched dataset for MOS contains 40 batches.
Loading data for CTVA...
Data loaded for CTVA: 1257 rows.
Batched dataset for CTVA contains 40 batches.


In [None]:
if len(list(dataset)) == 0:
    print(f"No batches for {ticker}")


NameError: name 'dataset' is not defined

In [None]:
from google.colab import drive
drive.mount('/content/drive')


NotImplementedError: Mounting drive is unsupported in this environment. Use PyDrive instead. See examples at https://colab.research.google.com/notebooks/io.ipynb#scrollTo=7taylj9wpsA2.

In [None]:
import tensorflow as tf
import tensorflow_federated as tff
import pandas as pd
import os

# Path to the saved stock data in Colab
base_dir = '/content/FinancialData'

# Load each company's stock data from the saved CSV files
def load_company_data(ticker):
    file_path = os.path.join(base_dir, f"{ticker}_stock_data.csv")
    return pd.read_csv(file_path)

# Define a function to load stock data for all companies and convert it to tf.data.Dataset
def create_client_data():
    companies = ['DE', 'ADM', 'BG', 'MOS', 'CTVA']  # List of ticker symbols
    client_data = []

    for ticker in companies:
        data = load_company_data(ticker)
        # Debug: Check if data is loaded correctly
        #print(f"Data for {ticker} loaded. Sample:\n", data.head())

        # Use relevant features for the model (e.g., Close price, Volume)
        features = data[['Close', 'Volume']].fillna(0).values.astype('float32')  # Changed to float32
        labels = data[['Close']].fillna(0).values.astype('float32')  # Predicting Close price, changed to float32

        # Debug: Check features and labels
        #print(f"Features shape for {ticker}: {features.shape}")
        #print(f"Labels shape for {ticker}: {labels.shape}")

        # Create dataset and check batches
        dataset = tf.data.Dataset.from_tensor_slices((features, labels))
        dataset = dataset.batch(32)  # Batch the dataset
        client_data.append(dataset)

        # Debug: Check batched dataset
        #print(f"Batched dataset for {ticker}:")
        for batch in dataset:
            #print(batch)
            break  # Print only the first batch for debugging

    return client_data

# Create a simple model for stock price prediction (e.g., linear regression)
def create_model():
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(10, activation='relu', input_shape=(2,)),  # 2 features: Close price, Volume
        tf.keras.layers.Dense(1)  # Output layer for predicting next price
    ])

    # Debug: Check model summary
    model.summary()
    return model

# Define a TFF model
def model_fn():
    try:
        # Create the Keras model using the function defined earlier
        keras_model = create_model()

        # Ensure input_spec matches the dataset structure (features, labels)
        input_spec = (
            tf.TensorSpec(shape=[None, 2], dtype=tf.float32),  # Features (2: Close price, Volume)
            tf.TensorSpec(shape=[None, 1], dtype=tf.float32)   # Labels (1: Close price prediction)
        )

        # Create a TFF model from the Keras model
        tff_model = tff.learning.models.from_keras_model(
            keras_model=keras_model,
            input_spec=input_spec,
            loss=tf.keras.losses.MeanSquaredError(),  # Loss function (regression task)
            metrics=[tf.keras.metrics.MeanSquaredError()]  # Metrics to track
        )

        print("TFF model created successfully.")

        return tff_model

    except Exception as e:
        print(f"Error while creating TFF model: {e}")
        raise e

# Federated learning process setup
def federated_training():
    client_data = create_client_data()

    # Use TFF's internal sgdm optimizer instead of Keras's optimizer
    client_optimizer_fn = tff.learning.optimizers.build_sgdm(learning_rate=0.01)
    server_optimizer_fn = tff.learning.optimizers.build_sgdm(learning_rate=1.0)

    # Build federated averaging process using TFF's internal optimizers
    iterative_process = tff.learning.algorithms.build_weighted_fed_avg(
        model_fn=model_fn,
        client_optimizer_fn=client_optimizer_fn,  # Pass the optimizer directly
        server_optimizer_fn=server_optimizer_fn   # Pass the optimizer directly
    )

    # Initialize the model state
    state = iterative_process.initialize()

    # Simulate training across clients (5 companies)
    for round_num in range(1, 11):
        state, metrics = iterative_process.next(state, client_data)
        print(f'Round {round_num}, Metrics: {metrics}')

# Start federated training
federated_training()


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 10)                30        
                                                                 
 dense_1 (Dense)             (None, 1)                 11        
                                                                 
Total params: 41 (164.00 Byte)
Trainable params: 41 (164.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
TFF model created successfully.
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 10)                30        
                                                                 
 dense_1 (Dense)             (None, 1)                 11        
                                  