<a href="https://colab.research.google.com/github/basugautam/Reproducibility-Challenge-Project/blob/Architecture-Files/19_Pre_existing_Implementation_or_Custom_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Mount Google Drive to access the file
from google.colab import drive
drive.mount('/content/drive')

# Import necessary library
import pandas as pd

# Provide the path to the file in Google Drive
file_path = '/content/drive/My Drive/timeseries_data.csv.csv'

# Read the CSV file
df = pd.read_csv(file_path)

# Display the first few rows of the data
df.head()


In [None]:
# Extract and read data from the loaded CSV file
data = df.values  # Convert the DataFrame to a numpy array for easier manipulation

# If the data includes any non-numeric columns, you can separate them
features = df.drop(columns=['target_column_name'])  # replace 'target_column_name' with actual target column
target = df['target_column_name']  # this is assuming your data has a column 'target_column_name'

# Display the first few rows of the extracted data
features.head(), target.head()


In [None]:
# Explanations for various operations done above

# a) Why we are using this strategy:
# We are loading the CSV file from Google Drive because this is the easiest way to access external data
# while working within Google Colab. By mounting Google Drive, we can seamlessly access files stored in it
# and load them into our environment. This allows us to perform analysis and model building tasks using real data.

# b) How these codes, functions, operations will solve our purpose:
# - `drive.mount()` connects Google Colab with Google Drive, enabling access to files stored there.
# - `pd.read_csv()` is used to read the CSV file and convert it into a Pandas DataFrame, which is a powerful data structure
#   for data manipulation and analysis.
# - `df.head()` is used to display the first few rows of the dataset, helping us confirm the data is loaded correctly.
# - `df.values` extracts the raw data as a NumPy array, which can be used for further data processing and model training.
# - `df.drop(columns=['target_column_name'])` separates the features from the target variable to prepare the data for model training.

# c) Explanation of the terms used:
# - `df`: Refers to the DataFrame object created by Pandas when loading the CSV file. A DataFrame is a table-like data structure
#   where columns can have different data types.
# - `pd.read_csv()`: Reads a CSV file and stores it as a Pandas DataFrame.
# - `df.head()`: Displays the first few rows of the dataset for quick inspection.
# - `df.drop(columns=['target_column_name'])`: This function removes the target column (the column we want to predict)
#   and keeps only the features used for prediction.
# - `df.values`: Converts the DataFrame into a NumPy array, which is commonly used in machine learning libraries for further processing.

# d) What we will achieve from this operation:
# By performing these operations, we will successfully load and prepare the data for further processing,
# allowing us to use it for building machine learning models. By separating features and targets,
# we have the data structured in a way suitable for training predictive models.


In [None]:
# Adapting a pre-existing implementation of a transformer model (e.g., from Hugging Face Transformers)

from transformers import TFAutoModelForSequenceClassification, AutoTokenizer

# Load pre-trained model and tokenizer from Hugging Face
model_name = 'bert-base-uncased'  # You can choose another model as per your task
model = TFAutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Tokenize the data
inputs = tokenizer(list(df['text_column_name']), padding=True, truncation=True, return_tensors='tf')

# Compile the model with an optimizer and loss function
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
history = model.fit(inputs['input_ids'], target, epochs=3, batch_size=32)

# Show training history plot
import matplotlib.pyplot as plt

plt.plot(history.history['loss'], label='Train Loss')
plt.legend()
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Model Loss')
plt.show()


In [None]:
# Building a custom transformer-based model using TensorFlow/Keras

from tensorflow import keras
from tensorflow.keras import layers

# Define a simple custom transformer-based model
def build_transformer_model(input_shape):
    input_layer = layers.Input(shape=input_shape)
    x = layers.Embedding(input_dim=5000, output_dim=64)(input_layer)  # Adjust the vocab size and embedding dimension
    x = layers.MultiHeadAttention(num_heads=4, key_dim=64)(x, x)  # Multi-head attention layer
    x = layers.GlobalAveragePooling1D()(x)  # Pooling layer
    x = layers.Dense(64, activation='relu')(x)
    x = layers.Dense(1, activation='sigmoid')(x)  # Output layer for binary classification

    model = keras.Model(inputs=input_layer, outputs=x)
    return model

# Build and compile the model
input_shape = (100,)  # Adjust based on your input size
custom_model = build_transformer_model(input_shape)
custom_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
history = custom_model.fit(inputs['input_ids'], target, epochs=3, batch_size=32)

# Show training history plot
plt.plot(history.history['loss'], label='Train Loss')
plt.legend()
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Custom Model Loss')
plt.show()
