<div style="text-align:center; border-radius:15px; padding:15px; color:white; margin:0; font-family: 'Orbitron', sans-serif; background: #2E0249; background: #11001C; box-shadow: 0px 4px 8px rgba(0, 0, 0, 0.3); overflow:hidden; margin-bottom: 1em;">    <div style="font-size:150%; color:#FEE100"><b>Foreign Currency Exchange Analysis</b></div>    <div>This notebook was created with the help of <a href="https://devra.ai/ref/kaggle" style="color:#6666FF">Devra AI</a></div></div>

## Table of Contents

1. [Data Loading and Preliminary Exploration](#Data-Loading-and-Preliminary-Exploration)
2. [Data Cleaning and Preprocessing](#Data-Cleaning-and-Preprocessing)
3. [Exploratory Data Analysis](#Exploratory-Data-Analysis)
4. [Correlation Analysis](#Correlation-Analysis)
5. [Prediction Modeling with Regression](#Prediction-Modeling-with-Regression)
6. [Conclusion and Future Analysis](#Conclusion-and-Future-Analysis)

If you find this notebook useful, please upvote it.

In [None]:
# Importing required libraries and suppress warnings
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import numpy as np

import matplotlib
matplotlib.use('Agg')  # Use Agg backend if using entire matplotlib module
import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error

from sklearn.inspection import permutation_importance

# Set a style for seaborn plots
sns.set(style="whitegrid")

# For reproducibility
np.random.seed(42)

# Print a message indicating successful imports
print('Libraries imported successfully.')

## Data Loading and Preliminary Exploration

The foreign currency exchange data presents an opportunity to analyze exchange rate dynamics for various currency pairs. The data includes base and quote currencies, the year, average rates along with high and low rates, and an index of volatility. Let us load the dataset and take a look at the initial details.

In [None]:
# Load the dataset
file_path = 'foreign_currency_exchange_synthetic.csv'
df = pd.read_csv(file_path, encoding='ascii', delimiter=',')

# Display the first few rows
print('First 5 rows of the dataset:')
display(df.head())

# Display a summary of the dataframe
print('\nDataset info:')
display(df.info())

# Display summary statistics for numeric columns
print('\nSummary statistics:')
display(df.describe())

## Data Cleaning and Preprocessing

In this section we ensure the data is in proper format. The synthetic dataset already specifies column types, but it is always beneficial to inspect the data for any inconsistencies or missing values. Note that although there is a 'Year' column, it is numeric and not a full date. In cases where a more detailed date is required, further date parsing might be needed.

In [None]:
# Checking for missing values
missing_values = df.isnull().sum()
print('Missing values in each column:')
display(missing_values)

# If missing values are encountered, you might use different strategies like fillna or dropna.
if missing_values.sum() > 0:
    print('Missing values found. Consider imputation methods or dropping rows/columns with missing data.')
else:
    print('No missing values found.')

# Convert the Year column if needed. Here, 'Year' is integer already, so we are all set.
df['Year'] = pd.to_numeric(df['Year'], errors='coerce')

# Ensure proper data types for numeric features
numeric_columns = ['Year', 'Average_Exchange_Rate', 'High_Rate', 'Low_Rate', 'Volatility_Index']
df[numeric_columns] = df[numeric_columns].apply(pd.to_numeric, errors='coerce')

print('Data cleaning and preprocessing complete.')

## Exploratory Data Analysis

It is time to dive deeper into individual features and relationships between them. The following analysis uses various plots such as histograms to observe distributions, count plots to examine the currency distributions, and pair plots to inspect relationships among numerical variables.

In [None]:
# Plot count plots for Base_Currency and Quote_Currency
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
sns.countplot(data=df, x='Base_Currency', palette='viridis')
plt.title('Count of Base Currency')

plt.subplot(1, 2, 2)
sns.countplot(data=df, x='Quote_Currency', palette='magma')
plt.title('Count of Quote Currency')

plt.tight_layout()
plt.show()

# Plot histogram for numeric values
numeric_df = df.select_dtypes(include=[np.number])
numeric_df.hist(bins=15, figsize=(15, 10), color='cornflowerblue', edgecolor='black')
plt.suptitle('Histograms for Numeric Columns', fontsize=16)
plt.show()

In [None]:
# Creating a pair plot to explore relationships among numerical features
sns.pairplot(numeric_df, diag_kind='hist', palette='Set2')
plt.suptitle('Pair Plot of Numeric Variables', y=1.02)
plt.show()

## Correlation Analysis

To better understand the interactions among the numerical features, we produce a heatmap of the correlation matrix. Given that there are more than three numeric columns, this approach is valid and highly useful. Correlation analysis can also hint at potential multicollinearity issues which might affect predictive models.

In [None]:
# Compute correlation matrix
corr_matrix = numeric_df.corr()

plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Heatmap of Numeric Features')
plt.show()

## Prediction Modeling with Regression

In an effort to leverage the data insights, we will develop a regression model to predict the Average Exchange Rate. The features used in the model include Year, High Rate, Low Rate and Volatility Index. This approach is effective when there is a linear relationship between the predictors and the target variable. We perform a train-test split, train a simple Linear Regression model, evaluate its performance, and explore feature importances via permutation importance.

Note: Selecting features for a predictor can be tricky when features are interrelated, as in the high and low values relative to the average. Future analysis can explore alternative models such as tree-based methods.


In [None]:
# Prepare features and target variable
features = ['Year', 'High_Rate', 'Low_Rate', 'Volatility_Index']
target = 'Average_Exchange_Rate'

X = df[features]
y = df[target]

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the Linear Regression model
lr = LinearRegression()
lr.fit(X_train, y_train)

# Predict on the test data
y_pred = lr.predict(X_test)

# Evaluate the model
r2 = r2_score(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
print(f'R-squared Score: {r2:.3f}')
print(f'Mean Absolute Error: {mae:.3f}')

# Permutation importance to assess feature importances
result = permutation_importance(lr, X_test, y_test, n_repeats=10, random_state=42, scoring='r2')
importance_df = pd.DataFrame({
    'feature': features,
    'importance_mean': result.importances_mean
}).sort_values(by='importance_mean', ascending=True)

plt.figure(figsize=(8, 6))
plt.barh(importance_df['feature'], importance_df['importance_mean'], color='teal')
plt.title('Permutation Importance of Features')
plt.xlabel('Mean Importance')
plt.ylabel('Feature')
plt.show()

## Conclusion and Future Analysis

This notebook presented a detailed analysis of a synthetic foreign currency exchange dataset. We began with data cleaning, performed various exploratory analyses using histograms, pair plots, and correlation heatmaps, and finally developed a regression model to predict the Average Exchange Rate. The model evaluation showed the model's performance using R-squared and Mean Absolute Error, and permutation importance highlighted the most impactful features.

The linear model, while straightforward, can be a starting point for more complex modeling strategies, such as incorporating polynomial features or ensemble methods. Future analysis may also consider temporal trends over multiple years or incorporate more granular date information into the model. If you find these insights helpful, please consider upvoting this notebook.