In the world of finance, data is king. With the right analysis, it can reveal trends, predict market movements, and even make you a fortune. Today, we dive into a comprehensive financial dataset that spans multiple indices, commodities, and economic indicators. Let's see what insights we can uncover and whether we can predict future market behavior. If you find this notebook useful, consider upvoting it.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.impute import SimpleImputer
import warnings
warnings.filterwarnings('ignore')

## Load the Data
Let's start by loading the dataset and taking a quick look at its structure.

In [None]:
file_path = '/kaggle/input/financial-data/financial_regression.csv'
df = pd.read_csv(file_path)
df.head()

## Data Preprocessing
We'll start by converting the 'date' column to a datetime object and checking for any missing values.

In [None]:
df['date'] = pd.to_datetime(df['date'])
df.info()

In [None]:
df.isnull().sum()

## Exploratory Data Analysis
Let's explore the data with some visualizations to understand the relationships between different financial indicators.

In [None]:
# Plotting S&P 500 close prices over time
plt.figure(figsize=(14, 7))
plt.plot(df['date'], df['sp500 close'], label='S&P 500 Close')
plt.title('S&P 500 Close Prices Over Time')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()

In [None]:
# Correlation heatmap
numeric_df = df.select_dtypes(include=[np.number])
plt.figure(figsize=(16, 12))
sns.heatmap(numeric_df.corr(), annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

## Predictive Modeling
Given the richness of this dataset, let's attempt to predict the S&P 500 closing price using a simple linear regression model.

In [None]:
# Define features and target
features = ['sp500 open', 'sp500 high', 'sp500 low', 'sp500 volume', 'us_rates_%', 'CPI', 'usd_chf', 'eur_usd', 'GDP']
X = df[features]
y = df['sp500 close']

# Handle missing values
imputer = SimpleImputer(strategy='mean')
X = imputer.fit_transform(X)
y = imputer.fit_transform(y.values.reshape(-1, 1)).ravel()

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
mse, r2

## Conclusion
In this notebook, we explored a comprehensive financial dataset, visualized key trends, and built a simple linear regression model to predict the S&P 500 closing price. The model's performance, as indicated by the R-squared value, suggests that while it captures some variance, there is room for improvement. Future analysis could involve more sophisticated models, feature engineering, or incorporating additional data sources to enhance predictive accuracy. If you found this analysis insightful, don't forget to upvote.

## Credits
This notebook was created with the help of [Devra AI data science assistant](https://devra.ai/ref/kaggle)