# USD rate EDA
Quick exploratory data analysis for `datasets/usd_rates.csv` including diagnostics, summary statistics, and core visuals.


If visualization packages are missing, install with `pip install seaborn matplotlib` before running the notebook.


## Load data
Parse dates, sort chronologically, and take a first look.


In [None]:
data_path = Path("datasets/usd_rates.csv")
df = pd.read_csv(data_path)



## Data structure and integrity
Basic dtype overview plus missing and duplicate checks.


## Summary statistics
Numeric summary with custom percentiles and IQR for spread.


In [None]:
num_cols = df.select_dtypes(include=["number"]).columns
summary = df[num_cols].describe(percentiles=[0.05, 0.25, 0.5, 0.75, 0.95]).T
summary["iqr"] = (summary["75%"] - summary["25%"]).round(2)
summary


## Rate trends by year
Quick roll-up to see how the USD rate evolved by calendar year.


In [None]:
yearly = (
    df.assign(Year=df["Date"].dt.year)
    .groupby("Year")["Rate"]
    .agg(["count", "mean", "std", "min", "max"])
)
yearly


## Distribution views
Histograms with KDE and boxplots for `Rate` and `Diff`.


In [None]:
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
sns.histplot(df["Rate"], kde=True, ax=axes[0, 0])
axes[0, 0].set_title("Rate distribution")
sns.histplot(df["Diff"].dropna(), kde=True, ax=axes[0, 1])
axes[0, 1].set_title("Diff distribution")
sns.boxplot(x=df["Rate"], ax=axes[1, 0])
axes[1, 0].set_title("Rate boxplot")
sns.boxplot(x=df["Diff"].dropna(), ax=axes[1, 1])
axes[1, 1].set_title("Diff boxplot")
plt.tight_layout()


## Time-series view
Line plot to track the USD rate over time.


In [None]:
fig, ax = plt.subplots(figsize=(12, 4))
sns.lineplot(data=df, x="Date", y="Rate", ax=ax, linewidth=1.5)
ax.set_title("USD rate over time")
ax.set_xlabel("Date")
ax.set_ylabel("Rate")
plt.xticks(rotation=45)
plt.tight_layout()


## Relationships
Scatter for `Rate` vs `Diff` and a numeric correlation heatmap.


In [None]:
fig, ax = plt.subplots(figsize=(8, 5))
sns.scatterplot(data=df, x="Rate", y="Diff", alpha=0.4, ax=ax)
ax.set_title("Rate vs Diff")
plt.tight_layout()

corr = df.select_dtypes(include=["number"]).corr()
fig, ax = plt.subplots(figsize=(6, 4))
sns.heatmap(corr, annot=True, fmt=".2f", cmap="crest", vmin=-1, vmax=1, ax=ax)
ax.set_title("Numeric correlation")
plt.tight_layout()
