In [None]:
},
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check for missing values and data quality\n",
    "print(\"Data Quality Assessment:\")\n",
    "print(\"\\nMissing values per column:\")\n",
    "missing_data = stock_data.isnull().sum()\n",
    "print(missing_data[missing_data > 0] if missing_data.sum() > 0 else \"No missing values found!\")\n",
    "\n",
    "# Check for duplicate dates\n",
    "duplicate_dates = stock_data.index.duplicated().sum()\n",
    "print(f\"\\nDuplicate dates: {duplicate_dates}\")\n",
    "\n",
    "# Check OHLC relationships\n",
    "ohlc_violations = (\n",
    "    (stock_data['High'] < stock_data['Low']) |\n",
    "    (stock_data['High'] < stock_data['Open']) |\n",
    "    (stock_data['High'] < stock_data['Close']) |\n",
    "    (stock_data['Low'] > stock_data['Open']) |\n",
    "    (stock_data['Low'] > stock_data['Close'])\n",
    ").sum()\n",
    "\n",
    "print(f\"OHLC relationship violations: {ohlc_violations}\")\n",
    "\n",
    "# Check for extreme price movements (>20% in one day)\n",
    "extreme_moves = (abs(stock_data['Daily_Return']) > 0.20).sum()\n",
    "print(f\"Days with >20% price movement: {extreme_moves}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. Next Steps and Recommendations\n",
    "\n",
    "Based on this initial data exploration, here are the recommended next steps:\n",
    "\n",
    "### Key Findings:\n",
    "1. **Data Quality**: The dataset appears clean with no missing values or OHLC violations\n",
    "2. **Market Behaviour**: The stock shows typical financial time-series characteristics with volatility clustering\n",
    "3. **Technical Indicators**: RSI, MACD, and Bollinger Bands provide complementary signals\n",
    "4. **Market Regimes**: Clear distinction between trending and ranging periods\n",
    "\n",
    "### Recommendations for Next Analysis:\n",
    "1. **Feature Engineering** (`02_feature_engineering.ipynb`):\n",
    "   - Create lagged features for time-series dependencies\n",
    "   - Engineer trend strength and momentum indicators\n",
    "   - Calculate rolling statistics and ratios\n",
    "   - Add market sentiment and fundamental indicators\n",
    "\n",
    "2. **Model Development** (`03_model_development.ipynb`):\n",
    "   - Implement classification models for trend direction prediction\n",
    "   - Develop regression models for price forecasting\n",
    "   - Use time-series aware validation techniques\n",
    "   - Compare multiple algorithms (Logistic, RF, XGBoost, LSTM)\n",
    "\n",
    "3. **Performance Analysis** (`04_results_analysis.ipynb`):\n",
    "   - Evaluate model performance across different market regimes\n",
    "   - Analyse feature importance and model interpretability\n",
    "   - Conduct walk-forward validation\n",
    "   - Calculate financial metrics (Sharpe ratio, maximum drawdown)\n",
    "\n",
    "### Important Considerations:\n",
    "- **Overfitting Risk**: Use proper temporal validation to avoid lookahead bias\n",
    "- **Market Changes**: Models may need retraining as market conditions evolve\n",
    "- **Transaction Costs**: Real-world trading involves fees and slippage\n",
    "- **Risk Management**: Always consider position sizing and stop-loss strategies\n",
    "\n",
    "### Disclaimer:\n",
    "This analysis is for educational purposes only and should not be considered financial advice. Always consult qualified professionals before making investment decisions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Final data validation check\n",
    "print(\"Final Data Validation:\")\n",
    "print(f\"✓ Dataset shape: {enhanced_data.shape}\")\n",
    "print(f\"✓ Date range: {enhanced_data.index.min().date()} to {enhanced_data.index.max().date()}\")\n",
    "print(f\"✓ Missing values: {enhanced_data.isnull().sum().sum()}\")\n",
    "print(f\"✓ Technical indicators: {len([col for col in enhanced_data.columns if col not in ['Open', 'High', 'Low', 'Close', 'Volume', 'Symbol', 'Date']])}\")\n",
    "print(f\"✓ Data types: {enhanced_data.dtypes.value_counts().to_dict()}\")\n",
    "\n",
    "print(\"\\n🎉 Data exploration completed successfully!\")\n",
    "print(\"Ready to proceed with feature engineering and model development.\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
},
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Price Trend Visualisation\n",
    "\n",
    "Create visualisations to understand price movements and patterns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create comprehensive price chart\n",
    "fig, axes = plt.subplots(2, 2, figsize=(15, 12))\n",
    "fig.suptitle(f'{symbol} Stock Analysis - Last 2 Years', fontsize=16, fontweight='bold')\n",
    "\n",
    "# 1. Closing price over time\n",
    "axes[0, 0].plot(stock_data.index, stock_data['Close'], linewidth=1.5, color='navy')\n",
    "axes[0, 0].set_title('Closing Price Over Time')\n",
    "axes[0, 0].set_ylabel('Price (£)')\n",
    "axes[0, 0].grid(True, alpha=0.3)\n",
    "axes[0, 0].tick_params(axis='x', rotation=45)\n",
    "\n",
    "# 2. Daily returns distribution\n",
    "axes[0, 1].hist(stock_data['Daily_Return'].dropna(), bins=50, alpha=0.7, color='darkgreen', edgecolor='black')\n",
    "axes[0, 1].axvline(stock_data['Daily_Return'].mean(), color='red', linestyle='--', \n",
    "                   label=f'Mean: {stock_data[\"Daily_Return\"].mean()*100:.2f}%')\n",
    "axes[0, 1].set_title('Daily Returns Distribution')\n",
    "axes[0, 1].set_xlabel('Daily Return')\n",
    "axes[0, 1].set_ylabel('Frequency')\n",
    "axes[0, 1].legend()\n",
    "axes[0, 1].grid(True, alpha=0.3)\n",
    "\n",
    "# 3. Volume over time\n",
    "axes[1, 0].bar(stock_data.index, stock_data['Volume'], width=1, alpha=0.6, color='orange')\n",
    "axes[1, 0].set_title('Trading Volume Over Time')\n",
    "axes[1, 0].set_ylabel('Volume')\n",
    "axes[1, 0].tick_params(axis='x', rotation=45)\n",
    "axes[1, 0].grid(True, alpha=0.3)\n",
    "\n",
    "# 4. Price volatility (30-day rolling)\n",
    "rolling_vol = stock_data['Daily_Return'].rolling(window=30).std() * np.sqrt(252) * 100  # Annualised volatility\n",
    "axes[1, 1].plot(stock_data.index, rolling_vol, color='purple', linewidth=1.5)\n",
    "axes[1, 1].set_title('30-Day Rolling Volatility (Annualised)')\n",
    "axes[1, 1].set_ylabel('Volatility (%)')\n",
    "axes[1, 1].tick_params(axis='x', rotation=45)\n",
    "axes[1, 1].grid(True, alpha=0.3)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create OHLC candlestick-style visualisation\n",
    "# Note: We'll use matplotlib's basic plotting since plotly isn't imported yet\n",
    "\n",
    "# Sample recent data for better visibility\n",
    "recent_data = stock_data.tail(60)  # Last 60 trading days\n",
    "\n",
    "fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(15, 10), height_ratios=[3, 1])\n",
    "\n",
    "# Price chart with OHLC information\n",
    "for idx, (date, row) in enumerate(recent_data.iterrows()):\n",
    "    # Determine colour based on close vs open\n",
    "    color = 'green' if row['Close'] >= row['Open'] else 'red'\n",
    "    \n",
    "    # Draw high-low line\n",
    "    ax1.plot([idx, idx], [row['Low'], row['High']], color='black', linewidth=1)\n",
    "    \n",
    "    # Draw open-close rectangle\n",
    "    height = abs(row['Close'] - row['Open'])\n",
    "    bottom = min(row['Close'], row['Open'])\n",
    "    ax1.bar(idx, height, bottom=bottom, color=color, alpha=0.7, width=0.8)\n",
    "\n",
    "ax1.set_title(f'{symbol} - Last 60 Trading Days (OHLC)')\n",
    "ax1.set_ylabel('Price (£)')\n",
    "ax1.grid(True, alpha=0.3)\n",
    "\n",
    "# Volume chart\n",
    "colors = ['green' if close >= open_price else 'red' \n",
    "          for close, open_price in zip(recent_data['Close'], recent_data['Open'])]\n",
    "ax2.bar(range(len(recent_data)), recent_data['Volume'], color=colors, alpha=0.7)\n",
    "ax2.set_title('Trading Volume')\n",
    "ax2.set_ylabel('Volume')\n",
    "ax2.set_xlabel('Days (Most Recent 60)')\n",
    "\n",
    "# Set x-axis labels for dates (every 10 days)\n",
    "date_labels = [date.strftime('%Y-%m-%d') for date in recent_data.index[::10]]\n",
    "ax1.set_xticks(range(0, len(recent_data), 10))\n",
    "ax1.set_xticklabels(date_labels, rotation=45)\n",
    "ax2.set_xticks(range(0, len(recent_data), 10))\n",
    "ax2.set_xticklabels(date_labels, rotation=45)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Technical Indicators Preview\n",
    "\n",
    "Calculate and visualise basic technical indicators to understand market signals."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialise technical indicators calculator\n",
    "indicators = TechnicalIndicators()\n",
    "\n",
    "# Add basic indicators to our data\n",
    "print(\"Calculating technical indicators...\")\n",
    "enhanced_data = indicators.add_all_indicators(stock_data.copy())\n",
    "\n",
    "print(f\"Enhanced dataset now has {len(enhanced_data.columns)} columns\")\n",
    "print(\"\\nNew indicator columns added:\")\n",
    "new_columns = [col for col in enhanced_data.columns if col not in stock_data.columns]\n",
    "for i, col in enumerate(new_columns, 1):\n",
    "    print(f\"{i:2d}. {col}\")\n",
    "    if i % 10 == 0:  # Print in groups of 10 for readability\n",
    "        print()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualise key technical indicators\n",
    "fig, axes = plt.subplots(2, 2, figsize=(15, 12))\n",
    "fig.suptitle(f'{symbol} Technical Indicators Analysis', fontsize=16, fontweight='bold')\n",
    "\n",
    "# Recent data for better visibility\n",
    "recent_enhanced = enhanced_data.tail(120)  # Last 120 days\n",
    "\n",
    "# 1. Price with Moving Averages\n",
    "axes[0, 0].plot(recent_enhanced.index, recent_enhanced['Close'], label='Close Price', linewidth=2)\n",
    "axes[0, 0].plot(recent_enhanced.index, recent_enhanced['SMA_20'], label='SMA 20', linestyle='--')\n",
    "axes[0, 0].plot(recent_enhanced.index, recent_enhanced['SMA_50'], label='SMA 50', linestyle='-.')\n",
    "axes[0, 0].plot(recent_enhanced.index, recent_enhanced['EMA_12'], label='EMA 12', alpha=0.8)\n",
    "axes[0, 0].set_title('Price with Moving Averages')\n",
    "axes[0, 0].set_ylabel('Price (£)')\n",
    "axes[0, 0].legend()\n",
    "axes[0, 0].grid(True, alpha=0.3)\n",
    "axes[0, 0].tick_params(axis='x', rotation=45)\n",
    "\n",
    "# 2. RSI\n",
    "axes[0, 1].plot(recent_enhanced.index, recent_enhanced['RSI'], color='purple', linewidth=2)\n",
    "axes[0, 1].axhline(y=70, color='red', linestyle='--', alpha=0.7, label='Overbought (70)')\n",
    "axes[0, 1].axhline(y=30, color='green', linestyle='--', alpha=0.7, label='Oversold (30)')\n",
    "axes[0, 1].axhline(y=50, color='gray', linestyle='-', alpha=0.5, label='Neutral (50)')\n",
    "axes[0, 1].set_title('Relative Strength Index (RSI)')\n",
    "axes[0, 1].set_ylabel('RSI')\n",
    "axes[0, 1].set_ylim(0, 100)\n",
    "axes[0, 1].legend()\n",
    "axes[0, 1].grid(True, alpha=0.3)\n",
    "axes[0, 1].tick_params(axis='x', rotation=45)\n",
    "\n",
    "# 3. MACD\n",
    "axes[1, 0].plot(recent_enhanced.index, recent_enhanced['MACD'], label='MACD', linewidth=2)\n",
    "axes[1, 0].plot(recent_enhanced.index, recent_enhanced['MACD_Signal'], label='Signal', linewidth=2)\n",
    "axes[1, 0].bar(recent_enhanced.index, recent_enhanced['MACD_Histogram'], \n",
    "               label='Histogram', alpha=0.6, width=1)\n",
    "axes[1, 0].axhline(y=0, color='black', linestyle='-', alpha=0.3)\n",
    "axes[1, 0].set_title('MACD')\n",
    "axes[1, 0].legend()\n",
    "axes[1, 0].grid(True, alpha=0.3)\n",
    "axes[1, 0].tick_params(axis='x', rotation=45)\n",
    "\n",
    "# 4. Bollinger Bands\n",
    "axes[1, 1].plot(recent_enhanced.index, recent_enhanced['Close'], label='Close Price', linewidth=2, color='black')\n",
    "axes[1, 1].plot(recent_enhanced.index, recent_enhanced['BB_Upper'], label='Upper Band', alpha=0.7, color='red')\n",
    "axes[1, 1].plot(recent_enhanced.index, recent_enhanced['BB_Middle'], label='Middle Band', alpha=0.7, color='blue')\n",
    "axes[1, 1].plot(recent_enhanced.index, recent_enhanced['BB_Lower'], label='Lower Band', alpha=0.7, color='green')\n",
    "axes[1, 1].fill_between(recent_enhanced.index, recent_enhanced['BB_Upper'], recent_enhanced['BB_Lower'], \n",
    "                        alpha=0.1, color='gray')\n",
    "axes[1, 1].set_title('Bollinger Bands')\n",
    "axes[1, 1].set_ylabel('Price (£)')\n",
    "axes[1, 1].legend()\n",
    "axes[1, 1].grid(True, alpha=0.3)\n",
    "axes[1, 1].tick_params(axis='x', rotation=45)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Correlation Analysis\n",
    "\n",
    "Examine relationships between different indicators and price movements."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Select key indicators for correlation analysis\n",
    "correlation_columns = [\n",
    "    'Close', 'Volume', 'Daily_Return', 'SMA_20', 'SMA_50', \n",
    "    'EMA_12', 'RSI', 'MACD', 'BB_Width', 'ATR', 'Volatility'\n",
    "]\n",
    "\n",
    "# Calculate correlation matrix\n",
    "corr_data = enhanced_data[correlation_columns].copy()\n",
    "correlation_matrix = corr_data.corr()\n",
    "\n",
    "# Create correlation heatmap\n",
    "plt.figure(figsize=(12, 10))\n",
    "mask = np.triu(np.ones_like(correlation_matrix, dtype=bool))  # Mask upper triangle\n",
    "\n",
    "sns.heatmap(correlation_matrix, \n",
    "            mask=mask,\n",
    "            annot=True, \n",
    "            cmap='coolwarm', \n",
    "            center=0,\n",
    "            square=True,\n",
    "            fmt='.2f',\n",
    "            cbar_kws={'label': 'Correlation Coefficient'})\n",
    "\n",
    "plt.title(f'{symbol} - Correlation Matrix of Key Indicators', fontsize=14, fontweight='bold')\n",
    "plt.xticks(rotation=45, ha='right')\n",
    "plt.yticks(rotation=0)\n",
    "plt.tight_layout()\n",
    "plt.show()\n",
    "\n",
    "# Print strongest correlations\n",
    "print(\"Strongest positive correlations (>0.7):\")\n",
    "strong_positive = correlation_matrix.where(np.triu(np.ones(correlation_matrix.shape), k=1).astype(bool))\n",
    "strong_positive = strong_positive.stack().sort_values(ascending=False)\n",
    "print(strong_positive[strong_positive > 0.7])\n",
    "\n",
    "print(\"\\nStrongest negative correlations (<-0.7):\")\n",
    "strong_negative = strong_positive[strong_positive < -0.7]\n",
    "print(strong_negative if len(strong_negative) > 0 else \"None found\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Market Regime Analysis\n",
    "\n",
    "Identify different market conditions (trending vs. ranging, bull vs. bear markets)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define market regimes based on moving averages and volatility\n",
    "regime_data = enhanced_data.copy()\n",
    "\n",
    "# Trend direction: Price vs SMA_50\n",
    "regime_data['Trend_Direction'] = np.where(\n",
    "    regime_data['Close'] > regime_data['SMA_50'], 'Uptrend', 'Downtrend'\n",
    ")\n",
    "\n",
    "# Volatility regime: High/Low based on median ATR\n",
    "median_atr = regime_data['ATR'].median()\n",
    "regime_data['Volatility_Regime'] = np.where(\n",
    "    regime_data['ATR'] > median_atr, 'High_Volatility', 'Low_Volatility'\n",
    ")\n",
    "\n",
    "# Market strength: RSI-based\n",
    "regime_data['Market_Strength'] = pd.cut(\n",
    "    regime_data['RSI'], \n",
    "    bins=[0, 30, 70, 100], \n",
    "    labels=['Oversold', 'Neutral', 'Overbought']\n",
    ")\n",
    "\n",
    "# Analyse regime distributions\n",
    "print(\"Market Regime Analysis:\")\n",
    "print(\"\\n1. Trend Direction Distribution:\")\n",
    "trend_dist = regime_data['Trend_Direction'].value_counts(normalize=True) * 100\n",
    "print(trend_dist.round(1))\n",
    "\n",
    "print(\"\\n2. Volatility Regime Distribution:\")\n",
    "vol_dist = regime_data['Volatility_Regime'].value_counts(normalize=True) * 100\n",
    "print(vol_dist.round(1))\n",
    "\n",
    "print(\"\\n3. Market Strength Distribution:\")\n",
    "strength_dist = regime_data['Market_Strength'].value_counts(normalize=True) * 100\n",
    "print(strength_dist.round(1))\n",
    "\n",
    "# Performance by regime\n",
    "print(\"\\n4. Average Daily Returns by Regime:\")\n",
    "regime_performance = regime_data.groupby('Trend_Direction')['Daily_Return'].agg(['mean', 'std'])\n",
    "regime_performance['mean'] *= 100  # Convert to percentage\n",
    "regime_performance['std'] *= 100\n",
    "print(regime_performance.round(3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualise market regimes over time\n",
    "fig, axes = plt.subplots(3, 1, figsize=(15, 12))\n",
    "fig.suptitle(f'{symbol} - Market Regime Analysis Over Time', fontsize=16, fontweight='bold')\n",
    "\n",
    "# Recent data for clarity\n",
    "recent_regime = regime_data.tail(180)  # Last 180 days\n",
    "\n",
    "# 1. Price with trend regime colouring\n",
    "uptrend_mask = recent_regime['Trend_Direction'] == 'Uptrend'\n",
    "axes[0].plot(recent_regime.index, recent_regime['Close'], color='gray', alpha=0.3, linewidth=1)\n",
    "axes[0].scatter(recent_regime[uptrend_mask].index, recent_regime[uptrend_mask]['Close'], \n",
    "               color='green', alpha=0.6, s=10, label='Uptrend')\n",
    "axes[0].scatter(recent_regime[~uptrend_mask].index, recent_regime[~uptrend_mask]['Close'], \n",
    "               color='red', alpha=0.6, s=10, label='Downtrend')\n",
    "axes[0].plot(recent_regime.index, recent_regime['SMA_50'], color='blue', linestyle='--', \n",
    "            label='SMA 50', alpha=0.8)\n",
    "axes[0].set_title('Price Trend Regimes')\n",
    "axes[0].set_ylabel('Price (£)')\n",
    "axes[0].legend()\n",
    "axes[0].grid(True, alpha=0.3)\n",
    "\n",
    "# 2. Volatility regime\n",
    "high_vol_mask = recent_regime['Volatility_Regime'] == 'High_Volatility'\n",
    "axes[1].plot(recent_regime.index, recent_regime['ATR'], color='gray', linewidth=1)\n",
    "axes[1].fill_between(recent_regime[high_vol_mask].index, \n",
    "                    0, recent_regime[high_vol_mask]['ATR'],\n",
    "                    color='red', alpha=0.3, label='High Volatility')\n",
    "axes[1].fill_between(recent_regime[~high_vol_mask].index, \n",
    "                    0, recent_regime[~high_vol_mask]['ATR'],\n",
    "                    color='green', alpha=0.3, label='Low Volatility')\n",
    "axes[1].axhline(y=median_atr, color='blue', linestyle='--', alpha=0.7, label=f'Median ATR: {median_atr:.2f}')\n",
    "axes[1].set_title('Volatility Regimes (Average True Range)')\n",
    "axes[1].set_ylabel('ATR')\n",
    "axes[1].legend()\n",
    "axes[1].grid(True, alpha=0.3)\n",
    "\n",
    "# 3. Market strength (RSI) with regime colouring\n",
    "axes[2].plot(recent_regime.index, recent_regime['RSI'], color='purple', linewidth=2)\n",
    "axes[2].fill_between(recent_regime.index, 0, 30, color='green', alpha=0.2, label='Oversold Zone')\n",
    "axes[2].fill_between(recent_regime.index, 70, 100, color='red', alpha=0.2, label='Overbought Zone')\n",
    "axes[2].axhline(y=70, color='red', linestyle='--', alpha=0.7)\n",
    "axes[2].axhline(y=30, color='green', linestyle='--', alpha=0.7)\n",
    "axes[2].axhline(y=50, color='gray', linestyle='-', alpha=0.5)\n",
    "axes[2].set_title('Market Strength (RSI)')\n",
    "axes[2].set_ylabel('RSI')\n",
    "axes[2].set_xlabel('Date')\n",
    "axes[2].set_ylim(0, 100)\n",
    "axes[2].legend()\n",
    "axes[2].grid(True, alpha=0.3)\n",
    "\n",
    "# Rotate x-axis labels for all subplots\n",
    "for ax in axes:\n",
    "    ax.tick_params(axis='x', rotation=45)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Data Export and Summary\n",
    "\n",
    "Save the processed data and provide a summary of findings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Save enhanced dataset for future analysis\n",
    "output_path = Path('../data/processed')\n",
    "output_path.mkdir(parents=True, exist_ok=True)\n",
    "\n",
    "# Save as both CSV and pickle for different use cases\n",
    "enhanced_data.to_csv(output_path / f'{symbol}_enhanced_data.csv')\n",
    "enhanced_data.to_pickle(output_path / f'{symbol}_enhanced_data.pkl')\n",
    "\n",
    "print(f\"Enhanced dataset saved to:\")\n",
    "print(f\"- CSV: {output_path / f'{symbol}_enhanced_data.csv'}\")\n",
    "print(f\"- Pickle: {output_path / f'{symbol}_enhanced_data.pkl'}\")\n",
    "\n",
    "# Generate summary statistics\n",
    "summary_stats = {\n",
    "    'Symbol': symbol,\n",
    "    'Analysis_Date': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),\n",
    "    'Data_Points': len(enhanced_data),\n",
    "    'Date_Range': f\"{enhanced_data.index.min().date()} to {enhanced_data.index.max().date()}\",\n",
    "    'Price_Range': f\"£{enhanced_data['Low'].min():.2f} - £{enhanced_data['High'].max():.2f}\",\n",
    "    'Current_Price': f\"£{enhanced_data['Close'].iloc[-1]:.2f}\",\n",
    "    'Total_Return': f\"{((enhanced_data['Close'].iloc[-1] / enhanced_data['Close'].iloc[0]) - 1) * 100:.2f}%\",\n",
    "    'Average_Daily_Return': f\"{enhanced_data['Daily_Return'].mean() * 100:.3f}%\",\n",
    "    'Volatility': f\"{enhanced_data['Daily_Return'].std() * 100:.3f}%\",\n",
    "    'Sharpe_Ratio': f\"{(enhanced_data['Daily_Return'].mean() / enhanced_data['Daily_Return'].std()) * np.sqrt(252):.3f}\",\n",
    "    'Max_Drawdown': f\"{((enhanced_data['Close'] / enhanced_data['Close'].expanding().max()) - 1).min() * 100:.2f}%\",\n",
    "    'Current_RSI': f\"{enhanced_data['RSI'].iloc[-1]:.1f}\",\n",
    "    'Trend_Direction': regime_data['Trend_Direction'].iloc[-1],\n",
    "    'Volatility_Regime': regime_data['Volatility_Regime'].iloc[-1],\n",
    "}\n",
    "\n",
    "# Display summary\n",
    "print(\"\\n\" + \"=\"*50)\n",
    "print(\"ANALYSIS SUMMARY\")\n",
    "print(\"=\"*50)\n",
    "for key, value in summary_stats.items():\n",
    {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Stock Market Data Exploration\n",
    "\n",
    "This notebook provides an initial exploration of stock market data, including:\n",
    "- Data loading and validation\n",
    "- Basic statistical analysis\n",
    "- Visualisation of price trends\n",
    "- Data quality assessment\n",
    "\n",
    "**Note**: This analysis is for educational purposes only and should not be considered financial advice."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import required libraries\n",
    "import sys\n",
    "import warnings\n",
    "from pathlib import Path\n",
    "\n",
    "# Add src directory to path for imports\n",
    "sys.path.append(str(Path.cwd().parent / 'src'))\n",
    "\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "from datetime import datetime, timedelta\n",
    "\n",
    "# Configure visualisation settings\n",
    "plt.style.use('seaborn-v0_8')\n",
    "sns.set_palette(\"husl\")\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "# Set display options\n",
    "pd.set_option('display.max_columns', None)\n",
    "pd.set_option('display.width', 1000)\n",
    "\n",
    "print(\"Libraries imported successfully!\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import project modules\n",
    "from src.data.data_loader import StockDataLoader\n",
    "from src.features.technical_indicators import TechnicalIndicators\n",
    "from src.utils.config import DEFAULT_SYMBOLS, DATA_CONFIG\n",
    "\n",
    "print(\"Project modules imported successfully!\")\n",
    "print(f\"Default symbols for analysis: {DEFAULT_SYMBOLS}\")\n",
    "print(f\"Data configuration: {DATA_CONFIG}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Data Loading\n",
    "\n",
    "Load historical stock data for analysis. We'll start with a single symbol and then expand to multiple symbols."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialise data loader\n",
    "loader = StockDataLoader()\n",
    "\n",
    "# Load data for Apple Inc. (AAPL) as our primary example\n",
    "symbol = \"AAPL\"\n",
    "print(f\"Loading data for {symbol}...\")\n",
    "\n",
    "try:\n",
    "    stock_data = loader.get_stock_data(symbol, period=\"2y\")\n",
    "    print(f\"Successfully loaded {len(stock_data)} days of data\")\n",
    "    print(f\"Date range: {stock_data.index.min().date()} to {stock_data.index.max().date()}\")\n",
    "    \n",
    "    # Display basic information\n",
    "    print(\"\\nDataFrame Info:\")\n",
    "    print(stock_data.info())\n",
    "    \n",
    "except Exception as e:\n",
    "    print(f\"Error loading data: {e}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Display first few rows of data\n",
    "print(\"First 5 rows of stock data:\")\n",
    "display(stock_data.head())\n",
    "\n",
    "print(\"\\nLast 5 rows of stock data:\")\n",
    "display(stock_data.tail())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Basic Statistical Analysis\n",
    "\n",
    "Examine the statistical properties of the stock data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Basic statistical summary\n",
    "print(\"Statistical Summary:\")\n",
    "display(stock_data[['Open', 'High', 'Low', 'Close', 'Volume']].describe())\n",
    "\n",
    "# Calculate additional statistics\n",