# Notebook 01 – Data Collection
## Real Historical Stock Prices (Egyptian Stock Market)

This notebook downloads **real historical stock price data** for:
- Commercial International Bank (COMI / CIB)
- Alexandria Mineral Oils Company (AMOC)
- Elsewedy Electric (SWDY)

Period: **January 2020 – December 2025**

Data is saved permanently to **Google Drive**.


In [1]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
# Install required libraries
!pip install yfinance pandas numpy



In [3]:
import os
import yfinance as yf
import pandas as pd

# Base directory on Google Drive
BASE_DIR = '/content/drive/MyDrive/finrl-egx-multimodal'

# Create directory structure
folders = [
    'data/stocks',
    'data/news',
    'data/sentiment',
    'notebooks',
    'results'
]

for folder in folders:
    os.makedirs(os.path.join(BASE_DIR, folder), exist_ok=True)

print('Project directory structure created successfully.')

Project directory structure created successfully.


In [4]:
# Egyptian stock tickers (Yahoo Finance)
stocks = {
    'COMI': 'COMI.CA',
    'AMOC': 'AMOC.CA',
    'SWDY': 'SWDY.CA'
}

start_date = '2020-01-01'
end_date = '2025-12-31'

In [5]:
# Download and save historical stock data
for stock_name, ticker in stocks.items():
    print(f'Downloading data for {stock_name} ({ticker})...')

    df = yf.download(
        ticker,
        start=start_date,
        end=end_date,
        auto_adjust=False,
        progress=False
    )

    df.reset_index(inplace=True)
    file_path = os.path.join(BASE_DIR, 'data/stocks', f'{stock_name}.csv')
    df.to_csv(file_path, index=False)

    print(f'Saved: {file_path}')

Downloading data for COMI (COMI.CA)...
Saved: /content/drive/MyDrive/finrl-egx-multimodal/data/stocks/COMI.csv
Downloading data for AMOC (AMOC.CA)...
Saved: /content/drive/MyDrive/finrl-egx-multimodal/data/stocks/AMOC.csv
Downloading data for SWDY (SWDY.CA)...
Saved: /content/drive/MyDrive/finrl-egx-multimodal/data/stocks/SWDY.csv


In [6]:
# Quick data quality check (COMI)
sample_df = pd.read_csv(os.path.join(BASE_DIR, 'data/stocks', 'COMI.csv'))
sample_df.head()

Unnamed: 0,Date,Adj Close,Close,High,Low,Open,Volume
0,,COMI.CA,COMI.CA,COMI.CA,COMI.CA,COMI.CA,COMI.CA
1,2020-01-02,37.81723403930664,41.42820358276367,41.55283737182617,41.129085540771484,41.38832092285156,570963
2,2020-01-05,36.7978515625,40.31148910522461,40.879817962646484,40.132015228271484,41.42820358276367,1679811
3,2020-01-06,36.838809967041016,40.35635757446289,40.630550384521484,39.93260192871094,40.31148910522461,1865728
4,2020-01-08,37.77172088623047,41.37834930419922,41.727325439453125,39.94755935668945,40.35635757446289,4145684


In [7]:
# Date range verification
sample_df['Date'] = pd.to_datetime(sample_df['Date'], errors='coerce')
sample_df = sample_df.dropna(subset=['Date'])
print(sample_df['Date'].min(), sample_df['Date'].max())

2020-01-02 00:00:00 2025-12-30 00:00:00
