<a href="https://colab.research.google.com/github/Milsy18/m18-model2/blob/main/notebooks_01_data_ingestion_and_alignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%%bash
# ───────────────────────────────────────────────────────────────
# 1) Clone (or update) the repo into /content/m18-model2
# ───────────────────────────────────────────────────────────────
if [ ! -d "/content/m18-model2/.git" ]; then
  echo "🔄 Cloning m18-model2…"
  git clone https://github.com/Milsy18/m18-model2.git /content/m18-model2
else
  echo "🔄 Repo already exists; pulling latest changes…"
  cd /content/m18-model2
  git pull
fi

# ───────────────────────────────────────────────────────────────
# 2) Enter the project folder
# ───────────────────────────────────────────────────────────────
cd /content/m18-model2

# ───────────────────────────────────────────────────────────────
# 3) Quick sanity‐check: list everything recursively
# ───────────────────────────────────────────────────────────────
echo
echo "Project root (/content/m18-model2) contents:"
ls -R .

# ───────────────────────────────────────────────────────────────
# End of setup cell
# ───────────────────────────────────────────────────────────────


🔄 Repo already exists; pulling latest changes…
Already up to date.

Project root (/content/m18-model2) contents:
.:
data
modules
notebooks

./data:
processed
raw

./data/processed:

./data/raw:

./modules:

./notebooks:


In [11]:
%%bash
cd /content/m18-model2

# 1) Stage the ingestion notebook
git add notebooks/01_data_ingestion_and_alignment.ipynb

# 2) Commit with a clear message
git commit -m "Phase 1: data ingestion & alignment notebook"

# 3) Push to GitHub
git push origin main

echo "✅ Notebook saved to GitHub"


On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	data/raw/btc_d.csv
	data/raw/total.csv
	data/raw/total3.csv
	data/raw/usdt_d.csv

nothing added to commit but untracked files present (use "git add" to track)
✅ Notebook saved to GitHub


fatal: pathspec 'notebooks/01_data_ingestion_and_alignment.ipynb' did not match any files
fatal: could not read Username for 'https://github.com': No such device or address


In [12]:
%%bash
cd /content/m18-model2/data/raw

mv "CRYPTOCAP_BTC.D, 1D (1).csv"   btc_d.csv
mv "CRYPTOCAP_TOTAL, 1D (1).csv"   total.csv
mv "CRYPTOCAP_TOTAL3, 1D (1).csv"  total3.csv
mv "CRYPTOCAP_USDT.D, 1D.csv"      usdt_d.csv

echo "Renamed files in data/raw:"
ls -1


Renamed files in data/raw:
btc_d.csv
total3.csv
total.csv
usdt_d.csv


mv: cannot stat 'CRYPTOCAP_BTC.D, 1D (1).csv': No such file or directory
mv: cannot stat 'CRYPTOCAP_TOTAL, 1D (1).csv': No such file or directory
mv: cannot stat 'CRYPTOCAP_TOTAL3, 1D (1).csv': No such file or directory
mv: cannot stat 'CRYPTOCAP_USDT.D, 1D.csv': No such file or directory


In [13]:
# 1) Upload your raw CSVs from your local machine
from google.colab import files
import os, shutil

# this will pop up a file picker—select all 4 of your CSVs:
uploaded = files.upload()

# 2) Make sure data/raw exists
os.makedirs("data/raw", exist_ok=True)

# 3) Move each uploaded file into data/raw/
for fn in uploaded:
    shutil.move(fn, os.path.join("data/raw", fn))

# 4) Confirm
print("Files in data/raw/:", os.listdir("data/raw"))


Saving total.csv to total.csv
Saving total3.csv to total3.csv
Saving usdt_d.csv to usdt_d.csv
Saving btc_d.csv to btc_d.csv
Files in data/raw/: ['btc_d.csv', 'usdt_d.csv', 'total.csv', 'total3.csv']


In [14]:
import pandas as pd
from pathlib import Path

# 1) Point at your raw directory & define the four filenames
RAW_DIR = Path("data/raw")
csv_map = {
    "BTC_D":    "btc_d.csv",
    "TOTAL":    "total.csv",
    "TOTAL3":   "total3.csv",
    "USDT_D":   "usdt_d.csv",
}

# 2) Load each into a dict of DataFrames
market_dfs = {}
for name, fname in csv_map.items():
    path = RAW_DIR / fname
    print(f"Loading {name} from {path}…")
    df = pd.read_csv(path, index_col="time")               # use the epoch seconds column
    df.index = pd.to_datetime(df.index, unit="s")         # convert seconds → datetime
    df.index.name = "Date"                                # name the index
    # keep only the 5 columns we need
    market_dfs[name] = df[["open", "high", "low", "close", "Volume"]]


# 3) Align on the intersection of dates, dropping any rows with missing data
aligned = (
    pd.concat(market_dfs.values(), axis=1, keys=market_dfs.keys())
      .dropna()
)
print(f"\nAligned DataFrame shape: {aligned.shape}")

# 4) Write the aligned regimes out
PROC_DIR = Path("data/processed")
PROC_DIR.mkdir(parents=True, exist_ok=True)
out_path = PROC_DIR / "market_regimes_aligned.csv"
aligned.to_csv(out_path, index_label="Date")
print(f"Wrote aligned regimes to → {out_path}")


Loading BTC_D from data/raw/btc_d.csv…
Loading TOTAL from data/raw/total.csv…
Loading TOTAL3 from data/raw/total3.csv…
Loading USDT_D from data/raw/usdt_d.csv…

Aligned DataFrame shape: (1985, 20)
Wrote aligned regimes to → data/processed/market_regimes_aligned.csv


In [18]:
%%bash
cd /content/m18-model2
echo "Looking for any .ipynb in the tree…"
find . -type f -name '*.ipynb'


Looking for any .ipynb in the tree…


In [15]:
%%bash
cd /content/m18-model2

# Move the ingestion notebook into the notebooks/ directory
git mv 01_data_ingestion_and_alignment.ipynb notebooks/

# Stage, commit & push
git add notebooks/01_data_ingestion_and_alignment.ipynb
git commit -m "Move ingestion notebook into notebooks/ folder"
git push origin main

echo "✅ Notebook is now in notebooks/ on GitHub"


On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	data/raw/btc_d.csv
	data/raw/total.csv
	data/raw/total3.csv
	data/raw/usdt_d.csv

nothing added to commit but untracked files present (use "git add" to track)
✅ Notebook is now in notebooks/ on GitHub


fatal: bad source, source=01_data_ingestion_and_alignment.ipynb, destination=notebooks/01_data_ingestion_and_alignment.ipynb
fatal: pathspec 'notebooks/01_data_ingestion_and_alignment.ipynb' did not match any files
fatal: could not read Username for 'https://github.com': No such device or address


In [20]:
%%bash
cd /content/m18-model2

# Move the file into notebooks/ and strip the extra “notebooks_” prefix
git mv notebooks_01_data_ingestion_and_alignment.ipynb notebooks/01_data_ingestion_and_alignment.ipynb


In [21]:
%%bash
cd /content/m18-model2
git commit -m "Fix: place ingestion notebook in notebooks/ folder"


[main c8128bc] Fix: place ingestion notebook in notebooks/ folder
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename notebooks_01_data_ingestion_and_alignment.ipynb => notebooks/01_data_ingestion_and_alignment.ipynb (100%)
