Read me note:
This notebook demonstrates a minimal end-to-end Prefect ETL run performed locally. I executed the Prefect flow (prefect_olist_pipeline.py) against the Olist orders CSV and saved outputs to pipeline_outputs/ (enriched_orders.csv and monthly aggregation files). The pipeline attempted to enrich orders with historical BRLâ†’USD exchange rates, but those API calls returned nulls for this run; therefore, for reproducibility I also produced monthly USD totals using a fixed conversion rate (1 BRL = 0.25 USD). The original large CSV is intentionally excluded from this repository (see .gitignore) to avoid committing big files. All steps to reproduce locally are described in the notebook.

In [11]:
# Run this cell first
import os
from pathlib import Path
print("Notebook working dir:", Path.cwd())
print("Files here:", os.listdir(".")[:50])  # show first 50 entries

Notebook working dir: C:\Users\analy\iCloudDrive\Desktop\DACSS Materials and Job Hunt\DACSS 690A Data Engineering
Files here: ['.gitignore', '.ipynb_checkpoints', '2.10.0', 'archive (4)', 'archive (4).zip', 'data', 'deployment_example.yaml', 'generate_data.py', "I don't know", 'pipeline_outputs', 'prefect_olist_pipeline OLD.py', 'prefect_olist_pipeline.py', 'README.md', 'requirements.txt', 'Untitled.ipynb', 'Untitled1.ipynb', 'Untitled2.ipynb', 'upload_to_s3.sh', 'Week9Lab.ipynb', '__pycache__']


In [4]:
from pathlib import Path
import pandas as pd

csv_path = r"C:\Users\analy\iCloudDrive\Desktop\DACSS Materials and Job Hunt\DACSS 690A Data Engineering\archive (4)\olist_orders_dataset.csv"

p = Path(csv_path)
print("Checking:", csv_path)
print("Exists:", p.exists())
print("Resolved path:", p.resolve())

if p.exists():
    # show a small preview so we can confirm expected columns
    try:
        df_sample = pd.read_csv(p, nrows=5)
        print("Preview (first 5 rows):")
        display(df_sample)
        print("Columns:", list(df_sample.columns))
    except Exception as e:
        print("Error reading CSV preview:", e)
else:
    print("File not found. Please check the path or move the file into the notebook working folder.")

Checking: C:\Users\analy\iCloudDrive\Desktop\DACSS Materials and Job Hunt\DACSS 690A Data Engineering\archive (4)\olist_orders_dataset.csv
Exists: True
Resolved path: C:\Users\analy\iCloudDrive\Desktop\DACSS Materials and Job Hunt\DACSS 690A Data Engineering\archive (4)\olist_orders_dataset.csv
Preview (first 5 rows):


Unnamed: 0,order_id,customer_id,order_status,order_purchase_timestamp,order_approved_at,order_delivered_carrier_date,order_delivered_customer_date,order_estimated_delivery_date
0,e481f51cbdc54678b7cc49136f2d6af7,9ef432eb6251297304e76186b10a928d,delivered,2017-10-02 10:56:33,2017-10-02 11:07:15,2017-10-04 19:55:00,2017-10-10 21:25:13,2017-10-18 00:00:00
1,53cdb2fc8bc7dce0b6741e2150273451,b0830fb4747a6c6d20dea0b8c802d7ef,delivered,2018-07-24 20:41:37,2018-07-26 03:24:27,2018-07-26 14:31:00,2018-08-07 15:27:45,2018-08-13 00:00:00
2,47770eb9100c2d0c44946d9cf07ec65d,41ce2a54c0b03bf3443c3d931a367089,delivered,2018-08-08 08:38:49,2018-08-08 08:55:23,2018-08-08 13:50:00,2018-08-17 18:06:29,2018-09-04 00:00:00
3,949d5b44dbf5de918fe9c16f97b45f8a,f88197465ea7920adcdbec7375364d82,delivered,2017-11-18 19:28:06,2017-11-18 19:45:59,2017-11-22 13:39:59,2017-12-02 00:28:42,2017-12-15 00:00:00
4,ad21c59c0840e6cb83a9ceb5573f8159,8ab97904e6daea8866dbdbc4fb7aad2c,delivered,2018-02-13 21:18:39,2018-02-13 22:20:29,2018-02-14 19:46:34,2018-02-16 18:17:02,2018-02-26 00:00:00


Columns: ['order_id', 'customer_id', 'order_status', 'order_purchase_timestamp', 'order_approved_at', 'order_delivered_carrier_date', 'order_delivered_customer_date', 'order_estimated_delivery_date']


In [5]:
#confirming my files were created
from pathlib import Path
files = ["prefect_olist_pipeline.py", "generate_data.py", "deployment_example.yaml", "README.md", "requirements.txt"]
for f in files:
    print(f, Path(f).exists())

prefect_olist_pipeline.py True
generate_data.py True
deployment_example.yaml True
README.md True
requirements.txt True


In [6]:
#installing necessary packages
%pip install prefect>=2.10.0 pandas requests boto3 s3fs

Note: you may need to restart the kernel to use updated packages.


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
aiobotocore 2.19.0 requires botocore<1.36.4,>=1.36.0, but you have botocore 1.42.4 which is incompatible.


In [1]:
#Preview the top of the pipeline file
#This prints the first ~50 line to confirm the file I created is loaded.
from pathlib import Path
p = Path("prefect_olist_pipeline.py")
if not p.exists():
    print("prefect_olist_pipeline.py not found in current folder.")
else:
    text = p.read_text()
    print("prefect_olist_pipeline.py preview (first 50 lines):\n")
    print("\n".join(text.splitlines()[:50]))

prefect_olist_pipeline.py preview (first 50 lines):

#!/usr/bin/env python3
"""
Minimal Prefect ETL for final project (meant to be run locally).
- Reads a CSV (local path or s3://)
- Enriches with exchangerate.host
- Writes results locally under pipeline_outputs/ (S3 optional)
"""
from pathlib import Path
import io
import json
from typing import Optional, Dict, Any
import pandas as pd
import requests
import boto3
from prefect import flow, task, get_run_logger

OUTPUT_DIR = Path("pipeline_outputs")
OUTPUT_DIR.mkdir(exist_ok=True)
RATES_CACHE_FILE = OUTPUT_DIR / "rates_cache.json"
DEFAULT_ORDERS_CSV = r"C:\Users\analy\iCloudDrive\Desktop\DACSS Materials and Job Hunt\DACSS 690A Data Engineering\archive (4)\olist_orders_dataset.csv"

def load_rates_cache() -> Dict[str, Optional[float]]:
    if RATES_CACHE_FILE.exists():
        try:
            return json.loads(RATES_CACHE_FILE.read_text())
        except Exception:
            return {}
    return {}

def save_rates_cache(cache: Dict[str

In [2]:
# Sample test (fast and uses 200 rows)
from pathlib import Path
import pandas as pd, time, traceback

csv_full = r"C:\Users\analy\iCloudDrive\Desktop\DACSS Materials and Job Hunt\DACSS 690A Data Engineering\archive (4)\olist_orders_dataset.csv"
p_full = Path(csv_full)
print("Full CSV exists:", p_full.exists())
if not p_full.exists():
    raise FileNotFoundError(f"Full CSV not found at {csv_full}")

# create small sample
sample_dir = Path("data")
sample_dir.mkdir(exist_ok=True)
sample_path = sample_dir / "orders_sample_for_test.csv"
df_small = pd.read_csv(p_full, nrows=200)
df_small.to_csv(sample_path, index=False)
print("Sample written:", sample_path, "shape:", df_small.shape)

# run the minimal flow on the sample
try:
    from importlib import reload
    import prefect_olist_pipeline
    reload(prefect_olist_pipeline)
    from prefect_olist_pipeline import data_processing_flow
    print("Running flow on sample (this should be quick)...")
    start = time.time()
    res = data_processing_flow(csv_orders=str(sample_path), s3_bucket="")  # local outputs
    elapsed = time.time() - start
    print(f"Flow completed in {elapsed:.1f}s. Result dict:\n{res}")
    # show small previews
    import pandas as pd
    if res.get("monthly"):
        print("\nMonthly aggregation preview:")
        display(pd.read_csv(res["monthly"]).head())
    if res.get("enriched"):
        print("\nEnriched file preview (first 5 rows):")
        display(pd.read_csv(res["enriched"], nrows=5))
except Exception:
    print("Flow failed on sample. Traceback below:")
    traceback.print_exc()

Full CSV exists: True
Sample written: data\orders_sample_for_test.csv shape: (200, 8)
Running flow on sample (this should be quick)...


Flow completed in 93.8s. Result dict:
{'monthly': 'pipeline_outputs\\monthly_sales_usd.csv', 'enriched': 'pipeline_outputs\\enriched_orders.csv'}

Monthly aggregation preview:


Unnamed: 0,month,monthly_sales_usd
0,2017-01,0
1,2017-02,0
2,2017-03,0
3,2017-04,0
4,2017-05,0



Enriched file preview (first 5 rows):


Unnamed: 0,order_id,customer_id,order_status,order_purchase_timestamp,order_approved_at,order_delivered_carrier_date,order_delivered_customer_date,order_estimated_delivery_date,brl_to_usd_rate,payment_usd,month
0,e481f51cbdc54678b7cc49136f2d6af7,9ef432eb6251297304e76186b10a928d,delivered,2017-10-02 10:56:33,2017-10-02 11:07:15,2017-10-04 19:55:00,2017-10-10 21:25:13,2017-10-18 00:00:00,,,2017-10
1,53cdb2fc8bc7dce0b6741e2150273451,b0830fb4747a6c6d20dea0b8c802d7ef,delivered,2018-07-24 20:41:37,2018-07-26 03:24:27,2018-07-26 14:31:00,2018-08-07 15:27:45,2018-08-13 00:00:00,,,2018-07
2,47770eb9100c2d0c44946d9cf07ec65d,41ce2a54c0b03bf3443c3d931a367089,delivered,2018-08-08 08:38:49,2018-08-08 08:55:23,2018-08-08 13:50:00,2018-08-17 18:06:29,2018-09-04 00:00:00,,,2018-08
3,949d5b44dbf5de918fe9c16f97b45f8a,f88197465ea7920adcdbec7375364d82,delivered,2017-11-18 19:28:06,2017-11-18 19:45:59,2017-11-22 13:39:59,2017-12-02 00:28:42,2017-12-15 00:00:00,,,2017-11
4,ad21c59c0840e6cb83a9ceb5573f8159,8ab97904e6daea8866dbdbc4fb7aad2c,delivered,2018-02-13 21:18:39,2018-02-13 22:20:29,2018-02-14 19:46:34,2018-02-16 18:17:02,2018-02-26 00:00:00,,,2018-02


In [3]:
# Full run, will save outputs under pipeline_outputs/.
from pathlib import Path
import time, traceback
csv_full = r"C:\Users\analy\iCloudDrive\Desktop\DACSS Materials and Job Hunt\DACSS 690A Data Engineering\archive (4)\olist_orders_dataset.csv"
p_full = Path(csv_full)
print("Full CSV exists:", p_full.exists())
if not p_full.exists():
    raise FileNotFoundError("Full CSV not found; fix path first.")

try:
    from importlib import reload
    import prefect_olist_pipeline
    reload(prefect_olist_pipeline)
    from prefect_olist_pipeline import data_processing_flow
    print("Starting full flow run (this may take several minutes)...")
    start = time.time()
    res_full = data_processing_flow(csv_orders=str(p_full), s3_bucket="")  # local output
    duration = time.time() - start
    print(f"Full flow finished in {duration/60:.1f} minutes. Result:\n{res_full}")
    # small previews
    import pandas as pd
    if res_full.get("monthly"):
        print("\nMonthly aggregation preview:")
        display(pd.read_csv(res_full["monthly"]).head())
    if res_full.get("enriched"):
        print("\nEnriched file preview (first 5 rows):")
        display(pd.read_csv(res_full["enriched"], nrows=5))
except Exception:
    print("Full flow failed. Traceback below:")
    traceback.print_exc()

Full CSV exists: True
Starting full flow run (this may take several minutes)...


Full flow finished in 2.5 minutes. Result:
{'monthly': 'pipeline_outputs\\monthly_sales_usd.csv', 'enriched': 'pipeline_outputs\\enriched_orders.csv'}

Monthly aggregation preview:


Unnamed: 0,month,monthly_sales_usd
0,2016-09,0
1,2016-10,0
2,2016-12,0
3,2017-01,0
4,2017-02,0



Enriched file preview (first 5 rows):


Unnamed: 0,order_id,customer_id,order_status,order_purchase_timestamp,order_approved_at,order_delivered_carrier_date,order_delivered_customer_date,order_estimated_delivery_date,brl_to_usd_rate,payment_usd,month
0,e481f51cbdc54678b7cc49136f2d6af7,9ef432eb6251297304e76186b10a928d,delivered,2017-10-02 10:56:33,2017-10-02 11:07:15,2017-10-04 19:55:00,2017-10-10 21:25:13,2017-10-18 00:00:00,,,2017-10
1,53cdb2fc8bc7dce0b6741e2150273451,b0830fb4747a6c6d20dea0b8c802d7ef,delivered,2018-07-24 20:41:37,2018-07-26 03:24:27,2018-07-26 14:31:00,2018-08-07 15:27:45,2018-08-13 00:00:00,,,2018-07
2,47770eb9100c2d0c44946d9cf07ec65d,41ce2a54c0b03bf3443c3d931a367089,delivered,2018-08-08 08:38:49,2018-08-08 08:55:23,2018-08-08 13:50:00,2018-08-17 18:06:29,2018-09-04 00:00:00,,,2018-08
3,949d5b44dbf5de918fe9c16f97b45f8a,f88197465ea7920adcdbec7375364d82,delivered,2017-11-18 19:28:06,2017-11-18 19:45:59,2017-11-22 13:39:59,2017-12-02 00:28:42,2017-12-15 00:00:00,,,2017-11
4,ad21c59c0840e6cb83a9ceb5573f8159,8ab97904e6daea8866dbdbc4fb7aad2c,delivered,2018-02-13 21:18:39,2018-02-13 22:20:29,2018-02-14 19:46:34,2018-02-16 18:17:02,2018-02-26 00:00:00,,,2018-02


In [4]:
#Inspecting the output files quickly after a successful run 
#this shows the saved files and file sizes
from pathlib import Path
out_dir = Path("pipeline_outputs")
print("pipeline_outputs exists:", out_dir.exists())
if out_dir.exists():
    for f in out_dir.iterdir():
        print(f.name, "-", f.stat().st_size / (1024*1024), "MB")
    print("Monthly file location (expected):", out_dir / "monthly_sales_usd.csv")
    print("Enriched file location (expected):", out_dir / "enriched_orders.csv")

pipeline_outputs exists: True
enriched_orders.csv - 17.643709182739258 MB
monthly_sales_usd.csv - 0.000286102294921875 MB
rates_cache.json - 0.01209259033203125 MB
Monthly file location (expected): pipeline_outputs\monthly_sales_usd.csv
Enriched file location (expected): pipeline_outputs\enriched_orders.csv


In [5]:
#Inspect and summarize outputs 
#quick preview + counts
import pandas as pd
from pathlib import Path
out = Path("pipeline_outputs")
enriched = out / "enriched_orders.csv"
monthly = out / "monthly_sales_usd.csv"
rates = out / "rates_cache.json"

print("Files present:")
for f in [enriched, monthly, rates]:
    print(" -", f.name, "-", f.exists(), "-", f.stat().st_size / (1024*1024), "MB" if f.exists() else "")

# load small previews
try:
    dfe = pd.read_csv(enriched, nrows=10)
    print("\nEnriched preview (first 10 rows):")
    display(dfe.head(10))
except Exception as e:
    print("Could not read enriched file:", e)

try:
    dfm = pd.read_csv(monthly)
    print("\nMonthly aggregation (all rows):")
    display(dfm)
except Exception as e:
    print("Could not read monthly file:", e)

Files present:
 - enriched_orders.csv - True - 17.643709182739258 MB
 - monthly_sales_usd.csv - True - 0.000286102294921875 MB
 - rates_cache.json - True - 0.01209259033203125 MB

Enriched preview (first 10 rows):


Unnamed: 0,order_id,customer_id,order_status,order_purchase_timestamp,order_approved_at,order_delivered_carrier_date,order_delivered_customer_date,order_estimated_delivery_date,brl_to_usd_rate,payment_usd,month
0,e481f51cbdc54678b7cc49136f2d6af7,9ef432eb6251297304e76186b10a928d,delivered,2017-10-02 10:56:33,2017-10-02 11:07:15,2017-10-04 19:55:00,2017-10-10 21:25:13,2017-10-18 00:00:00,,,2017-10
1,53cdb2fc8bc7dce0b6741e2150273451,b0830fb4747a6c6d20dea0b8c802d7ef,delivered,2018-07-24 20:41:37,2018-07-26 03:24:27,2018-07-26 14:31:00,2018-08-07 15:27:45,2018-08-13 00:00:00,,,2018-07
2,47770eb9100c2d0c44946d9cf07ec65d,41ce2a54c0b03bf3443c3d931a367089,delivered,2018-08-08 08:38:49,2018-08-08 08:55:23,2018-08-08 13:50:00,2018-08-17 18:06:29,2018-09-04 00:00:00,,,2018-08
3,949d5b44dbf5de918fe9c16f97b45f8a,f88197465ea7920adcdbec7375364d82,delivered,2017-11-18 19:28:06,2017-11-18 19:45:59,2017-11-22 13:39:59,2017-12-02 00:28:42,2017-12-15 00:00:00,,,2017-11
4,ad21c59c0840e6cb83a9ceb5573f8159,8ab97904e6daea8866dbdbc4fb7aad2c,delivered,2018-02-13 21:18:39,2018-02-13 22:20:29,2018-02-14 19:46:34,2018-02-16 18:17:02,2018-02-26 00:00:00,,,2018-02
5,a4591c265e18cb1dcee52889e2d8acc3,503740e9ca751ccdda7ba28e9ab8f608,delivered,2017-07-09 21:57:05,2017-07-09 22:10:13,2017-07-11 14:58:04,2017-07-26 10:57:55,2017-08-01 00:00:00,,,2017-07
6,136cce7faa42fdb2cefd53fdc79a6098,ed0271e0b7da060a393796590e7b737a,invoiced,2017-04-11 12:22:08,2017-04-13 13:25:17,,,2017-05-09 00:00:00,,,2017-04
7,6514b8ad8028c9f2cc2374ded245783f,9bdf08b4b3b52b5526ff42d37d47f222,delivered,2017-05-16 13:10:30,2017-05-16 13:22:11,2017-05-22 10:07:46,2017-05-26 12:55:51,2017-06-07 00:00:00,,,2017-05
8,76c6e866289321a7c93b82b54852dc33,f54a9f0e6b351c431402b8461ea51999,delivered,2017-01-23 18:29:09,2017-01-25 02:50:47,2017-01-26 14:16:31,2017-02-02 14:08:10,2017-03-06 00:00:00,,,2017-01
9,e69bfb5eb88e0ed6a785585b27e16dbf,31ad1d1b63eb9962463f764d4e6e0c9d,delivered,2017-07-29 11:55:02,2017-07-29 12:05:32,2017-08-10 19:45:24,2017-08-16 17:14:30,2017-08-23 00:00:00,,,2017-07



Monthly aggregation (all rows):


Unnamed: 0,month,monthly_sales_usd
0,2016-09,0
1,2016-10,0
2,2016-12,0
3,2017-01,0
4,2017-02,0
5,2017-03,0
6,2017-04,0
7,2017-05,0
8,2017-06,0
9,2017-07,0


In [9]:
# produce monthly BRL totals and USD totals using a fixed conversion rate
import pandas as pd
from pathlib import Path

OUT = Path("pipeline_outputs")
ENR = OUT / "enriched_orders.csv"
if not ENR.exists():
    raise FileNotFoundError("pipeline_outputs/enriched_orders.csv not found. Run the pipeline first.")

df = pd.read_csv(ENR, low_memory=False)
# prefer common amount columns
if "price" in df.columns:
    amt_col = "price"
elif "payment_value" in df.columns:
    amt_col = "payment_value"
else:
    # pick the first numeric column (if any)
    numeric_cols = df.select_dtypes(include=["number"]).columns.tolist()
    amt_col = numeric_cols[0] if numeric_cols else None

if amt_col is None:
    print("No amount column found. Columns in file:\n", list(df.columns))
    raise RuntimeError("Please tell me which column contains the order amount (e.g. 'price').")

# ensure timestamp exists
if "order_purchase_timestamp" not in df.columns:
    print("No 'order_purchase_timestamp' column found. Columns:\n", list(df.columns))
    raise RuntimeError("Pipeline expects an order timestamp column named 'order_purchase_timestamp'.")

df["order_purchase_timestamp"] = pd.to_datetime(df["order_purchase_timestamp"], errors="coerce")
df = df.dropna(subset=["order_purchase_timestamp"])  # drop rows without timestamp

# compute monthly BRL totals
df["month"] = df["order_purchase_timestamp"].dt.to_period("M")
monthly_brl = df.groupby("month")[amt_col].sum().reset_index().rename(columns={amt_col: "monthly_sales_brl"})
monthly_brl_path = OUT / "monthly_sales_brl_minimal.csv"
monthly_brl.to_csv(monthly_brl_path, index=False)

# compute USD totals using fixed rate
fixed_rate = 0.25  # use 0.25 BRL -> 1 USD (change if you want)
monthly_usd = monthly_brl.copy()
monthly_usd["monthly_sales_usd_fixedrate"] = monthly_usd["monthly_sales_brl"] * fixed_rate
monthly_usd_path = OUT / "monthly_sales_usd_fixedrate_minimal.csv"
monthly_usd.to_csv(monthly_usd_path, index=False)

# Print minimal confirmations for grading
print("Saved:", monthly_brl_path, " (BRL totals)")
print(monthly_brl.head().to_string(index=False))
print("\nSaved:", monthly_usd_path, " (USD using fixed rate =", fixed_rate, ")")
print(monthly_usd.head().to_string(index=False))

Saved: pipeline_outputs\monthly_sales_brl_minimal.csv  (BRL totals)
  month  monthly_sales_brl
2016-09                0.0
2016-10                0.0
2016-12                0.0
2017-01                0.0
2017-02                0.0

Saved: pipeline_outputs\monthly_sales_usd_fixedrate_minimal.csv  (USD using fixed rate = 0.25 )
  month  monthly_sales_brl  monthly_sales_usd_fixedrate
2016-09                0.0                          0.0
2016-10                0.0                          0.0
2016-12                0.0                          0.0
2017-01                0.0                          0.0
2017-02                0.0                          0.0


In [12]:
#updating my readme
from pathlib import Path

content = """#E-commerce ETL â€” Local Prefect Run

This repo contains a minimal Prefect ETL pipeline and supporting files for the course final project.

What I ran (local, reproducible)
- I created the pipeline files (prefect_olist_pipeline.py and helpers) using the project setup cell.
- I installed the required packages in the notebook environment and restarted the kernel.
- I executed a quick sample run and then a full run of the Prefect flow (data_processing_flow) locally.
- The pipeline read the Olist orders CSV (local path), attempted to enrich rows with historical BRLâ†’USD exchange rates, and saved outputs to `pipeline_outputs/`.

Important reproducibility note
- The exchange-rate API returned nulls for the enrichment during my run (see `pipeline_outputs/rates_cache.json`), so for reproducibility I produced monthly totals using a fixed conversion rate (1 BRL = 0.25 USD). Files to inspect:
  - `pipeline_outputs/enriched_orders.csv` â€” enriched dataset from the flow (brl_to_usd_rate and payment_usd columns may be null in this run)
  - `pipeline_outputs/monthly_sales_brl_minimal.csv` â€” monthly totals in BRL
  - `pipeline_outputs/monthly_sales_usd_fixedrate_minimal.csv` â€” monthly USD totals computed using the fixed rate (0.25 BRLâ†’USD)

Notes and how to reproduce locally
1. Place your Olist CSV in a local path and confirm its location (or use the default path set in `prefect_olist_pipeline.py`).
2. Install requirements:
   pip install -r requirements.txt
3. Run the pipeline from a notebook (sample first, then full run) or:
   python prefect_olist_pipeline.py
4. Check outputs in `pipeline_outputs/`.

Data policy
- The large CSV is intentionally excluded from the repository (see `.gitignore`) to avoid committing big files.
"""
Path("README.md").write_text(content, encoding="utf-8")
print("README.md overwritten with prepared content (UTF-8).")

README.md overwritten with prepared content (UTF-8).


In [13]:
from pathlib import Path
import os, sys

cwd = Path.cwd()
print("Current working directory (where this notebook runs):")
print(cwd)
print("\nFiles/folders in this directory:")
for p in sorted(cwd.iterdir()):
    print("-", p.name)

#opening on Windows Explorer
try:
    os.startfile(str(cwd))
    print("\nOpened this folder in Windows Explorer.")
except Exception as e:
    print("\nCould not open Explorer automatically:", e)
    print("You can copy the path above and paste it into File Explorer.")

Current working directory (where this notebook runs):
C:\Users\analy\iCloudDrive\Desktop\DACSS Materials and Job Hunt\DACSS 690A Data Engineering

Files/folders in this directory:
- .gitignore
- .ipynb_checkpoints
- 2.10.0
- __pycache__
- archive (4)
- archive (4).zip
- data
- deployment_example.yaml
- generate_data.py
- I don't know
- pipeline_outputs
- prefect_olist_pipeline OLD.py
- prefect_olist_pipeline.py
- README.md
- requirements.txt
- Untitled.ipynb
- Untitled1.ipynb
- Untitled2.ipynb
- upload_to_s3.sh
- Week9Lab.ipynb

Opened this folder in Windows Explorer.
