## 2026 EY AI & Data Challenge - Landsat Data Extraction Notebook

This notebook demonstrates Landsat data extraction and the creation of an output file to be used by the benchmark notebook. The baseline data is [Landsat Collection 2 Level 2](https://planetarycomputer.microsoft.com/dataset/landsat-c2-l2) data from the MS Planetary Computer catalog.

**Caution**... This notebook requires significant execution time as there are 9,319 data points (unique locations and times) used for data extraction from the Landsat archive. The code takes about 7 hours to run to completion on a typical laptop computer with a typical internet connection. Lower execution times are likely possible with optimization of the data extraction process and the use of cloud computing services.


### Load In Dependencies
The following code installs the required Python libraries (found in the requirements.txt file) in the Snowflake environment to allow successful execution of the remaining notebook code. After running this code for the first time, it is required to ‚Äúrestart‚Äù the kernal so the Python libraries are available in the environment. This is done by selecting the ‚ÄúConnected‚Äù menu above the notebook (next to ‚ÄúRun all‚Äù) and selecting the ‚Äúrestart kernal‚Äù link. Subsequent runs of the notebook do not require this ‚Äúrestart‚Äù process. 

In [17]:
# !pip install uv
# !uv pip install  -r ../requirements.txt

In [18]:
import time
# import snowflake
# from snowflake.snowpark.context import get_active_session
# session = get_active_session()

import warnings
warnings.filterwarnings("ignore")

# Data manipulation and analysis
import numpy as np
import pandas as pd

# Planetary Computer tools for STAC API access and authentication
import pystac_client
import planetary_computer as pc
from odc.stac import stac_load
from pystac.extensions.eo import EOExtension as eo

from datetime import date
from tqdm import tqdm
import os
import time

### Extracting Landsat Data Using API Calls

The API-based method allows us to efficiently access **Landsat** data for specific coordinates and time periods, ensuring scalability and reproducibility of the process.

Through the API, we can query individual bands or compute indices like **NDMI** on the fly. This approach reduces storage requirements and simplifies data preprocessing, making it ideal for large-scale environmental and water quality analysis.

The **compute_Landsat_values** function extracts Landsat surface reflectance values for specific sampling locations using a 100 m focal buffer around each point. For each location:

- A bounding box (bbox) is created around the latitude and longitude coordinates.
- The Microsoft Planetary Computer API is queried for Landsat-8 Level-2 surface reflectance imagery within the date range.
- The nearest low-cloud (<10% cloud cover) scene is selected, and the specified bands (**green**, **nir08**, **swir16**, **swir22**) are loaded.
- Median values of the pixels within the bounding box are computed to reduce the effect of noise or outliers.

**Why the buffer value is 0.00089831**

We want a ~100 m buffer around each point.  
At the equator, 1 degree ‚âà 110 km.

Therefore, the degree equivalent of 100 m is:

*buffer_deg ‚âà 100 m / 110,000 m per degree ‚âà 0.00089831*

This value ensures that the buffer approximately matches the pixel resolution of Landsat imagery, capturing a ~100 m area around each sampling location.


In [32]:
# Setup
tqdm.pandas()

catalog = pystac_client.Client.open(
        "https://planetarycomputer.microsoft.com/api/stac/v1",
        modifier=pc.sign_inplace,
    )

bands_of_interest = ['qa', 'red', 'blue', 'drad', 'emis', 'emsd', 'lwir', 'trad', 'urad', 'atran', 'cdist', 'green', 'nir08', 'lwir', 'swir16', 'swir22', 'cloud_qa', 'qa_pixel', 'qa_radsat', 'atmos_opacity']

def compute_Landsat_values(row):
    lat = row['Latitude']
    lon = row['Longitude']
    date = pd.to_datetime(row['Sample Date'], dayfirst=True, errors='coerce')


    # Buffer size for ~100m 
    bbox_size = 0.00089831  
    bbox = [
        lon - bbox_size / 2,
        lat - bbox_size / 2,
        lon + bbox_size / 2,
        lat + bbox_size / 2
    ]

    # Wider search range, we'll filter to nearest date later
    search = catalog.search(
        collections=["landsat-c2-l2"],
        bbox=bbox,
        datetime="2011-01-01/2015-12-31",
        query={"eo:cloud_cover": {"lt": 10}},
    )
    
    items = search.item_collection()

    if not items:
        # return pd.Series({
        #     "nir": np.nan
        #     , "green": np.nan
        #     , "swir16": np.nan
        #     , "swir22": np.nan
        #     , "red": np.nan
        #     , "blue": np.nan
        # })
        print("No items found")
        return pd.Series([np.nan] * len(bands_of_interest), index=bands_of_interest)

    try:
        # Convert sample date to UTC
        sample_date_utc = date.tz_localize("UTC") if date.tzinfo is None else date.tz_convert("UTC")

        # Pick the item closest to the sample date
        items = sorted(
            items,
            key=lambda x: abs(pd.to_datetime(x.properties["datetime"]).tz_convert("UTC") - sample_date_utc)
        )
        selected_item = pc.sign(items[0])

        # max_key_length = len(max(selected_item.assets, key=len))
        # for key, asset in selected_item.assets.items():
        #     print(f"{key.rjust(max_key_length)}: {asset.title}")

        # Load required bands
        # bands_of_interest = ["green", "nir08", "swir16", "swir22", "red", "blue"]
        bands_to_extract = [band for band in bands_of_interest if band in selected_item.assets.keys()]
        data = stac_load([selected_item], bands=bands_to_extract, bbox=bbox).isel(time=0)


        medians = []
        for band_name in bands_of_interest:
            if band_name in selected_item.assets.keys():
                band = data[band_name].astype("float")
                # Compute medians
                median_band = float(band.median(skipna=True).values)
                # Replace 0 with NaN
                median_band = median_band if median_band != 0 else np.nan
                medians.append(median_band)
            else:
                medians.append(np.nan)
        # print(pd.Series(medians, index=bands_of_interest))
        return pd.Series(medians, index=bands_of_interest)


            # green = data["green"].astype("float")
            # nir = data["nir08"].astype("float")
            # swir16 = data["swir16"].astype("float")
            # swir22 = data["swir22"].astype("float")
            # red = data["red"].astype("float")
            # blue = data["blue"].astype("float")
            #
            # # Compute medians
            # median_green = float(green.median(skipna=True).values)
            # median_nir = float(nir.median(skipna=True).values)
            # median_swir16 = float(swir16.median(skipna=True).values)
            # median_swir22 = float(swir22.median(skipna=True).values)
            # median_red = float(red.median(skipna=True).values)
            # median_blue = float(blue.median(skipna=True).values)
            #
            # # Replace 0 with NaN
            # median_green = median_green if median_green != 0 else np.nan
            # median_nir = median_nir if median_nir != 0 else np.nan
            # median_swir16 = median_swir16 if median_swir16 != 0 else np.nan
            # median_swir22 = median_swir22 if median_swir22 != 0 else np.nan
            # median_red = median_red if median_red != 0 else np.nan
            # median_blue = median_blue if median_blue != 0 else np.nan

            # return pd.Series({
            #     "nir": median_nir,
            #     "green": median_green,
            #     "swir16": median_swir16,
            #     "swir22": median_swir22,
            #     "red": median_red,
            #     "blue": median_blue,
            # })
    
    except Exception as e:
        print(e)
        return pd.Series([np.nan] * len(bands_of_interest), index=bands_of_interest)
        #pd.Series({
        #    "nir": np.nan, "green": np.nan, "swir16": np.nan, "swir22": np.nan, "red": np.nan, "blue": np.nan
        #})

### Extracting features for the training dataset

In [20]:
Water_Quality_df=pd.read_csv('../water_quality_training_dataset.csv')
display(Water_Quality_df.head())

Unnamed: 0,Latitude,Longitude,Sample Date,Total Alkalinity,Electrical Conductance,Dissolved Reactive Phosphorus
0,-28.760833,17.730278,02-01-2011,128.912,555.0,10.0
1,-26.861111,28.884722,03-01-2011,74.72,162.9,163.0
2,-26.45,28.085833,03-01-2011,89.254,573.0,80.0
3,-27.671111,27.236944,03-01-2011,82.0,203.6,101.0
4,-27.356667,27.286389,03-01-2011,56.1,145.1,151.0


In [21]:
# Water_Quality_df.shape

In [22]:
# Water_Quality_df_200 = Water_Quality_df.loc[0:9318]
# Water_Quality_df_200.shape

In [23]:
# batch_size = 100
# for batch_min in range(0,len(Water_Quality_df),batch_size):
#     batch_max = min(batch_min + batch_size - 1, len(Water_Quality_df) - 1)
#     print(f"Processing {batch_min} --> {batch_max}")
#     # Extract band values from Landsat for training dataset
#     print("üöÄ Running Landsat feature extraction for training data...")
#     landsat_train_features = Water_Quality_df.loc[batch_min:batch_max].progress_apply(compute_Landsat_values, axis=1)
#
#     train_features_path = f"all_bands_extraction/landsat_features_training_{batch_min}_{batch_max}.csv"
#     landsat_train_features.to_csv(train_features_path, index=False)

In [24]:
import pandas as pd
from pathlib import Path

directory = Path("all_bands_extraction")

def extract_end_index(path: Path) -> int:
    # filename without extension ‚Üí split by "_" ‚Üí take last piece
    return int(path.stem.split("_")[-1])

# Collect and sort all CSV files by endindex
csv_files = sorted(
    [f for f in directory.iterdir() if f.suffix.lower() == ".csv"],
    key=extract_end_index
)

all_rows = []

for filepath in csv_files:
    print(f"Loading: {filepath}")
    df = pd.read_csv(filepath)
    all_rows.append(df)

combined_df = pd.concat(all_rows, ignore_index=True)

Loading: all_bands_extraction\landsat_features_training_0_99.csv
Loading: all_bands_extraction\landsat_features_training_100_199.csv
Loading: all_bands_extraction\landsat_features_training_200_299.csv
Loading: all_bands_extraction\landsat_features_training_300_399.csv
Loading: all_bands_extraction\landsat_features_training_400_499.csv
Loading: all_bands_extraction\landsat_features_training_500_599.csv
Loading: all_bands_extraction\landsat_features_training_600_699.csv
Loading: all_bands_extraction\landsat_features_training_700_799.csv
Loading: all_bands_extraction\landsat_features_training_800_899.csv
Loading: all_bands_extraction\landsat_features_training_900_999.csv
Loading: all_bands_extraction\landsat_features_training_1000_1099.csv
Loading: all_bands_extraction\landsat_features_training_1100_1199.csv
Loading: all_bands_extraction\landsat_features_training_1200_1299.csv
Loading: all_bands_extraction\landsat_features_training_1300_1399.csv
Loading: all_bands_extraction\landsat_featu

In [25]:
len(combined_df)

9319

In [26]:
landsat_train_features_combined = pd.concat([Water_Quality_df, combined_df], axis=1)
landsat_train_features_combined.to_csv("landsat_features_training_all_bands.csv")

In [35]:
landsat_train_features_combined_filling_nas = landsat_train_features_combined[landsat_train_features_combined[bands_of_interest].isna().all(axis=1)].progress_apply(compute_Landsat_values, axis=1)

100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3163/3163 [10:56:52<00:00, 12.46s/it]  


In [37]:
landsat_train_features_combined_filling_nas.to_csv("landsat_features_training_all_bands_nas.csv")

In [40]:
landsat_train_features_combined.update(landsat_train_features_combined_filling_nas)

In [45]:
landsat_train_features_combined.to_csv("landsat_features_training_all_bands.csv")

In [38]:
landsat_train_features_combined_filling_nas

Unnamed: 0,qa,red,blue,drad,emis,emsd,lwir,trad,urad,atran,cdist,green,nir08,lwir.1,swir16,swir22,cloud_qa,qa_pixel,qa_radsat,atmos_opacity
508,349.0,9254.5,8333.5,541.0,9904.0,,,8802.5,1057.0,8624.0,67.5,9372.5,8642.5,,8140.0,7985.0,,21952.0,,
509,349.0,9254.5,8333.5,541.0,9904.0,,,8802.5,1057.0,8624.0,67.5,9372.5,8642.5,,8140.0,7985.0,,21952.0,,
755,655.0,12142.0,9431.0,1160.0,9764.0,82.0,,8616.0,2488.0,6770.0,,11877.0,18287.0,,18208.0,15101.0,,22018.0,,
756,655.0,12142.0,9431.0,1160.0,9764.0,82.0,,8616.0,2488.0,6770.0,,11877.0,18287.0,,18208.0,15101.0,,22018.0,,
757,655.0,12142.0,9431.0,1160.0,9764.0,82.0,,8616.0,2488.0,6770.0,,11877.0,18287.0,,18208.0,15101.0,,22018.0,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9310,192.0,8946.5,8152.0,803.0,9833.0,24.5,,9873.0,1571.5,7914.5,1156.5,8991.5,13016.0,,10373.0,8802.5,,21824.0,,
9311,252.0,10282.5,8865.0,1212.0,9621.0,176.0,,11084.5,2516.0,6828.0,1048.0,9917.0,16178.0,,15580.5,13168.5,,21824.0,,
9313,209.5,10152.0,9109.0,1136.5,9610.0,106.0,,10730.5,2337.0,7019.0,1859.0,9750.5,14352.0,,15688.0,13176.0,,21824.0,,
9314,155.0,10596.5,8983.0,385.5,9700.0,63.0,,10747.5,698.5,9048.5,3531.5,10043.0,15296.5,,16381.0,14443.0,,21824.0,,


In [29]:
landsat_train_features_combined[landsat_train_features_combined[bands_of_interest].isna().all(axis=1)]

Unnamed: 0,Latitude,Longitude,Sample Date,Total Alkalinity,Electrical Conductance,Dissolved Reactive Phosphorus,qa,red,blue,drad,...,cdist,green,nir08,lwir.1,swir16,swir22,cloud_qa,qa_pixel,qa_radsat,atmos_opacity
508,-22.225556,29.990556,30-07-2015,86.300,1216.00,50.0,,,,,...,,,,,,,,,,
509,-22.225556,29.990556,03-08-2015,94.200,1243.00,50.0,,,,,...,,,,,,,,,,
755,-34.065833,20.404167,21-01-2014,111.736,318.52,10.0,,,,,...,,,,,,,,,,
756,-34.065833,20.404167,11-02-2014,124.722,822.00,10.0,,,,,...,,,,,,,,,,
757,-34.065833,20.404167,11-03-2014,106.098,822.00,20.0,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9310,-27.273889,28.490000,22-12-2015,46.300,113.00,30.0,,,,,...,,,,,,,,,,
9311,-25.459639,28.264306,22-12-2015,206.745,682.00,168.0,,,,,...,,,,,,,,,,
9313,-25.810483,27.909552,23-12-2015,102.736,505.00,100.0,,,,,...,,,,,,,,,,
9314,-27.527500,30.858056,23-12-2015,38.900,134.00,20.0,,,,,...,,,,,,,,,,


### Note

The Landsat data extraction process for all 9,319 locations typically requires more than 7 hours when executed in a single run. During long executions, you may occasionally encounter API limits, timeout errors, or request failures. To avoid these interruptions, we recommend running the extraction in smaller batches.

In this notebook, we provide a sample code snippet demonstrating how to extract data for the first 200 locations. Participants are encouraged to follow the same batching approach to extract data for all 9,319 locations safely and efficiently.

We have already executed the full extraction for all 9,319 locations and saved the output to **landsat_features_training.csv**, which will be used in the benchmark notebook.  
Similarly, participants can extract Landsat features in batches, combine the batch outputs, and save the final merged dataset as **landsat_features_training.csv** to ensure the benchmark notebook runs smoothly.


In [135]:
# Extract band values from Landsat for training dataset
train_features_path = "landsat_features_training_10_linebyline.csv"

print("üöÄ Running Landsat feature extraction for training data...")
landsat_train_features = Water_Quality_df_200.progress_apply(compute_Landsat_values, axis=1)
landsat_train_features.to_csv(train_features_path, index=False)

üöÄ Running Landsat feature extraction for training data...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3/3 [01:10<00:00, 23.55s/it]


In [119]:
landsat_train_features

Unnamed: 0,qa,red,blue,drad,emis,emsd,lwir,trad,urad,atran,cdist,green,nir08,lwir.1,swir16,swir22,cloud_qa,qa_pixel,qa_radsat,atmos_opacity
0,172.0,12802.0,9557.0,890.0,9880.0,,44344.0,8935.0,1835.0,7669.0,2447.5,11426.0,11190.0,44344.0,7687.5,7645.0,32.0,5504.0,,8.0
1,213.0,9241.5,8691.0,747.0,9570.5,,45396.5,9097.0,1477.5,7957.5,476.5,9550.0,17658.5,45396.5,13746.5,10574.0,,5440.0,,46.0
2,298.0,12540.0,9502.5,329.0,9670.5,89.5,41467.5,7739.0,590.0,9051.0,178.5,10720.0,15210.0,41467.5,17974.0,14201.0,2.0,5440.0,,9.0
3,284.0,11237.5,9693.5,1198.0,9756.5,96.5,46121.0,8990.0,2550.0,6476.0,381.5,10943.0,14887.0,46121.0,13522.0,11403.0,,5440.0,,48.0
4,260.0,9290.0,8718.0,1187.0,9829.5,95.0,44381.0,8547.0,2526.0,6521.0,930.0,9502.5,16828.5,44381.0,12665.5,9643.0,,5440.0,,106.5
5,274.0,10728.5,9344.0,1245.0,9804.5,67.0,44339.0,8492.0,2668.0,6332.0,592.0,10433.5,12433.5,44339.0,9579.5,8531.5,1.0,5440.0,,101.0
6,-4740.0,5122.0,4984.0,-4376.0,-117.0,-4978.5,22583.0,-497.5,-3682.5,-1702.5,-4990.5,5189.5,7814.0,22583.0,6664.0,5222.0,,2720.5,,-4690.0
7,633.5,9985.0,10039.0,1244.5,9619.0,54.0,44583.0,8744.0,2629.0,6600.5,1.5,10466.5,14137.5,44583.0,10315.5,8536.0,,5568.0,,690.5
8,650.5,10524.0,9921.0,1284.0,9529.0,80.0,45279.0,8948.5,2725.0,6481.0,3.0,10647.0,15543.0,45279.0,11919.5,9642.5,,5568.0,,622.5
9,303.5,10262.5,9414.0,1237.0,9693.0,77.0,46784.0,9156.0,2652.0,6366.0,274.5,10207.0,13683.0,46784.0,14011.5,11850.5,,5440.0,,123.0


In [63]:
landsat_train_features

Unnamed: 0,blue,green,red,nir08,swir16,lwir,swir22,blue.1,green.1,red.1,...,lwir.1,swir22.1,coastal,blue.2,green.2,red.2,nir08.1,swir16.1,swir22.2,lwir11
0,,,,,,,,,,,...,,,,,,,,,,
1,,,,,,,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
5,,,,,,,,,,,...,,,,,,,,,,
6,,,,,,,,,,,...,,,,,,,,,,
7,,,,,,,,,,,...,,,,,,,,,,
8,,,,,,,,,,,...,,,,,,,,,,
9,,,,,,,,,,,...,,,,,,,,,,


In [None]:
landsat_train_features = pd.read_csv("landsat_features_training_all_bands.csv")

**NDMI and MNDWI Indices**

In this notebook, we compute two commonly used water-related indices from the extracted Landsat bands:

- **NDMI (Normalized Difference Moisture Index):**  
  Measures vegetation water content and surface moisture.  
  Computed as *(NIR - SWIR16) / (NIR + SWIR16)*.

- **MNDWI (Modified Normalized Difference Water Index):**  
  Highlights open water features by enhancing water reflectance and suppressing built-up areas.  
  Computed as *(Green - SWIR16) / (Green + SWIR16)*.

- **NDTI (Normalized Difference Turbidity Index):**
  A remote sensing metric used to assess water clarity by analyzing suspended sediment concentrations using satellite imagery
  Computed as *(Red - Green) / (Red + Green)*.

An **epsilon value** (*eps = 1e-10*) is added to the denominators to avoid division by zero.  
These indices are widely used in hydrological and water quality analyses for detecting water presence and vegetation moisture levels.


In [24]:
# Create indices: NDMI and MNDWI
eps = 1e-10
landsat_train_features['NDMI'] = (landsat_train_features['nir08'] - landsat_train_features['swir16']) / (landsat_train_features['nir08'] + landsat_train_features['swir16'] + eps)
landsat_train_features['MNDWI'] = (landsat_train_features['green'] - landsat_train_features['swir16']) / (landsat_train_features['green'] + landsat_train_features['swir16'] + eps)
landsat_train_features['NDTI'] = (landsat_train_features['red'] - landsat_train_features['green']) / (landsat_train_features['red'] + landsat_train_features['green'] + eps)

In [25]:
landsat_train_features['Latitude'] = Water_Quality_df['Latitude']
landsat_train_features['Longitude'] = Water_Quality_df['Longitude']
landsat_train_features['Sample Date'] = Water_Quality_df['Sample Date']
landsat_train_features = landsat_train_features[['Latitude', 'Longitude', 'Sample Date', 'nir08', 'green', 'swir16', 'swir22', 'NDMI', 'MNDWI', 'NDTI']]

Unnamed: 0,Latitude,Longitude,Sample Date,nir,green,swir16,swir22,NDMI,MNDWI,NDTI
0,-28.760833,17.730278,02-01-2011,11190.0,11426.0,7687.5,7645.0,0.185538,0.195595,0.056794
1,-26.861111,28.884722,03-01-2011,17658.5,9550.0,13746.5,10574.0,0.124566,-0.180134,-0.016417
2,-26.45,28.085833,03-01-2011,15210.0,10720.0,17974.0,14201.0,-0.083293,-0.252805,0.078246
3,-27.671111,27.236944,03-01-2011,14887.0,10943.0,13522.0,11403.0,0.048048,-0.105416,0.013277
4,-27.356667,27.286389,03-01-2011,16828.5,9502.5,12665.5,9643.0,0.141147,-0.142683,-0.011308


In [27]:
# Preview File
landsat_train_features.head()

In [None]:
landsat_train_features.to_csv("landsat_features_training_all_bands_with_index.csv",index = False)

In [None]:
session.sql("""
    PUT file:///tmp/landsat_features_training_200.csv
    'snow://workspace/USER$.PUBLIC."EY-AI-and-Data-Challenge"/versions/live/'
    AUTO_COMPRESS=FALSE
    OVERWRITE=TRUE
""").collect()

print("File saved! Refresh the browser to see the files in the sidebar")

**Note:** If you're using your own workspace, remember to replace "EY-AI-and-Data-Challenge" with your workspace name in the file path.

### Extracting features for the validation dataset

In [12]:
Validation_df=pd.read_csv('submission_template.csv')
display(Validation_df.head())

In [None]:
Validation_df.shape

In [14]:
# Extract band values from Landsat for submission dataset
val_features_path = "landsat_features_validation.csv"

print("üöÄ Running Landsat feature extraction for validation data...")
landsat_val_features = Validation_df.progress_apply(compute_Landsat_values, axis=1)
landsat_val_features.to_csv(val_features_path, index=False)

In [15]:
# Create indices: NDMI and MNDWI
eps = 1e-10
landsat_val_features['NDMI'] = (landsat_val_features['nir'] - landsat_val_features['swir16']) / (landsat_val_features['nir'] + landsat_val_features['swir16'])
landsat_val_features['MNDWI'] = (landsat_val_features['green'] - landsat_val_features['swir16']) / (landsat_val_features['green'] + landsat_val_features['swir16'] + eps)

In [None]:
landsat_val_features['Latitude'] = Validation_df['Latitude']
landsat_val_features['Longitude'] = Validation_df['Longitude']
landsat_val_features['Sample Date'] = Validation_df['Sample Date']
landsat_val_features = landsat_val_features[['Latitude', 'Longitude', 'Sample Date', 'nir', 'green', 'swir16', 'swir22', 'NDMI', 'MNDWI']]

In [None]:
# Preview File
landsat_val_features.head()

In [None]:
landsat_val_features.to_csv("/tmp/landsat_features_validation.csv",index = False)

In [None]:
session.sql("""
    PUT file:///tmp/landsat_features_validation.csv
    'snow://workspace/USER$.PUBLIC."EY-AI-and-Data-Challenge"/versions/live/'
    AUTO_COMPRESS=FALSE
    OVERWRITE=TRUE
""").collect()

print("File saved! Refresh the browser to see the files in the sidebar")

**Note:** If you're using your own workspace, remember to replace "EY-AI-and-Data-Challenge" with your workspace name in the file path.