In [0]:
# required each time the cluster is restarted which should be only on the first notebook as they run in order 


tiers = ['bronze','silver','gold']
adls_path = {tiers:f"abfss://{tiers}@dbprojectearthquack.dfs.core.windows.net/" for tiers in tiers}


#Accessing paths 

bronze_adls = adls_path['bronze']
silver_adls = adls_path['silver']
gold_adls = adls_path['gold']

dbutils.fs.ls(bronze_adls)
dbutils.fs.ls(silver_adls)
dbutils.fs.ls(gold_adls)


[]

What the code is doing in simple terms

You have three storage areas called Bronze, Silver, and Gold.

Think of them like three different folders in a big online storage drive.

Bronze might hold raw, unprocessed data.

Silver might hold cleaned and organized data.

Gold might hold final, ready-to-use data.

You’re creating a map (dictionary) that links each storage area’s name to its web address (path) in Azure Data Lake Storage (ADLS).

The code automatically builds these paths using the names "bronze", "silver", and "gold".

Example: "bronze" gets linked to something like abfss://bronze@dbprojectearthquack.dfs.core.windows.net/.

You save each path into a variable so you can easily refer to it later.

bronze_adls = web address for Bronze folder.

silver_adls = web address for Silver folder.

gold_adls = web address for Gold folder.

You check what’s inside each folder using dbutils.fs.ls() —

This command is like opening the folder in File Explorer or Google Drive to see what files are inside.

In [0]:
import requests, json
from datetime import date, timedelta

In [0]:
start_date = date.today() - timedelta(1)
end_date = date.today()

requests

A popular Python library used to send HTTP requests (like visiting websites, downloading data, or talking to APIs).

Example: If you want to get weather data from a web service, requests is the tool that makes the call and gets the response.

json

A built-in Python library for working with JSON data (JavaScript Object Notation).

JSON is a way to store and send structured data — looks like a dictionary but in text form.

You use json to convert data between Python objects and JSON text.

from datetime import date, timedelta

You’re importing date (to work with calendar dates like today’s date).

And timedelta (to represent a difference in time — for example, "5 days ago" or "next 7 days").

In [0]:
# API URL
url = f"https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime={start_date}&endtime={end_date}"
headers = {
    "User-Agent": "Mozilla/5.0"
}

try:
    # Make the GET request to fetch data
    response = requests.get(url, headers=headers)

    # Check if the request was successful
    response.raise_for_status() # Raise an exception if the request was unsuccessful

    try:
        data = response.json().get("features", [])
        if not data:
            print("No earthquake data found.")
        else:
            json_data = json.dumps(data, indent=4)
            file_path = f"{bronze_adls}/{start_date}_earthquake_data.json"
            dbutils.fs.put(file_path, json_data, overwrite=True)
            print(f"Saved {len(data)} records to {file_path}")
    except json.JSONDecodeError:
        print("Failed to parse JSON")
        print("Response:\n", response.text[:500])
except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")


Wrote 303549 bytes.
Saved 244 records to abfss://bronze@dbprojectearthquack.dfs.core.windows.net//2025-09-16_earthquake_data.json


In [0]:
data[1]

{'type': 'Feature',
 'properties': {'mag': 4.4,
  'place': '33 km NNE of Mejillones, Chile',
  'time': 1758066789194,
  'updated': 1758068087040,
  'tz': None,
  'url': 'https://earthquake.usgs.gov/earthquakes/eventpage/us7000qwky',
  'detail': 'https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=us7000qwky&format=geojson',
  'felt': None,
  'cdi': None,
  'mmi': None,
  'alert': None,
  'status': 'reviewed',
  'tsunami': 0,
  'sig': 298,
  'net': 'us',
  'code': '7000qwky',
  'ids': ',us7000qwky,',
  'sources': ',us,',
  'types': ',origin,phase-data,',
  'nst': 19,
  'dmin': 0.739,
  'rms': 0.32,
  'gap': 153,
  'magType': 'mb',
  'type': 'earthquake',
  'title': 'M 4.4 - 33 km NNE of Mejillones, Chile'},
 'geometry': {'type': 'Point', 'coordinates': [-70.3643, -22.8115, 35]},
 'id': 'us7000qwky'}

What this code does

It connects to the US Geological Survey (USGS) earthquake API to get earthquake data for yesterday up to today, and then stores it in your Bronze layer in Azure Data Lake.

Step-by-step breakdown

Build the API link

url = f"https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime={start_date}&endtime={end_date}"


This creates a website link (API URL) that says:

“Give me earthquake data in GeoJSON format from {start_date} to {end_date}.”

start_date = yesterday, end_date = today.

Set request headers

headers = {"User-Agent": "Mozilla/5.0"}


Pretends you’re a normal web browser so the server doesn’t block you.

Make the request to get earthquake data

response = requests.get(url, headers=headers)


This sends the request to the API and waits for the earthquake data.

Check if the request worked

response.raise_for_status()


If something went wrong (like wrong link, server down, or no internet), this will throw an error.

Process the data

data = response.json().get("features", [])


Reads the API’s JSON response and takes the "features" section (which contains the earthquake records).

If there’s no "features" key, it just gives an empty list.

If no data, show message

if not data:
    print("No earthquake data found.")


Lets you know if there were no earthquakes in that date range.

If there’s data, save it to the Bronze layer

json_data = json.dumps(data, indent=4)
file_path = f"{bronze_adls}/{start_date}_earthquake_data.json"
dbutils.fs.put(file_path, json_data, overwrite=True)


Converts the earthquake data into nicely formatted JSON text.

Creates a file name like:

bronze_layer/2025-08-13_earthquake_data.json


Saves the file in your Bronze area in ADLS.

If JSON parsing fails

except json.JSONDecodeError:
    print("Failed to parse JSON")


If the data from the API isn’t in valid JSON format, it shows an error.

If network request fails

except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")


Shows an error if the request couldn’t even reach the API.

Plain-English Summary

This script:

Figures out yesterday and today’s dates.

Asks the official US earthquake service for earthquake events between those dates.

If found, it saves them into your Bronze storage layer in Azure Data Lake as a JSON file.

If no quakes or an error happens, it prints a message.

In [0]:
# define your variables 

output_data = {

    "start_date":start_date.isoformat(),
    "end_date":end_date.isoformat(),
    "bronze_adls": bronze_adls,
    "silver_adls": silver_adls,
    "gold_adls": gold_adls
}

# return the dictonery directly 

dbutils.jobs.taskValues.set(key="bronze_output", value= output_data)

print("Bronze task values stored:", output_data)

Bronze task values stored: {'start_date': '2025-09-16', 'end_date': '2025-09-17', 'bronze_adls': 'abfss://bronze@dbprojectearthquack.dfs.core.windows.net/', 'silver_adls': 'abfss://silver@dbprojectearthquack.dfs.core.windows.net/', 'gold_adls': 'abfss://gold@dbprojectearthquack.dfs.core.windows.net/'}


Create a dictionary of important information

output_data = {
    "start_date": start_date.isoformat(),
    "end_date": end_date.isoformat(),
    "bronze_adls": bronze_adls,
    "silver_adls": silver_adls,
    "gold_adls": gold_adls
}


This builds a small package of variables (like putting them all in a labeled box).

start_date.isoformat() → turns the date into a text string like "2025-08-13".

Same for end_date.

It also stores the paths for Bronze, Silver, and Gold ADLS folders.

Share the dictionary with another Databricks task

dbutils.jobs.taskValues.set(key="bronze_output", value=output_data)


This saves the whole dictionary into Databricks job task values under the name "bronze_output".

Think of it like handing off these values so another task later in the same job can use them.

This is especially useful if:

Task 1 gets the data

Task 2 processes it

Task 3 moves it to another location

Plain-English Summary

You’re basically saying:

“Here are my start and end dates, and my storage folder paths. I’m packaging them up and passing them to the next task in this Databricks job so it can use them without recalculating everything.”
