## Data Storage Tiers and ADLS Paths

In this notebook, we define the tiers for data storage and create a dictionary to store the Azure Data Lake Storage (ADLS) paths for each tier. We then list the contents of each ADLS path to verify access.

### Mount ADLS Gen2

This script mounts Azure Data Lake Storage (ADLS) Gen2. This is required each time the cluster is restarted. It should be executed only in the first notebook as they run in order.

In [0]:
# Define the tiers for data storage
tiers = ["bronze", "silver", "gold"]

# Create a dictionary to store the ADLS paths for each tier
adls_paths = {
    tier: f"abfss://{tier}@earthquakedatadb.dfs.core.windows.net/" 
    for tier in tiers
}

# Accessing paths for each tier
bronze_adls = adls_paths["bronze"]
silver_adls = adls_paths["silver"]
gold_adls = adls_paths["gold"] 

# List the contents of each ADLS path to verify access
dbutils.fs.ls(bronze_adls)
dbutils.fs.ls(silver_adls)
dbutils.fs.ls(gold_adls)

[FileInfo(path='abfss://gold@earthquakedatadb.dfs.core.windows.net/earthquake_events_gold/', name='earthquake_events_gold/', size=0, modificationTime=1745057122000)]

## Import Necessary Libraries

In this cell, we import the necessary libraries for making HTTP requests and handling JSON data. We also import date and timedelta for date manipulations.

In [0]:
import requests
import json
from datetime import date, timedelta

## Define Date Range

Here, we define the start and end dates for the data extraction. This is currently set to extract data from the previous day.

In [0]:
# Remove this before running Data Factory Pipeline
start_date = date.today() - timedelta(days=1)
end_date = date.today()

In [0]:
start_date, end_date

(datetime.date(2025, 4, 19), datetime.date(2025, 4, 20))

## Access Earthquake Data from USGS API

In this cell, we access the earthquake data from the USGS API using the defined date range.

In [0]:
# Accessing earthquake.usgs.gov API
url = "https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime={}&endtime={}".format(start_date, end_date)

# Option 2
# url = f"https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime={start_date}&endtime={end_date}"

## Fetch, Load AND Save Data to Bronze Container

This cell fetches the data from the API and loads it into a JSON object.

In [0]:
try:    
    # Make the GET requests to fetch data
    response = requests.get(url)

    # Check if the request was successful
    response.raise_for_status() # Raise HTTP Error for bad responses
    data = response.json().get('features', [])

    if not data:
        print("No earthquake data found for the specified date range.")
    else:
        # Specify the ADLS Gen2 path to write the data
        adls_path = f"{bronze_adls}/{start_date}_earthquake_data.json"

        # Save the json data to ADLS Gen2
        json_data = json.dumps(data, indent=4)
        dbutils.fs.put(adls_path, json_data, overwrite=True)
        print(f"Data saved to {adls_path}")
except requests.exceptions.HTTPError as err:
    print(f"Error fetching data from API: {err}")

Wrote 369205 bytes.
Data saved to abfss://bronze@earthquakedatadb.dfs.core.windows.net//2025-04-19_earthquake_data.json


In [0]:
data[0]

{'type': 'Feature',
 'properties': {'mag': 0.3,
  'place': '33 km NW of Indian Springs, Nevada',
  'time': 1745107130799,
  'updated': 1745108597444,
  'tz': None,
  'url': 'https://earthquake.usgs.gov/earthquakes/eventpage/nn00896479',
  'detail': 'https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=nn00896479&format=geojson',
  'felt': None,
  'cdi': None,
  'mmi': None,
  'alert': None,
  'status': 'reviewed',
  'tsunami': 0,
  'sig': 1,
  'net': 'nn',
  'code': '00896479',
  'ids': ',nn00896479,',
  'sources': ',nn,',
  'types': ',origin,phase-data,',
  'nst': 12,
  'dmin': 0.017,
  'rms': 0.0905,
  'gap': 199.50000000000006,
  'magType': 'ml',
  'type': 'earthquake',
  'title': 'M 0.3 - 33 km NW of Indian Springs, Nevada'},
 'geometry': {'type': 'Point', 'coordinates': [-115.9286, 36.7856, 7.2]},
 'id': 'nn00896479'}