**Copyright: © NexStream Technical Education, LLC**.  
All rights reserved


# USGS Earthquake Scraper Introduction
In this project, you will create a 'web scraper' to access and retrieve real-time data from the US Geological Service (USGS) reflecting the latest active earthquakes around the world which are equal or above a user input magnitude.

The data is in JSON format so you'll need to convert the output into a user-readable (friendly) format.

The feed is from the USGS database here:  https://earthquake.usgs.gov/earthquakes/feed/.  You should become familiar with this site.

The format of the feed summary is here: https://earthquake.usgs.gov/earthquakes/feed/v1.0/geojson.php.  You should become familiar with the fields for the JSON data.  

Note you can use a JSON viewer for a more readable format of the data.  






# Part 1a:  Setup the environment and script and prompt the user for input.
Setup the script imports and prompt the user for the magnitude from which the USGS data will be accessed.  That is, any earthquake greater than or equal to the input magnitude will be retrieved from the database.  
You'll need to import the urllib.request library to get to the web site.
You also can input the json library to utilize the functions in that library.
Check out both API's for reference.


In [1]:
import urllib.request
import json
import csv
from datetime import datetime

# Part 1b:  Write the printResults function.  
In this function, you should print the output of the data you retrieved from the site:  http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.geojson      
See the code comments for guided instruction.


Note you can use a JSON viewer for a more readable format of the data if you want to view it before processing it with your function.



In [2]:
def printResults(data):
    """
    Part 1b: Print earthquake results
    - data: JSON string of USGS earthquake feed
    """
    try:
        parsed = json.loads(data)
        events = parsed.get('features', [])
        print(f"Number of earthquakes found: {len(events)}\n")
        for feature in events:
            props = feature['properties']
            mag = props.get('mag')
            place = props.get('place')
            time_ms = props.get('time')
            time_str = datetime.utcfromtimestamp(time_ms/1000).strftime('%Y-%m-%d %H:%M:%S UTC')
            print(f"Mag {mag}, Location: {place}, Time: {time_str}")
    except Exception as e:
        print(f"Error processing data: {e}")

# Part 1c:  Write the runner
In this code (either main or in a function), you should setup the URL from the USGS site, open the URL and read the data, call the printResults function.
http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.geojson  
See the code comments for guided instruction.  
 
Note you can use a JSON viewer for a more readable format of the data if you want to view it before processing it with your function.

In [3]:
def runner():
    """
    Part 1c: Prompt user, fetch data, call printResults, and loop
    """
    while True:
        try:
            mag = float(input("Enter minimum magnitude (0.0 to 10.0): "))
            if not (0.0 <= mag <= 10.0):
                print("Magnitude out of realistic range. Please try again.\n")
                continue
        except ValueError:
            print("Invalid input. Please enter a number.\n")
            continue

        # Fetch all earthquakes in the past day
        url = "https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson"
        try:
            with urllib.request.urlopen(url) as response:
                raw_data = response.read().decode('utf-8')
        except Exception as e:
            print(f"Error fetching data: {e}\n")
            continue

        # Filter by magnitude threshold
        parsed = json.loads(raw_data)
        filtered_events = {
            "type": parsed.get("type"),
            "metadata": parsed.get("metadata"),
            "features": [f for f in parsed.get("features", []) if f['properties'].get('mag', 0) >= mag]
        }

        data_str = json.dumps(filtered_events)
        printResults(data_str)

        # Prompt to output to CSV
        save = input("\nSave results to CSV? (y/n): ").strip().lower()
        if save == 'y':
            printResults2(data_str)

        # Prompt for another run or exit
        cont = input("\nRun again? (y/n): ").strip().lower()
        if cont != 'y':
            print("Exiting program.")
            break

# Part 2:  Output data to spreadsheet
Convert output to CSV format.  

Rewrite the printResults function.  Call it printResults2(data) where a list or dictionary (your choice) is returned from the function to the runner then the data is converted to CSV format and saved to a file.

Change your runner to assign the returned data from your printResults2 function to a variable that you then convert to CSV format and save to a file.

Include at least the 4 retrieved from the database from Part 1.  
Include exception handling in your file IO processing.   

In [4]:
def printResults2(data):
    """
    Part 2: Convert data to CSV and save to file
    Returns list of record dicts
    """
    try:
        parsed = json.loads(data)
        events = parsed.get('features', [])
        records = []
        for feature in events:
            props = feature['properties']
            records.append({
                'mag': props.get('mag'),
                'place': props.get('place'),
                'time': datetime.utcfromtimestamp(props.get('time')/1000).strftime('%Y-%m-%d %H:%M:%S UTC'),
                'url': props.get('url')
            })
        # Write to CSV
        with open('earthquakes.csv', 'w', newline='') as csvfile:
            writer = csv.DictWriter(csvfile, fieldnames=records[0].keys())
            writer.writeheader()
            writer.writerows(records)
        print("Data saved to 'earthquakes.csv'\n")
        return records
    except Exception as e:
        print(f"Error writing CSV: {e}")
        return []

# Part 3:  Search on another field
Create a new printResults function called printResults3(data, searchField) where:  
'data' is the 'scraped' data from the usgs site as in the previous parts and  
'searchField' is a field defined at the geojson.php site below. 

The search field may be input from a selection provided to the user or may be fixed (programmer's choice).  Use a meaningful field that you can glean some information from (think about how a data scientist may want to analyze certain types of data from the set).  

Change your runner to search the database for the different field and print out the results based on that field.  For example you might want to search for all the earthquakes that occurred within a particular latitude and longitude bounding box.   

See https://earthquake.usgs.gov/earthquakes/feed/v1.0/geojson.php for the list of parameters that can be retrieved.


In [5]:
def printResults3(data, searchField, searchValue):
    """
    Part 3: Filter on another field in the properties
    - searchField: property name to filter on
    - searchValue: value to match
    """
    try:
        parsed = json.loads(data)
        events = parsed.get('features', [])
        filtered = [f for f in events if f['properties'].get(searchField) == searchValue]
        print(f"Found {len(filtered)} events where {searchField} == {searchValue}\n")
        for feature in filtered:
            props = feature['properties']
            print(f"Mag {props.get('mag')}, Location: {props.get('place')}")
    except Exception as e:
        print(f"Error filtering data: {e}")

# Ejecución de ejemplo para verificar cada parte

A continuación se muestran llamadas de ejemplo para probar las funciones sin necesidad de inputs interactivos.

In [6]:
# Parte 1b: Imprimir resultados (example)
url = "https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson"
raw_data = urllib.request.urlopen(url).read().decode('utf-8')
printResults(raw_data)

# Parte 2: Guardar CSV y mostrar primeras filas
records = printResults2(raw_data)
import pandas as pd
df = pd.DataFrame(records)
df.head()

# Parte 3: Filtrar por campo 'mag' mayor o igual a 5.0
# Dado que printResults3 busca igualdad, usamos mag exacta 5.0 o ajustamos
print("Eventos con magnitud exactamente 5.0:")
printResults3(raw_data, 'mag', 5.0)

URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1020)>