**Copyright: © NexStream Technical Education, LLC**.  
All rights reserved


# USGS Earthquake Scraper Introduction
In this project, you will create a 'web scraper' to access and retrieve real-time data from the US Geological Service (USGS) reflecting the latest active earthquakes around the world which are equal or above a user input magnitude.

The data is in JSON format so you'll need to convert the output into a user-readable (friendly) format.

The feed is from the USGS database here:  https://earthquake.usgs.gov/earthquakes/feed/.  You should become familiar with this site.

The format of the feed summary is here: https://earthquake.usgs.gov/earthquakes/feed/v1.0/geojson.php.  You should become familiar with the fields for the JSON data.  

Note you can use a JSON viewer for a more readable format of the data.  






# Part 1a:  Setup the environment and script and prompt the user for input.
Setup the script imports and prompt the user for the magnitude from which the USGS data will be accessed.  That is, any earthquake greater than or equal to the input magnitude will be retrieved from the database.  
You'll need to import the urllib.request library to get to the web site.
You also can input the json library to utilize the functions in that library.
Check out both API's for reference.


In [81]:
#Import the urllib.request and json libraries

import urllib.request 
import json 

####Your code here....

#Prompt the user to input a magnitude parameter of type floating point.  
#Limit the range that user can input to realistic magnitudes (check the magnitude entered and if it doesn't fall within a range, print out a message and prompt again.)
#Provide a prompt to the user to end the program or input another magnitude number (this code can be in a later cell).

####Your code here....



while(True):
  user_input = float(input("Enter the magnitude earthquake"))
  
  if(user_input > 0 and user_input < 10):
    break

print(user_input)


2.4


# Part 1b:  Write the printResults function.  
In this function, you should print the output of the data you retrieved from the site:  http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.geojson      
See the code comments for guided instruction.


Note you can use a JSON viewer for a more readable format of the data if you want to view it before processing it with your function.



In [82]:
#Function printResults(data)
#In Python 3.x we need to explicitly decode the response to a string 
#i.e. data is output from data.decode("utf-8") 
import requests 



def printResults(data):

  # 1.  Use the json "loads" api  to load the string data into a dictionary
  ####Your code here....
  response  = requests.get(data)
  data_json = response.text

  data = json.loads(data_json)

  #print(data)
  # 2.  Access the contents of the JSON data
  #     and print out the metadata title
####Your code here....
  print("Step 2")
  print(data['metadata'])
  #3.  Output the number of events
####Your code here....
  
  metadata_json = data['metadata']
  features_json = data['features']
  
  # Priting the number of events
  print("Step 3")
  print("Count of events :", metadata_json['count'])
  #4.  For each event, print the place where it occurred
####Your code here....

  # print(features_json[0])

  # features_0 = features_json[0]

  # magnitude = features_0['properties']['mag']
  # print(magnitude)
  print("Step 4")
  for i in range(len(features_json)):
    print(features_json[i]['properties']['mag'] , features_json[i]['properties']['place'])

    
  print("Step 5")
  #5 For each event, if the magnitude is greater than the user input
  #  print both the magnitude and the place it occurred. 
  #  HINT: use the "title" field that each feature has.

  for i in range(len(features_json)):
    mag = features_json[i]['properties']['mag']
    if( mag > user_input):
     print(features_json[i]['properties']['title'])

####Your code here....


printResults("https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.geojson")

Step 2
{'generated': 1745905445000, 'url': 'https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.geojson', 'title': 'USGS Magnitude 2.5+ Earthquakes, Past Day', 'status': 200, 'api': '1.14.1', 'count': 25}
Step 3
Count of events : 25
Step 4
5.2 47 km E of Kandrian, Papua New Guinea
3.19 6 km N of Lluveras, Puerto Rico
4.4 36 km NE of Samaná, Dominican Republic
4.5 15 km WSW of Burica, Panama
3 37 km ESE of Whittier, Alaska
4.2 43 km S of Tocopilla, Chile
4 Celebes Sea
4.9 191 km ESE of Kokopo, Papua New Guinea
4.6 204 km NW of Neiafu, Tonga
4.9 191 km ESE of Kimbe, Papua New Guinea
2.5 8 km ENE of Soda Springs, Idaho
2.6 7 km NE of Soda Springs, Idaho
4.5 76 km N of Shikotan, Russia
2.6 18 km N of Chignik Lagoon, Alaska
2.9 49 km SE of Egegik, Alaska
4.2 231 km E of Levuka, Fiji
2.9 21 km NNE of Yerington, Nevada
4.4 12 km WNW of Sola, Vanuatu
4.2 55 km SSW of Ollagüe, Chile
4.4 148 km E of Saga, China
2.73 66 km WNW of Petrolia, CA
4.5 115 km WNW of Ternate, Indonesia
3 1

# Part 1c:  Write the runner
In this code (either main or in a function), you should setup the URL from the USGS site, open the URL and read the data, call the printResults function.
http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.geojson  
See the code comments for guided instruction.  
 
Note you can use a JSON viewer for a more readable format of the data if you want to view it before processing it with your function.

In [50]:
# Define a variable to hold the source URL (see the notes for the URL)
# This feed lists all earthquakes for the last day larger than Mag 2.5 (this is your minimum input)
####Your code here....
import urllib.request
import ssl

url_usgs = "https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.geojson"


context = ssl._create_unverified_context()

# Open the URL and read the data
# See urllib.request.urlopen API
####Your code here....

with urllib.request.urlopen(url_usgs, context=context) as f:
  code = f.getcode()
  print("HTTP status code:", code)
  if(code == 200):
    try:
      print(f.read().decode("utf-8"))
    finally:
      f.close()
  
# Print the HTTP status code of the response (200 is a valid response)
# See urllib.request.urlopen API
####Your code here....


# If the HTTP status code of the response is valid (hint: 200) 
#    then read the data (hint: .read API) and convert to a string (hint: .decode("utf-8") API), 
#    and print the results using your printResults function from step 1b
# Make sure your code handles an error condition (i.e. non-valid status code) 
#    and print out the error code in that case.
####Your code here....

code = f.getcode()
print("HTTP status code:", code)
if(code == 200):
  try:
    printResults(url_usgs)
  finally:
    f.close()
else:
  print("Network error getting", code)


HTTP status code: 200
{'generated': 1745898725000, 'url': 'https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.geojson', 'title': 'USGS Magnitude 2.5+ Earthquakes, Past Day', 'status': 200, 'api': '1.14.1', 'count': 25}
Count of events : 25
Step 5
M 4.4 - 36 km NE of Samaná, Dominican Republic
M 3.0 - 37 km ESE of Whittier, Alaska
M 4.2 - 43 km S of Tocopilla, Chile
M 4.0 - Celebes Sea
M 4.9 - 191 km ESE of Kokopo, Papua New Guinea
M 4.6 - 204 km NW of Neiafu, Tonga
M 4.9 - 191 km ESE of Kimbe, Papua New Guinea
M 2.6 - 7 km NE of Soda Springs, Idaho
M 4.5 - 76 km N of Shikotan, Russia
M 2.6 - 18 km N of Chignik Lagoon, Alaska
M 2.9 - 49 km SE of Egegik, Alaska
M 4.2 - 231 km E of Levuka, Fiji
M 2.9 - 21 km NNE of Yerington, Nevada
M 4.4 - 12 km WNW of Sola, Vanuatu
M 4.2 - 55 km SSW of Ollagüe, Chile
M 4.4 - 148 km E of Saga, China
M 2.7 - 66 km WNW of Petrolia, CA
M 4.5 - 115 km WNW of Ternate, Indonesia
M 3.0 - 108 km NW of Yakutat, Alaska
M 3.3 - 15 km WSW of Johannesb

# Part 2:  Output data to spreadsheet
Convert output to CSV format.  

Rewrite the printResults function.  Call it printResults2(data) where a list or dictionary (your choice) is returned from the function to the runner then the data is converted to CSV format and saved to a file.

Change your runner to assign the returned data from your printResults2 function to a variable that you then convert to CSV format and save to a file.

Include at least the 4 retrieved from the database from Part 1.  
Include exception handling in your file IO processing.   

In [84]:
####Your code here....
import pandas as pd 
import numpy as np


url_usgs2 = "https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/4.5_day.geojson"
url_usgs3 = "https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_week.geojson"
url_usgs4 = "https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/1.0_day.geojson"
def printResults2(data):
  response  = requests.get(data)
  data_json = response.text

  data = json.loads(data_json)
  features_json = data['features']
  results_print = []
  for i in range(len(features_json)):
    print(features_json[i]['properties']['mag'])
    results_print.append(features_json[i]['properties']['mag'])

  try:
    np.savetxt("output_numpy3.csv", [results_print], delimiter=",", fmt="%s")
  except Exception as e:
    print (e)

 
  

# printResults2(url_usgs)
# printResults2(url_usgs2)
# printResults2(url_usgs3)
printResults2(url_usgs4)

  
  

1.2
1.59
1.37
1.75
5.2
3.19
1.65
1.5
0.95
4.4
4.5
1.77
3
1.9
1.92
1.22
1.47
1.56
1.86
1.24
1.1
4.2
1.58
1.29
1.7
1.41
1.33
1.3
1.5
1.33
1.8
1.06
1.7
1.83
1.07
1.2
1.2
1.31
1.7
1.6
1.4
1.9
1.38
1.6
1.6
2.3
1.41
1.6
1.56
1.26
1.7
4
1.49
1.75
1.4
1.5
4.9
1.5
1.77
1.5
1.7
0.97
4.6
4.9
1.08
1.6
1.3
1.2
1.45
1.99
1.09
1.44
1.35
1.2
1.5
1.6
1.7
1.1
1.8
1.91
0.99
1.6
1.4
2.5
1.07
1.7
2.6
1.3
1.32
1.3
1.4
1.5
1.6
4.5
2.2
1.34
1.38
1.76
2.2
2.6
2.9
1.32
1.47
1.57
1.78
1.5
4.2
2.9
1.6
4.4
1
1.8
1.5
1.28
1.3
1.79
4.2
1.08
4.4
1.47
1.58
2.3
2.73
1.62
1.11
4.5
2.1
1.7
1.6
3
1.29
1.33
1.05
2.05
2
1.9
1.9
2.2
1.05
1.14
2.2
2
2.29
3.34
1.8
4.6
1.4
1.2


# Part 3:  Search on another field
Create a new printResults function called printResults3(data, searchField) where:  
'data' is the 'scraped' data from the usgs site as in the previous parts and  
'searchField' is a field defined at the geojson.php site below. 

The search field may be input from a selection provided to the user or may be fixed (programmer's choice).  Use a meaningful field that you can glean some information from (think about how a data scientist may want to analyze certain types of data from the set).  

Change your runner to search the database for the different field and print out the results based on that field.  For example you might want to search for all the earthquakes that occurred within a particular latitude and longitude bounding box.   

See https://earthquake.usgs.gov/earthquakes/feed/v1.0/geojson.php for the list of parameters that can be retrieved.


In [78]:
####Your code here....

def printResults3(data, searchfield):
  response  = requests.get(data)
  data_json = response.text

  data = json.loads(data_json)
  features_json = data['features']
  
  features_value_list = ['mag', 'place', 'time', 'updated', 'tz', 'url', 'detail', 'felt', 'cdi', 'mmi', 'alert','status', 'tsunami', 'sig', 'net', 'code', 'ids', 'sources', 'types', 'nst', 'dmin','rms', 'gap', 'magType', 'type', 'title']


  if(searchfield in features_value_list):
    for i in range(len(features_json)):
      print(features_json[i]['properties'][searchfield])
  else:
      print("Enter the correct propertie value that you want from the list below")
      print(features_value_list)     


printResults3("https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson","code")

0255gw3zw6
41139096
41139088
75173496
75173486
75173481
74662367
7000pvqh
2025119000
41139072
00896930
75173471
7000pvq8
7000pvq9
75173466
41139032
0255gufrih
0255gufojn
41139024
00896929
41139016
75173456
41139008
75173451
41139000
41138992
41138984
41138976
0255gtt8lk
7000pvpy
41138944
75173431
41138936
0255gtjsry
41138920
41138912
00896926
75173421
75173416
0255gt57uq
0255gt1zoe
41138880
62090032
0255fjfsw3
75173401
75173396
0255fje3wd
75173386
75173381
41138848
0255fj6lor
0255fj4883
75173361
0255fit7kl
0255fiq1c3
0255fioz4u
62090012
75173356
00896912
41138744
0255fijusw
0255fi8pcm
0255fi715e
75173346
90080673
2025ihka
41138640
75173326
0255fhlit9
7000pvne
0255fhfbyg
75173311
41138528
90080668
0255fgsz9s
2025ihfu
7000pvmt
2025iheo
74662092
0255fga0um
75173296
00896903
41138352
80106466
7000pvl5
7000pvl4
41138296
75173286
0255ff8gr1
0255ff80w4
75173276
62089877
61504278
75173271
75173261
41138224
41138232
00896898
0255fennnu
0255fel9z6
75173251
0255feiozh
00896901
74661982
75173236
2

1
1
