<a href="https://colab.research.google.com/github/clayton-aldern/nicar-ghactions/blob/main/usgs_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Import Libraries

In [1]:
import pandas as pd # import pandas library for data manipulation and analysis

# Import and clean data from GitHub
This code chunk imports the usgs_main.csv from the repo. It then cleans the data by taking the date column and separating it into the following columns: 

  - date: year-month-day format
  - time: the time of the earthquake in 12 hour format 
  - military_time: the time of the earthquake in 24 hour format



In [8]:


# Read in data
df_main = pd.read_csv('https://raw.githubusercontent.com/clayton-aldern/nicar-ghactions/main/usgs_main.csv', index_col=None) # Enter the raw url from your repository

# Clean data
df_main["date_time"] = pd.to_datetime(df_main["time"]) # Convert time to a column called date_time
df_main.drop("time", axis = 1) # Drop the old time column

df_main = df_main.assign(   
    date = df_main["date_time"].dt.date, # Make new column with date in the format year-month-day
    time = df_main["date_time"].dt.strftime('%I:%M %p'), # Make new column with 12 hour format
    military_time = df_main["date_time"].dt.time # Make new colum with 24 hour format
    )

df_main.head() # Take a look at the first five rows

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,horizontalError,depthError,magError,magNst,status,locationSource,magSource,date_time,date,military_time
0,09:25 PM,35.596668,-120.271332,11.57,2.31,md,5.0,178.0,0.159,0.01,...,3.14,3.51,0.76,3.0,automatic,nc,nc,2022-03-04 21:25:05.130000+00:00,2022-03-04,21:25:05.130000
1,09:20 PM,35.929167,-117.660833,3.25,0.88,ml,9.0,73.0,0.02053,0.13,...,0.33,0.74,0.055,10.0,automatic,ci,ci,2022-03-04 21:20:43.590000+00:00,2022-03-04,21:20:43.590000
2,09:19 PM,62.3602,-149.6345,9.8,1.4,ml,,,,0.52,...,,0.5,,,automatic,ak,ak,2022-03-04 21:19:08.215000+00:00,2022-03-04,21:19:08.215000
3,09:05 PM,17.961333,-66.848833,13.23,2.37,md,7.0,207.0,,0.14,...,0.79,0.46,0.018123,3.0,reviewed,pr,pr,2022-03-04 21:05:59.100000+00:00,2022-03-04,21:05:59.100000
4,08:56 PM,19.183666,-155.483002,30.709999,1.83,md,33.0,77.0,,0.13,...,0.66,0.91,0.91,8.0,automatic,hv,hv,2022-03-04 20:56:56.870000+00:00,2022-03-04,20:56:56.870000


pd.shape returns a tuple where the first element is the number of rows and the second element is the number of columns. 

In [9]:
df_main.shape

(241, 25)

We can filter a column for a specific conditions. The following line returns the row that has the most recent date in the `date_time` column

In [10]:
latest = latest = df_main[df_main["date_time"] == df_main["date_time"].max()]
latest

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,horizontalError,depthError,magError,magNst,status,locationSource,magSource,date_time,date,military_time
239,09:28 PM,38.759666,-122.719666,1.61,1.24,md,14.0,115.0,0.004494,0.04,...,0.3,0.36,0.1,5.0,automatic,nc,nc,2022-03-04 21:28:02.440000+00:00,2022-03-04,21:28:02.440000


Now that we have the most recent earthquake, we can grab values that we want from it, like the magnitude. 

In [11]:
latest.iloc[0]["mag"]

1.24

# Create variables to reference for the analysis sentence
This code chunk filters the usg_main.csv to calculate:
- the number of earthquakes in the dataframe
- the earliest earthquake that occured in the dataframe
- the latest earthquake that occured in the dataframe 
- the strongest earthquake that occured in the dataframe  

In [12]:
# Query the dataframe to isolate types of earthquakes, to write a sentence about
number_earthquakes = df_main.shape[0] # Return number of rows of dataframe
earliest = df_main[df_main["date_time"] == df_main["date_time"].min()]
latest = df_main[df_main["date_time"] == df_main["date_time"].max()]  # Return the row with the earliest earthquake since you started recording
strongest = df_main[df_main["mag"] == df_main["mag"].max()] # Return the row with the strongest earthquakes since you started recording

# Write a sentence that dynamically updates

This code chunk injects the variables constructed above into an string that updates with the latest variables

In [7]:
# Paste the values into a sentence. If there are earthquakes that happened at the same earliest time or had the same magnitude, we are taking the first row
print(f'Since {earliest.iloc[0]["time"]} on {earliest.iloc[0]["date"].strftime("%m/%d/%Y")} there have been {number_earthquakes} recorded earthquakes. {chr(10)} The most recent earthquake was {latest.iloc[0]["mag"]} in magnitude and occured in/near {latest.iloc[0]["place"]} on {latest.iloc[0]["date"]} at {latest.iloc[0]["time"]}.{chr(10)} The strongest earthquakes since the start of this webscraper was {strongest.iloc[0]["mag"]} magnitude and occured in/near {strongest.iloc[0]["place"]} on {strongest.iloc[0]["date"]} at {strongest.iloc[0]["time"]}.')

Since 09:37 PM on 03/03/2022 there have been 241 recorded earthquakes. 
 The most recent earthquake was 1.24 in magnitude and occured in/near 3km SW of Anderson Springs, CA on 2022-03-04 at 09:28 PM.
 The strongest earthquakes since the start of this webscraper was 5.4 magnitude and occured in/near 70 km ENE of Kimbe, Papua New Guinea on 2022-03-04 at 07:47 AM.
