<a href="https://colab.research.google.com/github/aadittambe/actions-pipeline/blob/main/usgs_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Import Libraries

In [13]:
import pandas as pd # import pandas library for data manipulation and analysis

# Import and clean data from GitHub
This code chunk imports the usgs_main.csv from the repo. It then cleans the data by taking the date column and separating it into the following columns: 

  - date: year-month-day format
  - time: the time of the earthquake in 12 hour format 
  - military_time: the time of the earthquake in 24 hour format



In [14]:


# Read in data
df_main = pd.read_csv('https://raw.githubusercontent.com/aadittambe/actions-pipeline/main/usgs_main.csv', index_col=None) # Enter the raw url from your repository

# Clean data
df_main["date_time"] = pd.to_datetime(df_main["time"]) # Convert time to a column called date_time
df_main.drop("time", axis = 1) # Drop the old time column

df_main = df_main.assign(   
    date = df_main["date_time"].dt.date, # Make new column with date in the format year-month-day
    time = df_main["date_time"].dt.strftime('%I:%M %p'), # Make new column with 12 hour format
    military_time = df_main["date_time"].dt.time # Make new colum with 24 hour format
    )

df_main.head() # Take a look at the first five rows

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,horizontalError,depthError,magError,magNst,status,locationSource,magSource,date_time,date,military_time
0,03:22 PM,63.7196,-150.6752,3.5,1.3,ml,,,,0.47,...,,0.4,,,automatic,ak,ak,2022-03-02 15:22:18.576000+00:00,2022-03-02,15:22:18.576000
1,03:07 PM,38.772499,-122.879837,2.03,1.43,md,28.0,148.0,0.02633,0.06,...,0.43,0.47,0.05,5.0,automatic,nc,nc,2022-03-02 15:07:07.550000+00:00,2022-03-02,15:07:07.550000
2,03:03 PM,-30.1295,-177.6868,35.0,5.5,mb,,55.0,0.888,1.22,...,9.8,1.9,0.034,296.0,reviewed,us,us,2022-03-02 15:03:00.177000+00:00,2022-03-02,15:03:00.177000
3,02:57 PM,38.832832,-122.793167,6.56,0.36,md,7.0,78.0,0.005989,0.04,...,1.14,3.4,,1.0,automatic,nc,nc,2022-03-02 14:57:37.540000+00:00,2022-03-02,14:57:37.540000
4,02:52 PM,63.0052,-150.5116,94.4,1.6,ml,,,,0.54,...,,0.8,,,automatic,ak,ak,2022-03-02 14:52:25.459000+00:00,2022-03-02,14:52:25.459000


pd.shape returns a tuple where the first element is the number of rows and the second element is the number of columns. 

In [25]:
df_main.shape

(799, 25)

We can filter a column for a specific conditions. The following line returns the row that has the most recent date in the `date_time` column

In [16]:
latest = latest = df_main[df_main["date_time"] == df_main["date_time"].max()]
latest

Unnamed: 0,time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,...,horizontalError,depthError,magError,magNst,status,locationSource,magSource,date_time,date,military_time
793,04:09 AM,38.822334,-122.808166,2.69,1.13,md,26.0,37.0,0.001828,0.02,...,0.21,0.43,0.1,4.0,automatic,nc,nc,2022-03-04 04:09:44.780000+00:00,2022-03-04,04:09:44.780000


Now that we have the most recent earthquake, we can grab values that we want from it, like the magnitude. 

In [17]:
latest.iloc[0]["mag"]

1.13

# Create variables to reference for the analysis sentence
This code chunk filters the usg_main.csv to calculate:
- the number of earthquakes in the dataframe
- the earliest earthquake that occured in the dataframe
- the latest earthquake that occured in the dataframe 
- the strongest earthquake that occured in the dataframe  

In [18]:
# Query the dataframe to isolate types of earthquakes, to write a sentence about
number_earthquakes = df_main.shape[0] # Return number of rows of dataframe
earliest = df_main[df_main["date_time"] == df_main["date_time"].min()]
latest = df_main[df_main["date_time"] == df_main["date_time"].max()]  # Return the row with the earliest earthquake since you started recording
strongest = df_main[df_main["mag"] == df_main["mag"].max()] # Return the row with the strongest earthquakes since you started recording

# Write a sentence that dynamically updates

This code chunk injects the variables constructed above into an string that updates with the latest variables

In [33]:
# Paste the values into a sentence. If there are earthquakes that happened at the same earliest time or had the same magnitude, we are taking the first row
print(f'Since {earliest.iloc[0]["time"]} on {earliest.iloc[0]["date"].strftime("%m/%d/%Y")} there have been {number_earthquakes} recorded earthquakes. {chr(10)} The most recent earthquake was {latest.iloc[0]["mag"]} in magnitude and occured in/near {latest.iloc[0]["place"]} on {latest.iloc[0]["date"]} at {latest.iloc[0]["time"]}.{chr(10)} The strongest earthquakes since the start of this webscraper was {strongest.iloc[0]["mag"]} magnitude and occured in/near {strongest.iloc[0]["place"]} on {strongest.iloc[0]["date"]} at {strongest.iloc[0]["time"]}.')

Since 03:31 PM on 03/01/2022 there have been 799 recorded earthquakes. 
 The most recent earthquake was 1.13 in magnitude and occured in/near 7km NW of The Geysers, CA on 2022-03-04 at 04:09 AM.
 The strongest earthquakes since the start of this webscraper was 6.6 magnitude and occured in/near Kermadec Islands, New Zealand on 2022-03-02 at 12:52 PM.


'\n'