# Project Title
### CPTR 141 Final Project

Krittanatt (Mickey) Bhummabhuti
WWU ID: 2251206

## Introduction
The dataset contains a list of earthquakes recorded in the country of Romania since 2012. The dataset includes the datetime, magnitude, magnitude type, depth, latitude, longitude, and zone. I am interested in this dataset because I want to study where earthquakes are coming from in the country and how we can put up prevention measures to keep everyone safe and buildings intact. We can also use past data from the dataset to try and predict when and where earthquakes might occur in the future.

## Dataset
With the dataset I'm using, plan to extract the following information: the latest and earliest earthquake data, the top 10 greatest and lowest magnitude earthquakes, the top 10 deepest and shallowest earthquakes, the average and median of the latitude and longitude of all earthquakes, the top 10 zones the earthuakes are happining in, the correlation between earthquake magnitude and depth. In the code, I have added multiple user inputs so that the users can search for earthquakes in the specified date range, earthquakes with a certain magnitude, and earthquakes with certain depths.

By using python to sort through and analyze the dataset, I aim to gain ecperience in file handling, dataset handling, and pandas module. In file handling, I want to learn about the handling of uploaded files using python. I also want to learn how to utilize and manipulate data from datasets to analyze and display data in a specific format. By learning how to handle datasets, I can use this skill in my career to build/create a software/code to extract and analyze data in important datasets such as earthquakes, global temperatures, sea levels, etc. I think that learning the pandas module is also very useful because it has plenty of useful built-in functions to handle data from the dataset.  Since the digital world is rapidly developing, the importance of datasets have risen significantly, I think that learning skills about dataset handling will heavily support my career in the future.

Link to Dataset: [Earthquakes in Romania 2012-2024](https://www.kaggle.com/datasets/stefancomanita/earthquakes-in-romania-from-2012-to-2024?resource=download)

List of columns in dataset:

* dateTime
* magnitude
* magnitude type
* depth
* latitude
* longitude
* zone description


In [2]:
import pandas as pd
import datetime

# Load the dataset and handle the case where the file might not be found.
try:
    file = pd.read_csv('romanianearthquakes.csv')
except FileNotFoundError:
    print("Error: The file 'romanianearthquakes.csv' was not found.")
    exit()

# Function to display the top N sorted rows of a given column with a custom label.
def display_sorted_data(column, ascending=True, top_n=10, label="results"):
    try:
        sorted_data = file.sort_values(by=column, ascending=ascending).head(top_n)
        print(f"\nTop {top_n} {label} ({'Ascending' if ascending else 'Descending'}):")
        print(sorted_data.to_string(index=False))
    except KeyError:
        print(f"Column '{column}' not found in the dataset.")

# Function to display statistical analysis for a specific column.
def display_stat(column, stat_func, stat_name):
    try:
        result = stat_func(file[column])
        print(f"\n{stat_name} {column}: {result}")
    except KeyError:
        print(f"Column '{column}' not found in the dataset.")
    except Exception as e:
        print(f"Error calculating {stat_name}: {e}")

# Display the top 10 earthquakes by magnitude in descending order.
def greatest10magnitude():
    display_sorted_data('magnitude', ascending=False, label="earthquakes by magnitude")

# Display the lowest 10 earthquakes by magnitude in ascending order.
def lowest10magnitude():
    display_sorted_data('magnitude', ascending=True, label="earthquakes by magnitude")

# Search for earthquakes based on a user-provided magnitude.
def usermagnitude():
    try:
        usermag = float(input("Enter a magnitude: "))
        result = file[file['magnitude'] == usermag]
        if result.empty:
            print(f"No earthquakes found with magnitude == {usermag}.")
        else:
            print(f"\nEarthquakes with magnitude == {usermag}:")
            print(result.to_string(index=False))
    except ValueError:
        print("Invalid magnitude input. Please enter a numeric value.")

# Display the top 10 zones with the most earthquake occurrences.
def top10zone():
    print("\nTop 10 zones by earthquake count:")
    print(file['zone description'].value_counts().head(10).to_string())

# Display zones with the fewest earthquake occurrences.
def lowest10zone():
    print("\nZones with the fewest earthquake occurrences:")
    print(file['zone description'].value_counts(ascending=True).head(10).to_string())

# Display the top 10 earthquakes by depth in descending order.
def top10depth():
    display_sorted_data('depth', ascending=False, label="earthquakes by depth")

# Display the lowest 10 earthquakes by depth in ascending order.
def lowest10depth():
    display_sorted_data('depth', ascending=True, label="earthquakes by depth")

# Display the earliest 10 earthquakes by date.
def earliest10():
    display_sorted_data('dateTime', ascending=True, label="earthquakes by date")

# Display the latest 10 earthquakes by date.
def latest10():
    display_sorted_data('dateTime', ascending=False, label="earthquakes by date")

# Display the average of the earthquake magnitude.
def average():
    display_stat('magnitude', pd.Series.mean, "Average")

# Display the average latitude.
def averagelatitude():
    display_stat('latitude', pd.Series.mean, "Average")

# Display the average longitude.
def averagelongitude():
    display_stat('longitude', pd.Series.mean, "Average")

# Display the median of the latitude values.
def medianlatitude():
    display_stat('latitude', pd.Series.median, "Median")

# Display the median of the longitude values.
def medianlongitude():
    display_stat('longitude', pd.Series.median, "Median")

# Filter earthquakes based on a date range provided by the user.
def filter_by_date():
    start_date = input("Enter the start date (YYYY-MM-DD): ")
    end_date = input("Enter the end date (YYYY-MM-DD): ")
    try:
        file['dateTime'] = pd.to_datetime(file['dateTime'])
        filtered = file[(file['dateTime'] >= start_date) & (file['dateTime'] <= end_date)]
        if filtered.empty:
            print(f"No earthquakes found between {start_date} and {end_date}.")
        else:
            print(f"\nEarthquakes between {start_date} and {end_date}:")
            print(filtered.to_string(index=False))
    except Exception as e:
        print("Error filtering by date. Ensure the dates are in YYYY-MM-DD format.", e)

# Search for earthquakes based on a specific depth.
def searchfordepth():
    try:
        userinput = float(input("Enter a depth: "))
        result = file[file['depth'] == userinput]
        if result.empty:
            print(f"No earthquakes found with depth == {userinput}.")
        else:
            print(f"\nEarthquakes with depth == {userinput}:")
            print(result.to_string(index=False))
    except ValueError:
        print("Invalid depth input. Please enter a numeric value.")

# Analyze the correlation between magnitude and depth.
def correlation_magnitude_depth():
    try:
        correlation = file['magnitude'].corr(file['depth'])
        print(f"\nCorrelation between magnitude and depth: {correlation}")
        if correlation > 0:
            print("A positive correlation indicates that earthquakes with greater depth tend to have higher magnitudes.")
        elif correlation < 0:
            print("A negative correlation indicates that deeper earthquakes tend to have lower magnitudes.")
        else:
            print("No correlation between magnitude and depth.")
    except KeyError:
        print("Columns 'magnitude' or 'depth' not found in the dataset.")
    except Exception as e:
        print(f"Error calculating correlation: {e}")

# Main function to display the menu and execute user-selected options.
def main():
    # Menu options
    options = [
        "Greatest 10 Magnitude",
        "Lowest 10 Magnitude",
        "Search for a Magnitude",
        "Greatest 10 Zones",
        "Lowest 10 Zones",
        "Greatest 10 Depth",
        "Lowest 10 Depth",
        "Earliest 10",
        "Latest 10",
        "Average Magnitude",
        "Average Latitude",
        "Average Longitude",
        "Median Latitude",
        "Median Longitude",
        "Filter by Date Range",
        "Search for Depth",
        "Correlation Between Magnitude and Depth",
        "Exit"]

    while True:
        print("\nWelcome to Romania Earthquake Analysis")
        for i, option in enumerate(options, 1):
            print(f"{i}. {option}")

        try:
            choice = int(input("Enter your choice: "))
            if 1 <= choice <= len(options):
                if choice == 1:
                    greatest10magnitude()
                elif choice == 2:
                    lowest10magnitude()
                elif choice == 3:
                    usermagnitude()
                elif choice == 4:
                    top10zone()
                elif choice == 5:
                    lowest10zone()
                elif choice == 6:
                    top10depth()
                elif choice == 7:
                    lowest10depth()
                elif choice == 8:
                    earliest10()
                elif choice == 9:
                    latest10()
                elif choice == 10:
                    average()
                elif choice == 11:
                    averagelatitude()
                elif choice == 12:
                    averagelongitude()
                elif choice == 13:
                    medianlatitude()
                elif choice == 14:
                    medianlongitude()
                elif choice == 15:
                    filter_by_date()
                elif choice == 16:
                    searchfordepth()
                elif choice == 17:
                    correlation_magnitude_depth()
                elif choice == 18:
                    print("Thank you for using Romania Earthquake Analysis!")
                    break
            else:
                print("Invalid choice. Please select a valid option.")
        except ValueError:
            print("Invalid input. Please enter a number from 1 to 18.")

if __name__ == "__main__":
    main()




Welcome to Romania Earthquake Analysis
1. Greatest 10 Magnitude
2. Lowest 10 Magnitude
3. Search for a Magnitude
4. Greatest 10 Zones
5. Lowest 10 Zones
6. Greatest 10 Depth
7. Lowest 10 Depth
8. Earliest 10
9. Latest 10
10. Average Magnitude
11. Average Latitude
12. Average Longitude
13. Median Latitude
14. Median Longitude
15. Filter by Date Range
16. Search for Depth
17. Correlation Between Magnitude and Depth
18. Exit
Enter your choice: 17

Correlation between magnitude and depth: 0.48067617878453545
A positive correlation indicates that earthquakes with greater depth tend to have higher magnitudes.

Welcome to Romania Earthquake Analysis
1. Greatest 10 Magnitude
2. Lowest 10 Magnitude
3. Search for a Magnitude
4. Greatest 10 Zones
5. Lowest 10 Zones
6. Greatest 10 Depth
7. Lowest 10 Depth
8. Earliest 10
9. Latest 10
10. Average Magnitude
11. Average Latitude
12. Average Longitude
13. Median Latitude
14. Median Longitude
15. Filter by Date Range
16. Search for Depth
17. Correlatio

In [None]:
file

Unnamed: 0,dateTime,magnitude,magnitude type,depth,latitude,longitude,zone description
0,2024-10-31 00:00:17,2.1,ML,19.7,45.2831,27.2761,MUNTENIA BUZAU
1,2024-10-30 12:15:29,2.0,ML,14.7,46.1699,25.6946,TRANSILVANIA COVASNA
2,2024-10-28 09:37:37,2.0,ML,13.5,45.2636,27.2377,MUNTENIA BUZAU
3,2024-10-28 07:41:42,2.1,ML,12.5,45.2416,27.2101,MUNTENIA BUZAU
4,2024-10-26 10:25:16,2.0,ML,6.1,45.7058,21.8558,BANAT TIMIS
...,...,...,...,...,...,...,...
3709,2012-10-01 20:26:18,4.2,ML,80.0,45.7455,26.7238,ZONA SEISMICA VRANCEA VRANCEA
3710,2012-10-01 05:43:39,3.6,ML,150.0,45.7225,26.5750,ZONA SEISMICA VRANCEA VRANCEA
3711,2012-09-27 19:01:27,3.9,ML,110.0,45.7939,26.7744,ZONA SEISMICA VRANCEA VRANCEA
3712,2012-09-23 23:35:48,3.5,ML,160.0,45.6498,26.4749,ZONA SEISMICA VRANCEA BUZAU


# Conclusion

The highest earthquake magnitude recorded since 2012 in Romaina was on the 28th of October 2018 with a magnitude of 5.9. The zone where earthquakes occured most often on was ZONA SEISMICA VRANCEA VRANCEA, a total of 1248 earthquakes have occured in this zone. The deepest the earthquakes would go to was only 200 meters. On average, the earthquakes occur near a latitude of 45.6 and a longitude of 26. In addition, I also noticed that the dataset showed that there were no earthquakes below a magnitude of 2.0, I hypothsise that earthquakes below a magnitude of 2.0 is deemed as insificant and is sometimes inaccurate when measured.

# Reflection

In the future, I can confidently use Python to analyze datasets and extract specified information, which will be imortant for data driven projects. I struggled the most with extracting data from the dataset by a date range based on the user's input. However, by utilizing Python's datetime module, I learned how to use the user's input to search for certain data within a specified timeframe. I also learned how to use the pandas library, this helped me handle and display data from the datasets efficiently. Through this project, I gained valuable experience in managing and processing data from datasets. Additionally, I discovered how to utilize the try-except function to validate user input and handle errors (repeat back the loop to allow users to re-enter their prompt), making my program more robust and user-friendly. These skills will help me in similar challenges in my future projects/career.
