# NASA APOD Data Retrieval and Analysis Project

## The steps I will do in this section:
1. Getting API Key
2. Connecting to NASA APOD API
3. The Accumulation, Storage, and Processing of Data
4. The Interpretation and Visualisation of Data


In [4]:
import os
from dotenv import load_dotenv

# dowloand .env file 
load_dotenv()

# get api key
api_key = os.getenv("API_KEY")
if not api_key:
    print("API key is not found")
else:
    print("API key successfully retrieved")


API key successfully retrieved


# Problem 1: NASA APOD Data Retrieval and JSON File Processing (33 marks)
**I can set the API key by running the code below, but it did not work on my computer. I solved it by using the `python-dotenv` library instead.**

- **Windows:** `setx API_KEY "your_api_key"`
- **MacOS/Linux:** `export API_KEY="your_api_key"`


### 1.1 API CALL - `get_apod_data` Function (10 marks)

#### Purpose?
- Pull visual information for a specific date using NASA's APOD (Astronomy Picture of the Day) API.

#### Definitions of Functions Used:

- **Function Name**: `get_apod_data`
- **Parameters**:
  - `api_key` (str): NASA APOD API key
  - `date` (str): The date we want to pull the data from is specified in the documentation as Format: `"YYYY-MM-DD"`
    
- **Return**: Data retrieved from the API will be returned as a dictionary
  - `date`: The date of the data
  - `title`: The title of the data
  - `media_type`: The type of media (image or video)
  - `url`: URL of the media
  - `explanation`: Explanation of the media

#### Error Handling:
- The function is designed using the `try-except` block, so that it handles errors that may occur when connecting to the API and displays the error message caught to the user.




In [None]:
import requests


def get_apod_data(api_key, date):
    """
     Retrieves data by sending a request for a specific date
    
    Args:
    - api_key (str): NASA APOD API key
    - date (str): The date we want to pull the data from is specified in the documentation as Format: "YYYY-MM-DD"
    
    Returns:
    - dict: A dictionary containing the data (date, title, media type, URL, description)
    - None: If the request fails user will be informed
    """
    url = "https://api.nasa.gov/planetary/apod"  # API endpoint
    params = {
        'api_key': api_key,  # Add the API key to the parameters
        'date': date  # Add the date to the parameters
    }

    try:
        # Send a GET request to the API
        response = requests.get(url, params=params)
        response.raise_for_status()  # Throws HTTPError if there is an error
        data = response.json()  # Retrieves the response in JSON format as a dictionary

        # Return the data as a dictionary
        return {
            'date': data.get('date'),
            'title': data.get('title'),
            'media_type': data.get('media_type'),
            'url': data.get('url'),
            'explanation': data.get('explanation')
        }
    except requests.exceptions.RequestException as e:
        # In case of error, the user is informed
        print(f"An error occurred during the API request: {e}")
        return None


In [None]:
result = get_apod_data(api_key, "1999-11-01")  # My birthday :)))
print(result)

### 1.2 Fetching Data for a Date Range (15 marks) - `fetch_multiple_apod_data` Function

#### Purpose?

This function will pull NASA APOD data for each day within a **specific date range** (e.g. one year). The captured data will be saved in a JSON file for later analysis. This step involves the process of extracting and saving bulk data.

#### Definitions of Functions Used:

- **Func Name**: `fetch_multiple_apod_data`
- **Parametres**:
  - `api_key` (str): API KEY
  - `start_date` (str): Start date. Format: `"YYYY-MM-DD"`.
  - `end_date` (str): End date. Format: `"YYYY-MM-DD"`.
- **İşleyiş**:
  - calls the `get_apod_data` function daily for the specified date period.
  - A **1 second delay** is added after each request to maintain the API rate limit.
  - The data is written to a JSON file in **append mode**.

#### Error Handling:

- Network errors that may occur when making requests within the date range are checked.
- Any errors that may occur during file writing (for example, permission errors) are checked and a meaningful message is displayed to the user.

In [None]:
import json
import time
from datetime import datetime, timedelta


def fetch_multiple_apod_data(api_key, start_date, end_date):
    """
    Pulls NASA APOD data for each day in the specified date range and saves it to a JSON file.
    
    Args:
    - api_key (str): NASA APOD API key
    - start_date (str): Start date (in the format "YYYY-MM-DD")
    - end_date (str): End date (in the format "YYYY-MM-DD")
    
    Returns:
    - None
    """
    current_date = datetime.strptime(start_date, "%Y-%m-%d")
    end_date = datetime.strptime(end_date, "%Y-%m-%d")

    # Dosya adı
    file_name = "apod_data.json"

    # I check here if the file exists or not with try and except block
    try:
        with open(f"data/{file_name}", "a") as file:
            while current_date <= end_date:
                # We format the date as string because the API receives the date in this format
                date_str = current_date.strftime("%Y-%m-%d")

                # Get data from the API
                data = get_apod_data(api_key, date_str)

                if data:
                    # I add the data to the file in JSON format
                    json.dump(data, file)
                    file.write("\n")  # Adds new line after each entry
                    print(f"Data added for {date_str}")

                # Moving to the next day because we are traversing the date range
                current_date += timedelta(days=1)
                # Adding  1second delay to comply with the API rate limit
                time.sleep(1)

    except IOError as e:
        print(f"An error occurred while writing the file: {e}")


#### Explanation

This function saves the data to the JSON file by calling the `get_apod_data` function for each day in the given date range. We comply with the API rate limit by adding  1-second delay to each request during processing. 

- **File Operation**: If the `apod_data.json` file does not exist, it will be created as a new file. Each piece of data is added and a new line is passed for the next entry.


In [None]:
# test etme
fetch_multiple_apod_data(api_key, "2020-01-01", "2020-01-25")

### 1.3 Saving Data to a JSON File (8 marks) - `read_apod_data` Function

#### Purpose?
Now, we will edit the `fetch_multiple_apod_data` function and add the captured data to a file named `apod_data.json`. If the file does not exist, the function will create the file and save the new data.


#### Required Characteristics of the Position:

- If the file already exists, we must add the new data without overwriting it.
- We must address errors that may occur during file operations (for example, file permission errors).
- To increase the size of the JSON file, ensure that at least 200-300 days of data is extracted.
- FYI: 200-300 data cannot be withdrawn daily. Reading the documentation the daily limit is not more than 35. This was seen by testing in Postman.

#### Definitions of Functions Used:

- I checked whether the file exists and if it did, I opened it in append mode, otherwise I opened it in write mode.
- Pull data from API for each day and add it to the file.
- Implemented error management during file writing.

In [None]:
import json
import os
import time
from datetime import datetime, timedelta


def fetch_multiple_apod_data(api_key, start_date, end_date, file_name="apod_data.json"):
    """
    Pulls NASA APOD data for each day in the specified date range and saves it to a JSON file. If the file does not exist, it creates it and handles errors that may occur during file writing
    
    Args:
    - api_key (str): NASA APOD API key
    - start_date (str): Start date (in the format "YYYY-MM-DD")
    - end_date (str): End date (in the format "YYYY-MM-DD")
    - file_name (str): The name of the JSON file. Defaults to "apod_data.json".
    
    Returns:
    - None
    """
    current_date = datetime.strptime(start_date, "%Y-%m-%d")
    end_date = datetime.strptime(end_date, "%Y-%m-%d")

    try:
        # If the file exists, open it in append mode; otherwise, open it in create and write mode.
        with open(f"data/{file_name}", "a" if os.path.exists(f"data/{file_name}") else "w") as file:
            while current_date <= end_date:
                date_str = current_date.strftime("%Y-%m-%d")

                # Get data from the API
                data = get_apod_data(api_key, date_str)

                if data:
                    # Add the data to the file in JSON format
                    json.dump(data, file)
                    file.write("\n")  # Add a new line after each entry
                    print(f"Data added for {date_str}.")
                else:
                    print(f"No data for {date_str}.")

                # Moving to the next day because we are traversing the date range
                current_date += timedelta(days=1)

                # Delay
                time.sleep(1)

    except IOError as e:
        print(f"Something wrong check: {e}")

#### The explanation

* * If os.path.exists(file_name): "a" otherwise "w" Detects file existence and adds it (`"a"`) or writes it (`"w"`).

- Error Management: Detects `IOError` and other file errors using `try-except` block and shows a notice.

**Insert New Line**: Use file.write("n") to insert a new line after each JSON record.



In [None]:
fetch_multiple_apod_data(api_key, "2020-01-26", "2020-01-30")

# Problem 2: JSON Data Reading, Looping, and Processing (27 Marks)
## 2.1 Reading and Loading JSON Data with Exception Handling (10 marks) - `read_apod_data` Function
#### Purpose
Here, we will load the data into a usable list structure in Python by reading the `apod_data.json` file we created before. I will also have addressed possible errors during file reading.


#### Definitions of Functions Used:

- **Function Name**: `read_apod_data`
- **Operation**:
 - Checks whether the file exists and is readable
   - Reads the JSON file line by line and loads each line as a JSON object
     - Handles error conditions (`FileNotFoundError`, `PermissionError`, `JSONDecodeError`) and returns meaningful messages to the user.


In [None]:
import json


def read_apod_data(file_name="apod_data.json"):
    """
    Reads the JSON file and loads the data into a list.
    
    Args:
    - file_name (str): The name of the JSON file to read. Defaults to "apod_data.json".
    
    Returns:
    - list: A list containing the JSON data from the file.
    - None: If an error occurs, it returns None.
    """
    try:
        with open(f"data/{file_name}", "r") as file:
            # Load the data from the file
            data = [json.loads(line) for line in file]
            print(f"{file_name} file loaded successfully.")
            return data
    except FileNotFoundError:
        print(f"{file_name} file not found.")
    except PermissionError:
        print(f"No permission to read {file_name}.")
    except json.JSONDecodeError:
        print(f"Error decoding JSON from {file_name}.")
    return None


In [None]:
apod_data = read_apod_data()
if apod_data:
    print("Total number of data:", len(apod_data))
    print("First Data:", apod_data[0])

### 2.2 Processing and Summarizing Data Using Loops (10 marks) - `analyze_apod_media` Function

#### Purpose?_

If this is the task, these are the things to do ⤵️
- Counts the total number of **images** and **video** in the JSON file.
- Finds the date with the longest description.

#### Definitions of Functions Used:

- **Func Name**: `analyze_apod_media`
- **How is work**:
 - Counts the media types (image and video) of the data
   - Finds the date with the longest description and measures the length of this description and presents it to the user



In [None]:
def analyze_apod_media(file_name="apod_data.json"):
    """
    It checks the JSON file for the number of photos and videos and the longest description date.

    Args:
    - file_name (str): The name of the JSON file to read. Defaults to "apod_data.json".
    
    Returns:
    - dict: A dictionary containing the results of the analysis.
    """
    data = read_apod_data(file_name)
    if not data:
        return "Data Can't find! "

    image_count = 0
    video_count = 0
    longest_explanation_date = ""
    max_explanation_length = 0

    for entry in data:
        media_type = entry.get('media_type', '')
        explanation = entry.get('explanation', '')

        # Medya türüne göre sayım yap
        if media_type == 'image':
            image_count += 1
        elif media_type == 'video':
            video_count += 1

        # En uzun açıklamayı bul
        if len(explanation) > max_explanation_length:
            max_explanation_length = len(explanation)
            longest_explanation_date = entry.get('date', '')

    return {
        "image_count": image_count,
        "video_count": video_count,
        "longest_explanation_date": longest_explanation_date,
        "max_explanation_length": max_explanation_length
    }


In [None]:
result = analyze_apod_media()
if result:
    print("Total image count:", result["image_count"])
    print("Total video count:", result["video_count"])
    print("Date of the longest description:", result["longest_explanation_date"])
    print("Length of the longest description:", result["max_explanation_length"])

### 2.3 Extracting and Writing Data to a CSV File (7 marks) - `write_apod_summary_to_csv` Function

#### Purpose?

- This function will read the data from the JSON file and write it to a CSV file containing certain fields (`date`, `title`, `media_type`, `url`)

#### Definitions:

- **Function name**: `write_apod_summary_to_csv`
- **What it does?**:
  - Reads JSON data and writes specified fields (date, title, media_type, url) to CSV file.
    - The new data is appended to an existing CSV file.
        - Maintains record format and handles file I/O faults.



In [None]:
import csv  # !pip install csv


def write_apod_summary_to_csv(json_file="apod_data.json", csv_file="apod_summary.csv"):
    """
   It reads data from a JSON file and writes it to a CSV file containing specific fields.
    
    Args:
    - json_file (str): The name of the JSON file to read. Defaults to "apod_data.json".
    - csv_file (str): The name of the CSV file to write. Defaults to "apod_summary.csv".
    
    Returns:
    - None
    """
    data = read_apod_data(json_file)
    if not data:
        return "No data available to write to CSV."

    try:
        # CSV file exists control
        file_exists = os.path.exists(f"data/{csv_file}")

        with open(f"data/{csv_file}", "a", newline='', encoding='utf-8') as file:
            writer = csv.writer(file)

            # Control that adds the header line if the file is written for the first time
            if not file_exists:
                writer.writerow(["date", "title", "media_type", "url"])

            for entry in data:
                # Checks required fields and writes to CSV file
                writer.writerow([
                    entry.get('date', ''),
                    entry.get('title', '').replace('\n', ' '),  # Cleans new lines in the title
                    entry.get('media_type', ''),
                    entry.get('url', '')
                ])

        print(f"Data written to {csv_file} successfully.")

    except IOError as e:
        print(f"An error occurred while writing the file: {e}")



In [None]:
write_apod_summary_to_csv()

# Problem 3 - Numpy Array Manipulation and Statistical Functions (18 marks)
### 3.1 NumPy Array Creation and Compliance with Terms - 2D NumPy Array Creation (7 marks)

#### Aim

 • The sum of each row must be even.
 • The sum of all values in the array must be a multiple of 5.

#### İşleyiş

- First of all, I will create a NumPy array with random integers, then I will check the conditions and complete the task by changing the array elements if necessary.

In [None]:
import numpy as np


# Function to create and condition a 2D NumPy array
def create_array_with_conditions(rows=20, cols=5, min_val=10, max_val=100):
    """
    Creates a 2D NumPy array containing random integers between 10 and 100 and satisfies certain conditions.
     - The sum of each row must be even.
      - The sum of all values in the array must be a multiple of 5.
    
    Args:
    - rows (int): Number of rows. Defaults to 20.
    - cols (int): Number of columns. Defaults to 5.
    - min_val (int): Minimum integer value. Defaults to 10.
    - max_val (int): Maximum integer value. Defaults to 100.
    
    Returns:
    - np.ndarray: A NumPy array that satisfies the conditions.
    """
    array = np.random.randint(min_val, max_val, (rows, cols))

    # Loop that ensures the sum of each row is even
    for i in range(rows):
        if array[i].sum() % 2 != 0:  # check phase if sum is not even
            array[i, 0] += 1  # Adjust the sum by incrementing the first element of the array by 1

    # Make sure the sum of all values in the array is a multiple of 5
    while array.sum() % 5 != 0:
        array[0, 0] += 1  # Adjust the sum by incrementing the first element of the array by 1

    return array

In [None]:
# Create a 2D NumPy array with conditions
array_with_conditions = create_array_with_conditions()
print("2D NumPy array with conditions:")
print(array_with_conditions)

### 3.2 NumPy Array Manipulation - Array Selection and Replacement (6 marks)

#### Target:

In the specified task, on the 2D NumPy array we created
- We will select elements from the array that are divisible by both 3 and 5 and print them on the screen.
- We will replace all elements in the array that are greater than 75 with the average of all elements of the array.

#### Definitions:

- We will select the elements in the array using `numpy`'s conditional operations.
- We will use NumPy's indexing feature to manipulate elements.


In [None]:
# Select elements divisible by 3 and 5
divisible_by_3_and_5 = array_with_conditions[(array_with_conditions % 3 == 0) & (array_with_conditions % 5 == 0)]
print("Elements divisible by 3 and 5:")
print(divisible_by_3_and_5)

# Calculate the mean value of the array
mean_value = array_with_conditions.mean()

# Replace elements greater than 75 with the mean valu
array_with_conditions[array_with_conditions > 75] = mean_value
print("\nElements greater than 75 are replaced with the average of the array:")
print(array_with_conditions)


### 3.3 Perform some statistical operations on the array - Statistical Analysis (6 marks)
#### Aim
In this step, we will perform statistical analysis on the 2D NumPy array we created. We will perform the following actions:
- We will calculate the average of all elements of the array.
- We will calculate the standard deviation of all elements of the array.
- We will find the median value of the series.
- We will calculate the variance of each column.

#### Definitions:

This will perform these operations using the statistical functions of the NumPy library.


In [None]:
# mean of the array
mean_value = np.mean(array_with_conditions)
print("Mean of the array:", mean_value)

# Standart deviation of the array
std_dev = np.std(array_with_conditions)
print("Standart deviation of the array:", std_dev)

# Median value of the array
median_value = np.median(array_with_conditions)
print("Median value of the array:", median_value)

# Variance of each column
column_variances = np.var(array_with_conditions, axis=0)
print("Variance of each column:")


# Problem 4 - Working with Pandas DataFrames (22 marks)
### 4.1 Basic Analysis of the Iris Data Set (4 marks)

#### Aim

- This stage involves reading the `iris.csv` file into Python as a `pandas` DataFrame and answering the following questions:
* The data set has how many points?
* What are column data types?
* What are column names?
* How many flower kinds are in the data set?


#### Operation

Basic analysis will be performed on the DataFrame after reading the CSV file using the `pandas` library.


In [None]:
import pandas as pd

# iris.csv reading the file
iris_df = pd.read_csv("data/iris.csv")

# 1. The number of data points
total_data_points = iris_df.shape[0]
print("Total number of data points:", total_data_points)

# 2. Column data types
print("\nColumn data types:")
print(iris_df.dtypes)

# 3. Column names
print("\nColumn names:")
print(iris_df.columns)

# 4. Number of flower kinds in the data set
unique_species = iris_df['Species'].nunique()
print("\nNumber of flower kinds in the data set:", unique_species)


### 4.2 Correcting Incorrect Data in the Iris Data Set (3 marks)

#### Amaç

As stated in the documentation, there are incorrect rows in the irish data set. We will fix these erroneous lines.
 The lines with errors are:
- Line 35: Values will be corrected to `4.9, 3.1, 1.5, 0.2, "setosa"`.
- Line 38: Values will be corrected to `4.9, 3.6, 1.4, 0.1, "setosa"`.

#### Operation

- Data will be read using `pandas` DataFrame.
- Defective rows will be replaced with proper values.
- Printing lines 35 and 38 to the screen will verify modifications.



In [None]:
import pandas as pd

# iris.csv reading the file
iris_df = pd.read_csv("data/iris.csv")

# Fix line 35 (based on 1-indexing, index 34 in Python)
iris_df.loc[34] = [35, 4.9, 3.1, 1.5, 0.2, "Iris-setosa"]

# Fix line 38 
iris_df.loc[37] = [38, 4.9, 3.6, 1.4, 0.1, "Iris-setosa"]

# Print lines 35 and 38 to confirm changes
print("Corrected 35th row:")
print(iris_df.loc[34])

print("\nCorrected 38th row:")
print(iris_df.loc[37])


### 4.3 Adding New Features to the Iris Data Set (2 marks)

#### Purpose

• Petal Ratio: Defined as the ratio of petal length to petal width.
• Sepal Ratio: Defined as the ratio of sepal length to sepal width.

#### Definitions

After adding these properties to the `pandas` DataFrame, we will save the updated DataFrame in the `iris_corrected.csv` file.

In [None]:
import pandas as pd

# iris.csv read
iris_df = pd.read_csv("data/iris.csv")

# adad new features
iris_df['Petal Ratio'] = iris_df['PetalLengthCm'] / iris_df['PetalWidthCm']
iris_df['Sepal Ratio'] = iris_df['SepalLengthCm'] / iris_df['SepalWidthCm']

# Save the updated DataFrame to a new CSV file
iris_df.to_csv("data/iris_corrected.csv", index=False)

print("Updated DataFrame saved to 'iris_corrected.csv'.")


### 4.4 Calculating Correlation Between Numerical Features in the Iris Data Set (4 marks)

#### Purpose

In this section, I will calculate the pairwise correlation between all numerical columns in the iris dataset
- We will identify the two pairs of features that show the highest positive correlation.
- We will identify the two feature pairs with the highest negative correlation.

#### Definitions

- We will read the updated dataset using `pandas`.

In [None]:
import pandas as pd

# iris_corrected.csv read
iris_df = pd.read_csv("data/iris_corrected.csv")

# Selecting only numerical columns because we dont want to calculate correlation for Id column
numeric_columns = iris_df.select_dtypes(include=['float64', 'int64']).drop(columns=['Id'])

# Calculating the correlation matrix
correlation_matrix = numeric_columns.corr()

# Printing the correlation matrix
print("Correlation matrix:")
print(correlation_matrix)

# Selecting only the upper triangle of the correlation matrix to avoid repeating values
# Stacking the correlation matrix to a multi-index series
corr_unstacked = correlation_matrix.where(~correlation_matrix.isin([1.0])).stack()
max_positive_corr = corr_unstacked.idxmax()
min_negative_corr = corr_unstacked.idxmin()

print("\nHighest positive correlation:", max_positive_corr, correlation_matrix.loc[max_positive_corr])
print("\nHighest negative correlation:", min_negative_corr, correlation_matrix.loc[min_negative_corr])


### 4.5 Creating a Scatter Plot with Regression Lines (5 marks)

#### Purpose

In this step:
- We will create a scatter plot by taking **Sepal Ratio** to the x axis and **Petal Ratio** to the y axis.
- We will show the dots with different colors according to the species.
- We will add a linear regression line for each species.
- We will save the chart in the `iris_scatter_with_regression.pdf` file.

#### Definitions
- We will create graphics using `seaborn` and `matplotlib` libraries and save them in pdf format.


In [None]:
import pandas as pd
import seaborn as sns  # !pip install seaborn
import matplotlib.pyplot as plt  # !pip install matplotlib
from sklearn.linear_model import LinearRegression  # !pip install scikit-learn
import numpy as np

# iris_corrected.csv read
iris_df = pd.read_csv("data/iris_corrected.csv")

# Scatter plot creation
plt.figure(figsize=(12, 8))
sns.scatterplot(data=iris_df, x='Sepal Ratio', y='Petal Ratio', hue='Species', palette='viridis')

# Linear regression for each species
species_list = iris_df['Species'].unique()
for species in species_list:
    subset = iris_df[iris_df['Species'] == species]
    X = subset['Sepal Ratio'].values.reshape(-1, 1)
    y = subset['Petal Ratio'].values

    # Linear regression model fitting
    model = LinearRegression().fit(X, y)
    y_pred = model.predict(X)

    # Plotting the regression line
    plt.plot(X, y_pred, label=f"{species} Regression Line", linewidth=2)

# Adding labels and legend
plt.title("Sepal Ratio & Petal Ratio Scatter Plot")
plt.xlabel("Sepal Ratio")
plt.ylabel("Petal Ratio")
plt.legend()
plt.grid(True)

# Saving the plot as a pdf fşile
plt.savefig("data/iris_scatter_with_regression.pdf")
plt.show()

print("Graph saved as 'iris_scatter_with_regression.pdf'")


### 4.6 Creating a Pair Plot with New Features (4 marks)

#### Purpose:
- We will create a pair plot for the four original numerical features in the iris dataset (`SepalLengthCm`, `SepalWidthCm`, `PetalLengthCm`, `PetalWidthCm`) and the two new features we added (`Petal Ratio`, `Sepal Ratio`).
- We will color the dots according to the species.

#### Definitions:

- We will use the `seaborn` library to create the pair plot.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

iris_df = pd.read_csv("data/iris_corrected.csv")

# Adding new features
plt.figure(figsize=(12, 12))
sns.pairplot(iris_df, hue='Species',
             vars=['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Petal Ratio', 'Sepal Ratio'],
             palette='viridis')

plt.show()

print("Pair plot created and shown")
