*Section 1*



## How to Access the Data

To find and download the original data:

First, you want to visit the Open Acess to Ocean Data Network Portal:   
[Open Acess to Ocean Data Network Portal](https://portal.aodn.org.au/search)

 Next use the search filters or select the following variables (or substrates):  
   - Air Temperature  
   - Atmospheric Pressure  
   - Dew-point Temperature  
   - Earth-relative Wind Direction  
   - Earth-relative Wind Speed  
   - Sea Temperature  
   - Wet-bulb Temperature

Then download the data as a csv file:
#####The CSV file includes extra summary rows labeled as "data points." You can remove these either by deleting the rows manually in Excel or by filtering them out when importing the file in Co-Lab.


>If you're unable to retrieve the CSV file, I've provided a cleaned version below, which I manually formatted using Excel:
[Cleaned up: Sea Surface Temperature (SST) Sub facility-Near real-time_data](https://drive.google.com/file/d/1glfq1W8HE9AocZ13BTP6_OYtqzy7V_h0/view?usp=sharing)
>> For preferred use here is the raw data:
>>>[Uncleaned raw: Sea Surface Temperature (SST) Sub facility-Near real-time_data](https://drive.google.com/file/d/1xg128LrtRZrEC2HOyM30F7yMSZe1g50K/view?usp=sharing)

*Section 1*

## Uploading data from Drive into Co-Lab
#####You should create a folder in your Google Drive and name it whatever you prefer. For example, the file is named "ENGL 105" and uploading the CSV file from above into that folder.

Then to begin the code you want to import the drive module from Google Colab, which allows you to access and interact with files stored in your Google Drive.
- "from google.colab import drive"

The second line of code mounts your Google Drive to the Colab notebook at the specified path
- "drive.mount/content/gdrive"

#####This process connects your Drive to the notebook, making it possible to read from and write to files directly from your Google Drive.


#####When you run this code, a prompt will appear asking for permission to access your Google Drive.

#####Once permission is granted, your Drive files will be available within the notebook, and you can easily manage them in the Colab environment.









In [1]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


The following code imports Python libraries commonly used for data analysis and visualization. First, numpy is imported as np; this library is used for mathematical operations, used for arrays and numerical data. Next, pandas is imported as pd; it's a package for handling structured data like tables and spreadsheets. Finally, matplotlib.pyplot is imported as plt; this module allows you to create various visualizations such as graphs, histograms, and line plots to better understand your data.

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

The next line reads a CSV file from a specific location in your Google Drive using the pandas library and loads it into a DataFrame called df. The file path points to a CSV stored inside the folder labeled "ENGL 105" or the specific folder you decide.
> The "skiprows=1" portion tells pandas to skip the first row of the file, which in the clean up file provided above contains the title, which is information that's not part of the actual data.
>>Change the number in the line to skip the summar information telling you what each label means.
>>> For example "skiprows=10"
---


In [3]:
df = pd.read_csv('gdrive/My Drive/ENGL 105/Sea_Surface_Temperature_(SST)_Sub-facility_-_Near_real-time_data2.csv', skiprows=1)

To make sure your data is working or accurately connected to the notebook
>"df.head()"

This line of code displays the first five rows of the DataFrame "df", giving you a quick preview of its beginning columns and data entries.

In [4]:
df.head()

Unnamed: 0,FID,trajectory_id,measurement_id,platform_code,vessel_name,voyage_number,TIME,TIME_quality_control,LATITUDE,LATITUDE_quality_control,...,PL_SPD_b,PL_WDIR_b,PL_WSPD_b,PSAL_b,RELH_b,TEMP_b,WDIR_b,WETT_b,WSPD_b,geom
0,soop_sst_nrt_trajectory_data.fid-3b25e662_1963...,12323,2634519,VRZN9,Pacific Celebes,,2009-10-19T13:19:00Z,Z,-34.2681,Z,...,True,True,True,True,True,True,True,False,True,POINT (163.718505859375 -34.26809310913086)
1,soop_sst_nrt_trajectory_data.fid-3b25e662_1963...,12323,2634520,VRZN9,Pacific Celebes,,2009-10-19T13:21:00Z,Z,-34.2682,Z,...,True,True,True,True,True,True,True,False,True,POINT (163.72764587402344 -34.268157958984375)
2,soop_sst_nrt_trajectory_data.fid-3b25e662_1963...,12323,2634521,VRZN9,Pacific Celebes,,2009-10-19T13:23:00Z,Z,-34.2682,Z,...,True,True,True,True,True,True,True,False,True,POINT (163.73719787597656 -34.26819610595703)
3,soop_sst_nrt_trajectory_data.fid-3b25e662_1963...,12323,2634522,VRZN9,Pacific Celebes,,2009-10-19T13:25:00Z,Z,-34.2683,Z,...,True,True,True,True,True,True,True,False,True,POINT (163.74623107910156 -34.26826858520508)
4,soop_sst_nrt_trajectory_data.fid-3b25e662_1963...,12323,2634523,VRZN9,Pacific Celebes,,2009-10-19T13:27:00Z,Z,-34.2683,Z,...,True,True,True,True,True,True,True,False,True,POINT (163.75534057617188 -34.26834487915039)


*Section 3*

#Creating the Subset of Salinity and Tempature averages per region
In the following line, it takes the values in the TIME column (which is in the year-month-day-time format) and converts each into a datetime object.

> d.to_datetime(df['TIME'])

By assigning it back into "df['YEAR']", you create a new column called YEAR that holds those datetime converted values.

You will be transforming raw time stamps so that it will be easier to analyze











In [5]:
df['YEAR'] = pd.to_datetime(df['TIME'])

This code creates two subsets of your DataFrame sw which stands for South-west and se for South-East by selecting only those rows whose latitude lies between –40° and –30° (the southern study band) and whose longitude falls in the “west” portion (160° to 179°) for sw, or in the “east” portion (162.5° to 179°) for se.

In [6]:
sw = df[(df['LATITUDE'] <= -30) & (df['LATITUDE'] >= -40) &
        (df['LONGITUDE'] >= 160) & (df['LONGITUDE'] < 179)]

se = df[(df['LATITUDE'] <= -30) & (df['LATITUDE'] >= -40) &
        (df['LONGITUDE'] >= 162.5) & (df['LONGITUDE'] <= 179)]

This code checks the structure of the PSAL (salinity) column in your DataFrame df by:

>print(df['PSAL'].shape): showing the number of entries or rows in the salinity column.

>print(df['PSAL'].dtypes): displaying the data type of the salinity values, for instance float64 if they are decimal numbers.

This helps confirm the size of the data and whether it's in the right format for analysis.










In [7]:
print(df['PSAL'].shape)
print(df['PSAL'].dtypes)

(24092,)
float64


This code calculates the average salinity ('PSAL') for two different regions:

>sw_avg = sw['PSAL'].mean(): computes the mean or average salinity in the southwest region.

>se_avg = se['PSAL'].mean(): computes the mean or average salinity in the southeast region.

In [8]:
sw_avg = sw['PSAL'].mean()
se_avg = se['PSAL'].mean()

This code creates a simple table using a DataFrame to compare average salinity between two regions:

######The first line makes a table with two columns: Region and Average Salinity (PSU).
######In the second line West and East are named to label the  regions (from the southwest and southeast parts of Australia).
######The sw_avg and se_avg portion are the average salinity values calculated earlier.
######The final line (table_df) displays the table.

In [9]:
table_df = pd.DataFrame({'Region': ['West', 'East'],
    'Average Salinity (PSU)': [sw_avg, se_avg]})
table_df

Unnamed: 0,Region,Average Salinity (PSU)
0,West,35.492585
1,East,35.492583


This code gives you a quick overview of the data in the sw region (southwest):

>print(sw.head())
######This line hows the first 5 rows of the sw DataFrame so you can see what the data looks like.

>print(sw.shape)
######This line tells you how many rows and columns are in the sw DataFrame, which helps you understand the size of the dataset for that region.

In [10]:
print(sw.head())
print(sw.shape)

                                                 FID  trajectory_id  \
0  soop_sst_nrt_trajectory_data.fid-3b25e662_1963...          12323   
1  soop_sst_nrt_trajectory_data.fid-3b25e662_1963...          12323   
2  soop_sst_nrt_trajectory_data.fid-3b25e662_1963...          12323   
3  soop_sst_nrt_trajectory_data.fid-3b25e662_1963...          12323   
4  soop_sst_nrt_trajectory_data.fid-3b25e662_1963...          12323   

   measurement_id platform_code      vessel_name  voyage_number  \
0         2634519         VRZN9  Pacific Celebes            NaN   
1         2634520         VRZN9  Pacific Celebes            NaN   
2         2634521         VRZN9  Pacific Celebes            NaN   
3         2634522         VRZN9  Pacific Celebes            NaN   
4         2634523         VRZN9  Pacific Celebes            NaN   

                   TIME TIME_quality_control  LATITUDE  \
0  2009-10-19T13:19:00Z                    Z  -34.2681   
1  2009-10-19T13:21:00Z                    Z  -34.2682

The following line creates a new column called LON360. It converts any negative longitude values (which normally run from –180° to +180°) into a 0°–360°. This makes all longitudes positive and easier to work with when mapping or comparing locations around the globe.

In [11]:
df['LON360'] = df['LONGITUDE'] % 360

This line creates a new DataFrame called southern by filtering the original df for rows where:

the latitude is between -40 and -30 (covering southern coastal areas), and the longitude (in 0–360 format, LON360) is between 160 and 179.

It uses the function ".copy()" to make a separate copy of the filtered data, so any changes to southern won’t affect the original df.









In [12]:
southern = df[df['LATITUDE'].between(-40, -30) & df['LON360'].between(160, 179)].copy()

This code calculates the midpoint (or central value) of both longitude and latitude for the southern region:

"mid_lon" is the average of the minimum and maximum values in the LON360 column—this gives the central longitude.

"mid_lat" is the average of the minimum and maximum values in the LATITUDE column—this gives the central latitude.

Basically, it finds the geographic center of the southern area you filtered earlier.

In [13]:
mid_lon = (southern['LON360'].min() + southern['LON360'].max()) / 2
mid_lat = (southern['LATITUDE'].min() + southern['LATITUDE'].max()) / 2

This code creates a new column called Region in the southern DataFrame by assigning a label to each row based on its latitude and longitude:

It uses conditions "conds" to split the area into 3 parts:

If the latitude is less than the middle latitude, it will be labeled as 'South coastal'

If the latitude is greater or equal to the middle and longitude is less than the middle it will be labeled as 'West coastal'

If latitude is greater or equal and longitude is also greater or equal it will be labled as 'East coastal'

Then, "np.select()" applies those labels or "choices" row by row. If none of the conditions match, it assigns "unknown" by default

In [14]:
conds = [southern['LATITUDE'] < mid_lat, (southern['LATITUDE'] >= mid_lat) & (southern['LON360'] <  mid_lon), (southern['LATITUDE'] >= mid_lat) & (southern['LON360'] >= mid_lon),]
choices = ['South costal', 'West costal', 'East costal']
southern['Region'] = np.select(conds, choices, default='unknown')

This code calculates the average salinity for each region in the southern DataFrame:

.groupby('Region') This Line groups the data by the 'Region' column.

['salinity'].mean() The second portion of this line calculates the average salinity (in PSU) for each region.

.reset_index()This line turns the grouped result back into a regular DataFrame.

.rename(...) This function renames the salinity column to average_salinity_psu for clarity.


In [15]:
result = (
    southern
      .groupby('Region')['PSAL']
      .mean()
      .reset_index()
      .rename(columns={'PSAL':'average_salinity_psu'}))

To see the output, you could just do "result" or "print(result)". The final result DataFrame shows the average salinity for each region (South coastal, West coastal, East coastal).

In [16]:
result

Unnamed: 0,Region,average_salinity_psu
0,East costal,35.485716
1,South costal,35.39732
2,West costal,35.569645


This code works with the southern dataset to organize the data by latitude and region.
#####First, it creates a new column called "LAT_INT" by rounding down the latitude values to whole numbers and converting them into whole numbers. Then, it gets the midpoint of the longitudes in the "LON360" column. Finally, the code uses this midpoint to divide the data into two regions
#####If the longitude is less than the midpoint, it’s labeled as "SW" (South-West), and if it’s greater, it’s labeled as "SE" (South-East).

In [17]:
southern['LAT_INT'] = np.floor(southern['LATITUDE']).astype(int)

mid_lon = (southern['LON360'].min() + southern['LON360'].max()) / 2
southern['Region'] = np.where(southern['LON360'] < mid_lon, 'SW', 'SE')

This code creates a summary table that shows how average salinity changes by region and latitude.
#####It groups the data by region (South-East or South-West) and by rounded latitude values.
#####Then, it calculates the average salinity (PSAL) for each group and renames the columns to make the table easier to understand.

In [18]:
result_2 = (
    southern
      .groupby(['Region', 'LAT_INT'])[['PSAL']]
      .mean()
      .reset_index()
      .rename(columns={'PSAL': 'average_salinity_psu', 'LAT_INT': 'latitude'})
)

To see the output, you could just do "result_2" or "print(result_2)". The final result DataFrame shows the average salinity for each region and the regions latitude (South West coastal and South East coastal).

In [19]:
result_2

Unnamed: 0,Region,latitude,average_salinity_psu
0,SE,-40,35.235017
1,SE,-39,35.279658
2,SE,-38,34.957379
3,SE,-37,35.415749
4,SE,-36,35.42762
5,SE,-35,35.477406
6,SE,-34,35.603805
7,SE,-33,35.552977
8,SE,-32,35.674019
9,SW,-40,35.194756


This line of code is used to filter and clean a DataFrame, `df`, selecting only two columns and removing any rows with missing values (NaN) in those columns.

>>df[['TEMP', 'PSAL']]
#####Selects two columns, 'TEMP' and 'PSAL', from the DataFrame df. Making a new DataFrame containing only these two columns.

>>.dropna()
#####This method is called on the new DataFrame to remove any rows where one or both of the columns ('TEMP' or 'PSAL') have missing values (NaN).


So, the code creates a new DataFrame, filtered, that only contains the 'TEMP' and 'PSAL' columns from the original DataFrame and excludes any rows with NaN values in those columns.

In [20]:
filtered = df[['TEMP', 'PSAL']].dropna()

>>southern.groupby(['Region', 'LAT_INT'])[['TEMP', 'PSAL']]  
#####This Line of code groups the southern DataFrame by Region and LAT_INT (latitude), and selects only the 'TEMP' and 'PSAL' columns.

>.mean()`
#####This code calculates the mean temperature and salinity for each group.

>.reset_index()
#####This line turns the group from result back into a regular DataFrame with those columns.

>.rename(...)  
#####Renames the columns LAT_INT becomes latitude, TEMP becomes average_temperature, and PSAL becomes average_salinity_psu

The final result, temp_salinity_by_lat, is a table showing the average temperature and salinity at each latitude interval within each region.


In [21]:
temp_salinity_by_lat = (
    southern
      .groupby(['Region', 'LAT_INT'])[['TEMP', 'PSAL']]
      .mean()
      .reset_index()
      .rename(columns={
          'LAT_INT': 'latitude',
          'TEMP': 'average_temperature_c',
          'PSAL': 'average_salinity_psu'
      })
)

To see the output, you could just do "temp_salinity_by_lat" or "print(temp_salinity_by_lat)". The final result DataFrame shows the average salinity, tempature, and what the region  regions latitude is along with whether it is the South West coastal or the South East coastal.

In [22]:
temp_salinity_by_lat

Unnamed: 0,Region,latitude,average_temperature_c,average_salinity_psu
0,SE,-40,18.063425,35.235017
1,SE,-39,18.708997,35.279658
2,SE,-38,17.667356,34.957379
3,SE,-37,18.483619,35.415749
4,SE,-36,17.984764,35.42762
5,SE,-35,17.726302,35.477406
6,SE,-34,18.826426,35.603805
7,SE,-33,18.431123,35.552977
8,SE,-32,18.742568,35.674019
9,SW,-40,17.826274,35.194756


This code will save the result DataFrame into a CSV file named "temp_salinity_by_lat.csv" without including the index column. The data in result is written to a CSV file in a format, where each row represents a coastal region and its corresponding average salinity.



In [23]:
temp_salinity_by_lat.to_csv("Temp_salinity_by_lat.csv", index=False)