# Climate Coding Challenge

Climate change is impacting the way people live around the world

## There are more Earth Observation data online than any one person could ever look at

[NASA’s Earth Observing System Data and Information System (EOSDIS)
alone manages over 9PB of
data](https://www.earthdata.nasa.gov/learn/articles/getting-petabytes-people-how-eosdis-facilitates-earth-observing-data-discovery-and-use).
1 PB is roughly 100 times the entire Library of Congress (a good
approximation of all the books available in the US). It’s all available
to **you** once you learn how to download what you want.

Here we’re using the NOAA National Centers for Environmental Information
(NCEI) [Access Data
Service](https://www.ncei.noaa.gov/support/access-data-service-api-user-documentation)
application progamming interface (API) to request data from their web
servers. We will be using data collected as part of the Global
Historical Climatology Network daily (GHCNd) from their [Climate Data
Online library](https://www.ncdc.noaa.gov/cdo-web/datasets) program at
NOAA.

For this example we’re requesting [daily summary data in Karachi,
Pakistan (station ID
PKM00041780)](https://www.ncdc.noaa.gov/cdo-web/datasets/GHCND/stations/GHCND:PKM00041780/detail).

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-response"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Research and cite your data</div></div><div class="callout-body-container callout-body"><ol type="1">
<li>Research the <a
href="https://www.ncei.noaa.gov/metadata/geoportal/rest/metadata/item/gov.noaa.ncdc:C00861/html"><strong>Global
Historical Climatology Network - Daily</strong></a> data source.</li>
<li>In the cell below, write a 2-3 sentence description of the data
source.</li>
<li>Include a citation of the data (<strong>HINT:</strong> See the ‘Data
Citation’ tab on the GHCNd overview page).</li>
</ol>
<p>Your description should include:</p>
<ul>
<li>who takes the data</li>
<li>where the data were taken</li>
<li>what the maximum temperature units are</li>
<li>how the data are collected</li>
</ul></div></div>

**YOUR DATA DESCRIPTION AND CITATION HERE** 🛎️

## Access NCEI GHCNd Data from the internet using its API 🖥️ 📡 🖥️

The cell below contains the URL for the data you will use in this part
of the notebook. We created this URL by generating what is called an
**API endpoint** using the NCEI [API
documentation](https://www.ncei.noaa.gov/support/access-data-service-api-user-documentation).

> **What’s an API?**
>
> An **application programming interface** (API) is a way for two or
> more computer programs or components to communicate with each other.
> It is a type of software interface, offering a service to other pieces
> of software ([Wikipedia](https://en.wikipedia.org/wiki/API)).

First things first – you will need to import the `earthpy` library to
help with data management and the `pandas` library to work with tabular
data:

In [43]:
# Import required packages
import holoviews as hv
import hvplot.pandas
import pandas as pd
import earthpy
import earthpy.spatial as es
import earthpy.plot as ep
import panel as pn

The cell below contains the URL you will use to download climate data.
There are two things to notice about the URL code:

1.  It is surrounded by quotes – that means Python will interpret it as
    a `string`, or text, type, which makes sense for a URL.
2.  The URL is too long to display as one line on most screens. We’ve
    put parentheses around it so that we can easily split it into
    multiple lines by writing two strings – one on each line.

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It: Format your URL for readability</div></div><div class="callout-body-container callout-body"><ol type="1">
<li>Pick an expressive variable name for the URL.</li>
<li>Reformat the URL so that it adheres to the <a
href="https://peps.python.org/pep-0008/#maximum-line-length">79-character
PEP-8 line limit</a>, and so that it is <strong>easy to read</strong>.
If you are using GitHub Codespaces, you should see two vertical lines in
each cell – don’t let your code go past the second line.</li>
<li>Replace ‘DATATYPE’, ‘STATION’, and the start and end dates
‘YYYY-MM-DD’, with the values for the data you want to download.</li>
</ol></div></div>

In [44]:
ncei_karachi_url = ('https://www.ncei.noaa.gov/access/services/data/v1?'
           'dataset=daily-summaries&'
           'dataTypes=TAVG,TMIN,TMAX,PRCP&'
           'stations=PKM00041780&'
           'startDate=1942-05-06&'
           'endDate=2025-07-18&'
           'units=metric')
ncei_karachi_url

'https://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=TAVG,TMIN,TMAX,PRCP&stations=PKM00041780&startDate=1942-05-06&endDate=2025-07-18&units=metric'

In [45]:
project_dirname = 'ncei_karachi_data'  # Carpeta donde se guardarán los datos
ncei_filename = 'karachi_climate_1942_2025.csv'

## Download and get started working with NCEI data

Go ahead and use `earthpy` to download data from your API URL. You could
also use Python, but using earthpy saves a file and lets you work
offline later on. If you didn’t already, you should import the `earthpy`
library **at the top of this notebook** so that others who want to use
your code can find it easily.

In [46]:
project = earthpy.Project(dirname=project_dirname)
project.get_data(url=ncei_karachi_url, filename=ncei_filename)
ncei_path = project.project_dir / ncei_filename

# Load the data into a DataFrame
climate_df = pd.read_csv(ncei_path, parse_dates=['DATE'])


**Final Configuration Loaded:**
{}
Found 'data_home' in environment variables.


[('https://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=TAVG,TMIN,TMAX,PRCP&stations=PKM00041780&startDate=1942-05-06&endDate=2025-07-18&units=metric', 'karachi_climate_1942_2025.csv', 'file')]
karachi_climate_1942_2025.csv
[PosixPath('/workspaces/data/ncei_karachi_data/karachi_climate_1942_2025.csv')]


In [47]:
# Upload csv in parsel date
climate_df = pd.read_csv(
    ncei_path,
    parse_dates=['DATE'],
    na_values=['NaN']
)

# Convert temperatures from tenths of °C to °C
climate_df['TAVG_C'] = climate_df['TAVG'] / 10
climate_df['TMIN_C'] = climate_df['TMIN'] / 10
climate_df['TMAX_C'] = climate_df['TMAX'] / 10


climate_df['PRCP_mm'] = climate_df['PRCP'] / 10

climate_df.head()

Unnamed: 0,STATION,DATE,PRCP,TAVG,TMAX,TMIN,TAVG_C,TMIN_C,TMAX_C,PRCP_mm
0,PKM00041780,1942-05-06,,32.1,,,3.21,,,
1,PKM00041780,1942-05-07,0.0,29.6,32.4,,2.96,,3.24,0.0
2,PKM00041780,1942-05-08,0.0,30.5,33.0,28.0,3.05,2.8,3.3,0.0
3,PKM00041780,1942-05-09,0.0,30.2,32.4,28.0,3.02,2.8,3.24,0.0
4,PKM00041780,1942-05-10,0.0,29.9,31.9,28.0,2.99,2.8,3.19,0.0


# STEP -1: Wrap up

Don’t forget to store your variables so you can use them in other
notebooks! Replace `var1` and `var2` with the variable you want to save,
separated by spaces.

In [48]:
%store climate_df
%store ncei_path
%store project_dirname
%store ncei_filename

Stored 'climate_df' (DataFrame)
Stored 'ncei_path' (PosixPath)
Stored 'project_dirname' (str)
Stored 'ncei_filename' (str)


In [49]:
# print(climate_df.columns)

In [50]:
import pandas as pd
import hvplot.pandas
import panel as pn

pn.extension()

# Asegurarte de que la columna DATE esté en formato datetime
climate_df['DATE'] = pd.to_datetime(climate_df['DATE'])

# Filtrar datos válidos
climate_df_clean = climate_df[climate_df['TAVG_C'].notna()].copy()
climate_df_clean['YEAR'] = climate_df_clean['DATE'].dt.year

# Calcular promedio anual solo para años con datos
annual_avg = climate_df_clean.groupby('YEAR')['TAVG_C'].mean().reset_index()

# Crear un rango completo de años desde el primero al último
year_range = pd.DataFrame({'YEAR': range(climate_df_clean['YEAR'].min(), climate_df_clean['YEAR'].max() + 1)})

# Hacer merge para introducir NaN en los años sin datos
annual_avg_full = year_range.merge(annual_avg, on='YEAR', how='left')

# Graficar
temp_plot = annual_avg_full.hvplot.line(
    x='YEAR',
    y='TAVG_C',
    title='Average Annual Temperature in Karachi (°C)',
    xlabel='Year',
    ylabel='Temperature (°C)',
    width=800,
    height=400,
    line_width=3,
    color='crimson',
    grid=True,
    hover=True
)
temp_plot
# (opcional) Guardar como archivo HTML
#hvplot.save(temp_plot, 'karachi_avg_temp_with_gaps.html')

### Observations on the Average Annual Temperature in Karachi, Pakistan

- **There is a general upward trend in average annual temperatures from the 1980s to 2023**  
  This may be linked to global climate change, as well as the urban expansion of Karachi, which contributes to the urban heat island effect and overall temperature increases.

- **There are clear gaps and abrupt jumps in the data, particularly between 1950 and 1970**  
  These anomalies are likely caused by missing or incomplete data, errors in historical data collection, or inconsistencies in temperature recording methods. The sharp drop to around 18 °C followed by a sudden rise in 1950 suggests possible data entry issues or lack of quality control in older datasets.


In [51]:
# Year without data
all_years = pd.Series(range(climate_df_clean['YEAR'].min(), climate_df_clean['YEAR'].max()+1))
years_with_data = climate_df_clean['YEAR'].unique()
years_without_data = all_years[~all_years.isin(anios_con_datos)]
print("Years without data:", years_without_data.tolist())


Years without data: [1947, 1948, 1949, 1950, 1951, 1954, 1955, 1956, 1959, 1960, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 2013]


In [52]:
%%capture
%%bash
jupyter nbconvert climate-Karachi-marangunic.ipynb --to markdown
jupyter nbconvert climate-Karachi-marangunic.ipynb --to html

Finally, be sure to `Restart` and `Run all` to make sure your notebook
works all the way through!