# Climate coding challenge, Part 6

Getting your own data

## There are more Earth Observation data online than any one person could ever look at

[NASA’s Earth Observing System Data and Information System (EOSDIS)
alone manages over 9PB of
data](https://www.earthdata.nasa.gov/learn/articles/getting-petabytes-people-how-eosdis-facilitates-earth-observing-data-discovery-and-use).
1 PB is roughly 100 times the entire Library of Congress (a good
approximation of all the books available in the US). It’s all available
to **you** once you learn how to download what you want.

Here we’re using the NOAA National Centers for Environmental Information
(NCEI) [Access Data
Service](https://www.ncei.noaa.gov/support/access-data-service-api-user-documentation)
application progamming interface (API) to request data from their web
servers. We will be using data collected as part of the Global
Historical Climatology Network daily (GHCNd) from their [Climate Data
Online library](https://www.ncdc.noaa.gov/cdo-web/datasets) program at
NOAA.

For this example we’re requesting [daily summary data in
**Boulder, CO** (station ID
**USC00050848**)](https://www.ncdc.noaa.gov/cdo-web/datasets/GHCND/stations/GHCND:USC00050848/detail).

> ** Your task:**
>
> 1.  Research the [**Global Historical Climatology Network -
>     Daily**](https://www.ncei.noaa.gov/metadata/geoportal/rest/metadata/item/gov.noaa.ncdc:C00861/html)
>     data source.
> 2.  In the cell below, write a 2-3 sentence description of the data
>     source.
> 3.  Include a citation of the data (**HINT:** See the ‘Data Citation’
>     tab on the GHCNd overview page).
>
> Your description should include:
>
> -   who takes the data
> -   where the data were taken
> -   what the maximum temperature units are
> -   how the data are collected

### The Global Historical Climatology Network - Daily

The GHCN is a global network of various meteorlogival measurements made on land - mostly precipitation. In this case, these are daily observations and U.S. based observations are updated daily in most cases. While the GHCN provides an abundance of historucal meteorlogical data, its direct use for understanding climate change is limited because the data collected does not meet climate monitoring standards. Therfore some data cleaning is required before interpretation. 

*Menne, Matthew J., Imke Durre, Bryant Korzeniewski, Shelley McNeill, Kristy Thomas, Xungang Yin, Steven Anthony, Ron Ray, Russell S. Vose, Byron E.Gleason, and Tamara G. Houston (2012): Global Historical Climatology Network - Daily (GHCN-Daily), Version 3. [indicate subset used]. NOAA National Climatic Data Center. doi:10.7289/V5D21VHZ [2024 Sep 17].*

## Access NCEI GHCNd Data from the internet using its API 🖥️ 📡 🖥️

The cell below contains the URL for the data you will use in this part
of the notebook. We created this URL by generating what is called an
**API endpoint** using the NCEI [API
documentation](https://www.ncei.noaa.gov/support/access-data-service-api-user-documentation).

> **Note**
>
> An **application programming interface** (API) is a way for two or
> more computer programs or components to communicate with each other.
> It is a type of software interface, offering a service to other pieces
> of software ([Wikipedia](https://en.wikipedia.org/wiki/API)).

First things first – you will need to import the `pandas` library to
access NCEI data through its URL:

In [4]:
# Import required packages
import pandas as pd

> **Your task:**
>
> 1.  Pick an expressive variable name for the URL.
> 2.  Reformat the URL so that it adheres to the [79-character PEP-8
>     line
>     limit](https://peps.python.org/pep-0008/#maximum-line-length). You
>     should see two vertical lines in each cell - don’t let your code
>     go past the second line.
> 3.  At the end of the cell where you define your url variable, **call
>     your variable (type out its name)** so it can be tested.

In [11]:
GHCN_URL = ('https://www.ncei.noaa.gov/access/services/da'
'ta/v1?dataset=daily-summaries&dataTypes=TOBS,PRCP&stations='
'USC00050848&startDate=1893-10-01&endDate=2023-09-30')
GHCN_URL

'https://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=TOBS,PRCP&stations=USC00050848&startDate=1893-10-01&endDate=2023-09-30'

------------------------------------------------------------------------

## **Download and get started working with NCEI data**

Just like you did with the practice data, go ahead and use pandas to
import data from your API URL into Python. If you didn’t do it already,
you should import the pandas library **at the top of this notebook** so
that others who want to use your code can find it easily.

In [13]:
# Import data into Python from NCEI API
climate_df = pd.read_csv(
    GHCN_URL,
    na_values=['NaN'])

climate_df

Unnamed: 0,STATION,DATE,PRCP,TOBS
0,USC00050848,1893-10-01,239.0,
1,USC00050848,1893-10-02,0.0,
2,USC00050848,1893-10-03,0.0,
3,USC00050848,1893-10-04,10.0,
4,USC00050848,1893-10-05,0.0,
...,...,...,...,...
45966,USC00050848,2023-09-26,0.0,233.0
45967,USC00050848,2023-09-27,0.0,206.0
45968,USC00050848,2023-09-28,0.0,228.0
45969,USC00050848,2023-09-29,0.0,189.0
