# Climate Coding Challenge

Climate change is impacting the way people live around the world

# STEP 2: Wrangle your data

# STEP 0: Set up

To get started on this notebook, you’ll need to restore any variables
from previous notebooks to your workspace. To save time and memory, make
sure to specify which variables you want to load.

In [1]:
%store -r

You will also need to import any libraries you are using in this
notebook, since they won’t carry over from the previous notebook:

In [2]:
# Import libraries

## Python **packages** let you use code written by experts around the world

Because Python is open source, lots of different people and
organizations can contribute (including you!). Many contributions are in
the form of **packages** which do not come with a standard Python
download.

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-read"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Read More: Packages need to be installed and imported.</div></div><div class="callout-body-container callout-body"><p>Learn more about using Python packages. How do you find and use
packages? What is the difference between installing and importing
packages? When do you need to do each one? <a
href="https://www.earthdatascience.org/courses/intro-to-earth-data-science/python-code-fundamentals/use-python-packages/">This
article on Python packages</a> will walk you through the basics.</p></div></div>

In the cell below, someone was trying to import the **pandas package**,
which helps us to work with [**tabular data** such as comma-separated
value or csv
files](https://www.earthdatascience.org/courses/intro-to-earth-data-science/file-formats/use-text-files/).

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It: Import packages</div></div><div class="callout-body-container callout-body"><ol type="1">
<li>Correct the typo below to properly import the pandas package under
its <strong>alias</strong> pd.</li>
<li>Run the cell to import the libraries you’ll need for this
workflow.</li>
</ol></div></div>

> **Warning**
>
> Make sure to run your code in the right **environment** to avoid
> import errors!
>
> We’ve created a coding **environment** for you to use that already has
> all the software and packages you will need! When you try to run some
> code, you may be prompted to select a **kernel**. The **kernel**
> refers to the version of Python you are using. You should use the
> **base** kernel, which should be the default option for you.

In [3]:
# Import libraries
import holoviews as hv
import hvplot.pandas
import pandas as pd

## Download the practice data

Next, lets download some climate data from Karachi, Pakistan to practice
with. We keep our practice data on a website called
[figshare](https://figshare.com), so that we can check that it still
works and make sure it looks just like the data you would download from
the original source. Later, you’ll learn how to use our `earthpy`
package to download and manage data, and also how to download raw data
using **APIs**…but for now we’ll keep things simple and use a URL.

The cell below contains the starting point for URL for the data you will
use in this part of the notebook. There are three things to notice about
the URL code:

1.  It is surrounded by quotes – that means Python will interpret it as
    a `string`, or text, type, which makes sense for a URL.
2.  The URL is too long to display as one line on most screens. We’ve
    put parentheses around it so that we can easily split it into
    multiple lines by writing two strings – one on each line.
3.  We replaced the figshare identifier for this dataset with
    `'FIGSHARE_ID_HERE'`. You’ll have to replace that with the real
    identifier, 55245161

However, we still have a problem - we can’t get the URL back later on
because it isn’t saved in a **variable**. In other words, we need to
give the url a **name** so that we can request in from Python later
(sadly, Python has no ‘hey what was that thingy I typed yesterday?’
function).

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-read"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Read More: Names/variables in Python</div></div><div class="callout-body-container callout-body"><p>One of the most common challenges for new programmers is making sure
that your results are stored so you can use them again. In Python, this
is called <strong>naming</strong>, or saving a
<strong>variable</strong>. Learn more in this <a
href="https://www.earthdatascience.org/courses/intro-to-earth-data-science/python-code-fundamentals/get-started-using-python/variables/">hands-on
activity on using variables</a> from our learning portal.</p></div></div>

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It: Save the URL for later</div></div><div class="callout-body-container callout-body"><ol type="1">
<li>Replace <code>FIGSHARE_ID_HERE</code> with the figshare id, <span
data-__quarto_custom="true" data-__quarto_custom_type="Shortcode"
data-__quarto_custom_context="Inline"
data-__quarto_custom_id="6"></span></li>
<li>Pick an expressive variable name for the URL, and then save the URL
using the equals sign <code>=</code>. For example, you could save the
value <code>1</code> to the name <code>a_number</code> using the code
<code>a_number = 1</code>.</li>
<li>At the end of the cell where you define your url variable,
<strong>call your variable (type out its name)</strong> so you can see
what it is.</li>
</ol></div></div>

In [4]:
climate_url = 'https://figshare.com/ndownloader/files/55245161'

The `pandas` library you imported can download data from the internet
directly into a type of Python **object** called a `DataFrame`. In the
code cell below, you can see an attempt to do just this. But there are
some problems…

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It: Fix some code!</div></div><div class="callout-body-container callout-body"><ol type="1">
<li><p>Leave a space between the <code>#</code> and text in the comment,
capitalize it, and try to make it more informative</p></li>
<li><p>Make any changes needed to get this code to run. HINT: The
<code>my_url</code> variable doesn’t exist - you need to replace it with
the variable name <strong>you</strong> chose.</p></li>
<li><p>Modify the <code>.read_csv()</code> function call to include the
following parameters:</p>
<ul>
<li><code>index_col='DATE'</code> – this sets the <code>DATE</code>
column as the index. Needed for subsetting and resampling later on</li>
<li><code>parse_dates=True</code> – this lets <code>python</code> know
that you are working with time-series data, and values in the indexed
column are <strong>date time objects</strong></li>
<li><code>na_values=['NaN']</code> – this lets <code>python</code> know
how to handle missing values</li>
</ul></li>
<li><p>Clean up the code by using <strong>expressive variable
names</strong>, <strong>expressive column names</strong>, <strong>PEP-8
compliant code</strong>, and <strong>descriptive
comments</strong></p></li>
</ol></div></div>

In [5]:
#download
climate_df = pd.read_csv(
    climate_url,
    index_col='DATE', 
    parse_dates=True,
    na_values=['NaN'])
climate_df

Unnamed: 0_level_0,STATION,TAVG
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1
1942-10-01,PKM00041780,81
1942-10-02,PKM00041780,81
1942-10-03,PKM00041780,84
1942-10-04,PKM00041780,84
1942-10-05,PKM00041780,84
...,...,...
2024-09-26,PKM00041780,87
2024-09-27,PKM00041780,87
2024-09-28,PKM00041780,86
2024-09-29,PKM00041780,87


> **Tip**
>
> Check out the `type()` function below - you can use it to check that
> your data is now in `DataFrame` type object.

In [6]:
# Check that the data was imported into a pandas DataFrame
type(climate_df)

pandas.core.frame.DataFrame

## Clean up your `DataFrame`

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It: Get rid of unwanted columns</div></div><div class="callout-body-container callout-body"><p>You can use <strong>double brackets</strong> (<code>[[</code> and
<code>]]</code>) to select only the columns that you want from your
<code>DataFrame</code>:</p>
<ol type="1">
<li>Change <code>some_column_name</code> to the Temperature column
name.</li>
<li>Give the <code>DataFrame</code> a more descriptive name.</li>
<li>Add a properly formatted comment to describe what this code is
doing.</li>
</ol></div></div>

> **Warning**
>
> Column names are text values, not variable names, so you need to put
> them in quotes!

In [7]:
climate_df = climate_df[['TAVG']]
climate_df

Unnamed: 0_level_0,TAVG
DATE,Unnamed: 1_level_1
1942-10-01,81
1942-10-02,81
1942-10-03,84
1942-10-04,84
1942-10-05,84
...,...
2024-09-26,87
2024-09-27,87
2024-09-28,86
2024-09-29,87


Great work! You finished the data cleaning section of this coding
challenge. You can go on to the next notebook – but first, make sure to
store the climate `DataFrame` you made so that you can use it in the
next notebooks. We’ll do this using the [ipython `store` **cell
magic**](https://ipython.readthedocs.io/en/stable/config/extensions/storemagic.html).
[Cell magic
commands](https://ipython.readthedocs.io/en/stable/interactive/magics.html)
aren’t part of your regular code; they’re there to help you with coding
in Jupyter Notebooks. You can tell it’s a magic command because it
starts with `%`:

In [12]:
%store climate_df

# STEP -1: Wrap up

Don’t forget to store your variables so you can use them in other
notebooks! Replace `var1` and `var2` with the variable you want to save,
separated by spaces.

In [8]:
%store climate_url climate_df

Stored 'climate_url' (str)
Stored 'climate_df' (DataFrame)


Finally, be sure to `Restart` and `Run all` to make sure your notebook
works all the way through!