# Exercise 0: Learning the Jupyter Environment for Python Programming

---



Welcome to the first exercise for the Data-Mining for Environmental Science (DMES) course! The goal of this exercise is to get you comfortable with the Jupyter Notebook environment. Jupyter is a powerful tool that allows you to write and execute Python code, add explanatory text, and visualize data all in one place. üë©‚Äçüíª

## Task 1: Working with Jupyter Notebooks

Jupyter notebooks are made up of **cells**. There are two main types of cells: **Code** cells and **Markdown** cells.

A **Code** cell contains Python code that you can execute. To run a code cell, you can either click the "Run" button in the toolbar above or press `Shift + Enter` on your keyboard.

A **Markdown** cell contains text, like this one. It's used for explanations, headings, and instructions. You can format text using Markdown syntax (e.g., `#` for headings, `**` for bold text). After editing a Markdown cell, run it (`Shift + Enter`) to see the formatted text.

### 1.1 Your First Code Cell

Let's start with a classic "Hello, World!" example. The cell below is a code cell. Click on it and press `Shift + Enter` to run it.

In [None]:
print("Hello, World! This is my first Python command.")

### 1.2 Basic Calculations

You can use code cells as a calculator. Try running the following cells to perform some basic arithmetic.

In [None]:
# Addition and Subtraction
result = 10 + 5 - 3
print(result)

In [None]:
# Multiplication and Division
product = 7 * 6
quotient = 100 / 8
print("Product:", product)
print("Quotient:", quotient)

## Task 2: Loading Data

A major part of data mining is, of course, working with data! We will use the **pandas** library, which is the standard tool for data manipulation and analysis in Python. We'll also use **NumPy** for numerical operations.

First, we need to import these libraries. It's a convention to import `pandas` as `pd` and `numpy` as `np`.

In [None]:
import pandas as pd
import numpy as np

### 2.1 Loading a CSV file

For this exercise, we'll use a sample dataset of air quality measurements. The data is stored in a CSV (Comma-Separated Values) file. We'll load it from a URL into a pandas DataFrame.

A **DataFrame** is a 2-dimensional labeled data structure with columns of potentially different types, similar to a spreadsheet.

In [None]:
# URL of the raw CSV file from a GitHub repository
url = 'https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/air_quality_no2.csv'

# Load the data into a pandas DataFrame
air_quality = pd.read_csv(url, index_col=0, parse_dates=True)

# Display the first 5 rows of the DataFrame
print("First 5 rows of the dataset:")
air_quality.head()

### 2.2 Inspecting the Data

Once the data is loaded, it's good practice to inspect it to understand its structure.
- `.head()` shows the first few rows.
- `.tail()` shows the last few rows.
- `.info()` provides a concise summary of the DataFrame.
- `.describe()` shows summary statistics for the numerical columns.

In [None]:
# Display the last 5 rows
print("Last 5 rows of the dataset:")
air_quality.tail()

In [None]:
# Get information about the DataFrame
print("Information about the dataset:")
air_quality.info()

In [None]:
# Get summary statistics
print("Summary statistics of the dataset:")
air_quality.describe()

## Task 3: Basic Plotting

Visualizing data is crucial for understanding it. We will use the **matplotlib** library, which is the most common plotting library in Python. It's often imported as `plt`.

In [None]:
import matplotlib.pyplot as plt

### 3.1 Creating a Simple Plot

Let's create a line plot of the `station_antwerp` data to see how the NO2 concentration changes over time.

In [None]:
# Create a plot
air_quality.plot(figsize=(12, 6)) # figsize sets the size of the plot

# Add title and labels
plt.title('Air Quality in Europe')
plt.xlabel('Date')
plt.ylabel('NO2 Concentration (¬µg/m¬≥)')

# Show the plot
plt.show()

### 3.2 Creating a Scatter Plot

A scatter plot is useful for visualizing the relationship between two variables. Let's see if there's a relationship between the measurements at the Paris and Antwerp stations.

In [None]:
# Create a scatter plot
air_quality.plot.scatter(x="station_paris", y="station_antwerp", figsize=(8, 8))

# Add title and labels
plt.title('NO2 Concentration: Paris vs. Antwerp')
plt.xlabel('Paris (¬µg/m¬≥)')
plt.ylabel('Antwerp (¬µg/m¬≥)')

# Show the plot
plt.show()

## Congratulations! üéâ

You have completed Exercise 0. You now know how to:
- Work with code and markdown cells in a Jupyter notebook.
- Load data from a file into a pandas DataFrame.
- Inspect the basic properties of your data.
- Create simple line and scatter plots.

You are now ready to move on to the next exercises where we will dive deeper into data analysis and mining techniques.