# Using the Python module `csv` with an existing CSV dataset

## Introduction to CSV in Python

Introduction to the `csv` module in Python, see: [Python Docs](https://docs.python.org/3/library/csv.html#module-csv).

This notebook will guide you through working with CSV files using the `csv.DictReader`.
We will use a publicly available dataset: "Global Temperature Data."

We start with importing necessary libraries. We get the dataset from a URL, which is why we need a few more Python modules for loading the data set from the net.

The required Python libraries are imported:
- `csv`: Used to handle reading and writing CSV files.
- `requests`: Used to fetch a CSV file from an external URL.
- `StringIO`: Allows treating a string as a file-like object so it can be read as if it were a local file.

In [1]:
import csv
import requests
from io import StringIO

## Downloading a CSV File from a URL

- A CSV dataset is downloaded from the internet using the requests library.
- The `requests.get(url)` function fetches the file as raw text.
- The first 500 characters of the CSV file are displayed to ensure the data was retrieved correctly.

In [2]:
# Download a CSV file (Global Temperature Data) from an external URL
url = 'https://raw.githubusercontent.com/datasets/global-temp/master/data/monthly.csv'

# Send a GET request to fetch the raw CSV data
response = requests.get(url)
csv_data = response.text

# Display the first 500 characters of the CSV data to ensure we fetched it
csv_data[:500]

'Source,Year,Mean\r\ngcag,1850-01,-0.6746\r\ngcag,1850-02,-0.3334\r\ngcag,1850-03,-0.5913\r\ngcag,1850-04,-0.5887\r\ngcag,1850-05,-0.5088\r\ngcag,1850-06,-0.3442\r\ngcag,1850-07,-0.1598\r\ngcag,1850-08,-0.2077\r\ngcag,1850-09,-0.3847\r\ngcag,1850-10,-0.5331\r\ngcag,1850-11,-0.2825\r\ngcag,1850-12,-0.4037\r\ngcag,1851-01,-0.2007\r\ngcag,1851-02,-0.4693\r\ngcag,1851-03,-0.6461\r\ngcag,1851-04,-0.5421\r\ngcag,1851-05,-0.1976\r\ngcag,1851-06,-0.1367\r\ngcag,1851-07,-0.0968\r\ngcag,1851-08,-0.1018\r\ngcag,1851-09,-0.0912\r\ngcag,1851-10,-0.0084'

## Using `csv.DictReader` to Read CSV Data

- The downloaded text data is converted into a file-like object using StringIO so that Python can read it as a CSV file.
- `csv.DictReader` is used to read the CSV file, treating each row as a dictionary where:
  - The keys are column headers (e.g., "Year", "Mean"). Note that the Year field contains the year and the month!
  - The values are the corresponding row values.

- The first 5 rows of the dataset (as dictionaries) are displayed for verification.

In [3]:
# Use `csv.DictReader` to read the CSV data directly from the text
data = StringIO(csv_data)  # Convert the text data to a file-like object

# Reading CSV data with DictReader (this reads the CSV as a dictionary where each row is a dictionary)
reader = csv.DictReader(data)

# Convert to a list of dictionaries (each dictionary represents a row)
rows = list(reader)

# Display the first 5 rows of the dataset
rows[:5]

[{'Source': 'gcag', 'Year': '1850-01', 'Mean': '-0.6746'},
 {'Source': 'gcag', 'Year': '1850-02', 'Mean': '-0.3334'},
 {'Source': 'gcag', 'Year': '1850-03', 'Mean': '-0.5913'},
 {'Source': 'gcag', 'Year': '1850-04', 'Mean': '-0.5887'},
 {'Source': 'gcag', 'Year': '1850-05', 'Mean': '-0.5088'}]

## Accessing Specific Data

- This section extracts and stores a list of tuples, each containing:

  -  A year (e.g., "2000-01").
  -  The mean temperature recorded on that date.

- The first 5 entries of the extracted data are displayed.

In [5]:
# Let's access specific columns from the dataset, for example, 'Date' and 'MeanTemperature'
dates_and_temperatures = [(row['Year'], row['Mean']) for row in rows]

# Display the first 5 entries of the extracted data
dates_and_temperatures[:5]

[('1850-01', '-0.6746'),
 ('1850-02', '-0.3334'),
 ('1850-03', '-0.5913'),
 ('1850-04', '-0.5887'),
 ('1850-05', '-0.5088')]

## Filtering Data

In this example we will extracting data for a specific year.

- The dataset is filtered to include only rows where the Year starts with "2000" (i.e., data from the year 2000).
- This is done using list comprehension, checking if row['Year'].startswith('2000').
- The first 5 rows from the filtered data are displayed.

In [6]:
# Filter the rows to get only data from the year 2000
filtered_data_2000 = [row for row in rows if row['Year'].startswith('2000')]

# Display the first 5 rows from 2000
filtered_data_2000[:5]

[{'Source': 'GISTEMP', 'Year': '2000-01', 'Mean': '0.24'},
 {'Source': 'gcag', 'Year': '2000-01', 'Mean': '0.1813'},
 {'Source': 'GISTEMP', 'Year': '2000-02', 'Mean': '0.56'},
 {'Source': 'gcag', 'Year': '2000-02', 'Mean': '0.4992'},
 {'Source': 'GISTEMP', 'Year': '2000-03', 'Mean': '0.55'}]

## Writing Dat

We will write the data to a new CSV file using `csv.DictWriter`

- The filtered data from the year 2000 is written to a new CSV file (filtered_2000_temperatures.csv).
- `csv.DictWriter` is used to write the data:

   - The column names ('Year' and 'Mean') are defined.
   - The header row is written to the file.
   - Each row from the filtered dataset is written to the file.

- After writing, the file list in the directory is checked to verify that the new CSV file was successfully created.

In [10]:
# Let's write the filtered data from 2000 to a new CSV file using `csv.DictWriter`
outputfile = 'filtered_2000_temperatures.csv'

# Write the filtered data to a new CSV file

with open(outputfile, mode='w', newline='') as file:
    fieldnames = ['Year', 'Mean']
    writer = csv.DictWriter(file, fieldnames=fieldnames)

    # Write the header (fieldnames)
    writer.writeheader()

    # Write the rows
    for row in filtered_data_2000:
        writer.writerow({'Year': row['Year'], 'Mean': row['Mean']})

# Verify the new file was created
import os
print(os.listdir())

# Or, using the os.path.exists() function
os.path.exists(outputfile)

['.config', 'filtered_2000_temperatures.csv', 'sample_data']


True

## Conclusion and Further Exploration

In this notebook, we've learned how to:
- Read a CSV file from a URL using `csv.DictReader` and convert it to a list of dictionaries.
- Access and filter data based on specific conditions.
- Write filtered data to a new CSV file using `csv.DictWriter`.

You can further explore advanced CSV operations such as merging CSV files, data cleaning, and processing larger datasets.

For simply reading and writing Excel files you may want to explore `openpyxl`, see [Openpyxl](https://pypi.org/project/openpyxl/).

However, pandas can also read and write csv files and Excel spreadsheets. In the next section we will explore pandas.

### Learning Goals:
- **Using `csv.DictReader`**: Read CSV data as dictionaries, where each row is represented as a dictionary with column names as keys.
- **Filtering Data**: Filter rows based on specific conditions (e.g., extracting data for the year 2000).
- **Writing CSV Files**: Write the filtered data to a new CSV file using `csv.DictWriter`.

### Estimated Time: 2 Hours Max

- **1st Hour**: Understand the basics of `csv.DictReader`, and practice reading and filtering the data.
- **2nd Hour**: Learn to manipulate and write data to a new CSV file.

This notebook uses a real-world dataset and demonstrates practical use cases for working with CSV files using the **`csv`** module in Python.