![Data Dunkers Banner](https://github.com/Data-Dunkers/lessons/blob/main/images/top-banner.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fdata-dunkers%2Flessons&branch=main&subPath=data-from-csv.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/Data-Dunkers/lessons/main/images/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>
<a href="https://colab.research.google.com/github/data-dunkers/lessons/blob/main/data-from-csv.ipynb" target="_parent"><img src="https://raw.githubusercontent.com/Data-Dunkers/lessons/main/images/open-in-colab-button.svg?sanitize=true" width="123" height="24" alt="Open in Colab"/></a>

# Getting Data From a CSV File

The corresponding Activity Notebook for this Lesson Notebook can be found [here](https://github.com/Data-Dunkers/activities/blob/main/data-from-csv.ipynb).

## Objectives

By the end of this lesson, students will be able to:
- Learn how to load and view data from a CSV (Comma Separated Values) file. *(Example: Load and view a dataset of X and Y values from a CSV file to understand its structure.)*
- Understand the importance of correctly identifying and using data columns. *(Example: Identify the 'Season' and 'Points' columns to visualize a player’s performance over time.)*
- Visualize data effectively using simple line plots. *(Example: Create a line plot showing the relationship between X and Y values from the CSV data.)*

## Introduction

There are many ways we can import data, but the most common are from the code itself, a CSV (comma separated values) file, from an Excel spreadsheet, from a Google Sheet, or from a webpage. 

In this demo, we will demonstrate how to get data from within the Jupyter Notebook itself.

## IPO: Setup & Input

In our first example, we got our data from within the Jupyter Notebook itself.

A more common method is to get the data from *outside* the code, often from a **CSV** file.

In this example code, we first import the **Pandas** library using `import pandas as pd` (we still need `plotly.express` so that's imported as well). We then use the `pd.read_csv()` function to read the [CSV file](https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/x-y-data.csv) into a **Pandas DataFrame**. 

Note that we are using a variable called `URL` this time. This often makes the code easier to read.

In [None]:
# Import necessary libraries
import plotly.express as px
import pandas as pd

# Read the CSV file into a DataFrame named df
url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/x-y-data.csv'
df = pd.read_csv(url)

## IPO: Process

After importing data, it's important to review it to ensure everything looks as expected. One of the first things we might do is use the Pandas `head()` function to quickly look at the top few lines of the data.

In [None]:
# Display the first 5 rows of the data
print(df.head())

What about the bottom rows? (Let's only look at the bottom 2 rows)

In [None]:
# Display the last 2 rows of the data
print(df.tail(2))

You'll see that Pandas has inserted an index column before the data. We won't worry about that at this time because it won't affect us here.

Besides using `head()` to have a quick look at the data, data scientists also often look at what columns are included in the datafile. To do that, we use the `df.columns` attribute. Here's how:

In [None]:
# Display the column names
print(df.columns)

It tells us there are two columns: 'X' and 'Y'. The case of the letters is important, so always pay attention to that. 

## IPO: Output

And now let's plot it. Notice that this is the exact same code as when we plotted the [internal data](where-can-we-get-data-from-internal.ipynb).

In [None]:
# Create the plot
fig = px.line(data_frame=df, 
    x='X', 
    y='Y', 
    title='Data from a CSV file')

# Show the plot
fig.show()

The important difference from using internal list data is that we  have to identify the dataframe we want to use before telling it the names of the columns we want to use.

Putting it all together, we have:

In [None]:
# Setup
import plotly.express as px
import pandas as pd

# Input
url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/x-y-data.csv'
df = pd.read_csv(url)

# Process 
fig = px.line(data_frame=df, 
    x='X', 
    y='Y', 
    title='Data from a CSV file')

# Output
fig.show()

## What If We Want to Change the Names of the Columns?

If there are just a few columns you can simply reassign them like this:

In [None]:
# Changing column names
df.columns = ['X Value', 'X^2']
display(df.columns)

### Supplemental

If your DataFrame contains many columns and you only need to rename a few, you can do so efficiently using Python [*dictionaries*](https://www.w3schools.com/python/python_dictionaries.asp). This method allows you to specify only the columns you want to rename without affecting the others.

In [None]:
# Rename the columns of the DataFrame
df = df.rename(columns={'X Value': 'X',  # Renaming 'X Value' to 'X'
                        'X^2': 'Y'})     # Renaming 'X^2' to 'Y'

# Display the columns of the DataFrame to verify the renaming
display(df.columns)

## Exercise

Using the code above as an example, use the data below to plot Pascal Siakam's field goals made over his Raptors career. 

In [None]:
url = 'https://raw.githubusercontent.com/Data-Dunkers/data-dunkers-modules/main/data-dunkers/Data/example.csv'



---
In our next demonstration we will get our data from an [Excel](data-from-excel.ipynb) file.

---
Back to [Lessons](https://github.com/Data-Dunkers/lessons/blob/main/lessons.ipynb)