![Data Dunkers Banner](https://github.com/Data-Dunkers/lessons/blob/main/images/top-banner.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fdata-dunkers%2Flessons&branch=main&subPath=data-from-csv.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/Data-Dunkers/lessons/main/images/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>
<a href="https://colab.research.google.com/github/data-dunkers/lessons/blob/main/data-from-csv.ipynb" target="_parent"><img src="https://raw.githubusercontent.com/Data-Dunkers/lessons/main/images/open-in-colab-button.svg?sanitize=true" width="123" height="24" alt="Open in Colab"/></a>

# Data Dunkers Lesson: Data From a CSV File

The corresponding Activity Notebook for this Lesson Notebook can be found [here](https://github.com/Data-Dunkers/activities/blob/main/data-from-csv.ipynb).

## Objectives

By the end of this lesson, students will be able to:
- Learn how to load and view data from a CSV (Comma Separated Values) file. *(Example: Load and view a dataset of X and Y values from a CSV file to understand its structure.)*
- Understand the importance of correctly identifying and using data columns. *(Example: Identify the 'Season' and 'Points' columns to visualize a player’s performance over time.)*
- Visualize data effectively using simple line plots. *(Example: Create a line plot showing the relationship between X and Y values from the CSV data.)*

## IPO: Setup & Input

A common method is to get the data from *outside* the code, often from a [CSV file (comma separated values)](https://en.wikipedia.org/wiki/Comma-separated_values) file.

We will import the [code library](https://en.wikipedia.org/wiki/Library_(computing)) called [pandas](https://pandas.pydata.org/) to read data from [this CSV file](https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/x-y-data.csv) into a [DataFrame](https://www.w3schools.com/python/pandas/pandas_dataframes.asp).

In [None]:
import plotly.express as px
import pandas as pd

# Read the CSV file into a DataFrame named df
url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/x-y-data.csv'
df = pd.read_csv(url)

## IPO: Process

After importing data, it's important to review it to ensure everything looks as expected. One of the first things we might do is use the Pandas `head()` function to quickly look at the top few lines of the data.

In [None]:
df.head()

What about the bottom rows? (Let's only look at the bottom 2 rows)

In [None]:
df.tail(2)

You'll see that Pandas has inserted an index as the first column. We'll use that later.

Besides using `head()` to have a quick look at the data, data scientists often use `df.columns` to show which columns are included.

In [None]:
df.columns

It tells us there are two columns: 'X' and 'Y'. The case of the letters is important, so always pay attention to that. 

## IPO: Output

And now let's plot it using Plotly Express (`px`). This is just like with the internal data, but we also specify the variable where the dataframe is stored (`df`).

In [None]:
px.line(df, x='X', y='Y', title='Data from a CSV file')

Another way to create a plot is by storing it in a variable and then showing it.

In [None]:
fig = px.line(df, x='X', y='Y', title='Data from a CSV file')
fig.show()

Putting it all together, we have:

In [None]:
# Setup
import plotly.express as px
import pandas as pd

# Input
url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/x-y-data.csv'
df = pd.read_csv(url)

# Process 
fig = px.line(data_frame=df, x='X', y='Y', title='Data from a CSV file')

# Output
fig.show()

## What If We Want to Change the Names of the Columns?

If there are just a few columns you can simply reassign them like this:

In [None]:
df.columns = ['X Value', 'X^2']

df.columns

### Supplemental

If your DataFrame contains many columns and you only need to rename a few, you can do so efficiently using Python [*dictionaries*](https://www.w3schools.com/python/python_dictionaries.asp). This method allows you to specify only the columns you want to rename without affecting the others.

In [None]:
df = df.rename(columns={'X Value': 'X', 'X^2': 'Y'})

df.columns

## Exercise

Using the code above as an example, use the data below to plot Pascal Siakam's field goals made over his Raptors career. 

In [None]:
url = 'https://raw.githubusercontent.com/Data-Dunkers/data-dunkers-modules/main/data-dunkers/Data/example.csv'



---
In the next lesson we will get our data from an [Excel](data-from-excel.ipynb) file.

---
Back to [Lessons](https://github.com/Data-Dunkers/lessons/blob/main/lessons.ipynb)