Open this notebook in Callysto [here](https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https://github.com/pbeens/Data-Analysis&branch=main&subPath=Demos/where-can-we-get-data-from-csv.ipynb&depth=1) or in Colab [here](https://githubtocolab.com/pbeens/Data-Analysis/blob/main/Demos/where-can-we-get-data-from-csv.ipynb).

## Introduction

There are many ways we can import data, but the most common are from the program itself, a CSV (comma separated values) file, from an Excel spreadsheet, from a Google Sheet, or from a webpage. 

In this demo, we will demonstrate how to get data from within the Jupyter Notebook itself.

# Data from a CSV file

In our first example, we got our data from [within](where-can-we-get-data-from-internal.ipynb) the Jupyter Notebook itself. This method can be used but it is not very common.

A more common method is to get the data from outside the program, with the  **CSV** file format being one of the most common. 

In this example program, we first import the **Pandas** library using `import pandas as pd` (we still need `plotly.express` so that's imported as well). We then use the `pd.read_csv()` function to read the CSV file into a **Pandas DataFrame**. 

In [None]:
# import plotly.express and pandas
import plotly.express as px
import pandas as pd

# Read the CSV file into a DataFrame named df
df = pd.read_csv('https://raw.githubusercontent.com/pbeens/Data-Analysis/main/Data/x-y-data.csv')

Just for fun, let's look at the top few lines of data we just imported. We use the Pandas `head()` function for this:

In [None]:
# Display the first 5 rows of the data
print(df.head())

What about the bottom rows?

In [None]:
# Display the last 5 rows of the data
print(df.tail())

You'll see that Pandas has inserted an index column before the data. We won't worry about that at this time because it won't affect us here.

Besides using `head()` to have a quick look at the data, data scientists also often look at what columns are included in the datafile. To do that, we use the `df.columns` attribute. Hereâ€™s how:

In [None]:
print(df.columns)

It tells us there are two columns: 'X' and 'Y'. The case of the letters is important, so always pay attention to that. 

And now let's plot it. Try to indentify the differences from the prvious program.

In [None]:
# Create the plot
fig = px.line(data_frame=df, 
              x='X', 
              y='Y', 
              title='Data from a CSV file')

# Show the plot
fig.show()

The important differences are that first we had to identify the dataframe we want to use, and we had to tell it the name of the columns we want to use for the x-data and the y-data:

    data_frame=df,
    x='X', 
    y='Y'

Putting it all together, we have:

In [None]:
# import plotly.express and pandas
import plotly.express as px
import pandas as pd

# Read the CSV file into a DataFrame named df
df = pd.read_csv('https://raw.githubusercontent.com/pbeens/Data-Analysis/main/Data/x-y-data.csv')

# Create the plot
fig = px.line(data_frame=df, 
              x='X', 
              y='Y', 
              title='Data from a CSV file')

# Show the plot
fig.show()

In our next demonstration we will get our data from an [Excel](where-can-we-get-data-from-excel.ipynb) file.