![Data Dunkers Banner](https://github.com/Data-Dunkers/lessons/blob/main/images/top-banner.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2FData-Dunkers%2Flessons&branch=main&subPath=comparing-input-sources.ipynb&depth=1" target="_blank"><img src="https://raw.githubusercontent.com/Data-Dunkers/lessons/main/images/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"></a>
<a href="https://colab.research.google.com/github/Data-Dunkers/lessons/blob/main/comparing-input-sources.ipynb" target="_blank"><img src="https://raw.githubusercontent.com/Data-Dunkers/lessons/main/images/open-in-colab-button.svg?sanitize=true" width="123" height="24" alt="Open in Colab"></a>

# Data Dunkers Lesson: Comparing Data Input Sources

## Lesson Objectives

By the end of this lesson, students will be able to:
- Import and manage data in a Jupyter Notebook from various sources using pandas.

## Introduction

We'll explore five different ways to input data into a Jupyter Notebook:

- direct entry
- CSV files
- Excel files
- webpages
- Google Sheets

## Getting Data From Within the Notebook

Inputing data directly within the Jupyter Notebook is useful for small datasets or for testing purposes. Here, we define lists of x and y values and use them to create a simple line plot.

In [None]:
import pandas as pd

# Define the data
x_data = [0, 1, 2, 3, 4, 5]
y_data = [0, 1, 4, 9, 16, 25]

# Create a DataFrame from the data
df = pd.DataFrame({'X': x_data, 'Y': y_data})

# Display the DataFrame
df

## Getting Data from a CSV File

Next we'll import data from a CSV file, a common format for data storage due to their simplicity and compatibility.

In [None]:
import pandas as pd

# Read the CSV file into a DataFrame named df
url = 'https://raw.githubusercontent.com/pbeens/Data-Dunkers/main/Data/x-y-data.csv'
df = pd.read_csv(url)

# Display the DataFrame
df

## Data from an Excel Spreadsheet

Excel is a popular tool for data management, and pandas provides built-in support for reading Excel files using `read_excel`

In [None]:
import pandas as pd

# Read the Excel file into a DataFrame named df
url = 'https://raw.githubusercontent.com/Data-Dunkers/data-dunkers-modules/main/data/x-y-data.xlsx'
df = pd.read_excel(url)

# Display the DataFrame
df

## Getting Data from a Webpage

Web scraping is an invaluable skill for gathering data that's publicly available online. In this example, we'll use the Pandas read_html function to load a table directly from a webpage.

With this method, we can easily import tabular data from any webpage, provided we correctly identify the table we want to use. This opens up a wide range of possibilities for data analysis.

In [None]:
import pandas as pd

# Read the HTML table into a DataFrame named df
url = 'https://raw.githubusercontent.com/Data-Dunkers/data/main/demo/x-y-data.html'
df = pd.read_html(url)[0]  # Index 0 is the first table

# Display the DataFrame
df

## Getting Data from a Google Sheet

This method is particularly useful for collaborative projects, as Google Sheets allow multiple users to edit and view the data in real-time.

Ensure your Google Sheet is set to be viewable by anyone with the link, replace the end of the URL with `/export?format=csv`, and then treat it like any other CSV file.

In [None]:
import pandas as pd

# Google Sheet URL variable, with modified /export?format=csv ending
url = 'https://docs.google.com/spreadsheets/d/1ZULKhYzsMd4eYwiprsyGgE9Df3gaVtO8WRalUQDn-xE/export?format=csv'

# Read the Google Sheet into a DataFrame named df
df = pd.read_csv(url)

# Display the DataFrame
df