# 2.1 Loading Data with Pandas

Now that we have covered some basic numpy and pandas concepts it's time to load some data and do some exploration to get more familiar with using pandas dataframes.

Pandas supports many ways of loading data into dataframes.

You can load file-like data sources with methods like `pd.read_csv()` and the more generic `pd.read_table()`.

You can load appropriately structured json with `pd.read_json()`, html tables with `pd.read_html()`, and a variety of formats like parquet, avro, hdf5, etc with either natively with pandas or in some cases with the help of extra libraries.

You can load data from sql database connections with `pd.read_sql()`

And of course most formats that can be read with `pd.read_x()` can be written to with `pd.to_x()`.

## Load Some Data!


In [None]:
import pandas as pd
import numpy as np

## CSV Files

Pretty much the most common format for exchanging data files, a csv file is just a text file with one row per line, and each value separaterd with a comma (CSV stands for Comma Separated Value)

In [None]:
# Here we load a dataset of measurements from plant species in the genus iris. 
# A famous early dataset in the field of statistics and machine learning.
# https://en.wikipedia.org/wiki/Iris_flower_data_set
df_iris = pd.read_csv("data/iris.csv")
df_iris

## Excel Files

Loading from excel files is fairly straightforward. If we just specify a file, pandas will try and read the first worksheet in the workbook, but we can add extra parameters to tell it what to load more specifically.

In [None]:
df_quarterly_sales = pd.read_excel(
    "data/quarterly-sales.xlsx",
    sheet_name='Quarterly Report',  # We need to specify the sheet name to load data from
    skiprows=3 # We can specify a number of rows to skip from the top of the sheet.
)
df_quarterly_sales

## Loading json data

Pandas supports loading data from json files, a popular format for storing data.

In [None]:
df_users = pd.read_json('data/users.json')
df_users

## Loading from a database.
We can load data from a SQL database connection with pandas too, databases come in many forms and some are more complex to connect to than others. Here we connect to one of the simplest, a file-based database format called sqlite.

In [None]:
import sqlite3
conn = sqlite3.connect('data/example.db')

# We pass a SQL query along with an object representing our database connection.
df_items = pd.read_sql("""SELECT * FROM items""", conn)

df_items