# Reading Tabular Data into Data Frames

## Overview

**Teaching:** 10 min

**Exercises:** 10 min

## Questions
- How can I read tabular data?

## Objectives
- Import the Pandas library.

- Use Pandas to load a simple CSV data set.

- Get some basic information about a Pandas DataFrame.

## Use the Pandas library to do statistics on tabular data.

- [Pandas](https://pandas.pydata.org/) is a widely-used Python library for statistics, particularly on tabular data.
- Borrows many features from R’s dataframes.
  - A 2-dimensional table whose columns have names and potentially have different data types.
- Load it with `import pandas as pd`. The alias pd is commonly used for Pandas.
- Read a Comma Separated Values (CSV) data file with `pd.read_csv`.
  - Argument is the name of the file to be read.
  - Assign result to a variable to store the data that was read.

In [None]:
import pandas as pd

data = pd.read_csv('../../data/gapminder_gdp_oceania.csv')
print(data)

- The columns in a dataframe are the observed variables, and the rows are the observations.
- Pandas uses backslash `\` to show wrapped lines when output is too wide to fit the screen.

### File Not Found

Our lessons store their data files in a data sub-directory, which is two levels up from this file. That's why the path to the file is `../../data/gapminder_gdp_oceania.csv`. Each set of `..` takes you up one level in the directory tree. If you forget to include `../../data/`, or if you include it but your copy of the file is somewhere else, you will get a runtime error that ends with a line like this:

```
FileNotFoundError: [Errno 2] No such file or directory: '../../data/gapminder_gdp_oceania.csv'
```


## Use `index_col` to specify that a column’s values should be used as row headings.

- Row headings are numbers (`0` and `1` in this case).
- Really want to index by country.
- Pass the name of the column to `read_csv` as its `index_col` parameter to do this.


In [None]:
data = pd.read_csv('../../data/gapminder_gdp_oceania.csv', index_col='country')
data

## Use the `DataFrame.info()` method to find out more about a dataframe.

In [None]:
data.info()

- This is a `DataFrame`
- Two rows named `'Australia'` and `'New Zealand'`
- Twelve columns, each of which has two actual 64-bit floating point values.
  - We will talk later about null values, which are used to represent missing observations.
- Uses 208 bytes of memory.

## The `DataFrame.columns` variable stores information about the dataframe’s columns.

- Note that this is data, *not* a method. (It doesn’t have parentheses.)
  - Like `math.pi`.
  - So do not use `()` to try to call it.
- Called a `member variable`, or just `member`.

In [None]:
print(data.columns)

## Use `DataFrame.T` to transpose a dataframe.

- Sometimes want to treat columns as rows and vice versa.
- Transpose (written `.T`) doesn’t copy the data, just changes the program’s view of it.
- Like `columns`, it is a member variable.

In [None]:
data.T

# Use `DataFrame.describe()` to get summary statistics about data.

`DataFrame.describe()` gets the summary statistics of only the columns that have numerical data. All other columns are ignored, unless you use the argument `include='all'`.

In [None]:
data.describe()

- Not particularly useful with just two records, but very helpful when there are thousands.

## Exercises

See `../exercises/03-reading-tabular-data_exercises.ipynb`

## Key Points

- Use the Pandas library to get basic statistics out of tabular data.

- Use `index_col` to specify that a column’s values should be used as row headings.

- Use `DataFrame.info()` to find out more about a dataframe.

- The `DataFrame.columns` variable stores information about the dataframe’s columns.

- Use `DataFrame.T` to transpose a dataframe.

- Use `DataFrame.describe()` to get summary statistics about data.

Licensed under [CC-BY 4.0](http://swcarpentry.github.io/python-novice-gapminder/07-reading-tabular/index.html) 2018–2023 by [The Carpentries](https://carpentries.org/)

Licensed under [CC-BY 4.0](http://swcarpentry.github.io/python-novice-gapminder/07-reading-tabular/index.html) 2016–2018 by [Software Carpentry Foundation](https://software-carpentry.org/)