# Descriptive statistics with a dataframe  
*Pandas* is a Python module (e.g., library of functions) for working with tabular data similar to a spreadsheet. An advantage is the ability to automate analyses in a way that spreadsheets require clicking and typing to repeat. Python can also handle significantly larger data sets than most spreadsheet software.  

First, you'll need to *import* the pandas module to use its functions. If you forget, the code below with output errors when it doesn't recognize a function it's trying to access (because it wasn't imported). 

In [None]:
# you only need to run this once, but it doesn't break anyting to run it multiple times
import pandas as pd

The notebook uses [Allison Horst](https://www.allisonhorst.com/)'s penguin data, but you can use these examples with any pandas dataframe.  

In [None]:
# this reads in a data file
penguins = pd.read_csv('https://github.com/mwaskom/seaborn-data/raw/master/penguins.csv')

# this shows the first few rows of the dataframe
penguins.head(3)

Now that there's a dataframe called "penguins", you can see some descriptive statistics of one column by referencing the **dataframe** along with the **column heading** in single quotes surrounded by square brackets. All of that is case-sensitive, too. Here's an example of how to refernce the data in a column of a dataframe:  

penguins['flipper_length_mm']  

Pandas can determine statistical quantities by adding the function name to the end of the column reference. Here's an example:

In [None]:
penguins['flipper_length_mm'].mean()

Did that output a number around 200? If not or you got an error, things to check are:  
- Did you import pandas first?  
- Did you read in the data?  
- Did you edit the code and lose a single quote, bracket, period, ot something else?  

Replace .mean() in the example above to count the number of values in the column using .count()  
The standard deviation is found with .std()

In [None]:
penguins['flipper_length_mm'].std()

Python can also do math with these quantities using +, -, *, and /.

In [None]:
# twice the average, if you needed to calculate that
2 * penguins['flipper_length_mm'].mean()

In [None]:
# double asterisk does an exponent (not ^ like in spreadsheets)
penguins['flipper_length_mm'].std() ** 2

You can quickly end up with long lines of code for calculations. A way to simplify that is to store a value as a variable. Then, you can reference that variable instead of writing a longer line of code. Here's an example of squaring the number of values in a column:

In [None]:
# save the number of values as a variable called "n"
n = penguins['flipper_length_mm'].count()

# outputs n squared
n**2

## Credits
This notebook was designed by [Adam LaMee](https://adamlamee.github.io/). The penguin data came from [Allison Horst](https://www.allisonhorst.com/) in R format and made into a csv for Seabon use by [Michael Waskom](https://github.com/mwaskom/seaborn). Thanks to the great folks at [Binder](https://mybinder.org/) and [Google Colaboratory](https://colab.research.google.com/notebooks/intro.ipynb) for making this notebook interactive without you needing to download it or install [Jupyter](https://jupyter.org/) on your own device.