# Exploring Pandas with `pandas`

`pandas` is a Python library for working with data in tables.  "pandas" stands for "**pan**el **da**ta", with "panel" being a name data scientists use for tables.  Plus, pandas are cute (although they can be nasty).

In this worksheet, we'll learn how to use `pandas` to build and explore tables.

## Getting started

We normally tell Python to use `pandas` with the following command.  (The `as pd` says that we can refer to `pandas` with just `pd`.  That's a way to save typing.)

In [None]:
import pandas as pd

## Building a row of data
If we're building a table by hand, it's sometimes easiest to name each row.  Here's information on a famous giant panda.

In [None]:
baiyun = { "name": "Bai Yun", "sex": "Female", "age": 26, "home": "San Diego Zoo", "country": "U.S"}

Let's see what this row looks like.

In [None]:
baiyun

If you had to explain to someone how to enter data for a row, what would you say?  
*Write notes in your notebook and be prepared to tell a counselor.*

## Adding a few more rows
Here are a few more famous giant pandas.

In [None]:
baobao = { "name": "Bao Bao", "sex": "Female", "age": 4, "home":"Woolong", "country": "China"}

In [None]:
beibei = { "name": "Bei Bei", "sex": "Male", "age": 2, "home":"National Zoo", "country": "US"}

In [None]:
chuang = { "name": "Chuang Chuang", "sex": "Male", "age": 17, "home":"Chiang Mai Zoo", "country": "China"}

In [None]:
damao = { "name": "Da Mao", "sex": "Male", "age": 9, "home":"Calgary Zoo", "country": "Canada"}

In the following cell, check to see if the entries seem okay.  (Make sure to execute each expression above.)

## Building a table
We build a table with `pd.DataFrame(columns-[...],data=[...])`.  `columns` is a list of strings that name the columns.  `data` is a list of the rows.

In [None]:
giant_pandas = pd.DataFrame(columns=["name","sex","age"],
                            data=[baiyun,baobao,beibei,chuang,damao])

Let's print out the table.  Make sure to execute the previous command as well as the next command.

In [None]:
giant_pandas

Pretty cool, isn't it?

## Some experiments
Let's see what else we can do to build the table.

### Add columns
You'll note that we had a "home" and a "country" for each panda, but these don't appear in the output.  Update the instruction to add those columns.

In [None]:
giant_pandas = pd.DataFrame(columns=["name","sex","age"],
                            data=[baiyun,baobao,beibei,chuang,damao])
giant_pandas

### Rearrange columns
We can control the order in which values appear in the table.  In the command below, change the order of "sex" and "age".  Change the order of "home" and "country".  *What effect do you expect that change to have?*

In [None]:
giant_pandas = pd.DataFrame(columns=["name","sex","age","home","country"],
                            data=[baiyun,baobao,beibei,chuang,damao])
giant_pandas

### Rearrange rows
*What do you think happens if we change the order of the rows?*  Rearrange the list of pandas in the `data` below and see what happens.

### Add another row
You're now ready to add your own data to the table.  Pick one of the giant pandas 
mentioned one [the Wikipedia page](https://en.wikipedia.org/wiki/List_of_giant_pandas) 
and add it to the table.

In [None]:
new_panda = { "name": "name" }
giant_pandas = pd.DataFrame(columns=["name","sex","age","home","country"],
                            data=[baiyun,baobao,beibei,chuang,damao])
giant_pandas

### Add another column
Suppose we add a column for favorite food.  What do you expect to happen?  Give it a try and find out.

In [None]:
giant_pandas = pd.DataFrame(columns=["name","sex","age","home","country","favefood"],
                            data=[baiyun,baobao,beibei,chuang,damao])
giant_pandas

As you've probably discovered.  We get "NaN" for any missing data.  And we don't have a favorite food for any of the pandas.  Go back and add one to each.  Since Wikipedia doesn't tell you, you can make it up.  (I used "Eucalyptus" and "Bamboo".) *Make sure to run the cells again after you enter the data.*

## Selecting rows
What if we want only part of the data set, such as only the pandas that are in China?  Data frames provide a way to do so, but it's a bit weird.  Try the following.

In [None]:
giant_pandas.loc[giant_pandas["country"] == "China"]

We might want to name that table for for just those entries.

In [None]:
pandas_in_china = giant_pandas.loc[giant_pandas["country"] == "China"]
pandas_in_china

### Your turn
Now it's your turn.  Write a command to select just the male pandas.

In [None]:
male_pandas = []
male_pandas

### A challenge
Write a command to select the male pandas in the U.S.

In [None]:
male_pandas_in_us = []

## Selecting columns
We can also select just a column of data to get something like a Python list, but more powerful.  You write `data_frame["column_name"]` to get a particular column. Let's select just the "sex" column.

In [None]:
giant_pandas["sex"]

It may be useful to name that column.  Try the following command.

In [None]:
panda_sexes = giant_pandas["sex"]
panda_sexes

What can we do once we have a column?  If it's a column of numbers, we can ask 
for things like the mean, median, and mode.  If it's a column of words, we can count the number of words with `column.value_counts`.  Try the following command.