# Data Analysis and Visualization in Python
## Starting With Data
Questions
* How can I import data in Python?
* What is Pandas?
* Why should I use Pandas to work with data?

Objectives
* Load the Python Data Analysis Library (Pandas).
* Use `read_csv` to read tabular data into Python.
* Describe what a DataFrame is in Python.
* Access and summarize data stored in a DataFrame.
* Perform basic mathematical operations and summary statistics on data in a Pandas DataFrame.
* Create simple plots.

## Working With Pandas DataFrames in Python

### Pandas in Python
One of the best options for working with tabular data in Python is to use the Python Data Analysis Library (a.k.a. Pandas). The Pandas library provides data structures, produces high quality plots with matplotlib and integrates nicely with other libraries that use NumPy (which is another Python library) arrays.

In [None]:
import ### as ###

## Reading CSV Data Using Pandas
### So What’s a DataFrame?

In [None]:
# Note that pd.read_csv is used because we imported pandas as pd
###("###/surveys.csv")

In [None]:
### = pd.read_csv("data/surveys.csv")

In [None]:
surveys_df### # Displays the first several rows of a file

In [None]:
surveys_df.###
# What format does it return the shape of the DataFrame in?

In [None]:
surveys_df.###

In [None]:
# Compute descriptive statistics per column
print("Count:    ", surveys_df['weight'].###())
print("Mean:     ", surveys_df['weight'].###())
print("Std Dev.: ", surveys_df['weight'].###())
print("Min:      ", surveys_df['weight'].###())
print("Max:      ", surveys_df['weight'].###())

In [None]:
# New column - Convert all weights from grams to kilograms
surveys_df[###] = surveys_df[###] ###
surveys_df.###

## Types of Data
### Checking the format of our data

In [None]:
surveys_df###

In [None]:
surveys_df['month']###

Native Python Type | Pandas Type | Description
-------------------|-------------|------------
`str`              | `object`    | The most general dtype. Will be assigned to your column if column has mixed types (numbers and strings).
`int`              | `int64`     | 64 bits integer
`float`            | `float64`   | Numeric characters with decimals. If a column contains numbers and NaNs(see below), pandas will default to float64.
 N/A               | `datetime64`| Values meant to hold time data.

### Working With Our Survey Data

In [None]:
# Summary of descriptive statistics
surveys_df###

In [None]:
surveys_df['month'] = surveys_df['month'].###('###')
surveys_df['month'].###

In [None]:
surveys_df['month'].###

In [None]:
surveys_df['month'].###

In [None]:
surveys_df[###].unique()

### Exercise - Calculating Statistics

`1`. What happens if we try to convert `weight` values to `int64` integers?

In [None]:
surveys_df['weight'].###('int64')

`2`. Try converting the column `plot_id` to native Python `float` data type.

In [None]:
surveys_df['plot_id'] ### surveys_df['###'].###("###")
surveys_df['plot_id'].dtype

`3`. Create a list of unique site ID’s (`plot_id`) found in the surveys data. Call it `site_names`. How many unique sites are there in the data?

In [None]:
site_names = surveys_df[###]###
site_names.###

`4`. What is the difference between `len(site_names)` and `surveys_df['plot_id'].nunique()`?

In [None]:
# print(len(site_names))
# print(surveys_df['plot_id'].nunique())

## Groups in Pandas

In [None]:
# Group data by sex
### = surveys_df.###('###')

In [None]:
# Summary statistics for all numeric columns by sex
by_sex.###

In [None]:
# Provide the mean for each numeric column by sex
by_sex.###

### Exercise - Grouping
`1`. How many recorded individuals are female `F`, and how many male `M`?

In [None]:
by_sex###

`2`. What happens when you group by two columns using the following syntax and then grab mean values:

In [None]:
by_site_sex = surveys_df.groupby(['plot_id','sex'])
by_site_sex.###()

`3`. Summarize `weight` values for each site (`plot_id`) in your data. HINT: it is possible to select a column once the data has been grouped.

In [None]:
by_site = surveys_df.###(['###'])
by_site['###'].###

### Getting the Number of Records of One Species

In [None]:
surveys_df['species_id'].###

In [None]:
surveys_df.groupby('species_id')['record_id']#.count()###

## Quick & Easy Plotting Data Using Pandas

In [None]:
by_site['record_id'].###().###(kind='bar')

### Exercise - Plotting Challenge
Create a `line` plot of the median `weight` per month.

In [None]:
# Why is the following line necessary?
# surveys_df['month'] = surveys_df['month'].astype('int64')
surveys_df.###('###')['###'].###().plot(kind='###')

## Summary Example

In [None]:
site_sex_totalw = by_site_sex['weight'].###()
site_sex_totalw###

In [None]:
# Change the right-most categorical values into columns
sst = site_sex_totalw.unstack()
sst

In [None]:
s_plot = sst.plot(kind='bar', ###,
                  title="Total weight by site and sex")
s_plot.set_xlabel("Site")
s_plot.set_ylabel("Weight (g)")