# Overview

# Introduction to the Python Programming Language

## The Python Interpreter

### Jupyter Notebook

## Python Data Types

### Strings, Integers, and Floats

### Operators

## Sequences

### Tuples

### Challenge: Tuples and Lists

1. What happens when you type `a_tuple[2] = 5` versus `a_list[1] = 5`? And why?
2. Type `type(a_tuple)` into Python; what is the object's type?

## Dictionaries

### Challenge: Reassignment in a Dictionary

## Functions

### Challenge: Writing Your First Function

### Lambda Functions

# Managing Data in Python

## About the Data

## About Libraries

## Reading CSV Data in Pandas

### Data Frames

## Manipulating Data in Python

### Challenge: Viewing DataFrames in Python

Try executing each code sample below and see what is returned.

- `surveys.columns`
- `surveys.head()`
- `surveys.head(15)`
- `surveys.tail()`
- `surveys.shape`

Take note of the output of `surveys.shape`; what format does it return?

**Finally, what is the difference between the code samples that end in parentheses and those that do not?**

Each of the code samples above has us calling some **attribute** or **method** on the surveys DataFrame.

- **Methods** are functions that belong to an object in Python, like a DataFrame. Just like the functions we saw earlier, functions take zero or more arguments that go inside the parentheses. Even if we have no arguments to provide, we still have to use the parentheses to get the function to do its work. **In general, a method is a function that belongs to an object.**
- **Attributes** are a more general concept; an attribute is anything that belongs to an object in Python, including methods. Attributes that are not methods, however, don't need to be called with parentheses.

If we think of a person, an attribute is something that belongs to that person or describes that person, like hair color or number of siblings.
A method is something that person does, like bake a pie or go for a run.

## Calculating Statistics in a Pandas DataFrame

### Challenge: Unique Levels for a Column

1. Create a list of unique plot IDs found in the survey data; assign the list of unique IDs to a variable called `plot_names`. **How many unique plots are there in the data? How many unique species are in the data?**
2. What is the difference between `len(plot_names)` and `plot_names.shape`?

### Groups in Pandas

### Quickly Creating Summary Counts in Pandas

### Challenge: Understanding Grouped DataFrames

1. In that last command, we asked for the `record_id` column. Try asking for a different column in the square brackets. Do you get a different result? Why or why not?
2. How can we get a count of just the records with `species_id` set to `DO`? *Hint: You can build on the last command we executed; think about Dictionaries and key-value pairs.*

### Basic Math on DataFrame Columns

## Basic Plotting with Pandas

### Challenge: Plotting

1. Create a plot of the average weight in each species.
2. Create a plot of total males and total females across the entire dataset.

*Note:* Some of the species have no weight measurements; they are entered as `NaN`, which stands for "not a number" and refers to missing values.

### Multiple Grouping in Pandas

# Indexing and Slicing Python DataFrames

## Indexing and Slicing in Python

### Selecting Data Using Labels (Column Headings)

### Extracting a Range of Data with Slicing

![](./slicing-indexing.svg)

![](./slicing-slicing.svg)

### Challenge: Slicing

What do each of these lines of code return?

1. grades[0]
2. grades[len(grades)]
3. grades[4]

Why do (2) and (3) return errors?

### Slicing Subsets of Rows in Python

### Oops: Referencing versus Copying Objects in Python

### Slicing Subsets of Rows and Columns with Pandas

### Challenge: Slicing Rows and Columns

What happens when you type:

1. `surveys[0:3]`
2. `surveys[:5]`
3. `surveys[-1:]`

To review...

**To index by rows in Pandas:**

In [168]:
surveys[0:3]
surveys.iloc[0:3]
surveys.iloc[0:3,:]

Unnamed: 0,record_id,month,day,year,plot_id,species_id,sex,hindfoot_length,weight
0,1,7,16,1977,2,NL,M,32.0,
1,2,7,16,1977,3,NL,M,33.0,
2,3,7,16,1977,2,DM,F,37.0,


**To index by columns (and rows) in Pandas**

In [174]:
surveys[['month', 'day', 'year']]
surveys.loc[0:3, ['month', 'day', 'year']]
surveys.iloc[0:3, 1:4]

Unnamed: 0,month,day,year
0,7,16,1977
1,7,16,1977
2,7,16,1977


## Subsetting Data Using Criteria

### Challenge: Filtering Data

1. Filter the `surveys` table to observations of female members of the `DO` species. How many are there? What is their average weight?
2. Look at the help documentation for the `isin()` function (Hint: `?surveys.year.isin`). Use this function to filter the `surveys` DataFrame to those rows that match the three species: `OL`, `OT`, `OX`.

## Using Masks

## Dealing with Missing Data

# Transforming Data

## Converting Units

## Transforming Values

# Answering Questions with Data

## Sorting on Values

# Combining Multiple Datasets

## Concatenating DataFrames

## Writing Data to a CSV File

## Joining DataFrames

### Identifying Join Keys

### Inner Joins

### Left Joins

### Other Joins

### Challenges: Joins

Create a new data frame by joining the contents of `surveys` and `species`. Then, calculate and plot the distribution of `taxa` by `plot_id`.

# Automating Data Workflows with Python

## Automating Data Processing with For Loops

### Challenge: Automation with For Loops

1. Some of the surveys we saved are missing data; they have `NaN` values in one or more columns. **Modify our `for` loop so that the entries with null values are not included in the yearly files.** 
2. **What happens if there are no data for a year in the sequence?** You can generate a list of years for the `for` loop to use with, e.g., `range(1970, 1980)`.
3. Let's say you only want to look at data from a given multiple of years. **How would you modify your loop in order to generate a data file for only every 5th year, starting from 1977?**

## Building Reusable Code with Functions

### Challenge: Writing Reusable Functions

1. What type of object corresponds to a variable declared as `None`? (Hint: Create a variable set to `None` and use the function `type()`).
2. What happens if you only call `multiple_years_to_csv()` with `all_data` and an `end_year` (that is, without providing a `start_year`)?  Can you write the function call with only a value for `end_year`?

# Visualizing Data in Python

## Plotting with ggplot

## Boxplots