# pandas

This section of the workshop covers data ingestion, cleaning,
manipulation, analysis, and visualization in Python.

We build on the skills learned in the [Python
fundamentals](../python_fundamentals/index.ipynb) section and teach the
[pandas](https://pandas.pydata.org) library.

At the end of this section, you will be able to:

- Access data stored in a variety of formats  
- Combine multiple datasets based on observations that link them
  together  
- Perform custom operations on tables of data  
- Use the split-apply-combine method for analyzing sub-groups of data  
- Automate static analysis on changing data  
- Produce publication quality visualizations  


In the end, our goal with this section is to provide you the
necessary skills to – at a minimum – **immediately** replicate your current
data analysis workflow in Python with no loss of total (computer +
human) time.

This is a lower bound on the benefits you should expect to receive by
studying this section.

The expression “practice makes perfect” is especially true here.

As you work with these tools, both the time to write and the time to run
your programs will fall dramatically.

<div class="toctree">

- [Introduction](intro.ipynb)
  - [pandas](intro.ipynb#pandas)
  - [Series](intro.ipynb#series)
  - [DataFrame](intro.ipynb#dataframe)
  - [Data Types](intro.ipynb#data-types)
  - [Changing DataFrames](intro.ipynb#changing-dataframes)
  - [Exercises](intro.ipynb#exercises)
- [Basic Functionality](basics.ipynb)
  - [State Unemployment Data](basics.ipynb#state-unemployment-data)
  - [Dates in pandas](basics.ipynb#dates-in-pandas)
  - [DataFrame Aggregations](basics.ipynb#dataframe-aggregations)
  - [Transforms](basics.ipynb#transforms)
  - [Boolean Selection](basics.ipynb#boolean-selection)
  - [Exercises](basics.ipynb#exercises)
- [The Index](the_index.ipynb)
  - [So What is this Index?](the_index.ipynb#so-what-is-this-index)
  - [Setting the Index](the_index.ipynb#setting-the-index)
  - [Re-setting the Index](the_index.ipynb#re-setting-the-index)
  - [Choose the Index Carefully](the_index.ipynb#choose-the-index-carefully)
  - [Exercises](the_index.ipynb#exercises)
- [Storage Formats](storage_formats.ipynb)
  - [File Formats](storage_formats.ipynb#file-formats)
  - [Writing DataFrames](storage_formats.ipynb#writing-dataframes)
  - [Reading Files into DataFrames](storage_formats.ipynb#reading-files-into-dataframes)
  - [Practice](storage_formats.ipynb#practice)
- [Cleaning Data](data_clean.ipynb)
  - [Cleaning Data](data_clean.ipynb#id1)
  - [String Methods](data_clean.ipynb#string-methods)
  - [Type Conversions](data_clean.ipynb#type-conversions)
  - [Missing Data](data_clean.ipynb#missing-data)
  - [Case Study](data_clean.ipynb#case-study)
  - [Appendix: Performance of `.str` Methods](data_clean.ipynb#appendix-performance-of-str-methods)
  - [Exercises](data_clean.ipynb#exercises)
- [Reshape](reshape.ipynb)
  - [Tidy Data](reshape.ipynb#tidy-data)
  - [Reshaping your Data](reshape.ipynb#reshaping-your-data)
  - [Long vs Wide](reshape.ipynb#long-vs-wide)
  - [`set_index`, `reset_index`, and Transpose](reshape.ipynb#set-index-reset-index-and-transpose)
  - [`stack` and `unstack`](reshape.ipynb#stack-and-unstack)
  - [`melt`](reshape.ipynb#melt)
  - [`pivot` and `pivot_table`](reshape.ipynb#pivot-and-pivot-table)
  - [Visualizing Reshaping](reshape.ipynb#visualizing-reshaping)
  - [Exercises](reshape.ipynb#exercises)
- [Merge](merge.ipynb)
  - [Combining Datasets](merge.ipynb#combining-datasets)
  - [`pd.concat`](merge.ipynb#pd-concat)
  - [`pd.merge`](merge.ipynb#pd-merge)
  - [Arguments to `merge`](merge.ipynb#arguments-to-merge)
  - [`df.join`](merge.ipynb#df-join)
  - [Case Study](merge.ipynb#case-study)
  - [Extra Example: Airline Delays](merge.ipynb#extra-example-airline-delays)
  - [Visualizing Merge Operations](merge.ipynb#visualizing-merge-operations)
  - [Exercises](merge.ipynb#exercises)
- [GroupBy](groupby.ipynb)
  - [Split-Apply-Combine](groupby.ipynb#split-apply-combine)
  - [Case Study: Airline Delays](groupby.ipynb#case-study-airline-delays)
  - [Exercise: Cohort Analysis using Shopify Data](groupby.ipynb#exercise-cohort-analysis-using-shopify-data)
  - [Exercises](groupby.ipynb#exercises)
- [Time series](timeseries.ipynb)
  - [Intro](timeseries.ipynb#intro)
  - [Parsing Strings as Dates](timeseries.ipynb#parsing-strings-as-dates)
  - [Date Formatting](timeseries.ipynb#date-formatting)
  - [Extracting Data](timeseries.ipynb#extracting-data)
  - [Accessing Date Properties](timeseries.ipynb#accessing-date-properties)
  - [Leads and Lags: `df.shift`](timeseries.ipynb#leads-and-lags-df-shift)
  - [Rolling Computations: `.rolling`](timeseries.ipynb#rolling-computations-rolling)
  - [Changing Frequencies: `.resample`](timeseries.ipynb#changing-frequencies-resample)
  - [Optional: API keys](timeseries.ipynb#optional-api-keys)
  - [Exercises](timeseries.ipynb#exercises)
- [Intermediate Plotting](matplotlib.ipynb)
  - [Introduction](matplotlib.ipynb#introduction)
  - [The Want Operator: Replicate a Professional Figure](matplotlib.ipynb#the-want-operator-replicate-a-professional-figure)
  - [Data](matplotlib.ipynb#data)
  - [Warmup](matplotlib.ipynb#warmup)
  - [Data Cleaning](matplotlib.ipynb#data-cleaning)
  - [Constructing the Plot](matplotlib.ipynb#constructing-the-plot)
  - [Saving the Figure](matplotlib.ipynb#saving-the-figure)
  - [Exercises](matplotlib.ipynb#exercises)