# [02. Package overview](https://pandas.pydata.org/pandas-docs/stable/getting_started/overview.html)

## Package overview

**pandas** is well suited for many different kinds of data:

1. ****Tabular data** with heterogeneously-typed columns, as in an **SQL table or Excel spreadsheet**

2. **Ordered and unordered (not necessarily fixed-frequency) time series data.**

3. Arbitrary **matrix data** (homogeneously typed or heterogeneous) with row and column labels

4. **Any other form** of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure

The **two primary data structures of pandas:**
1. **Series (1-dimensional)** and 
2. **DataFrame (2-dimensional)**

For R users, DataFrame provides everything that **R’s data.frame** provides and much more. **pandas is built on top of NumPy and** is intended to integrate well within a scientific computing environment with **many other 3rd party libraries**.

pandas does well:

1. Easy handling of **missing data** (represented as **NaN**) in floating point as well as non-floating point data

2. **Size mutability**: columns can be inserted and deleted from DataFrame and higher dimensional objects

3. **Automatic and explicit data alignment**: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let Series, DataFrame, etc. automatically align the data for you in computations

4. Powerful, flexible group by functionality to perform **split-apply-combine operations** on data sets, **for both aggregating and transforming data**

5. Make it **easy to convert ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects**

6. Intelligent label-based **slicing, fancy indexing, and subsetting** of large data sets

7. Intuitive **merging and joining data sets**

8. Flexible **reshaping and pivoting of data sets**

9. **Hierarchical labeling of axes** (possible to have multiple labels per tick)

10. **Robust IO tools** for loading data from:
    1. **flat files (CSV and delimited), Excel files, databases**, and 
    2. saving / loading data from the ultrafast **HDF5 format**

11. **Time series**-specific functionality: date range generation and frequency conversion, moving window statistics, date shifting and lagging.

12. Multiple stages **for data scientists:**
    1. **munging and cleaning data**, 
    2. **analyzing / modeling** it, then 
    3. **organizing the results** of the analysis into a form suitable for plotting or tabular display. 
    4. pandas is the ideal tool for all of these tasks.

Some other notes

**pandas is fast.** Many of the low-level algorithmic bits have been extensively tweaked in **[Cython](https://cython.org/)** code. However, as with anything else generalization usually sacrifices performance. So if you focus on one feature for your application you may be able to create a faster specialized tool.

**pandas is a dependency of [statsmodels](https://www.statsmodels.org/stable/index.html), making it an important part of the statistical computing ecosystem in Python.**

**pandas has been used extensively in production in financial applications.**

## Data structures
1. 1-Dimension: **Series**
    + 1D labeled homogeneously-typed array
2. 2-Dimension: **DataFrame**
    + General 2D labeled, size-mutable tabular structure with potentially heterogeneously-typed column

## Why more than one data structure?
+ **The best way to think about the pandas data structures is as flexible containers for lower dimensional data.** For example, **DataFrame is a container for Series, and Series is a container for scalars**. We would like to be able to **insert and remove objects from these containers in a dictionary-like fashion**.

For example, with **tabular data (DataFrame)** it is more semantically helpful to think of the **index** (the rows) and the **columns** rather than axis 0 and axis 1. **Iterating through the columns of the DataFrame** thus results in more readable code:

## Mutability and copying of data
+ All pandas data structures are **value-mutable** (the values they contain can be altered) **but not always size-mutable. The length of a Series cannot be changed, but,** for example, **columns can be inserted into a DataFrame.** However, the vast majority of methods produce new objects and leave the input data untouched. In general we like to **favor immutability** where sensible.