# Pandas

pandas is an open source library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. In this notebook we will cover the following topics:

* Series
* DataFrame
* Dropping Entries
* Indexing, Selecting, Filtering
* Arithmetic and Data Alignment
* Function Application and Mapping
* Sorting
* Axis Indices with Duplicate Values
* Summarising and Computing Descriptive Statistics
* Cleaning Data
* Input and Output

For help please refer to [The official documentation page.](https://pandas.pydata.org/pandas-docs/stable/)

## Imports

## Series
A Series is a one-dimensional array-like object containing an array of data and an associated array of data labels. The data can be any NumPy data type and the labels are the Series' indices.

Note that each element of the list now has an index when it's converted to a series.

Get the array representation of a Series:

Get the index of the Series:

Index objects are immutable and hold the axis labels and metadata such as names and axis names. Now let's create a series with a custom index:

Get a value from a Series:

Verify the index number agains the index name:

Get a set of values from a Series by passing in a list od indices:

Get values greater than 1:

Multiply by a scalar:

Apply a function

A Series is like a fixed-length, ordered dictionary. We can create a series from dictionaries:

Note that the keys have become the indices in the Series.

We can also re-order a Series by passing in an index list (indices which are not found are considered as `NaN`) when creating from a dictionary:

We can also check for `NaN`s:

Or:

Series automatically aligns differently indexed data in arithmetic operations:

We can also name a Series and its index:

We can rename a Series' index in place:

---

## DataFrame

A DataFrame is a tabular data structure containing an ordered collection of columns.  Each column can have a different type.  DataFrames have both row and column indices.  Row and column operations are treated roughly symmetrically.  Columns returned when indexing a DataFrame are views of the underlying data, not a copy.  To obtain a copy, use the copy() method.

Pandas can create DataFrames in different ways (e.g., reading in a file (txt, json, csv), or from a dictionary). Let's start by creating a DataFrame from a dictionary:

Create a DataFrame specifying a sequence of columns:

Like Series, columns that are not present in the data are `NaN`:

We can retrieve a column by the column name, returning a Series:

We can retrieve a column by attribute, returning a Series:

We can retrieve a row by position:

We can update a column by assignment:

We can assign a Series to a column (note if assigning a list or array, the length must match the DataFrame, unlike a Series):

We can assign a new column that doesn't exist to any existing column to create a new column (a copy):

We can also delete the column:

We can create a DataFrame from a nested dictionary of dicts (the keys in the inner dicts are unioned and sorted to form the index in the result, unless an explicit index is specified):

We can transpose a DataFrame:

We can set an index name for the DataFrame:

We can also set a name for the DataFrame columns

Return the data contained in a DataFrame as a 2D ndarray:

## Dropping Entries

Drop rows from a Series or DataFrame:

Drop columns from a DataFrame:

## Indexing, Selecting, Filtering in DataFrames

Select specified columns from a DataFrame:

Select a slice from a DataFrame:

or

Select from a DataFrame based on a filter:

or

Select a slice of rows from a specific column of a DataFrame:

## Arithmetic and Data Alignment

Adding DataFrame objects results in the union of index pairs for rows and columns if the pairs are not the same, resulting in `NaN` for indices that do not overlap:

## Function Application and Mapping

Apply a function on 1D arrays to each column:

Apply a function on 1D arrays to each row:

## Sorting

Sort a DataFrame by its index:

Sort a DataFrame by columns in descending order:

## Axis Indices with Duplicate Values

Labels do not have to be unique in Pandas:

Select DataFrame elements:

## Summarising and Computing Descriptive Statistics

Unlike NumPy arrays, Pandas descriptive statistics automatically exclude missing data.  NaN values are excluded unless the entire row or column is NA.

Sum over the rows:

Account for NaNs:

## Cleaning Data
* Replace
* Drop
* Concatenate

### Replace

Replace all occurrences of a string with another string, in place (no copy):

In a specified column, replace all occurrences of a string with another string, in place (no copy):

### Drop

Drop the 'Population' column and return a copy of the DataFrame:

### Concatenate

Concatenate rows of two DataFrames:

Concatenate columns of two DataFrames:

## Input and Output
* Reading
* Writing

### Reading

### Writing