## Data Analysis with Pandas

In this section of the course we will learn how to use pandas for data analysis.

* Series
* DataFrames
* Missing Data
* GroupBy
* Merging,Joining,and Concatenating
* Operations
* Data Input and Output

## What is Pandas?

Pandas is a Python library used for working with data sets.

It has functions for **analyzing**, **cleaning**, **exploring**, and **manipulating data**.

The name "**Pandas**" has a reference to both "**Panel Data**", and "**Python Data Analysis**" and was created by Wes McKinney in 2008.

## What Can Pandas Do?
Pandas gives you answers about the data. Like:

- Is there a correlation between two or more columns?
- What is average value?
- Max value?
- Min value?
  
Pandas are also able to **delete rows** that are not relevant, or **contains wrong values**, like empty or **NULL values**. This is called **cleaning the data**.



## Series

The first main data type we will learn about for pandas is the **Series** data type.

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). 

<br>

What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label.

Let's explore this concept through some examples:

In [1]:
import numpy as np
import pandas as pd
from pandas import Series

### Creating a Series

You can convert a list,numpy array, or dictionary to a Series:

```py
pd.Series(
    data=None,
    index=None,
    dtype: 'Dtype | None' = None,
    name=None,
    copy: 'bool' = False,
    fastpath: 'bool' = False,
) -> 'None'
Docstring:     
One-dimensional ndarray with axis labels (including time series).

Parameters
----------
data : array-like, Iterable, dict, or scalar value
    Contains data stored in Series. If data is a dict, argument order is
    maintained.
index : array-like or Index (1d)
    Values must be hashable and have the same length as `data`.
    Non-unique index values are allowed. Will default to
    RangeIndex (0, 1, 2, ..., n) if not provided. If data is dict-like
    and index is None, then the keys in the data are used as the index. If the
    index is not None, the resulting Series is reindexed with the index values.


```

In [12]:
labels = ['a','b','c'] # list
my_list = [10,20,30] # list
arr = np.array([10,20,30]) # array
d = {'a':10,'b':20,'c':30} ## dictionary

**Using Lists**

In [16]:
pd.Series(data = labels, index = my_list)

10    a
20    b
30    c
dtype: object

** NumPy Arrays **

In [None]:
## More example with list

** Dictionary**

### Data in a Series

A pandas Series can hold a variety of object types:

In [None]:
pd.Series(data=labels)

In [None]:
# Even functions (although unlikely that you will use this)
pd.Series([sum,print,len])

## Using an Index


In [24]:
labels2 = ['USA', 'Germany','USSR', 'Japan']

In [28]:
ser1 = pd.Series([1,2,3,4],index =labels2)                                   

In [30]:
ser2 = pd.Series([1,2,5,4],index = labels2)                                   

Operations are then also done based off of index:

In [35]:
ser1 + ser2

USA        2
Germany    4
USSR       8
Japan      8
dtype: int64