## Data Analysis with Pandas

In this section of the course we will learn how to use pandas for data analysis.

* Series
* DataFrames
* Missing Data
* GroupBy
* Merging,Joining,and Concatenating
* Operations
* Data Input and Output

## Series

The first main data type we will learn about for pandas is the **Serie**s data type. Let's import Pandas and explore the Series object.

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.

Let's explore this concept through some examples:

In [1]:
import numpy as np
import pandas as pd

In [None]:
# from pandas import Series

### Creating a Series

You can convert a list,numpy array, or dictionary to a Series:

```py
pd.Series(
    data=None,
    index=None,
    dtype: 'Dtype | None' = None,
    name=None,
    copy: 'bool' = False,
    fastpath: 'bool' = False,
) -> 'None'
Docstring:     
One-dimensional ndarray with axis labels (including time series).

Parameters
----------
data : array-like, Iterable, dict, or scalar value
    Contains data stored in Series. If data is a dict, argument order is
    maintained.
index : array-like or Index (1d)
    Values must be hashable and have the same length as `data`.
    Non-unique index values are allowed. Will default to
    RangeIndex (0, 1, 2, ..., n) if not provided. If data is dict-like
    and index is None, then the keys in the data are used as the index. If the
    index is not None, the resulting Series is reindexed with the index values.


```

In [None]:
pd.Series()

In [12]:
labels = ['a','b','c'] # list
my_list = [10,20,30] # list
arr = np.array([10,20,30]) # array
d = {'a':10,'b':20,'c':30} ## dictionary

**Using Lists**

In [16]:
pd.Series(data = labels, index = my_list)

10    a
20    b
30    c
dtype: object

In [15]:
pd.Series(index = my_list, data =labels )

10    a
20    b
30    c
dtype: object

In [7]:
pd.Series(labels, my_list)

10    a
20    b
30    c
dtype: object

** NumPy Arrays **

In [17]:
pd.Series(data  =arr, index = my_list)

10    10
20    20
30    30
dtype: int32

In [None]:
## More example with list

** Dictionary**

In [18]:
d

{'a': 10, 'b': 20, 'c': 30}

In [19]:
pd.Series(d)

a    10
b    20
c    30
dtype: int64

In [20]:
df = pd.read_csv('Ecommerce.csv')

In [23]:
df['Address'].index

RangeIndex(start=0, stop=10000, step=1)

### Data in a Series

A pandas Series can hold a variety of object types:

In [None]:
pd.Series(data=labels)

In [None]:
# Even functions (although unlikely that you will use this)
pd.Series([sum,print,len])

## Using an Index


In [24]:
labels2 = ['USA', 'Germany','USSR', 'Japan']

In [28]:
ser1 = pd.Series([1,2,3,4],index =labels2)                                   

In [29]:
ser1

USA        1
Germany    2
USSR       3
Japan      4
dtype: int64

In [30]:
ser2 = pd.Series([1,2,5,4],index = labels2)                                   

In [31]:
ser1['USA']

1

In [32]:
ser1['Germany']

2

In [34]:
ser1['Japan']

4

Operations are then also done based off of index:

In [35]:
ser1 + ser2

USA        2
Germany    4
USSR       8
Japan      8
dtype: int64