# Introduction to Pandas

In [57]:
import numpy as np
import pandas as pd

`Series` is a 1-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.) The axis labels are collectively referred to as the index.
```
s = pd.Series(data, index=index)
```
`data` can be many different things:
- a Python dictionary
- an ndarray
- a scalar value

The passed index is a list of axis labels.

## From `ndarray`

If `data` is an ndarray, an index must be the same length as the data. If no index is passed, one will be created having values `[0, ..., len(data) - 1]`.

In [58]:
# Here, we specify the index
s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
print(s, '\n')
print(s.index)

a   -0.594161
b    0.138052
c    0.678388
d    1.264257
e    0.149400
dtype: float64 

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')


In [59]:
# Here, we let Pandas create a default index
d= pd.Series(np.random.randn(5))
print(d, '\n')
print(d.index)

0   -0.278788
1    1.380488
2    1.834854
3   -0.923532
4    1.842207
dtype: float64 

RangeIndex(start=0, stop=5, step=1)


> `Pandas` supports non-unique index values. If an operation that does not support duplciate index values is attempted, an exception will be raised at that time.

## From `dict`

When data is a dict, and an index is not passed, the **Series** index will be ordered by the dict's insertion order. There is no sorting if you have:
- Python version >= 3.65
- Pandas version >= 0.23.

In [60]:
# Create a dictionary
d = {
    'b' : 1,
    'a' : 0,
    'c' : 2
}

# Create a database
pd.Series(d)

b    1
a    0
c    2
dtype: int64

## From `scalar` value

If `data` is a scalar value, an index must be provided. The value will be repeated to match the length of the index.

In [61]:
n = pd.Series(5., index=['a', 'b', 'c', 'd', 'e'])
print(s)

a   -0.594161
b    0.138052
c    0.678388
d    1.264257
e    0.149400
dtype: float64


## Series is `ndarray`-like

Series acts very similarly to a `ndarray` from `NumPy` and is a valid argument to most Numpy functions. Operations such as slicing will also slice the index.

In [62]:
print(s[0], '\n')
print(s[:3], '\n')
print(s[s > s.median()], '\n')
print(s[[4, 3, 1]], '\n')
print(np.exp(s))

-0.5941611206209565 

a   -0.594161
b    0.138052
c    0.678388
dtype: float64 

c    0.678388
d    1.264257
dtype: float64 

e    0.149400
d    1.264257
b    0.138052
dtype: float64 

a    0.552025
b    1.148035
c    1.970698
d    3.540463
e    1.161138
dtype: float64


In [63]:
s.dtype

dtype('float64')

In [64]:
s.to_numpy()

array([-0.59416112,  0.13805197,  0.6783876 ,  1.26425739,  0.14940031])

## Series is `dict`-like

A `series` is like a fixed-size dict in which you can get and set values by an index label.

In [65]:
print(s['a'])
print(s['e'])
s['e'] = 12
print(s['e'])
print('e' in s)
print('f' in s)

-0.5941611206209565
0.14940030629934656
12.0
True
False


## Vectorized operations

When working with raw `NumPy` arrays, looping through value-by-value is usually not necessary. The same is true when working with `Series` in Pandas. `Series` can also be passed into most NumPy methods expecting an `ndarray`.

In [66]:
s + s

a    -1.188322
b     0.276104
c     1.356775
d     2.528515
e    24.000000
dtype: float64

In [67]:
s * 2

a    -1.188322
b     0.276104
c     1.356775
d     2.528515
e    24.000000
dtype: float64

In [68]:
np.exp(s)

a         0.552025
b         1.148035
c         1.970698
d         3.540463
e    162754.791419
dtype: float64

> The result of an operation between unaligned `Series` will involve the union of the indexes. If a label is not found in one `Series` or the other, the result with be marked as missing NaN.

## Name attribute

`Series` can also have a `name` attribute.

In [69]:
s = pd.Series(np.random.randn(5), name='something')

In [70]:
s

0    0.178773
1   -0.251875
2   -1.202340
3   -0.341223
4   -0.554626
Name: something, dtype: float64

In [71]:
s.name

'something'

# Pandas DataFrame

For this activity, we can continue in the notebook fromthe previous activity. If you decide to create a new one, don't forget to import the packages.

`DataFrame` is a 2-dimensional labeled data structure with columns of cpotentially different types. You can think of it like a spreadsheet or SQL table, or a `dictionary` of `Series` objects. They are generally the most commonly used Pandas object.

`DataFrame` accepts many different kinds of input:
- dictionary of 1-D `ndarrays`, `lists`, `dictionaries`, or `Series`
- 2-D `ndarray`
- `Series`
- `DataFrame`