# 01 Pandas - Series

In [4]:
import numpy as np
import pandas as pd

In [5]:
pd.__version__

'2.1.4'


----


At the very basic level, Pandas objects can be thought of as enhanced versions of NumPy structured arrays in which the rows and columns are identified with labels rather than simple integer indices. Pandas is built on top of NumPy to make data processing on relational data easier.

## Limitations of NumPy

Before we start learning Pandas, good to understand the limitations Numpy Possess

* No way to attach labels to data
* No pre-built methods to fill missing values
* No way to group data
* No way to pivot data

Thus, before we go any further, let's introduce these first data structure of three fundamental Pandas data structures

The ``Series``



---



# Pandas - Series


### Creating Series objects

`Series` is an One-dimensional ndarray with axis labels (including time series).

**Pandas Series & Index Syntax:**

```python
pandas.Series(data=None, index=None, dtype=None, 
              name=None, copy=False, fastpath=False)
```
```Python
pandas.Index(data=None, dtype=None, copy=False, 
    name=None, tupleize_cols=True, **kwargs)
```
Documentation link for [pandas.Series](https://pandas.pydata.org/docs/reference/api/pandas.Series.html#)

A Pandas ``Series`` is a one-dimensional array-like object containing a sequence of values (of
similar types to NumPy types) and an associated array of data labels, called its `index`.
The simplest Series is formed from only an array of data. It can be created from a list or array as follows:

In [6]:
# Create Series using a list

mylist = [0.25, 0.5, 0.75, 1.0]
data1 = pd.Series(mylist)
print(data1)

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64


As we see in the output, the ``Series`` wraps both a sequence of values and a sequence of indices, which we can access with the ``values`` and ``index`` attributes. Since we did not specify an index for the data, a default one consisting of the integers 0 through `N - 1` (where N is the length of the data) is created.

The ``values`` are simply a familiar NumPy array:

In [7]:
# Display the values of the series and Its data type

print(data1.values)
print(type(data1.values))


[0.25 0.5  0.75 1.  ]
<class 'numpy.ndarray'>


The ``index`` is an array-like object of type ``pd.Index``. By just typing the ``<object>.index`` we get the output detail about the index. We can see that the index object start number is ``0`` and end with ``4`` (Excluding) and incremented by steps of ``1``.

In [8]:
data1.index

RangeIndex(start=0, stop=4, step=1)

Often it will be desirable to create a Series with an index identifying each data point with a label. Which helps to identify the values with easy and meaningful. Series can be given a `name`, which helps to associate the data in meaningful way.

In [9]:
# Example for Series with Index identifing each data point
# We also given name to the series - Plannet_Mercury

mercury = pd.Series([0.33, 57.9, 4222.6], 
                 index=['Mass','Diameter','Daylength'],
                   name="Planet_Mercury")

print("Series Name:", mercury.name)
print(mercury)


Series Name: Planet_Mercury
Mass            0.33
Diameter       57.90
Daylength    4222.60
Name: Planet_Mercury, dtype: float64


***

### Different Ways to Create Series

* Using a List - **Already Tested before**
* Using Numpy Array
* Using Dictionary 

In [10]:
# Create Series using Numpy Array for Data and Index

arr = np.random.randint(1,50, 10)           # Generate Random 10 numbers between 1 and 50
index_arr = np.random.randint(60,100, 10)   # Generate Random 10 numbers between 60 and 100
np_series = pd.Series(arr, index_arr)       # Use both the ndarray as data and index.

print(np_series)

72    21
74    24
70    15
88    36
65    34
72     5
81    17
88     6
70    47
64    11
dtype: int32


When you are only passing a Dictionary object, the index in the resulting Series will have the dict’s keys.

In [11]:
# Create Series using a dictionary

population_dict = {'Texas': 26448193,
                   'New York': 19651127,
                   'Florida': 19552860,
                   'Illinois': 12882135,
                   'California': 38332521}
print("Dictionary Keys:", population_dict.keys())
print("Dictionary Values:", population_dict.values())
population = pd.Series(population_dict)
print(population)

Dictionary Keys: dict_keys(['Texas', 'New York', 'Florida', 'Illinois', 'California'])
Dictionary Values: dict_values([26448193, 19651127, 19552860, 12882135, 38332521])
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
California    38332521
dtype: int64


You can override the order of the keys by passing the index data separatly, The dictionary keys are matched with the passed index values, if any Dictionary key is missed in the index, the same will be ignored in the `Series`, if any new key is passed which is not part of the dictionary key it is added to the `Series` with `NaN` value.

In [12]:
# Create Series using a dictionary
# Since Texas not passed in index it is skipped
# Since Florida not passed in the Index is added with 'NaN'

sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
states = ['Ohio', 'Oregon','Florida','Utah']
obj3 = pd.Series(sdata)
obj4 = pd.Series(sdata, index=states)
print(obj3)
print("\n")
print(obj4)

Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64


Ohio       35000.0
Oregon     16000.0
Florida        NaN
Utah        5000.0
dtype: float64


Source is a dictionary, in which ``index`` is limited Not covering all the values, Notice that in this case, the ``Series`` is populated only with the explicitly identified index values

In [13]:
pd.Series({2:'a', 1:'b', 3:'c'}, index=[3, 2])

3    c
2    a
dtype: object


---


## ``Series`` as generalized NumPy array

From what we've seen so far, it may look like the ``Series`` object is basically interchangeable with a one-dimensional NumPy array.
The essential difference is the presence of the index: while the Numpy Array has an *implicitly defined* integer index used to access the values, the Pandas ``Series`` has an *explicitly defined* index associated with the values.

This explicit index definition gives the ``Series`` object additional capabilities. For example, the index need not be an integer, but can consist of values of any desired type.

For example, if we wish, we can use strings as an index:

In [14]:
data2 = pd.Series([0.25, 0.5, 0.75, 1.0],
                 index=['a', 'b', 'c', 'd'])
data2

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [15]:
data2.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [16]:
# Still implicit Indexing works

print(data2[0])
print(data2[2])
print("{0}".format(data2[1:3]))

0.25
0.75
b    0.50
c    0.75
dtype: float64


  print(data2[0])
  print(data2[2])


In [17]:
# Explicit Indexing
# Note in explicit indexing the end range is not excluded

print(data2['c'])
print("{0}".format(data2['b':'c']))

0.75
b    0.50
c    0.75
dtype: float64


We can even use non-contiguous or non-sequential indices:

In [18]:
data3 = pd.Series([0.25, 0.5, 0.75, 1.0],
                 index=[2, 4, 6, 3])
data3

2    0.25
4    0.50
6    0.75
3    1.00
dtype: float64

In [19]:
data3.index

Index([2, 4, 6, 3], dtype='int64')

In [20]:
data3[0:3]

2    0.25
4    0.50
6    0.75
dtype: float64


---


### Series as specialized dictionary

In this way, you can think of a Pandas ``Series`` a bit like a specialization of a Python dictionary.

A dictionary is a structure that maps arbitrary keys to a set of arbitrary values, and a ``Series`` is a structure which maps typed keys to a set of typed values.

This typing is important: just as the type-specific compiled code behind a NumPy array makes it more efficient than a Python list for certain operations, the type information of a Pandas ``Series`` makes it much more efficient than Python dictionaries for certain operations.

The ``Series``-as-dictionary analogy can be made even more clear by constructing a ``Series`` object directly from a Python dictionary:

In [21]:
population_dict = {'California': 38332521,
                   'Texas': 26448193,
                   'New York': 19651127,
                   'Florida': 19552860,
                   'Illinois': 12882135}

population = pd.Series(population_dict)
print(population)

California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
dtype: int64


By default, a ``Series`` will be created where the index is drawn from the sorted keys. From here, typical dictionary-style item access can be performed:

In [22]:
population['California']

38332521

In [23]:
# Notice that all the elements from start to end is listed
# the rule of end exclusion is not happening

population['California':'Florida']

California    38332521
Texas         26448193
New York      19651127
Florida       19552860
dtype: int64

In [24]:
# Using of slicing uses implicit index
# Florida's Index is 3

population[0:4]

California    38332521
Texas         26448193
New York      19651127
Florida       19552860
dtype: int64

In [25]:
population.index

Index(['California', 'Texas', 'New York', 'Florida', 'Illinois'], dtype='object')


---


## The Pandas Index Object

We have seen here that ``Series`` objects contain an explicit *index* that lets you reference and modify data.

```Python
pandas.Index(data=None, dtype=None, copy=False, 
    name=None, tupleize_cols=True, **kwargs)
```
Documentation link for [pandas.index](https://pandas.pydata.org/docs/reference/api/pandas.Index.html)

This ``Index`` object is an interesting structure in itself, and it can be thought of either as an *immutable array* or as an *ordered set* (technically a multi-set, as ``Index`` objects may contain repeated values).

Those views have some interesting consequences in the operations available on ``Index`` objects.

As a simple example, let's construct an ``Index`` from a list of integers:

In [26]:
ind = pd.Index([2, 3, 5, 7, 11])
ind

Index([2, 3, 5, 7, 11], dtype='int64')

### Index as immutable array

The ``Index`` in many ways operates like an array.
For example, we can use standard Python indexing notation to retrieve values or slices:

In [27]:
ind[1]

3

In [28]:
ind[::2]

Index([2, 5, 11], dtype='int64')

``Index`` objects also have many of the attributes familiar from NumPy arrays:

In [29]:
print(ind.size, ind.shape, ind.ndim, ind.dtype)

5 (5,) 1 int64


One difference between ``Index`` objects and NumPy arrays is that indices are immutable–that is, they cannot be modified via the normal means.

This immutability makes it safer to share indices between multiple ``DataFrames`` and arrays, without the potential for side effects from inadvertent index modification.

In [30]:
ind.argmax

<bound method Index.argmax of Index([2, 3, 5, 7, 11], dtype='int64')>

### Index as ordered set

Pandas objects are designed to facilitate operations such as joins across datasets, which depend on many aspects of set arithmetic.
The ``Index`` object follows many of the conventions used by Python's built-in ``set`` data structure, so that unions, intersections, differences, and other combinations can be computed in a familiar way:

In [31]:
indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])

In [32]:
# Intersection

indA.intersection(indB)

Index([3, 5, 7], dtype='int64')

In [33]:
# Union

indA.union(indB)

Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')

In [34]:
# Difference

indA.difference(indB)

Index([1, 9], dtype='int64')

In [35]:
# Symmetric Difference

indA.symmetric_difference(indB)

Index([1, 2, 9, 11], dtype='int64')


---
