## Pandas Data Structures

Series, DataFrame and Index

### The Pandas `Series` Object

This file can also be found <a href="https://github.com/amarnag/Pandas-Objects" target="_blank">here</a>.

In [1]:
import pandas as pd
pd.__version__

'1.3.4'

In [2]:
import numpy as np
np.__version__

'1.20.3'

A Pandas `Series` is a one-dimensional array of indexed data. It can be created from a list or array as follows:

In [3]:
data = pd.Series([0.5, 1.5, 2.5, 3.5,5])
data

0    0.5
1    1.5
2    2.5
3    3.5
4    5.0
dtype: float64

As we see in the output, the `Series` wraps both a sequence of `values` and a sequence of `indices`, which we can access with the values and index attributes. The values are simply a familiar NumPy array:

In [4]:
data.values

array([0.5, 1.5, 2.5, 3.5, 5. ])

The index is an array-like object of type pd.Index, which we'll discuss in more detail momentarily.

In [5]:
data.index

RangeIndex(start=0, stop=5, step=1)

Like with a NumPy array, data can be accessed by the associated index via the familiar Python square-bracket notation:

In [6]:
data[3]

3.5

In [7]:
data[2:5]

2    2.5
3    3.5
4    5.0
dtype: float64

The Pandas `Series` has an explicitly defined index associated with the values. Whereas Numpy Array has an implicitly defined integer index used to access the values.

In [8]:
data = pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
data

a    1
b    2
c    3
d    4
e    5
dtype: int64

And the item access works as expected:

In [9]:
data['d']

4

We can even use non-contiguous or non-sequential indices:

In [10]:
data = pd.Series([4,6,8,2], index = [1,2,3,4])
data

1    4
2    6
3    8
4    2
dtype: int64

## Series as a specialized dictionary

In [11]:
dict = {'Gurazala': 522415,
       'Rentachintala': 522421,
       'Kerampudi': 522615,
       'Macherla': 522426,
       'Dachepalli': 522414,
       'Piduguralla': 522413}
zipcode = pd.Series(dict)
zipcode

Gurazala         522415
Rentachintala    522421
Kerampudi        522615
Macherla         522426
Dachepalli       522414
Piduguralla      522413
dtype: int64

In [12]:
zipcode['Gurazala']

522415

In [13]:
zipcode['Kerampudi':'Piduguralla']

Kerampudi      522615
Macherla       522426
Dachepalli     522414
Piduguralla    522413
dtype: int64

### Series Objects

`data` can be a list or NumPy array, in which case index defaults to an integer sequence:

In [14]:
data = pd.Series([2,4,6,8])
data

0    2
1    4
2    6
3    8
dtype: int64

`data` can be a scalar, which is repeated to fill the specified index:

In [15]:
data = pd.Series(8, index = [100, 400, 600])
data

100    8
400    8
600    8
dtype: int64

`data` can be a dictionary, in which index defaults to the sorted dictionary keys:

In [16]:
data = pd.Series({1:'a',2:'b',3:'c',4:'d'})
data

1    a
2    b
3    c
4    d
dtype: object

### Data selection in Series

In [17]:
Swachh_City ={'Indoor': 1,
                 'Surat': 2,
                 'Vijayawada': 3,
                 'Nvai Mumbai': 4,
                  'Pune': 5,
                  'Raipur': 6,
                  'Bhopal': 7,
                  'Vadodara': 8,
                  'Vishakapatnam': 9,
                  'Ahmedhabad': 10}
data = pd.Series(Swachh_City)
data

Indoor            1
Surat             2
Vijayawada        3
Nvai Mumbai       4
Pune              5
Raipur            6
Bhopal            7
Vadodara          8
Vishakapatnam     9
Ahmedhabad       10
dtype: int64

In [18]:
data['Ahmedhabad']

10

In [19]:
# Slicing by implicit integer index
data[3:7]

Nvai Mumbai    4
Pune           5
Raipur         6
Bhopal         7
dtype: int64

In [20]:
# Slicing by explicit index
data['Vijayawada':'Ahmedhabad']

Vijayawada        3
Nvai Mumbai       4
Pune              5
Raipur            6
Bhopal            7
Vadodara          8
Vishakapatnam     9
Ahmedhabad       10
dtype: int64

In [21]:
'Guntur' in data

False

In [22]:
data.keys()

Index(['Indoor', 'Surat', 'Vijayawada', 'Nvai Mumbai', 'Pune', 'Raipur',
       'Bhopal', 'Vadodara', 'Vishakapatnam', 'Ahmedhabad'],
      dtype='object')

In [23]:
data.values

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10], dtype=int64)

In [24]:
data.items()

<zip at 0x1e2a711c080>

In [25]:
list(data.items())

[('Indoor', 1),
 ('Surat', 2),
 ('Vijayawada', 3),
 ('Nvai Mumbai', 4),
 ('Pune', 5),
 ('Raipur', 6),
 ('Bhopal', 7),
 ('Vadodara', 8),
 ('Vishakapatnam', 9),
 ('Ahmedhabad', 10)]

In [26]:
# Masking 
data[(data>3) & (data<9)]

Nvai Mumbai    4
Pune           5
Raipur         6
Bhopal         7
Vadodara       8
dtype: int64

In [27]:
# fancy indexing
data[['Vijayawada', 'Vishakapatnam']]

Vijayawada       3
Vishakapatnam    9
dtype: int64

## Indexers: loc, iloc, and ix

First, the `loc` attribute allows indexing and slicing that always references the explicit index:

In [28]:
data.loc['Vijayawada']

3

In [29]:
data.loc['Bhopal':'Vishakapatnam']

Bhopal           7
Vadodara         8
Vishakapatnam    9
dtype: int64

The `iloc` attribute allows indexing and slicing that always references the implicit Python-style index:

In [30]:
data.iloc[1:8]

Surat          2
Vijayawada     3
Nvai Mumbai    4
Pune           5
Raipur         6
Bhopal         7
Vadodara       8
dtype: int64