#  Intro to data structures

In [1]:
import pandas as pd 
import numpy as np 

# Series

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to cal

In [23]:
data = np.array([1,2,3,4,5,6,7,8])
data

array([1, 2, 3, 4, 5, 6, 7, 8])

In [24]:
s = pd.Series(data , index = ['A','B','C','D','E','F','G','H'])
s

A    1
B    2
C    3
D    4
E    5
F    6
G    7
H    8
dtype: int64

TO check the type 

In [4]:
type(s)

pandas.core.series.Series

In [5]:
series = pd.Series(np.random.rand(5),index = ['A','B','C','D','E'])
series

A    0.538982
B    0.898065
C    0.918703
D    0.213179
E    0.691304
dtype: float64

to access the data from series 

In [6]:
series.values

array([0.53898163, 0.89806536, 0.91870332, 0.21317852, 0.69130427])

To access the index 

In [7]:
series.index

Index(['A', 'B', 'C', 'D', 'E'], dtype='object')

#   From dict

In [8]:
d = {'a': 1, 'b': 2, 'c': 3}

In [9]:
series2 = pd.Series(d)
series2

a    1
b    2
c    3
dtype: int64

#  Series is ndarray-like

Series acts very similarly to a ndarray and is a valid argument to most NumPy functions. However, operations such as slicing will also slice the index

In [26]:
s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
s

a   -0.727063
b   -1.791956
c    1.022638
d    2.771892
e   -0.417474
dtype: float64

In [28]:
s.iloc[0].round(2)

np.float64(-0.73)

In [29]:
s.iloc[:3]

a   -0.727063
b   -1.791956
c    1.022638
dtype: float64

In [30]:
s[s > s.median()]

c    1.022638
d    2.771892
dtype: float64

# Vectorized operations and label alignment with Series

When working with raw NumPy arrays, looping through value-by-value is usually not necessary. The same is true when working with Series in pandas. Series can also be passed into most NumPy methods expecting an ndarray

In [31]:
s + s

a   -1.454126
b   -3.583911
c    2.045275
d    5.543784
e   -0.834948
dtype: float64

In [32]:
s * 2

a   -1.454126
b   -3.583911
c    2.045275
d    5.543784
e   -0.834948
dtype: float64

In [33]:
s.iloc[1:] + s.iloc[:-1]

a         NaN
b   -3.583911
c    2.045275
d    5.543784
e         NaN
dtype: float64

#   Name attribute

In [17]:
s = pd.Series(np.random.randn(5), name="raghav")

In [18]:
s

0   -0.305569
1   -1.876293
2   -0.636330
3    0.022954
4   -0.449773
Name: raghav, dtype: float64

In [19]:
s.name

'raghav'

In [20]:
s2 = s.rename("Updated Name")

In [21]:
s2.name

'Updated Name'