### Series

Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index.

A pandas Series can be created using the following constructor −

pandas.Series( data, index, dtype, copy)

- data takes various forms like ndarray, list, constants
- Index values must be unique and hashable, same length as data. Default np.arrange(n) if no index is passed.
- dtype is for data type. If None, data type will be inferred
- Copy data. Default False

In [8]:
import pandas as pd
import numpy as np

In [28]:
# create empty series

s = pd.Series() # empty series : default data type : object
print(s)
print()

Series([], dtype: float64)



  This is separate from the ipykernel package so we can avoid doing imports until


In [29]:
# Create a Series from ndarray

data = np.array(['a','b','c','d'])
s = pd.Series(data)
print(s)
print()

data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[100,101,102,103])
print(s)

0    a
1    b
2    c
3    d
dtype: object

100    a
101    b
102    c
103    d
dtype: object


In [30]:
# Create a Series from dict
# A dict can be passed as input and if no index is specified, then the dictionary keys are taken in a sorted order to construct index. 
# If index is passed, the values in data corresponding to the labels in the index will be pulled out.

data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
print(s)
print()

data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','d','a'])
print(s)

a    0.0
b    1.0
c    2.0
dtype: float64

b    1.0
c    2.0
d    NaN
a    0.0
dtype: float64


In [31]:
# Create a Series from Scalar
s = pd.Series(5, index=[0, 1, 2, 3])
print(s)

0    5
1    5
2    5
3    5
dtype: int64


In [32]:
# Accessing Data from Series with Position
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve 3rd element
print(s[3])

#retrieve the first three element
print(s[:3])

#retrieve the last three element
print(s[-3:])

4
a    1
b    2
c    3
dtype: int64
c    3
d    4
e    5
dtype: int64


In [33]:
# Retrieve Data Using Label (Index)

s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve a single element
print(s['a'])

#retrieve multiple elements
print(s[['a','c','d']])

# error : If a label is not contained, an exception is raised.
# print(s['f']) -> KeyError: 'f'

1
a    1
c    3
d    4
dtype: int64


### DataFrame
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.

pandas.DataFrame( data, index, columns, dtype, copy)

- data takes various forms like ndarray, series, map, lists, dict, constants and also another DataFrame.
- For the row labels, the Index to be used for the resulting frame is Optional Default np.arange(n) if no index is passed
- For column labels, the optional default syntax is - np.arange(n). This is only true if no index is passed.
- Data type of each column.
- copy is used for copying of data, if the default is False.

In [34]:
# Create an Empty DataFrame
df = pd.DataFrame()
print(df)

Empty DataFrame
Columns: []
Index: []


In [44]:
# Create a DataFrame from Lists
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print(df)

data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print(df)

df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print(df)

   0
0  1
1  2
2  3
3  4
4  5
     Name  Age
0    Alex   10
1     Bob   12
2  Clarke   13
     Name   Age
0    Alex  10.0
1     Bob  12.0
2  Clarke  13.0


In [45]:
# Create a DataFrame from Dict of ndarrays / Lists
# All the ndarrays must be of same length. If index is passed, then the length of the index should equal to the length of the arrays.
# If no index is passed, then by default, index will be range(n), where n is the array length.

data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print(df)

df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print(df)

    Name  Age
0    Tom   28
1   Jack   34
2  Steve   29
3  Ricky   42
        Name  Age
rank1    Tom   28
rank2   Jack   34
rank3  Steve   29
rank4  Ricky   42


In [46]:
# Create a DataFrame from List of Dicts
# The dictionary keys are by default taken as column names.

data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)
print(df)

df = pd.DataFrame(data, index=['first', 'second'])
print(df)

#With two column indices, values same as dictionary keys
df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])

#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b1'])
print(df1)
print(df2)

   a   b     c
0  1   2   NaN
1  5  10  20.0
        a   b     c
first   1   2   NaN
second  5  10  20.0
        a   b
first   1   2
second  5  10
        a  b1
first   1 NaN
second  5 NaN


In [48]:
# Create a DataFrame from Dict of Series

d = {'one' : pd.Series([1, 2, 3]),
   'two' : pd.Series([1, 2, 3, 4])}

df = pd.DataFrame(d)
print(df)

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df)

   one  two
0  1.0    1
1  2.0    2
2  3.0    3
3  NaN    4
   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  NaN    4


In [50]:
# Column Selection

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df['one'])

a    1.0
b    2.0
c    3.0
d    NaN
Name: one, dtype: float64


In [51]:
# Column Addition

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)

# Adding a new column to an existing DataFrame object with column label by passing new series

print("Adding a new column by passing as Series:")
df['three']=pd.Series([10,20,30],index=['a','b','c'])
print(df)

print("Adding a new column using the existing columns in DataFrame:")
df['four']=df['one']+df['three']

print(df)

Adding a new column by passing as Series:
   one  two  three
a  1.0    1   10.0
b  2.0    2   20.0
c  3.0    3   30.0
d  NaN    4    NaN
Adding a new column using the existing columns in DataFrame:
   one  two  three  four
a  1.0    1   10.0  11.0
b  2.0    2   20.0  22.0
c  3.0    3   30.0  33.0
d  NaN    4    NaN   NaN


In [52]:
# Column Deletion
# Columns can be deleted or popped

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']), 
   'three' : pd.Series([10,20,30], index=['a','b','c'])}

df = pd.DataFrame(d)
print ("Our dataframe is:")
print (df)

# using del function
print ("Deleting the first column using DEL function:")
del df['one']
print (df)

# using pop function
print ("Deleting another column using POP function:")
df.pop('two')
print (df)

Our dataframe is:
   one  two  three
a  1.0    1   10.0
b  2.0    2   20.0
c  3.0    3   30.0
d  NaN    4    NaN
Deleting the first column using DEL function:
   two  three
a    1   10.0
b    2   20.0
c    3   30.0
d    4    NaN
Deleting another column using POP function:
   three
a   10.0
b   20.0
c   30.0
d    NaN


In [71]:
# Row Selection, Addition, and Deletion

# Selection by Label
# Rows can be selected by passing row label to a loc function.
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df.loc['b'])
print(df.loc['d'])

print()
# Selection by integer location
# Rows can be selected by passing integer location to an iloc function.

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print (df.iloc[2])
print()

# Slice Rows
# Multiple rows can be selected using ‘ : ’ operator.
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df[2:4])
print()

# Addition of Rows
df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)
print(df)
print(df.loc[0]) # will print two rows, as label is 0 for two rows

print()
# Deletion of Rows
# Use index label to delete or drop rows from a DataFrame. If label is duplicated, then multiple rows will be dropped

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)

# Drop rows with label 0
df = df.drop(0)

print (df)

one    2.0
two    2.0
Name: b, dtype: float64
one    NaN
two    4.0
Name: d, dtype: float64

one    3.0
two    3.0
Name: c, dtype: float64

   one  two
c  3.0    3
d  NaN    4

   a  b
0  1  2
1  3  4
0  5  6
1  7  8
   a  b
0  1  2
0  5  6

   a  b
1  3  4
1  7  8
