## Pandas Examples

### Create an Empty Series:
A basic series, which can be created is an Empty Series.

In [3]:
#import the pandas library and aliasing as pd
import pandas as pd
s = pd.Series()
print(s)

Series([], dtype: float64)


In [5]:
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print(s)

0    a
1    b
2    c
3    d
dtype: object


We did not pass any index, so by default, it assigned the indexes ranging from 0 to len(data)-1, i.e., 0 to 3.

In [6]:
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[100,101,102,103])
print(s)

100    a
101    b
102    c
103    d
dtype: object


We passed the index values here. Now we can see the customized indexed values in the output.

### Create a Series from dict
A dict can be passed as input and if no index is specified, then the dictionary keys are taken in a sorted order to construct index. If index is passed, the values in data corresponding to the labels in the index will be pulled out.

In [7]:
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
print(s)

a    0.0
b    1.0
c    2.0
dtype: float64


Dictionary keys are used to construct index.

In [8]:
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','d','a'])
print(s)

b    1.0
c    2.0
d    NaN
a    0.0
dtype: float64


Index order is persisted and the missing element is filled with NaN (Not a Number).

### Create a Series from Scalar
If data is a scalar value, an index must be provided. The value will be repeated to match the length of index

In [11]:
import pandas as pd
import numpy as np
s = pd.Series(5, index=[0, 1, 2, 3])
print(s)

0    5
1    5
2    5
3    5
dtype: int64


### Accessing Data from Series with Position
Data in the series can be accessed similar to that in an ndarray.

**Example-1:** 
Retrieve the first element. As we already know, the counting starts from zero for the array, which means the first element is stored at zeroth position and so on.

In [20]:
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve the first element
print(s[0])

1


**Example-2:** 
Retrieve the first three elements in the Series. If a : is inserted in front of it, all items from that index onwards will be extracted. If two parameters (with : between them) is used, items between the two indexes (not including the stop index)

In [13]:
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve the first three element
print(s[:3])

a    1
b    2
c    3
dtype: int64


**Example-3:**
Retrieve the last three elements.

In [14]:
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve the last three element
print(s[-3:])

c    3
d    4
e    5
dtype: int64


### Retrieve Data Using Label (Index)
A Series is like a fixed-size dict in that you can get and set values by index label.

**Example-1:**
Retrieve a single element using index label value.

In [22]:
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve a single element
print(s['a'])

1


**Example-2:**
Retrieve multiple elements using a list of index label values.

In [23]:
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve multiple elements
print(s[['a','c','d']])

a    1
c    3
d    4
dtype: int64


**Example-3:**
If a label is not contained, an exception is raised.

In [24]:
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve multiple elements
print(s['f'])

KeyError: 'f'

### DataFrame
A data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.

### Create an Empty DataFrame

A basic DataFrame, which can be created is an Empty Dataframe.



In [26]:
import pandas as pd
df = pd.DataFrame()
print(df)

Empty DataFrame
Columns: []
Index: []


### Create a DataFrame from Lists
The DataFrame can be created using a single list or a list of lists.

**Example-1:**

In [27]:
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print(df)

   0
0  1
1  2
2  3
3  4
4  5


**Example-2:**

In [28]:
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print(df)

     Name  Age
0    Alex   10
1     Bob   12
2  Clarke   13


**Example-3:**

In [29]:
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print(df)

     Name   Age
0    Alex  10.0
1     Bob  12.0
2  Clarke  13.0


### Create a DataFrame from Dict of ndarrays / Lists
All the ndarrays must be of same length. If index is passed, then the length of the index should equal to the length of the arrays.

If no index is passed, then by default, index will be range(n), where n is the array length.

**Example-1:**

In [30]:
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print(df)

    Name  Age
0    Tom   28
1   Jack   34
2  Steve   29
3  Ricky   42


**Example-2:**
Let us now create an indexed DataFrame using arrays.

In [31]:
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print(df)

        Name  Age
rank1    Tom   28
rank2   Jack   34
rank3  Steve   29
rank4  Ricky   42


### Create a DataFrame from List of Dicts
List of Dictionaries can be passed as input data to create a DataFrame. The dictionary keys are by default taken as column names.

**Example-1:**
The following example shows how to create a DataFrame by passing a list of dictionaries.

In [32]:
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)
print(df)

   a   b     c
0  1   2   NaN
1  5  10  20.0


**Example-2:**
The following example shows how to create a DataFrame by passing a list of dictionaries and the row indices.

In [33]:
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data, index=['first', 'second'])
print(df)

        a   b     c
first   1   2   NaN
second  5  10  20.0


**Example-3:**
The following example shows how to create a DataFrame with a list of dictionaries, row indices, and column indices.

In [34]:
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]

#With two column indices, values same as dictionary keys
df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])

#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b1'])
print(df1)
print(df2)

        a   b
first   1   2
second  5  10
        a  b1
first   1 NaN
second  5 NaN


### Create a DataFrame from Dict of Series
Dictionary of Series can be passed to form a DataFrame. The resultant index is the union of all the series indexes passed.

In [35]:
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df)

   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  NaN    4


### Column Selection

In [37]:
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df['one'])

a    1.0
b    2.0
c    3.0
d    NaN
Name: one, dtype: float64


### Column Addition

In [38]:
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)

# Adding a new column to an existing DataFrame object with column label by passing new series

print ("Adding a new column by passing as Series:")
df['three']=pd.Series([10,20,30],index=['a','b','c'])
print(df)

print ("Adding a new column using the existing columns in DataFrame:")
df['four']=df['one']+df['three']

print(df)

Adding a new column by passing as Series:
   one  two  three
a  1.0    1   10.0
b  2.0    2   20.0
c  3.0    3   30.0
d  NaN    4    NaN
Adding a new column using the existing columns in DataFrame:
   one  two  three  four
a  1.0    1   10.0  11.0
b  2.0    2   20.0  22.0
c  3.0    3   30.0  33.0
d  NaN    4    NaN   NaN


### Column Deletion

In [39]:
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']), 
   'three' : pd.Series([10,20,30], index=['a','b','c'])}

df = pd.DataFrame(d)
print ("Our dataframe is:")
print(df)

# using del function
print("Deleting the first column using DEL function:")
del(df['one'])
print(df)

# using pop function
print("Deleting another column using POP function:")
df.pop('two')
print(df)

Our dataframe is:
   one  two  three
a  1.0    1   10.0
b  2.0    2   20.0
c  3.0    3   30.0
d  NaN    4    NaN
Deleting the first column using DEL function:
   two  three
a    1   10.0
b    2   20.0
c    3   30.0
d    4    NaN
Deleting another column using POP function:
   three
a   10.0
b   20.0
c   30.0
d    NaN


### Row Selection, Addition, and Deletion
We can perform row selection, addition and deletion.

#### Selection by Label
Rows can be selected by passing row label to a **loc** function.

In [1]:
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df.loc['b'])

one    2.0
two    2.0
Name: b, dtype: float64


The result is a series with labels as column names of the DataFrame. And, the Name of the series is the label with which it is retrieved.

#### Selection by integer location
Rows can be selected by passing integer location to an **iloc** function.

In [2]:
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df.iloc[2])

one    3.0
two    3.0
Name: c, dtype: float64


#### Slice Rows
Multiple rows can be selected using **‘ : ’** operator.

In [3]:
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df[2:4])

   one  two
c  3.0    3
d  NaN    4


#### Addition of Rows
Add new rows to a DataFrame using the append function. This function will **append** the rows at the end.

In [6]:
import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)
print(df)

   a  b
0  1  2
1  3  4
0  5  6
1  7  8


#### Deletion of Rows

Use **drop** method with index label to delete rows from a DataFrame. If label is duplicated, then multiple rows will be dropped.

In [16]:
import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)

# Drop rows with label 0
df = df.drop(0)

print(df)

   a  b
1  3  4
1  7  8


### Series Basic Functionality

#### Example:

In [28]:
import pandas as pd
import numpy as np

#Create a series with 4 random numbers
s = pd.Series(np.random.randn(4))
print(s)

0   -1.218789
1    0.640912
2    1.288269
3   -0.121283
dtype: float64


#### axes
Returns the list of the labels of the series.

In [29]:
import pandas as pd
import numpy as np

#Create a series with 4 random numbers
s = pd.Series(np.random.randn(4))
print ("The axes are:")
print(s.axes)

The axes are:
[RangeIndex(start=0, stop=4, step=1)]


#### empty
Returns the Boolean value saying whether the Object is empty or not. True indicates that the object is empty.

In [30]:
import pandas as pd
import numpy as np

#Create a series with 4 random numbers
s = pd.Series(np.random.randn(4))
print ("Is the Object empty?")
print(s.empty)

Is the Object empty?
False


#### ndim
Returns the number of dimensions of the object. By definition, a Series is a 1D data structure, so it returns

In [32]:
import pandas as pd
import numpy as np

#Create a series with 4 random numbers
s = pd.Series(np.random.randn(4))
print(s)

print ("The dimensions of the object:")
print(s.ndim)

0    1.645474
1   -0.957666
2    0.149339
3    0.172092
dtype: float64
The dimensions of the object:
1


#### size
Returns the size(length) of the series.

In [35]:
import pandas as pd
import numpy as np

#Create a series with 4 random numbers
s = pd.Series(np.random.randn(4))
print(s)
print ("The size of the object:")
print(s.size)

0    0.409396
1    0.442617
2   -0.546573
3    0.502226
dtype: float64
The size of the object:
4


#### values
Returns the actual data in the series as an array.

In [38]:
import pandas as pd
import numpy as np

#Create a series with 4 random numbers
s = pd.Series(np.random.randn(4))
print(s)

print ("The actual data series is:")
print(s.values)

0    1.144504
1   -0.558468
2   -0.769027
3   -0.899805
dtype: float64
The actual data series is:
[ 1.1445041  -0.55846845 -0.76902688 -0.89980496]


#### Head & Tail
To view a small sample of a Series or the DataFrame object, use the **head()** and the **tail()** methods.

**head()** returns the first n rows(observe the index values). The default number of elements to display is five, but you may pass a custom number.

In [42]:
import pandas as pd
import numpy as np

#Create a series with 4 random numbers
s = pd.Series(np.random.randn(4))
print ("The original series is:")
print(s)

print ("The first two rows of the data series:")
print(s.head(2))

The original series is:
0   -0.995230
1    0.355260
2    0.365966
3   -0.346365
dtype: float64
The first two rows of the data series:
0   -0.99523
1    0.35526
dtype: float64


**tail()** returns the last n rows(observe the index values). The default number of elements to display is five, but you may pass a custom number.

In [40]:
import pandas as pd
import numpy as np

#Create a series with 4 random numbers
s = pd.Series(np.random.randn(4))
print ("The original series is:")
print(s)

print ("The last two rows of the data series:")
print(s.tail(2))


The original series is:
0   -0.872644
1    0.704659
2   -0.496864
3    1.323745
dtype: float64
The last two rows of the data series:
2   -0.496864
3    1.323745
dtype: float64


### DataFrame Basic Functionality

#### Example

In [43]:
import pandas as pd
import numpy as np

#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
   'Age':pd.Series([25,26,25,23,30,29,23]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data series is:")
print(df)

Our data series is:
    Name  Age  Rating
0    Tom   25    4.23
1  James   26    3.24
2  Ricky   25    3.98
3    Vin   23    2.56
4  Steve   30    3.20
5  Smith   29    4.60
6   Jack   23    3.80


#### T (Transpose)
Returns the transpose of the DataFrame. The rows and columns will interchange.

In [44]:
import pandas as pd
import numpy as np
 
# Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
   'Age':pd.Series([25,26,25,23,30,29,23]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

# Create a DataFrame
df = pd.DataFrame(d)
print ("The transpose of the data series is:")
print(df.T)

The transpose of the data series is:
           0      1      2     3      4      5     6
Name     Tom  James  Ricky   Vin  Steve  Smith  Jack
Age       25     26     25    23     30     29    23
Rating  4.23   3.24   3.98  2.56    3.2    4.6   3.8


#### shape
Returns a tuple representing the dimensionality of the DataFrame. Tuple (a,b), where a represents the number of rows and b represents the number of columns.

In [45]:
import pandas as pd
import numpy as np
 
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
   'Age':pd.Series([25,26,25,23,30,29,23]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
 
#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print(df)
print ("The shape of the object is:")
print(df.shape)

Our object is:
    Name  Age  Rating
0    Tom   25    4.23
1  James   26    3.24
2  Ricky   25    3.98
3    Vin   23    2.56
4  Steve   30    3.20
5  Smith   29    4.60
6   Jack   23    3.80
The shape of the object is:
(7, 3)


#### Head & Tail
To view a small sample of a DataFrame object, use the head() and tail() methods. head() returns the first n rows (observe the index values). The default number of elements to display is five, but you may pass a custom number.

In [47]:
import pandas as pd
import numpy as np
 
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
   'Age':pd.Series([25,26,25,23,30,29,23]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data frame is:")
print(df)
print ("The first two rows of the data frame is:")
print(df.head(2))
print ("The last two rows of the data frame is:")
print(df.tail(2))

Our data frame is:
    Name  Age  Rating
0    Tom   25    4.23
1  James   26    3.24
2  Ricky   25    3.98
3    Vin   23    2.56
4  Steve   30    3.20
5  Smith   29    4.60
6   Jack   23    3.80
The first two rows of the data frame is:
    Name  Age  Rating
0    Tom   25    4.23
1  James   26    3.24
The last two rows of the data frame is:
    Name  Age  Rating
5  Smith   29     4.6
6   Jack   23     3.8


### Descriptive Statistics on DataFrame
A large number of methods collectively compute descriptive statistics and other related operations on DataFrame.

**sum()**
Returns the sum of the values for the requested axis. By default, axis is index (axis=0).

In [48]:
import pandas as pd
import numpy as np
 
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
   'Lee','David','Gasper','Betina','Andres']),
   'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
print(df)
print(df.sum())

      Name  Age  Rating
0      Tom   25    4.23
1    James   26    3.24
2    Ricky   25    3.98
3      Vin   23    2.56
4    Steve   30    3.20
5    Smith   29    4.60
6     Jack   23    3.80
7      Lee   34    3.78
8    David   40    2.98
9   Gasper   30    4.80
10  Betina   51    4.10
11  Andres   46    3.65
Name      TomJamesRickyVinSteveSmithJackLeeDavidGasperBe...
Age                                                     382
Rating                                                44.92
dtype: object


In [54]:
print(df.sum(1))

0     29.23
1     29.24
2     28.98
3     25.56
4     33.20
5     33.60
6     26.80
7     37.78
8     42.98
9     34.80
10    55.10
11    49.65
dtype: float64


**mean()**
Returns the average value

In [55]:
print(df.mean())

Age       31.833333
Rating     3.743333
dtype: float64


**std()**
Returns the Bressel standard deviation of the numerical columns.

In [56]:
print(df.std())

Age       9.232682
Rating    0.661628
dtype: float64


**describe()** function computes a summary of statistics pertaining to the DataFrame columns.

In [57]:
print(df.describe())

             Age     Rating
count  12.000000  12.000000
mean   31.833333   3.743333
std     9.232682   0.661628
min    23.000000   2.560000
25%    25.000000   3.230000
50%    29.500000   3.790000
75%    35.500000   4.132500
max    51.000000   4.800000


In [60]:
print(df.describe(include=['object']))   # object summerizes string columns

          Name
count       12
unique      12
top     Gasper
freq         1


In [61]:
print(df.describe(include='all'))

          Name        Age     Rating
count       12  12.000000  12.000000
unique      12        NaN        NaN
top     Gasper        NaN        NaN
freq         1        NaN        NaN
mean       NaN  31.833333   3.743333
std        NaN   9.232682   0.661628
min        NaN  23.000000   2.560000
25%        NaN  25.000000   3.230000
50%        NaN  29.500000   3.790000
75%        NaN  35.500000   4.132500
max        NaN  51.000000   4.800000
