## Introduction to Pandas

- Pandas is the most popular python library that is used for data analysis.
- It provides highly optimized performance with back-end source code is purely written in C or Python.
- We can analyze data in pandas with:
    - 1. **Series**
    - 2. **DataFrames**

## Series Basic Functionality

- **Series**: Series is one dimensional(1-D) array defined in pandas that can be used to store any data type.
    - Series has Attributes or Methods like 
        - axes --> Returns a list of the row axis labels
        - dtype --> Returns the dtype of the object.
        - empty --> Returns True if series is empty.
        - ndim --> Returns the number of dimensions of the underlying data, by definition 1.
        - size --> Returns the number of elements in the underlying data.
        - values --> Returns the Series as ndarray.
        - head() --> Returns the first n rows.
        - tail() --> Returns the last n rows.

In [2]:
# Example
import pandas as pd
import numpy as np

#Create a series with 100 random numbers
s = pd.Series(np.random.randn(4))
print(s)

0   -1.653261
1   -0.861048
2    1.188003
3    1.010887
dtype: float64


In [3]:
#axes --> Returns the list of the labels of the series.
import pandas as pd
import numpy as np

#Create a series with 100 random numbers
s = pd.Series(np.random.randn(4))
print("The axes are:")
print(s.axes)

The axes are:
[RangeIndex(start=0, stop=4, step=1)]


In [4]:
# empty --> Returns the Boolean value saying whether the Object is empty or not.
       # --> True indicates that the object is empty.
import pandas as pd
import numpy as np

#Create a series with 100 random numbers
s = pd.Series(np.random.randn(4))
print("Is the Object empty?")
print(s.empty)


Is the Object empty?
False


In [6]:
# ndim --> Returns the number of dimensions of the object. 
       # --> By definition, a Series is a 1D data structure, so it returns
import pandas as pd
import numpy as np

#Create a series with 4 random numbers
s = pd.Series(np.random.randn(4))
print (s)

print ("The dimensions of the object:")
print (s.ndim)

0   -0.895025
1    1.789423
2    0.633876
3    0.794108
dtype: float64
The dimensions of the object:
1


In [7]:
# size --> Returns the size(length) of the series.
import pandas as pd
import numpy as np

#Create a series with 4 random numbers
s = pd.Series(np.random.randn(2))
print (s)
print ("The size of the object:")
print (s.size)


0   -0.755705
1    0.986157
dtype: float64
The size of the object:
2


In [8]:
# values --> Returns the actual data in the series as an array.
import pandas as pd
import numpy as np

#Create a series with 4 random numbers
s = pd.Series(np.random.randn(4))
print (s)

print ("The actual data series is:")
print (s.values)

0   -0.084411
1    0.349575
2   -1.252986
3    1.086016
dtype: float64
The actual data series is:
[-0.08441059  0.34957465 -1.2529856   1.0860158 ]


In [11]:
# head() --> returns the first n rows(observe the index values).
# The default number of elements to display is five, but you may pass a custom number.
import pandas as pd
import numpy as np

#Create a series with 4 random numbers
s = pd.Series(np.random.randn(5))
print ("The original series is:")
print (s)

print ("The first two rows of the data series:")
print (s.head(2))

The original series is:
0   -1.566048
1   -0.215874
2   -0.184635
3   -0.225078
4    1.110255
dtype: float64
The first two rows of the data series:
0   -1.566048
1   -0.215874
dtype: float64


In [12]:
# tail() --> returns the last n rows(observe the index values).
# The default number of elements to display is five, but you may pass a custom number.
import pandas as pd
import numpy as np

#Create a series with 4 random numbers
s = pd.Series(np.random.randn(4))
print ("The original series is:")
print (s)

print ("The last two rows of the data series:")
print (s.tail(2))

The original series is:
0   -1.169983
1    0.728894
2   -0.899715
3   -0.067026
dtype: float64
The last two rows of the data series:
2   -0.899715
3   -0.067026
dtype: float64


## DataFrame Basic Functionality

- **DataFrames** --> The following tables lists down the important attributes or methods that help in DataFrame Basic Functionality.
    - T --> Transposes rows and columns.
    - axes --> Returns a list with the row axis labels and column axis labels as the only members.
    - dtypes --> Returns the dtypes in this object.
    - empty --> True if NDFrame is entirely empty [no items]; if any of the axes are of length 0.
    - ndim --> Number of axes / array dimensions.
    - shape --> Returns a tuple representing the dimensionality of the DataFrame.
    - size --> Number of elements in the NDFrame.
    - values --> Numpy representation of NDFrame.
    - tail() --> Returns last n rows

In [13]:
import pandas as pd
import numpy as np

#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
   'Age':pd.Series([25,26,25,23,30,29,23]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data series is:")
print (df)

Our data series is:
    Name  Age  Rating
0    Tom   25    4.23
1  James   26    3.24
2  Ricky   25    3.98
3    Vin   23    2.56
4  Steve   30    3.20
5  Smith   29    4.60
6   Jack   23    3.80


##### T (Transpose) --> Returns the transpose of the DataFrame. The rows and columns will interchange.

In [None]:
import pandas as pd
import numpy as np
 
# Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
   'Age':pd.Series([25,26,25,23,30,29,23]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

# Create a DataFrame
df = pd.DataFrame(d)
print ("The transpose of the data series is:")
print (df.T)

#### axes --> Returns the list of row axis labels and column axis labels.

In [14]:
import pandas as pd
import numpy as np

#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
   'Age':pd.Series([25,26,25,23,30,29,23]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Row axis labels and column axis labels are:")
print (df.axes)

Row axis labels and column axis labels are:
[RangeIndex(start=0, stop=7, step=1), Index(['Name', 'Age', 'Rating'], dtype='object')]


##### dtypes --> Returns the data type of each column.

In [15]:
import pandas as pd
import numpy as np

#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
   'Age':pd.Series([25,26,25,23,30,29,23]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("The data types of each column are:")
print (df.dtypes)

The data types of each column are:
Name       object
Age         int64
Rating    float64
dtype: object


##### empty --> Returns the Boolean value saying whether the Object is empty or not True indicates that the object is empty.

In [16]:
import pandas as pd
import numpy as np
 
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
   'Age':pd.Series([25,26,25,23,30,29,23]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
 
#Create a DataFrame
df = pd.DataFrame(d)
print ("Is the object empty?")
print (df.empty)

Is the object empty?
False


#### ndim --> Returns the number of dimensions of the object. By definition, DataFrame is a 2D object.

In [18]:
import pandas as pd
import numpy as np

#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
   'Age':pd.Series([25,26,25,23,30,29,23]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The dimension of the object is:")
print (df.ndim)

Our object is:
    Name  Age  Rating
0    Tom   25    4.23
1  James   26    3.24
2  Ricky   25    3.98
3    Vin   23    2.56
4  Steve   30    3.20
5  Smith   29    4.60
6   Jack   23    3.80
The dimension of the object is:
2


#### size --> Returns the number of elements in the DataFrame.

In [19]:
import pandas as pd
import numpy as np
 
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
   'Age':pd.Series([25,26,25,23,30,29,23]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
 
#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The total number of elements in our object is:")
print (df.size)

Our object is:
    Name  Age  Rating
0    Tom   25    4.23
1  James   26    3.24
2  Ricky   25    3.98
3    Vin   23    2.56
4  Steve   30    3.20
5  Smith   29    4.60
6   Jack   23    3.80
The total number of elements in our object is:
21


#### values --> Returns the actual data in the DataFrame as an NDarray.

In [20]:
import pandas as pd
import numpy as np
 
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
   'Age':pd.Series([25,26,25,23,30,29,23]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
 
#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The actual data in our data frame is:")
print (df.values)

Our object is:
    Name  Age  Rating
0    Tom   25    4.23
1  James   26    3.24
2  Ricky   25    3.98
3    Vin   23    2.56
4  Steve   30    3.20
5  Smith   29    4.60
6   Jack   23    3.80
The actual data in our data frame is:
[['Tom' 25 4.23]
 ['James' 26 3.24]
 ['Ricky' 25 3.98]
 ['Vin' 23 2.56]
 ['Steve' 30 3.2]
 ['Smith' 29 4.6]
 ['Jack' 23 3.8]]


####  head() --> returns the first n rows (observe the index values). The default number of elements to display is five, but you may pass a custom number.

In [22]:
import pandas as pd
import numpy as np
 
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
   'Age':pd.Series([25,26,25,23,30,29,23]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data frame is:")
print (df)
print ("The first two rows of the data frame is:")
print (df.head(2))

Our data frame is:
    Name  Age  Rating
0    Tom   25    4.23
1  James   26    3.24
2  Ricky   25    3.98
3    Vin   23    2.56
4  Steve   30    3.20
5  Smith   29    4.60
6   Jack   23    3.80
The first two rows of the data frame is:
    Name  Age  Rating
0    Tom   25    4.23
1  James   26    3.24


#### tail() --> returns the last n rows (observe the index values). The default number of elements to display is five, but you may pass a custom number.

In [23]:
import pandas as pd
import numpy as np

#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
   'Age':pd.Series([25,26,25,23,30,29,23]), 
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
 
#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data frame is:")
print (df)
print ("The last two rows of the data frame is:")
print (df.tail(2))

Our data frame is:
    Name  Age  Rating
0    Tom   25    4.23
1  James   26    3.24
2  Ricky   25    3.98
3    Vin   23    2.56
4  Steve   30    3.20
5  Smith   29    4.60
6   Jack   23    3.80
The last two rows of the data frame is:
    Name  Age  Rating
5  Smith   29     4.6
6   Jack   23     3.8
