# Pandas

# DataFrames

DataFrames are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same index. Let's use pandas to explore this topic!

### Different ways of creating a Dataframe

#### Creating from a list of lists

In [20]:
data = [['tom', 10], ['nick', 15], ['juli', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
df

Unnamed: 0,Name,Age
0,tom,10
1,nick,15
2,juli,14


#### Creating DataFrame from dict of lists
To create DataFrame from dict of list, all the lists must be of same length. If index is passed then the length index should be equal to the length of list. If no index is passed, then by default, index will be range(n) where n is the list length

In [21]:
# initialize data of lists.
data = {'Name':['Tom', 'nick', 'krish', 'jack'],
        'Age':[20, 21, 19, 18]}
 
# Create DataFrame
df = pd.DataFrame(data)
 
# Print the output.
df

Unnamed: 0,Name,Age
0,Tom,20
1,nick,21
2,krish,19
3,jack,18


Creating an indexed dataframe using dictionary of lists

In [22]:
# initialize data of lists.
data = {'Name':['Tom', 'Jack', 'nick', 'juli'],
        'marks':[99, 98, 95, 90]}
 
# Creates pandas DataFrame.
df = pd.DataFrame(data, index =['rank1',
                                'rank2',
                                'rank3',
                                'rank4'])
 
# print the data
df

Unnamed: 0,Name,marks
rank1,Tom,99
rank2,Jack,98
rank3,nick,95
rank4,juli,90


#### Creating Dataframe from list of dicts
Pandas DataFrame can be created by passing lists of dictionaries as a input data. By default dictionary keys taken as columns.

In [76]:
# Pandas DataFrame by lists of dicts.
 
# Initialize data to lists.
data = [{'a': 1, 'b': 2, 'c':3},
        {'a':10, 'b': 20, 'c': 30 , 'd':40}]
 
# Creates DataFrame.
df = pd.DataFrame(data)
 
# Print the data
df


Unnamed: 0,a,b,c,d
0,1,2,3,
1,10,20,30,40.0


#### Creating DataFrame from lists of dictionaries and row indexes

In [24]:
# Pandas DataFrame by passing lists of
# Dictionaries and row indices.

# Initialize data of lists
data = [{'b': 2, 'c':3}, {'a': 10, 'b': 20, 'c': 30}]
 
# Creates pandas DataFrame by passing
# Lists of dictionaries and row index.
df = pd.DataFrame(data, index =['first', 'second'])
 
# Print the data
df

Unnamed: 0,b,c,a
first,2,3,
second,20,30,10.0


#### Create DataFrame from lists of dictionaries with both row index as well as column index.

In [25]:
# Python code demonstrate to create a Pandas DataFrame with lists of
# dictionaries as well as row and column indexes.

# Initialize lists data.
data = [{'a': 1, 'b': 2},
        {'a': 5, 'b': 10, 'c': 20}]
  
# With two column indices, values same
# as dictionary keys
df1 = pd.DataFrame(data, index =['first',
                                 'second'],
                   columns =['a', 'b'])
  
# With two column indices with
# one index with other name
df2 = pd.DataFrame(data, index =['first',
                                 'second'],
                   columns =['a', 'b1'])
  
# print for first data frame
print (df1, "\n")
  
# Print for second DataFrame.
print (df2)

        a   b
first   1   2
second  5  10 

        a  b1
first   1 NaN
second  5 NaN


#### Creating DataFrame using zip() function.
Two lists can be merged by using list(zip()) function. Now, create the pandas DataFrame by calling pd.DataFrame() function.

In [11]:
# Python program to demonstrate creating
# pandas Datadaframe from lists using zip.
   
# List1
Name = ['tom', 'krish', 'nick', 'juli']
   
# List2
Age = [25, 30, 26, 22]
   
# get the list of tuples from two lists.
# and merge them by using zip().
list_of_tuples = list(zip(Name, Age))

#The zip function returns an object of zip class
   
# Assign data to tuples.
print (list_of_tuples)
 
# Converting lists of tuples into
# pandas Dataframe.
df = pd.DataFrame(list_of_tuples,
                  columns = ['Name', 'Age'])
    
# Print data.
df

[('tom', 25), ('krish', 30), ('nick', 26), ('juli', 22)]


Unnamed: 0,Name,Age
0,tom,25
1,krish,30
2,nick,26
3,juli,22


#### Creating DataFrame from Dicts of series.
To create DataFrame from Dicts of series, dictionary can be passed to form a DataFrame. The resultant index is the union of all the series of passed indexed.

In [77]:
# Python code demonstrate creating
# Pandas Dataframe from Dicts of series.
 
# Initialize data to Dicts of series.
d = {'one' : pd.Series([10, 20, 30, 40],
                       index =['a', 'b', 'c', 'd']),
      'two' : pd.Series([10, 20, 30, 40,50],
                        index =['a', 'b', 'c', 'd','e'])}
 
# creates Dataframe.
df = pd.DataFrame(d)
 
# print the data.
df

Unnamed: 0,one,two
a,10.0,10
b,20.0,20
c,30.0,30
d,40.0,40
e,,50


#### Creating DataFrame using Numpy Arrays

In [107]:
np.random.seed(101)
data = np.random.randn(5,4)
print (data)
df = pd.DataFrame(data,index='A B C D E'.split(),columns='W X Y Z'.split())
df

[[ 2.70684984  0.62813271  0.90796945  0.50382575]
 [ 0.65111795 -0.31931804 -0.84807698  0.60596535]
 [-2.01816824  0.74012206  0.52881349 -0.58900053]
 [ 0.18869531 -0.75887206 -0.93323722  0.95505651]
 [ 0.19079432  1.97875732  2.60596728  0.68350889]]


Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [30]:
# To see the underlying Numpy array in the we can use this command

df.values
# This is an attribute and not a function

array([[ 2.70684984,  0.62813271,  0.90796945,  0.50382575],
       [ 0.65111795, -0.31931804, -0.84807698,  0.60596535],
       [-2.01816824,  0.74012206,  0.52881349, -0.58900053],
       [ 0.18869531, -0.75887206, -0.93323722,  0.95505651],
       [ 0.19079432,  1.97875732,  2.60596728,  0.68350889]])