A Data Frame can be created from a dictionary, Series and even the combination of these two. The Dictionary keys provide the column labels for the Data Frame.

In [1]:
import pandas as pd
import numpy as np

# create a DataFrame using 3 series and a Dictionary
s1 = pd.Series(np.arange(1,6))
s2 = pd.Series(np.arange(6,11))
s3 = pd.Series(np.arange(11,16))
df2 = pd.DataFrame({'C1':s1, 'C2':s2, 'C3':s3})

df2

# Note how the dictionary keys become the column lables

Unnamed: 0,C1,C2,C3
0,1,6,11
1,2,7,12
2,3,8,13
3,4,9,14
4,5,10,15


In [2]:
# create a DataFrame from a Dictionary with List values

data_dict = {'product': ['ball','pen','pencil','paper','mug'],
            'color': ['blue','green','yellow','red','white'],
            'price': [1.2,1.0,0.6,0.9,1.7],
            'price_discount': [1.1,0.8, 0.5,0.8,1.5]}

df3 = pd.DataFrame(data_dict)
df3

#Again note how the keys become the column labels and the values become the data

Unnamed: 0,product,color,price,price_discount
0,ball,blue,1.2,1.1
1,pen,green,1.0,0.8
2,pencil,yellow,0.6,0.5
3,paper,red,0.9,0.8
4,mug,white,1.7,1.5


If the labels for the index (row) are not explicitly specified in your DataFrame, Pandas by default assigns a numeric sequence starting from 0. As you saw earlier, if you want to assign labels to the indexes (rows) you can include the index parameter when you create your DataFrame object to assign the labels you want.

In [3]:
# assign lables to the index(rows)
df3 = pd.DataFrame(data_dict, index=['one','two','three','four','five'])
df3

Unnamed: 0,product,color,price,price_discount
one,ball,blue,1.2,1.1
two,pen,green,1.0,0.8
three,pencil,yellow,0.6,0.5
four,paper,red,0.9,0.8
five,mug,white,1.7,1.5


A Data Frame performs automatic alignment of the data for each Series in the dictionary. This means that each value will appear in the correct column. To achieve this, the Series or arrays you are using to create the frame should have the same size.

What happens if your Series or arrays don't have the same size? No worries! Pandas will automatically fill these spaces with the NaN values.

In [4]:
# create a DataFrame using 4 Series with different size and a dictionary for column lables
s1 = pd.Series(np.arange(1,6)) # 5 elements
s2 = pd.Series(np.arange(6,9)) # 3 elements
s3 = pd.Series(np.arange(11,16)) # 5 elements
s4 = pd.Series((np.arange(16,20)))# 4 elements
df2 = pd.DataFrame({'C1':s1, 'C2': s2, 'C3':s3, 'C4':s4})
df2

Unnamed: 0,C1,C2,C3,C4
0,1,6.0,11,16.0
1,2,7.0,12,17.0
2,3,8.0,13,18.0
3,4,,14,19.0
4,5,,15,


## Practise Data Frames

In [9]:
# Create a Data Frame using a two-dimensional array 5x4 (5 rows and 4 columns)

import pandas as pd
import numpy as np

# 5x4 array
# could also use lists rather than arange, but arange, saves typing :)

df1 = pd.DataFrame(np.array([np.arange(1,5),np.arange(5,9), 
                             np.arange(9,13),np.arange(13,17),
                             np.arange(17,21)]))
print('Data Frame for 5x4 array:')
print(df1)
print()

Data Frame for 5x4 array:
    0   1   2   3
0   1   2   3   4
1   5   6   7   8
2   9  10  11  12
3  13  14  15  16
4  17  18  19  20



In [6]:
# Create a Data Frame using 3 Series with the same size.
s1 = pd.Series(np.arange(1,5))
s2 = pd.Series(np.arange(5,9))
s3 = pd.Series(np.arange(9,13))
df2 = pd.DataFrame([s1,s2,s3])
print('Data Frame from 3 equal size Series')
print(df2)


Data Frame from 3 equal size Series
   0   1   2   3
0  1   2   3   4
1  5   6   7   8
2  9  10  11  12


In [7]:
# Create a Data Frame using 5 Series with different sizes
s1 = pd.Series(np.arange(1,5))
s2 = pd.Series(np.arange(5,6))
s3 = pd.Series(np.arange(9,13))
s4 = pd.Series(np.arange(8,13))
s5 = pd.Series(np.arange(1,12))
df3 = pd.DataFrame([s1,s2,s3,s4,s5])
print('Data Frame from 5 different sized Series')
print(df3)

Data Frame from 5 different sized Series
     0     1     2     3     4    5    6    7    8     9    10
0  1.0   2.0   3.0   4.0   NaN  NaN  NaN  NaN  NaN   NaN   NaN
1  5.0   NaN   NaN   NaN   NaN  NaN  NaN  NaN  NaN   NaN   NaN
2  9.0  10.0  11.0  12.0   NaN  NaN  NaN  NaN  NaN   NaN   NaN
3  8.0   9.0  10.0  11.0  12.0  NaN  NaN  NaN  NaN   NaN   NaN
4  1.0   2.0   3.0   4.0   5.0  6.0  7.0  8.0  9.0  10.0  11.0


In [8]:
# Create a Data Frame with labelled rows (index) and columns using a dictionary.
df4 = pd.DataFrame({'col1':[1,2,3,4],'col2':[5,6,7,8], 'col3':[9,10,11,12]}, 
                   index=['row1','row2','row3','row4'])
print('Data Frame from Dictionary with index (row) and column labels')
print(df4)

Data Frame from Dictionary with index (row) and column labels
      col1  col2  col3
row1     1     5     9
row2     2     6    10
row3     3     7    11
row4     4     8    12
