In [1]:
import pandas as pd

###### Creating a DataFrame from a Dictionary

In [2]:
pd.DataFrame({'id':[100, 101, 102], 'color':['red', 'blue', 'red']})

Unnamed: 0,color,id
0,red,100
1,blue,101
2,red,102


We see that the keys, of the dict, becomes the column names and the values, of the dict, become the column values

The columns did not show up in the order that I wrote them. This is because the dict is an unordered data structure. 

###### To specify the columns in a particular order

In [3]:
pd.DataFrame({'id':[100, 101, 102], 'color':['red', 'blue', 'red']}, columns = ['id', 'color'])

Unnamed: 0,id,color
0,100,red
1,101,blue
2,102,red


To specify the columns in a particular order you must use the columns argument 
You simply specify them in the order that you want them to appear

###### The Index

If you do not specify an index, the df constructor will use the default integer index. However, you can specify an index if you have one

In [4]:
pd.DataFrame({'id':[100, 101, 102], 'color':['red', 'blue', 'red']}, columns = ['id', 'color'], index = ['a', 'b', 'c'])

Unnamed: 0,id,color
a,100,red
b,101,blue
c,102,red


Here we are passing the index a list of strings but you can also pass it integers as well

This is the most common way to create a df

In [5]:
df = pd.DataFrame({'id':[100, 101, 102], 'color':['red', 'blue', 'red']}, columns = ['id', 'color'], index = ['a', 'b', 'c'])

###### Create a df from a list of lists

In [7]:
pd.DataFrame([[100, 'red'], [101, 'blue'], [102, 'red']])

Unnamed: 0,0,1
0,100,red
1,101,blue
2,102,red


Each of the inner lists gets treated as a row and each list gets stacked on top of each other

In [8]:
pd.DataFrame([[100, 'red'], [101, 'blue'], [102, 'red']], columns=['id', 'color'])

Unnamed: 0,id,color
0,100,red
1,101,blue
2,102,red


Here we specify the columns so that we can move away from the Pandas generated default columns

###### Converting a numpy array to a DataFrame

In [9]:
import numpy as np

In [10]:
arr = np.random.rand(4,2)

This commands creates a 4 by 2 (4 rows by 2 columns) numpy array of random numbers between zero and one (the uniform distribution)

In [11]:
arr # Here is our numpt array

array([[0.79455832, 0.78715819],
       [0.40731938, 0.07616222],
       [0.40748625, 0.26655695],
       [0.41876625, 0.34100022]])

In [12]:
pd.DataFrame(arr) # So to create the df we simply pass the arr to the df constructor

Unnamed: 0,0,1
0,0.794558,0.787158
1,0.407319,0.076162
2,0.407486,0.266557
3,0.418766,0.341


In [13]:
pd.DataFrame(arr, columns = ['one', 'two']) # To add column names

Unnamed: 0,one,two
0,0.794558,0.787158
1,0.407319,0.076162
2,0.407486,0.266557
3,0.418766,0.341


###### Creating a larger dataframe

We are going to create a dataframe of 10 rows and two columns

In [14]:
pd.DataFrame({'student':np.arange(100, 110, 1), 'test':np.random.randint(60, 101, 10)})

Unnamed: 0,student,test
0,100,81
1,101,63
2,102,98
3,103,86
4,104,67
5,105,92
6,106,93
7,107,88
8,108,92
9,109,88


###### Bonus Tip - How to create a new series and attach it to an existing df

In [15]:
s = pd.Series(['round', 'square'], index=['c', 'b'], name='shape')
s

c     round
b    square
Name: shape, dtype: object

In [16]:
df # This is the df that we are going to attach our new series to

Unnamed: 0,id,color
a,100,red
b,101,blue
c,102,red


To combine them, we are going to concatenate them

In [18]:
pd.concat([df, s], axis=1) # We have axis =1 to concatentate by columns

Unnamed: 0,id,color,shape
a,100,red,
b,101,blue,square
c,102,red,round
