# Creating Pandas DataFrame

There are several ways to create a Pandas DataFrame. Usually, I would start by reading CSV or spreadsheet file to get started with data analysis. However, we can create DataFrame from lists and dictionaries too. Knowing how to do would be useful in some situations.

In [1]:
import pandas as pd

## DataFrame from List

First, let's look at how to create DataFrame from lists.

In [2]:
# ---------------
# With zippedlist
# ---------------

# Prepare the list
episode = ['#1', '#2', '#3', '#4', '#5', '#6', '#7', '#8', '#9', '#10', '#11', '#12']
source_size = [477, 489, 504, 443, 524, 507, 747, 746, 746, 746, 746, 747]
output_size = [95, 114, 111, 99, 120, 103, 103, 105, 87, 92, 106, 101]

# Create a zipped list with zip()
zippedlist = list(zip(episode, source_size, output_size))

# Turn zippedlist into Pandas DataFrame
df = pd.DataFrame(zippedlist, columns = ['Episode', 'Source', 'Output'])

# Show DataFrame
df

Unnamed: 0,Episode,Source,Output
0,#1,477,95
1,#2,489,114
2,#3,504,111
3,#4,443,99
4,#5,524,120
5,#6,507,103
6,#7,747,103
7,#8,746,105
8,#9,746,87
9,#10,746,92


In [3]:
# ---------------------------
# With multi-dimensional list
# ---------------------------

# Prepare the multi-dimensional list
my_list = [['Leto', 8], ['Oscar', 4], ['Lilianne', 12], ['Beatrice', 11]]

# Create a DataFrame from multi-dimensional list
df = pd.DataFrame(my_list, columns = ['Pet', 'Age'])

# Show DataFrame
df

Unnamed: 0,Pet,Age
0,Leto,8
1,Oscar,4
2,Lilianne,12
3,Beatrice,11


When constructing a `DataFrame`, note the argument `columns` and `index`. This will dictate the shape of the DataFrame. On another note, when constructing `DataFrame` from nested list, each item is assigned to a column.

In [4]:
# ---------------------
# Interesting behaviors
# ---------------------

# Create a nested list of 2 lists, 3 items in each list
my_list = [['a', 'b', 'c'], ['A', 'B', 'C']]

# Create a DataFrame, specifying 3 columns
df = pd.DataFrame(my_list, columns = ['Column_A', 'Column_B', 'Column_C'])

# Create a DataFrame from the same list, specifying 2 columns
df = pd.DataFrame(my_list, index = ['Column_lowercase', 'Column_uppercase'])

## DataFrame from Dictionary

`DataFrame` can also be created from Python `dictionary` as well.

In [5]:
# DataFrame from a dictionary of lists, alternative to zipped list

# Create the list
students = ['Chizuru', 'Mai', 'Rio', 'Erina', 'Alice', 'Asuna']
degree = ['Literature', 'Law', 'Physics', 'Gastrology', 'Engineering', 'IT']
gpa = ['3.4', '4.0', '3.7', '3.6', '3.5', '3.3']

# Turn lists into dictionary
dictionary = {'Name': students, 'Degree': degree, 'Pointer': gpa}

# Create DataFrame from dictionary
df = pd.DataFrame(dictionary)

# Show dictionary
df

Unnamed: 0,Name,Degree,Pointer
0,Chizuru,Literature,3.4
1,Mai,Law,4.0
2,Rio,Physics,3.7
3,Erina,Gastrology,3.6
4,Alice,Engineering,3.5
5,Asuna,IT,3.3
