# From Excel to Python

## Creating Pandas DataFrames from lists and Dictionaries

Whenever I am doing analysis with pandas my first goal is to get data into a panda’s DataFrame using one of the many available options. For the vast majority of instances, I use **read_excel**, **read_csv** , **or read_sql**.

However, there are instances when I just have a few lines of data or some calculations that I want to include in my analysis. In these cases it is helpful to know how to create DataFrames from standard python lists or dictionaries. The basic process is not difficult but because there are several different options it is helpful to understand how each works. I can never remember whether I should use **from_dict** , **from_records** , **from_items** or the default **DataFrame** constructor. Normally, through some trial and error, I figure it out. Since it is still confusing to me, I thought I would walk through several examples below to clarify the different approaches. At the end of the article, I briefly show how this can be useful when generating Excel reports.

![image](http://pbpython.com/images/pandas-dataframe-shadow.png)

In [5]:
import os
os.environ["http_proxy"] = '10.48.209.10:8080'
os.environ["https_proxy"] = '10.48.209.10:8080'

!pip install --user XlsxWriter

install('XlsxWriter')

Collecting XlsxWriter
[?25l  Downloading https://files.pythonhosted.org/packages/92/07/79ae72179714feaaaac5bd4155a35195e0986384396bd6f7cba3e7952072/XlsxWriter-1.1.0-py2.py3-none-any.whl (141kB)
[K    100% |████████████████████████████████| 143kB 8.5MB/s ta 0:00:01
[?25hInstalling collected packages: XlsxWriter
Successfully installed XlsxWriter-1.1.0


NameError: name 'main' is not defined

In [34]:
import pandas as pd
from collections import OrderedDict
from datetime import date
#import XlsxWriter 

#### Dictionaries

The “default” manner to create a DataFrame from python is to use a list of dictionaries. In this case each dictionary key is used for the column headings. A default index will be created automatically. As you can see, this approach is very “row oriented”. 

In [9]:
sales = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140},
         {'account': 'Alpha Co',  'Jan': 200, 'Feb': 210, 'Mar': 215},
         {'account': 'Blue Inc',  'Jan': 50,  'Feb': 90,  'Mar': 95 }]
df = pd.DataFrame(sales)
print(df)

   Feb  Jan  Mar    account
0  200  150  140  Jones LLC
1  210  200  215   Alpha Co
2   90   50   95   Blue Inc


If you would like to create a DataFrame in a “column oriented” manner, you would use from_dict. Using this approach, you get the same results as above. The key point to consider is which method is easier to understand in your unique situation. 

In [10]:
sales = {'account': ['Jones LLC', 'Alpha Co', 'Blue Inc'],
         'Jan': [150, 200, 50],
         'Feb': [200, 210, 90],
         'Mar': [140, 215, 95]}
df1 = pd.DataFrame.from_dict(sales)
print(df1)

   Feb  Jan  Mar    account
0  200  150  140  Jones LLC
1  210  200  215   Alpha Co
2   90   50   95   Blue Inc


Most of you will notice that the order of the columns looks wrong. The issue is that the standard python dictionary does not preserve the order of its keys. If you want to control column order then there are two options.

In [13]:
#manually
df = df[['account', 'Jan', 'Feb', 'Mar']]

#using OrderDict
sales = OrderedDict([ ('account', ['Jones LLC', 'Alpha Co', 'Blue Inc']),
          ('Jan', [150, 200, 50]),
          ('Feb',  [200, 210, 90]),
          ('Mar', [140, 215, 95]) ] )
df3 = pd.DataFrame.from_dict(sales)
print(df3)

     account  Jan  Feb  Mar
0  Jones LLC  150  200  140
1   Alpha Co  200  210  215
2   Blue Inc   50   90   95


#### Lists

The other option for creating your DataFrames from python is to include the data in a list structure.

The first approach is to use a row oriented approach using pandas from_records . This approach is similar to the dictionary approach but you need to explicitly call out the column labels.

In [15]:
sales = [('Jones LLC', 150, 200, 50),
         ('Alpha Co', 200, 210, 90),
         ('Blue Inc', 140, 215, 95)]
labels = ['account', 'Jan', 'Feb', 'Mar']
df4 = pd.DataFrame.from_records(sales, columns=labels)
print(df4)

     account  Jan  Feb  Mar
0  Jones LLC  150  200   50
1   Alpha Co  200  210   90
2   Blue Inc  140  215   95


#### Example

In [22]:
sales = [('Jones LLC', 150, 200, 50),
         ('Alpha Co', 200, 210, 90),
         ('Blue Inc', 140, 215, 95)]
labels = ['account', 'Jan', 'Feb', 'Mar']
df6 = pd.DataFrame.from_records(sales, columns=labels)

# build a footer
from datetime import date

created_date = "{:%m-%d-%Y}".format(date.today())
created_by = "GP"
footer = [('Created by', [created_by]), 
          ('created_on', [created_date]),
         ("version", [1.1])]

df_footer1 = pd.DataFrame.from_items(footer)
df_footer2 = pd.DataFrame.from_records(footer)
print(df_footer1, '\n', df_footer2)

  Created by  created_on  version
0         GP  08-29-2018      1.1 
             0             1
0  Created by          [GP]
1  created_on  [08-29-2018]
2     version         [1.1]


  from ipykernel import kernelapp as app


In [24]:
# doesnt workbecause cannot install xlsxwrite

writer = pd.ExcelWriter("simple-report.xlsx", engine='xlsxwriter')
df6.to_excel(writer, index=False)
df_footer.to_excel(writer, startrow=6, index=False)
writer.save()

ImportError: No module named 'xlsxwriter'

The secret sauce here is to use startrow to write the footer DataFrame below the sales DataFrame. There is also a corresponding startcol so you can control the column layout as well. This allows for a lot of flexibility with the basic to_excel function.