# Chapter 6: Data Loading, Storage, and File Formats

Most data neds to be read from the disk or the web before we can operate on it.

Additionally, to save data for future use, we save it to disk.

pandas provides several helper functions that automatically parse common
file formats.

I/O typically falls into a few main categories:
- text file
- binary
- databases
- network sources

In [1]:
import pandas as pd
import numpy as np

## 6.1: Reading and Writing Data in Text Format

Parsing functions in `pandas`
 - read_csv: load delimited data from a file, url or file-like object; use comma
    as a default delimiter
- read_table: load delimited data from a file, url or file-like object; use tab 
    as a default delimiter
- read_fwf: Read data in fixed-width column format (no delim)
- read_clipboard: Version of read_table that reads tada from the clipboard
- read_excel: Read tabular data from an XLS or XLSX file
- read_hdf: Read HDF5 files written by pandas
- read_html: Read all tables found in the given HTML document
- read_json: REad data from a JSON string representation
- read_msgpack: Read pandas data encoded using MessagePack binary format
- read_pickle: Read arbitrary data stared using Python's pickle format
- read_sas: Read a SAS dataset stored in one of the SAS system's formats
- read_sql: Read the results of a SQL query (using SQLAlchemy)
- read_stata: REad a dataset from Stata file format
- read_feather: Read the Feather binary format

The optional arguments for these functions fall into a few categories

*Indexing*

Can treat one or more columns as the index and where to get column names from
the file, the user, or not at all.

*Type inference and data conversion*

This includes user-defined value converseions and custom list of missing val 
markers

*Datetime parsing*

Includes combining capability including combining date and time info from 
multiple columns

*Iterating*

Support for iterating over chunks of large files

*Unclean data issues*

Skipping rows or a footer or comments or numeric data with commas in them

In [2]:
!cat examples/ex1.csv

a,b,c,d,message
1,2,3,4,hello
5,6,7,8,world
9,10,11,12,foo

In [3]:
pd.read_csv('examples/ex1.csv')

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


You can also parse un headed CSVs or CSVs delimited by something other than a 
comma

In [4]:
!cat examples/ex2.csv

1,2,3,4,hello
5,6,7,8,world
9,10,11,12,foo

In [5]:
pd.read_csv('examples/ex2.csv', sep=',', header=None)

Unnamed: 0,0,1,2,3,4
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


In [6]:
pd.read_csv('examples/ex2.csv', names=['a', 'b', 'c', 'd', 'message'])

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


To set one of the columns as the index for the dataframe, use index_col

In [7]:
pd.read_csv('examples/ex2.csv', names=['a', 'b', 'c', 'd', 'message'], 
             index_col='message')

Unnamed: 0_level_0,a,b,c,d
message,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
hello,1,2,3,4
world,5,6,7,8
foo,9,10,11,12


This can also be used for hierarchical indecies

In [8]:
pd.read_csv('examples/csv_mindex.csv')

Unnamed: 0,key1,key2,value1,value2
0,one,a,1,2
1,one,b,3,4
2,one,c,5,6
3,one,d,7,8
4,two,a,9,10
5,two,b,11,12
6,two,c,13,14
7,two,d,15,16


In [9]:
pd.read_csv('examples/csv_mindex.csv', index_col=['key1', 'key2'])

Unnamed: 0_level_0,Unnamed: 1_level_0,value1,value2
key1,key2,Unnamed: 2_level_1,Unnamed: 3_level_1
one,a,1,2
one,b,3,4
one,c,5,6
one,d,7,8
two,a,9,10
two,b,11,12
two,c,13,14
two,d,15,16


The delimiter can also be a regular expression

In [10]:
!cat examples/ex3.csv

            A         B         C
aaa -0.264438 -1.026059 -0.619500
bbb  0.927272  0.302904 -0.032399
ccc -0.264273 -0.386314 -0.217601
ddd -0.871858 -0.348382  1.100491

In [11]:
pd.read_table('examples/ex3.csv', sep='\s+')

Unnamed: 0,A,B,C
aaa,-0.264438,-1.026059,-0.6195
bbb,0.927272,0.302904,-0.032399
ccc,-0.264273,-0.386314,-0.217601
ddd,-0.871858,-0.348382,1.100491


Since there was one fewer column name than the number of data rows, read_table
assumes taht the first column is meant to be the DataFrame's index

Also, the skiprows parameter allows you to skip certain lines (ex. commented) 
in a file

In [12]:
!cat examples/ex4.csv

# hey!
a,b,c,d,message
# just wanted to make things more difficult for you
# who reads CSV files with computers, anyway?
1,2,3,4,hello
5,6,7,8,world
9,10,11,12,foo

In [13]:
pd.read_csv('examples/ex4.csv', skiprows=[0,2,3])

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


Also, these parses allow you to handle missing data either marked by an empty
string or a *sentinal* value (NA or NULL or NaN)

In [14]:
!cat examples/ex5.csv

something,a,b,c,d,message
one,1,2,3,4,NA
two,5,6,,8,world
three,9,10,11,12,foo

In [15]:
pd.read_csv('examples/ex5.csv')

Unnamed: 0,something,a,b,c,d,message
0,one,1,2,3.0,4,
1,two,5,6,,8,world
2,three,9,10,11.0,12,foo


In [16]:
pd.read_csv('examples/ex5.csv', na_values=['NULL'])

Unnamed: 0,something,a,b,c,d,message
0,one,1,2,3.0,4,
1,two,5,6,,8,world
2,three,9,10,11.0,12,foo


Different NA sentinals can be specified for each column in a dict

In [17]:
sentinals = {
    'message': ['foo', 'NA'], 
    'something': ['two']
}

pd.read_csv('examples/ex5.csv', na_values=sentinals)

Unnamed: 0,something,a,b,c,d,message
0,one,1,2,3.0,4,
1,,5,6,,8,world
2,three,9,10,11.0,12,


Some read_csv / read_table function args
- path: String with filesystem locator, URL, or file-like object
- sep, delimiter: Reg expression used to split fields in a row
- header: Row number to use as column names (None if none)
- index_col: Column numbers or names to use as the row index
- names: List of column names
- skiprows: Number of rows or list of row numbers to skip
- na_values: Sequence of values to replace with NA
- comment: Characters to split comments off the end of lines
- parse_dates: Attempt to parse data to datetime objects; if True, attempts to 
    parse all columns. if a list of column names/numbers, only parse those
- keep_date_col: if joining columns to parse date, keep the joined columns
- converters: Dict containing column number or name mapping to functions, 
    applies the function to all values in the column
- dayfirst: Datetime parsing
- date_parse: Function to use to parse dates
- nrows: Number of rows to read from the beginning of the file
- iterator: Return a TextParse object for reading file piecemeal
- chunksize: For iteration, size of file chunks.
- skip_footer: Number of lines to ignore at end of file
- verbose: Print various parse output information
- encoding: Text encoding for Unicode ('utf-8')
- squeeze: If the parsed data only contains one column, return a Series
- thousands: Separator for thousands (',' or '.')

### Reading Text Files in Pieces

When reading very large files, you may only want to read a small piece or 
iterate through smaller chunks of the file

If you want to only read a number of rows, specify that with nrows

In [18]:
pd.read_csv('examples/ex6.csv', nrows=5)

Unnamed: 0,one,two,three,four,key
0,0.467976,-0.038649,-0.295344,-1.824726,L
1,-0.358893,1.404453,0.704965,-0.200638,B
2,-0.50184,0.659254,-0.421691,-0.057688,G
3,0.204886,1.074134,1.388361,-0.982404,R
4,0.354628,-0.133116,0.283763,-0.837063,Q


To read a file in pieces, specify a chunksize as a number of rows

In [19]:
chunker = pd.read_csv('examples/ex6.csv', chunksize=1000)
chunker

<pandas.io.parsers.readers.TextFileReader at 0x7f2890fa93a0>

As an example of how to iterate over this TextParser object, we will calculate
the total value counts in the 'key' column

In [20]:
tot = pd.Series([])
for piece in chunker:
    tot = tot.add(piece['key'].value_counts(), fill_value=0)

tot = tot.sort_values(ascending=False)
tot.head()

  tot = pd.Series([])


E    368.0
X    364.0
L    346.0
O    343.0
Q    340.0
dtype: float64

### Writing Data to Text Format

Data can also be exported to a delimited format

In [21]:
data = pd.read_csv('examples/ex5.csv')
data

Unnamed: 0,something,a,b,c,d,message
0,one,1,2,3.0,4,
1,two,5,6,,8,world
2,three,9,10,11.0,12,foo


In [22]:
data.to_csv('examples/out.csv')
!cat examples/out.csv

,something,a,b,c,d,message
0,one,1,2,3.0,4,
1,two,5,6,,8,world
2,three,9,10,11.0,12,foo


We can specify the delimiter, the null sentinal, and even output to stdout

In [23]:
import sys

In [24]:
data.to_csv(sys.stdout, sep='|', na_rep='NULL')

|something|a|b|c|d|message
0|one|1|2|3.0|4|NULL
1|two|5|6|NULL|8|world
2|three|9|10|11.0|12|foo


You can also specify whether to include the columns or index

In [25]:
data.to_csv(sys.stdout, index=False, header=False)

one,1,2,3.0,4,
two,5,6,,8,world
three,9,10,11.0,12,foo


You can specify which columns to save

In [26]:
data.to_csv(sys.stdout, index=False, columns=['a', 'b', 'c'])

a,b,c
1,2,3.0
5,6,
9,10,11.0


Series objects also have a to_csv object that works similarly

### Working with Delimited Formats

Sometimes some manual processing may be necessary.

It's not uncommon to recieve a file with one or more malformed lines that trip
up read_table or read_csv.

In [27]:
!cat examples/ex7.csv

"a","b","c"
"1","2","3"
"1","2","3"


In [28]:
import csv
f =  open('examples/ex7.csv')

reader = csv.reader(f)

In [29]:
for line in reader:
    print(line)

['a', 'b', 'c']
['1', '2', '3']
['1', '2', '3']


In [30]:
with open('examples/ex7.csv') as f:
    lines = list(csv.reader(f))

In [31]:
header, values = lines[0], lines[1:]

In [32]:
data_dict = {h: v for h,v in zip(header, zip(*values))}
data_dict

{'a': ('1', '1'), 'b': ('2', '2'), 'c': ('3', '3')}

CSV files come in many different flavors. To define a new format, we can define
a simple subclass of csv.Dialect

In [33]:
class my_dialect(csv.Dialect):
    lineterminator = '\n'
    delimiter = ';'
    quotechar = '"'
    quoting = csv.QUOTE_MINIMAL

with open('examples/ex7.csv') as f:
    reader = csv.reader(f, dialect=my_dialect)

CSV Dialect options:
- delimiter
- lineterminator
- quotechar
- quoting
- skipinitialspace
- doublequote
- escapechar

In some cases, csv will not be enough, and you will have to manually process
data using `re`

To write delimited files manually, you can use csv.writer which accepts an open
writable file object and the same dialect / format options as csv.reader

### JSON Data

JSON (JavaScript-Object-Notation) has become one the the standard formats for
sending data by HTTP requests. It is a much more freeform data than CSV

JSON is very nearly valid Python code with the exception of its null value and
disallowing trailing commas.

There are several libraries for reading and writing JSON data, including `json`

To convert a JSON file to a Python object use json.loads

In [34]:
obj = """
{"name": "Wes",
 "places_lived": ["United States", "Spain", "Germany"],
 "pet": null,
 "siblings": [{"name": "Scott", "age": 30, "pets": ["Zeus", "Zuko"]},
              {"name": "Katie", "age": 38, "pets": ["Sixes", "Stache", "Cisco"]}
             ]
}
"""

In [35]:
import json

In [36]:
result = json.loads(obj)
result

{'name': 'Wes',
 'places_lived': ['United States', 'Spain', 'Germany'],
 'pet': None,
 'siblings': [{'name': 'Scott', 'age': 30, 'pets': ['Zeus', 'Zuko']},
  {'name': 'Katie', 'age': 38, 'pets': ['Sixes', 'Stache', 'Cisco']}]}

In [37]:
asjson = json.dumps(result)
asjson

'{"name": "Wes", "places_lived": ["United States", "Spain", "Germany"], "pet": null, "siblings": [{"name": "Scott", "age": 30, "pets": ["Zeus", "Zuko"]}, {"name": "Katie", "age": 38, "pets": ["Sixes", "Stache", "Cisco"]}]}'

You can use this resulting object (`result`) to create a DataFrame

In [38]:
siblings = pd.DataFrame(result['siblings'], columns=['name', 'age'])
siblings

Unnamed: 0,name,age
0,Scott,30
1,Katie,38


However, you do not need to go through this, since pandas includes a read_json
function

In [39]:
!cat examples/example.json

[{"a": 1, "b": 2, "c": 3},
 {"a": 4, "b": 5, "c": 6},
 {"a": 7, "b": 8, "c": 9}]


In [40]:
data = pd.read_json('examples/example.json')
data

Unnamed: 0,a,b,c
0,1,2,3
1,4,5,6
2,7,8,9


To export a DataFrame or Series to JSON, use the to_json method

In [41]:
print(data.to_json())

{"a":{"0":1,"1":4,"2":7},"b":{"0":2,"1":5,"2":8},"c":{"0":3,"1":6,"2":9}}


### XML and HTML Data

Python also has many libraries for reading and writing both HTML and XML 
including lxml, Beautiful Soup, and html5lib.

lxml is the fastest, but the other libraries can better handle malformed files

The read_html function returns a list of DataFrames

In [42]:
tables = pd.read_html("examples/fdic_failed_bank_list.html")
failures = tables[0]
failures

Unnamed: 0,Bank Name,City,ST,CERT,Acquiring Institution,Closing Date,Updated Date
0,Allied Bank,Mulberry,AR,91,Today's Bank,"September 23, 2016","November 17, 2016"
1,The Woodbury Banking Company,Woodbury,GA,11297,United Bank,"August 19, 2016","November 17, 2016"
2,First CornerStone Bank,King of Prussia,PA,35312,First-Citizens Bank & Trust Company,"May 6, 2016","September 6, 2016"
3,Trust Company Bank,Memphis,TN,9956,The Bank of Fayette County,"April 29, 2016","September 6, 2016"
4,North Milwaukee State Bank,Milwaukee,WI,20364,First-Citizens Bank & Trust Company,"March 11, 2016","June 16, 2016"
...,...,...,...,...,...,...,...
542,"Superior Bank, FSB",Hinsdale,IL,32646,"Superior Federal, FSB","July 27, 2001","August 19, 2014"
543,Malta National Bank,Malta,OH,6629,North Valley Bank,"May 3, 2001","November 18, 2002"
544,First Alliance Bank & Trust Co.,Manchester,NH,34264,Southern New Hampshire Bank & Trust,"February 2, 2001","February 18, 2003"
545,National State Bank of Metropolis,Metropolis,IL,3815,Banterra Bank of Marion,"December 14, 2000","March 17, 2005"


In [43]:
len(tables)

1

In [44]:
close_timestamps = pd.to_datetime(failures["Closing Date"])

In [45]:
close_timestamps.dt.year.value_counts()

2010    157
2009    140
2011     92
2012     51
2008     25
2013     24
2014     18
2002     11
2015      8
2016      5
2004      4
2001      4
2007      3
2003      3
2000      2
Name: Closing Date, dtype: int64

### Parsing XML with lxml.objectify

`pandas.read_html` uses either lxml or Beautiful Soup under the hood to parse
HTML data.

XML and HTML are similar, but XML is more general.

In [46]:
from lxml import objectify

In [47]:
path = 'datasets/mta_perf/Performance_MNR.xml'
parsed = objectify.parse(open(path))
root = parsed.getroot()

Root is the root node of the XML file and root.INDICATOR returns a generator 
yielding each INDICATOR XML element.

In [48]:
data = []
skip_fields = ['PARENT_SEQ', 'INDICATOR_SEQ', 
               'DESIRED_CHANGE', 'DECIMAL_PLACES']

for elt in root.INDICATOR:
    el_data = {}
    for child in elt.getchildren():
        if child.tag in skip_fields:
            continue
        el_data[child.tag] = child.pyval
    data.append(el_data)

perf = pd.DataFrame(data)
perf.head()

Unnamed: 0,AGENCY_NAME,INDICATOR_NAME,DESCRIPTION,PERIOD_YEAR,PERIOD_MONTH,CATEGORY,FREQUENCY,INDICATOR_UNIT,YTD_TARGET,YTD_ACTUAL,MONTHLY_TARGET,MONTHLY_ACTUAL
0,Metro-North Railroad,On-Time Performance (West of Hudson),Percent of commuter trains that arrive at thei...,2008,1,Service Indicators,M,%,95.0,96.9,95.0,96.9
1,Metro-North Railroad,On-Time Performance (West of Hudson),Percent of commuter trains that arrive at thei...,2008,2,Service Indicators,M,%,95.0,96.0,95.0,95.0
2,Metro-North Railroad,On-Time Performance (West of Hudson),Percent of commuter trains that arrive at thei...,2008,3,Service Indicators,M,%,95.0,96.3,95.0,96.9
3,Metro-North Railroad,On-Time Performance (West of Hudson),Percent of commuter trains that arrive at thei...,2008,4,Service Indicators,M,%,95.0,96.8,95.0,98.3
4,Metro-North Railroad,On-Time Performance (West of Hudson),Percent of commuter trains that arrive at thei...,2008,5,Service Indicators,M,%,95.0,96.6,95.0,95.8


XML data can get much more complicated, since each tag can have metadata
as well

In [49]:
from io import StringIO
tag = '<a href="http://www.google.com">Google</a>'
root = objectify.parse(StringIO(tag)).getroot()
root

<Element a at 0x7f2890c535c0>

In [50]:
root.get('href')

'http://www.google.com'

In [51]:
root.text

'Google'

## 6.2: Binary Data Formats

One of the easiest ways to serialize and store binary data is using Python's 
built-in `pickle` perialization.

All `pandas` objects have a `to_pickle` method

In [52]:
frame = pd.read_csv('examples/ex1.csv')
frame

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


In [53]:
frame.to_pickle('examples/frame_pickle')

In [54]:
pd.read_pickle('examples/frame_pickle')

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


Note: The pickle format is only recommended as a short-term storage format.

While backwards compatability is sought after, there may be a time where it must
be necessary to "break"

### Using HDF5 Format

HDF5 is a widely used format for storing large quantities of scientific array
data. 

It is availiable as a C library, and it has interfaces in many other languages
including Java, Julia, MATLAB, and Python.

"HDF" stands for Hierarchical Data Format.

Additionally, HDF5 can be a good choice for working with very large datasets
that don't fit into memory, as you can efficiently read and write small sections
of much larger arrays.

In [55]:
frame = pd.DataFrame({'a': np.random.randn(100)})
frame

Unnamed: 0,a
0,-0.227105
1,-0.889501
2,0.930681
3,-0.626299
4,-0.910480
...,...
95,-1.875969
96,0.942542
97,0.805593
98,0.064067


In [56]:
store = pd.HDFStore('examples/mydata.h5')
store['obj1'] = frame
store['obj1_col'] = frame['a']
store

<class 'pandas.io.pytables.HDFStore'>
File path: examples/mydata.h5

In [57]:
store['obj1']

Unnamed: 0,a
0,-0.227105
1,-0.889501
2,0.930681
3,-0.626299
4,-0.910480
...,...
95,-1.875969
96,0.942542
97,0.805593
98,0.064067


HDF5Store supports two storage schemas, fixed and table.

The latter is generally slower but it supports query operation using a special
syntax

In [58]:
store.put('obj2', frame, format='table')
store.select('obj2', where=['index >= 10 and index <= 15'])

Unnamed: 0,a
10,1.091206
11,2.147374
12,-0.980263
13,-0.112213
14,1.19013
15,0.447255


In [59]:
store.close()

### Reading Microsoft Excel Files

pandas also supports reading data stored in Excel 2003 and higher files using 
either the ExcelFile class or pandas.read_excel function.

In [60]:
xlsx = pd.ExcelFile('examples/ex1.xlsx')
pd.read_excel(xlsx, 'Sheet1')

Unnamed: 0.1,Unnamed: 0,a,b,c,d,message
0,0,1,2,3,4,hello
1,1,5,6,7,8,world
2,2,9,10,11,12,foo


In [61]:
frame = pd.read_excel('examples/ex1.xlsx', 'Sheet1')
frame

Unnamed: 0.1,Unnamed: 0,a,b,c,d,message
0,0,1,2,3,4,hello
1,1,5,6,7,8,world
2,2,9,10,11,12,foo


To write pandas data to Excel format, you must first  create an ExcelWriter

In [62]:
writer = pd.ExcelWriter('examples/ex2.xlsx')
frame.to_excel(writer, 'Sheet1')
writer.save()

## 6.3: Interacting with Web APIs

Many websites have public APIs providing data feeds via JSON or some other 
format.

The easiest to use method is the `requests` package from Python

In [63]:
import requests

In [65]:
url = 'https://api.github.com/repos/pandas-dev/pandas/issues'
resp = requests.get(url)
resp

<Response [200]>

In [66]:
data = resp.json()
data[0]['title']



Each element in data is a dictionary containing all of the data found on a 
GitHub issue page.

In [67]:
issues = pd.DataFrame(data, columns=['number', 'title', 'labels', 'state'])
issues

Unnamed: 0,number,title,labels,state
0,47055,BUG: Reindexing `pd.Float64Dtype()` series giv...,"[{'id': 76811, 'node_id': 'MDU6TGFiZWw3NjgxMQ=...",open
1,47053,BUG: Regression: `Styler.to_html` and `to_late...,"[{'id': 76811, 'node_id': 'MDU6TGFiZWw3NjgxMQ=...",open
2,47051,BUG: `DataFrame.shift` shows different behavio...,[],open
3,47050,CLN/TST: Remove tm.makeUnicodeIndex,"[{'id': 127685, 'node_id': 'MDU6TGFiZWwxMjc2OD...",open
4,47049,CI: Debug CI Windows recurssion error,[],open
5,47048,DEPS: Bump optional dependencies,"[{'id': 527603109, 'node_id': 'MDU6TGFiZWw1Mjc...",open
6,47047,ENH: move DataError from core/base.py to error...,[],open
7,47046,Update mangle_dupe_cols documentation to refle...,[],open
8,47045,TYP: Mypy workaround for NoDefault,"[{'id': 1280988427, 'node_id': 'MDU6TGFiZWwxMj...",open
9,47043,API: DatetimeIndex.get_loc(date) partial-slice?,"[{'id': 76811, 'node_id': 'MDU6TGFiZWw3NjgxMQ=...",open


## 6.4: Interacting with Databases

In a business setting, most data may not be stored in text, excel, or binary
files.

For these purposes, SQL-bases relational databases are widely used.

Loading data from SQL into a DataFrame is fairly straightforward and pandas
has some functions to simplify the process.

To simulate a SQL Database, we will use SQLite3.

In [68]:
import sqlite3

In [69]:
query = """
CREATE TABLE test
(a VARCHAR(20), b VARCHAR(20),
 c REAL,        d INTEGER);
"""

con = sqlite3.connect('examples/mydata.sqlite')
con.execute(query)

<sqlite3.Cursor at 0x7f28823b8e30>

In [70]:

data = [('Atlanta', 'Georgia', 1.25, 6),
        ('Tallahassee', 'Florida', 2.6, 3),
        ('Sacramento', 'California', 1.7, 5)]
stmt = "INSERT INTO test VALUES(?, ?, ?, ?)"
con.executemany(stmt, data)
con.commit()

In [71]:
cursor = con.execute('select * from test')
rows = cursor.fetchall()
rows

[('Atlanta', 'Georgia', 1.25, 6),
 ('Tallahassee', 'Florida', 2.6, 3),
 ('Sacramento', 'California', 1.7, 5)]

In [72]:
cursor.description

(('a', None, None, None, None, None, None),
 ('b', None, None, None, None, None, None),
 ('c', None, None, None, None, None, None),
 ('d', None, None, None, None, None, None))

In [73]:
pd.DataFrame(rows, columns=[x[0] for x in cursor.description])

Unnamed: 0,a,b,c,d
0,Atlanta,Georgia,1.25,6
1,Tallahassee,Florida,2.6,3
2,Sacramento,California,1.7,5
