---
# Creating and Persisting DataFrames
---

In [1]:
import numpy as np
import pandas as pd

Create parallel lists with data in them. Each of these lists will be a column in the
DataFrame, so they should have the same type

In [2]:
fname = ["Paul", "John", "Richard", "George"]
lname = ["McCartney", "Lennon", "Starkey", "Harrison"]
birth = [1942, 1940, 1940, 1943]

Create a dictionary from the lists, mapping the column name to the list:

In [3]:
people = dict(first=fname, last=lname, birth=birth)

Create a DataFrame from the dictionary

In [4]:
beatles = pd.DataFrame(people)

In [5]:
beatles

Unnamed: 0,first,last,birth
0,Paul,McCartney,1942
1,John,Lennon,1940
2,Richard,Starkey,1940
3,George,Harrison,1943


In [6]:
beatles.index

RangeIndex(start=0, stop=4, step=1)

In [11]:
# change index
pd.DataFrame(data=people, index=list('abcd'))

Unnamed: 0,first,last,birth
a,Paul,McCartney,1942
b,John,Lennon,1940
c,Richard,Starkey,1940
d,George,Harrison,1943


## Writing CSV

Write the DataFrame to a CSV file:

In [12]:
beatles

Unnamed: 0,first,last,birth
0,Paul,McCartney,1942
1,John,Lennon,1940
2,Richard,Starkey,1940
3,George,Harrison,1943


In [13]:
 from io import StringIO

In [21]:
beatles_file = StringIO()
beatles.to_csv(beatles_file)

Look at the file contents:

In [22]:
print(beatles_file.getvalue())

,first,last,birth
0,Paul,McCartney,1942
1,John,Lennon,1940
2,Richard,Starkey,1940
3,George,Harrison,1943



In [24]:
_ = beatles_file.seek(0)
pd.read_csv(beatles_file)

Unnamed: 0.1,Unnamed: 0,first,last,birth
0,0,Paul,McCartney,1942
1,1,John,Lennon,1940
2,2,Richard,Starkey,1940
3,3,George,Harrison,1943


The `read_csv` function has an `index_col` parameter that you can use to specify the
location of the index:

In [25]:
_ = beatles_file.seek(0)
pd.read_csv(beatles_file, index_col=0)

Unnamed: 0,first,last,birth
0,Paul,McCartney,1942
1,John,Lennon,1940
2,Richard,Starkey,1940
3,George,Harrison,1943


Alternatively, if we didn't want to include the index when writing the CSV file, we can set the
index parameter to `False`:

In [26]:
beatles_file = StringIO()
beatles.to_csv(beatles_file, index=False)

In [28]:
_ = beatles_file.seek(0)
pd.read_csv(beatles_file)

Unnamed: 0,first,last,birth
0,Paul,McCartney,1942
1,John,Lennon,1940
2,Richard,Starkey,1940
3,George,Harrison,1943
