# Creating, Reading and Writing
There are two core objects in pandas: the **DataFrame** and the **Series**.

In [3]:
import pandas as pd

## 1.1 Creating data

In [3]:
# A DataFrame is a table.
# constructor: pd.DataFrame()
# Input a dict, keys are column name, values are a list of entries
df = pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]}) 
df

Unnamed: 0,Yes,No
0,50,131
1,21,2


In [4]:
# Another way to create a DataFrame, using columns parameter
pd.DataFrame([[30, 21]], columns=['Apples', 'Bananas'])

Unnamed: 0,Apples,Bananas
0,30,21


In [7]:
# Assign row labels, using index parameter
df = pd.DataFrame({"Bod" : ["Like", "Awful"],
                   "Sue" : ["Good", "Bland"]},
                     index = ["Product A", "Product B"])
df

Unnamed: 0,Bod,Sue
Product A,Like,Good
Product B,Awful,Bland


In [8]:
# A Series is a sequence of data values, a Series is a list.
# Or a single column of a DataFrame
series = pd.Series([1, 2, 3, 4, 5])
series

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [12]:
# A series doesn't have a column name
# we can only assign column names: usign index parameter
# assign a overall name: using name parameter
pd.Series([1, 2, 3],
          index = ["2015 sales", "2016 sales", "2017 sales"],
          name = "Product A")

2015 sales    1
2016 sales    2
2017 sales    3
Name: Product A, dtype: int64

##  1.2 Reading data files
**csv files:** a table of values separated by commas. Hence the name: "Comma-Separated Values", or CSV.

In [17]:
# read a csv file, put it into a DataFrame 
# and get its shape
atl_addr = pd.read_csv("./atl-address-1.csv")
atl_addr.shape    # 40 rows, 5 columns

(40, 5)

In [18]:
# get first 5 rows to examine the contents
atl_addr.head()

Unnamed: 0,Title,Price,Beds,Baths,Area
0,"34 The Prado NE, Atlanta, GA 30309","$1,495,000",4 bds,4 ba,"3,644 sqft"
1,"2060 Shirley St SW, Atlanta, GA 30311","$225,000",3 bds,2 ba,"1,300 sqft"
2,"300 Peachtree St NE APT 11G, Atlanta, GA 30308","$259,000",2 bds,2 ba,890 sqft
3,"1690 Memorial Dr SE, Atlanta, GA 30317","$320,000",2 bds,1 ba,"1,163 sqft"
4,"6253 Old Kingston Dr, South Fulton, GA","$349,275",5 bds,3 ba,-- sqft


The **pd.read_csv( )** function is well-endowed, with over 30 optional parameters you can specify. 

In [12]:
# Using a column in the csv file as the labels of rows
atl_addr = pd.read_csv("./atl-address-1.csv", index_col = 0)
atl_addr.head()

Unnamed: 0_level_0,Price,Beds,Baths,Area
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
"34 The Prado NE, Atlanta, GA 30309","$1,495,000",4 bds,4 ba,"3,644 sqft"
"2060 Shirley St SW, Atlanta, GA 30311","$225,000",3 bds,2 ba,"1,300 sqft"
"300 Peachtree St NE APT 11G, Atlanta, GA 30308","$259,000",2 bds,2 ba,890 sqft
"1690 Memorial Dr SE, Atlanta, GA 30317","$320,000",2 bds,1 ba,"1,163 sqft"
"6253 Old Kingston Dr, South Fulton, GA","$349,275",5 bds,3 ba,-- sqft


In [6]:
# Save a DataFrame to csv file
animals = pd.DataFrame({'Cows': [12, 20], 'Goats': [22, 19]}, index=['Year 1', 'Year 2'])
animals.to_csv("cow_and_goat.csv")