## Comma-separated values

Example csv as plain text:

Title, Year Released, Director
The Godfather, 1972, Francis Ford Coppola The Shawshank Redemption, 1994, Frank Darabont
Citizen Kane, 1941, 0rson Welles
Gone with the wind, 1939, victor Fleming The Sound of Music, 1965, Robert wise Spirited Away, 2001, Hayao Miyazaki
"The Good, the Bad and the Ugly"
', 1966, Sergio Leone
It's a Wonderful Life, 1946, Frank Capra Amadeus, 1984, Milos Forman
The Lord of the Rings: The Return of the King, 2003, Peter Jackson Saving Private Ryan, 1998, Steven Spielberg Rear window, ,Alfred Hitchcock Rocky, 1976, John G. Avildsen

The first row of a csv file is usually a header (column names). Whenever there is no value between commas, the 'cell' is null.

In [4]:
path = 'google_stock.csv'

file = open(path)

In [5]:
# iterate over lines in a file
# for line in file[]:
    # print(line)

## List comprehension

List comprehension may be useful in constructing csv contents into useful data

In [6]:
lines = [line for line in open(path)]
lines[0]

'Date,Open,High,Low,Close,Volume,Adj Close\n'

In [7]:
lines[0].strip()

'Date,Open,High,Low,Close,Volume,Adj Close'

In [8]:
lines[0].strip().split(',')

['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']

In [9]:
dataset = [line.strip().split(',') for line in open(path)]
dataset[1]

['8/19/2014',
 '585.002622',
 '587.342658',
 '584.002627',
 '586.862643',
 '978600',
 '586.862643']

## CSV module

In [10]:
import csv
dir(csv)

['Dialect',
 'DictReader',
 'DictWriter',
 'Error',
 'QUOTE_ALL',
 'QUOTE_MINIMAL',
 'QUOTE_NONE',
 'QUOTE_NONNUMERIC',
 'QUOTE_NOTNULL',
 'QUOTE_STRINGS',
 'Sniffer',
 'StringIO',
 '_Dialect',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '__version__',
 'excel',
 'excel_tab',
 'field_size_limit',
 'get_dialect',
 'list_dialects',
 're',
 'reader',
 'register_dialect',
 'types',
 'unix_dialect',
 'unregister_dialect',
 'writer']

In [11]:
new_file = open(path, newline='') # empty string new line arg ensures a clean line start

### Reader object

In [15]:
reader = csv.reader(new_file) # reader object is an iterator
header = next(reader) # iterator starts at the beginning of the file, subsequent calls return subsequent lines

In [16]:
from datetime import datetime

data = []

for row in reader:
    date = datetime.strptime(row[0], '%m/%d/%Y') # string-parse time
    open_price = float(row[1])
    high_price = float(row[2])
    low_price = float(row[3])
    close_price = float(row[4])
    volume = int(row[5])
    adj_close = float(row[6])

    data.append([date, open_price, high_price, low_price, close_price, volume, adj_close])

print(data[0])

[datetime.datetime(2014, 8, 15, 0, 0), 577.862619, 579.382595, 570.522603, 573.482626, 1519100, 573.482626]


### Writer object

In [27]:
returns_path = 'google_returns.csv'

returns_file = open(returns_path, 'w') # w for 'write' mode
writer = csv.writer(returns_file)
header = ['Date', 'Return']

writer.writerow(header)

for i in range(len(data) - 1):
    todays_row = data[i]
    todays_date = todays_row[0]
    todays_price = todays_row[-1]
    yesterdays_row = data[+1] # data is arranged by date descending
    yesterdays_price = yesterdays_row[-1]

    daily_return = (todays_price - yesterdays_price) / yesterdays_price
    
    formatted_date = todays_date.strftime('%m/%d/%y') # string format time
    writer.writerow([formatted_date, daily_return])



In [28]:
import pandas as pd
df = pd.read_csv('google_returns.csv')
df.tail()

Unnamed: 0,Date,Return
2446,11/26/04,-0.84407
2447,11/24/04,-0.848095
2448,11/23/04,-0.854388
2449,11/22/04,-0.856491
2450,11/19/04,-0.852754
