**Data Sourcing**

Finding Data: \
    1. Export CSV \
    2. Database SQL \
    3. Website Scraping \
    4. Consuming an API

# CSV

A comma-separated values file is a delimited text file that uses a comma to separate values. A CSV file stores tabular data (numbers and text) in plain text. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. \
Source: Wikipedia (https://en.wikipedia.org/wiki/Comma-separated_values)

*Example:* \
https://people.sc.fsu.edu/~jburkardt/data/csv/csv.html \
file: homes.csv

In [2]:
%%bash
mkdir -p raw_data
curl -s https://people.sc.fsu.edu/~jburkardt/data/csv/homes.csv > raw_data/homes.csv
cat raw_data/homes.csv

"Sell", "List", "Living", "Rooms", "Beds", "Baths", "Age", "Acres", "Taxes"
142, 160, 28, 10, 5, 3,  60, 0.28,  3167
175, 180, 18,  8, 4, 1,  12, 0.43,  4033
129, 132, 13,  6, 3, 1,  41, 0.33,  1471
138, 140, 17,  7, 3, 1,  22, 0.46,  3204
232, 240, 25,  8, 4, 3,   5, 2.05,  3613
135, 140, 18,  7, 4, 3,   9, 0.57,  3028
150, 160, 20,  8, 4, 3,  18, 4.00,  3131
207, 225, 22,  8, 4, 2,  16, 2.22,  5158
271, 285, 30, 10, 5, 2,  30, 0.53,  5702
 89,  90, 10,  5, 3, 1,  43, 0.30,  2054
153, 157, 22,  8, 3, 3,  18, 0.38,  4127
 87,  90, 16,  7, 3, 1,  50, 0.65,  1445
234, 238, 25,  8, 4, 2,   2, 1.61,  2087
106, 116, 20,  8, 4, 1,  13, 0.22,  2818
175, 180, 22,  8, 4, 2,  15, 2.06,  3917
165, 170, 17,  8, 4, 2,  33, 0.46,  2220
166, 170, 23,  9, 4, 2,  37, 0.27,  3498
136, 140, 19,  7, 3, 1,  22, 0.63,  3607
148, 160, 17,  7, 3, 2,  13, 0.36,  3648
151, 153, 19,  8, 4, 2,  24, 0.34,  3561
180, 190, 24,  9, 4, 2,  10, 1.55,  4681
293, 305, 26,  8, 4, 3,   6, 0.46,  7088
167, 170, 20,  9, 4, 2

## CSV Reading

In [8]:
import csv
with open('raw_data/homes.csv') as csvfile:
    reader = csv.reader(csvfile, skipinitialspace=True)
    for row in reader:
        # row is a `list`
        print(row)

['Sell', 'List', 'Living', 'Rooms', 'Beds', 'Baths', 'Age', 'Acres', 'Taxes']
['142', '160', '28', '10', '5', '3', '60', '0.28', '3167']
['175', '180', '18', '8', '4', '1', '12', '0.43', '4033']
['129', '132', '13', '6', '3', '1', '41', '0.33', '1471']
['138', '140', '17', '7', '3', '1', '22', '0.46', '3204']
['232', '240', '25', '8', '4', '3', '5', '2.05', '3613']
['135', '140', '18', '7', '4', '3', '9', '0.57', '3028']
['150', '160', '20', '8', '4', '3', '18', '4.00', '3131']
['207', '225', '22', '8', '4', '2', '16', '2.22', '5158']
['271', '285', '30', '10', '5', '2', '30', '0.53', '5702']
['89', '90', '10', '5', '3', '1', '43', '0.30', '2054']
['153', '157', '22', '8', '3', '3', '18', '0.38', '4127']
['87', '90', '16', '7', '3', '1', '50', '0.65', '1445']
['234', '238', '25', '8', '4', '2', '2', '1.61', '2087']
['106', '116', '20', '8', '4', '1', '13', '0.22', '2818']
['175', '180', '22', '8', '4', '2', '15', '2.06', '3917']
['165', '170', '17', '8', '4', '2', '33', '0.46', '2220']

## CSV with Headers

In [11]:
import csv
with open('raw_data/homes.csv') as csvfile:
    reader = csv.DictReader(csvfile, skipinitialspace=True)
    for row in reader:
        # row is a `list`
        print(row['Sell'], row['Beds'])

142 5
175 4
129 3
138 3
232 4
135 4
150 4
207 4
271 5
89 3
153 3
87 3
234 4
106 4
175 4
165 4
166 4
136 3
148 3
151 4
180 4
293 4
167 4
190 5
184 5
157 4
110 4
135 4
567 4
180 4
183 3
185 3
152 4
148 3
152 3
146 3
170 3
127 4
265 6
157 4
128 4
110 4
123 4
212 5
145 4
129 3
143 4
247 4
111 3
133 3
 None


# API