# Working with CSV Files

Welcome back! Let's discuss how to work with CSV files in Python. A file with the CSV file extension is a Comma Separated Values file. All CSV files are plain text, contain alphanumeric characters, and structure the data contained within them in a tabular form. Don't confuse Excel Files with csv files, while csv files are formatted very similarly to excel files, they don't have data types for their values, they are all strings with no font or color. They also don't have worksheets the way an excel file does. Python does have several libraries for working with Excel files, you can check them out [here](http://www.python-excel.org/) and [here](https://www.xlwings.org/).

Files in the CSV format are generally used to exchange data, usually when there's a large amount, between different applications. Database programs, analytical software, and other applications that store massive amounts of information (like contacts and customer data), will usually support the CSV format.

Let's explore how we can open a csv file with Python's built-in csv library. 

In [1]:
pwd

'/Users/clcx/Documents/GitHub/My-Python-Learning/Study Stuffs/Udemy Study/Python Bootcamp/Working with PDF and CSV'

____
## Reading CSV Files

In [2]:
import csv

In [3]:
# ada 3 step

# Buka filenya
data = open('example.csv')
# csv.reader
csv_data = csv.reader(data)
# reformat jadi python object list of lists
data_lines = list(csv_data)

### Encoding

Often csv files may contain characters that you can't interpret with standard python, this could be something like an **@** symbol, or even foreign characters. Let's view an example of this sort of error ([its pretty common, so its important to go over](https://stackoverflow.com/questions/9233027/unicodedecodeerror-charmap-codec-cant-decode-byte-x-in-position-y-character)).

jadi dulu default encodingnya gabisa baca symbol '@', harus di ganti ke utf-8 dulu, kalo ga error, tapi skrg udah bisa. jadi kalo misal csv filenya ada symbole lain kayak misalkan hanzi mandarin, atau bahasa lain, pastiin cari tau encodingnya dulu, terus baru import csvnya pake encoding itu. Contoh:

In [None]:
data = open('example.csv',encoding="utf-8")
csv_data = csv.reader(data)
data_lines = list(csv_data)

In [5]:
# coba kita munculin first 3
data_lines[:3]

[['id', 'first_name', 'last_name', 'email', 'gender', 'ip_address', 'city'],
 ['1',
  'Joseph',
  'Zaniolini',
  'jzaniolini0@simplemachines.org',
  'Male',
  '163.168.68.132',
  'Pedro Leopoldo'],
 ['2',
  'Freida',
  'Drillingcourt',
  'fdrillingcourt1@umich.edu',
  'Female',
  '97.212.102.79',
  'Buri']]

In [6]:
data_lines[0] # YANG PERTAMA PASTI COLUMN NAME NYA

['id', 'first_name', 'last_name', 'email', 'gender', 'ip_address', 'city']

In [7]:
len(data_lines) # ada 1000 data + 1 column name

1001

In [8]:
for line in data_lines[:5]:
    print(line)

['id', 'first_name', 'last_name', 'email', 'gender', 'ip_address', 'city']
['1', 'Joseph', 'Zaniolini', 'jzaniolini0@simplemachines.org', 'Male', '163.168.68.132', 'Pedro Leopoldo']
['2', 'Freida', 'Drillingcourt', 'fdrillingcourt1@umich.edu', 'Female', '97.212.102.79', 'Buri']
['3', 'Nanni', 'Herity', 'nherity2@statcounter.com', 'Female', '145.151.178.98', 'Claver']
['4', 'Orazio', 'Frayling', 'ofrayling3@economist.com', 'Male', '25.199.143.143', 'Kungur']


In [9]:
# extract data no 10
data_lines[10]

['10',
 'Hyatt',
 'Gasquoine',
 'hgasquoine9@google.ru',
 'Male',
 '221.155.106.39',
 'Złoty Stok']

In [12]:
# extract email nya dari data 10
# kita liat email itu array ke berapa, 3
data_lines[10][3]

'hgasquoine9@google.ru'

In [13]:
# skrg coba kalo kita mau extract percolumn
# misal mau ambil email dari semua orang
all_emails = []
for line in data_lines[1:15]:
    all_emails.append(line[3])

In [14]:
all_emails

['jzaniolini0@simplemachines.org',
 'fdrillingcourt1@umich.edu',
 'nherity2@statcounter.com',
 'ofrayling3@economist.com',
 'jmurrison4@cbslocal.com',
 'lgamet5@list-manage.com',
 'dhowatt6@amazon.com',
 'kherion7@amazon.com',
 'chedworth8@china.com.cn',
 'hgasquoine9@google.ru',
 'ftarra@shareasale.com',
 'abathb@umn.edu',
 'lchastangc@goo.gl',
 'cceried@yale.edu']

In [15]:
# extract full name, berarti column 1+2
full_names = []
for line in data_lines[1:15]:
    full_names.append(line[1]+' '+line[2])

In [16]:
full_names

['Joseph Zaniolini',
 'Freida Drillingcourt',
 'Nanni Herity',
 'Orazio Frayling',
 'Julianne Murrison',
 'Lucy Gamet',
 'Dyana Howatt',
 'Kassey Herion',
 'Chrissy Hedworth',
 'Hyatt Gasquoine',
 'Felicdad Tarr',
 'Andrew Bath',
 'Lucais Chastang',
 'Car Cerie']

## Writing to CSV Files

We can also write csv files, either new ones or add on to existing ones.

### New File 
**This will also overwrite any exisiting file with the same name, so be careful with this!**

In [19]:
# ini kita buat file nya dulu
# kalo nama filenya sama, bisa ke overwrite

# newline controls how universal newlines works (it only applies to text
# mode). It can be None, '', '\n', '\r', and '\r\n'. 
file_to_output = open('to_save_file.csv','w',newline='')

In [20]:
# delimiter itu pemisah antar column
# bisa aja pake tab(\t)
csv_writer = csv.writer(file_to_output,delimiter=',')

In [21]:
csv_writer.writerow(['a','b','c'])

7

In [22]:
csv_writer.writerows([['1','2','3'],['4','5','6']])

In [23]:
file_to_output.close()

buka filenya biar ngerti

____
### Existing File 

skrg kita write ke existing file, caranya modenya di ganti ke 'a' artinya append. jangan 'w' nanti malah keoverwrite

In [24]:
f = open('to_save_file.csv','a',newline='')

In [25]:
csv_writer = csv.writer(f)

In [26]:
csv_writer.writerow(['new','new','new'])

13

In [27]:
f.close()