In [None]:
pwd

____
## Reading CSV Files

In [None]:
import csv

When passing in the file path, make sure to include the extension if it has one, you should be able to Tab Autocomplete the file name. If you can't Tab autocomplete, that is a good indicator your file is not in the same location as your notebook. You can always type in the entire file path (it will look similar in formatting to the output of **pwd**.

In [None]:
data = open('example.csv')

In [None]:
data

### Encoding

Often csv files may contain characters that you can't interpret with standard python, this could be something like an **@** symbol, or even foreign characters. Let's view an example of this sort of error ([its pretty common, so its important to go over](https://stackoverflow.com/questions/9233027/unicodedecodeerror-charmap-codec-cant-decode-byte-x-in-position-y-character)).
Understand about Encodings and it's importance here:
- [Detect Encoding of CSV File in Python](https://www.geeksforgeeks.org/python/detect-encoding-of-csv-file-in-python/)
- [Stack Overflow - Fix Encoding Errors in CSV with Mixed Encodings](https://stackoverflow.com/questions/75747269/fix-encoding-errors-in-csv-with-mixed-encodings)
- [Microsoft Excel UTF-8 CSV Issues - User Forum](https://answers.microsoft.com/en-us/msoffice/forum/all/opening-utf-8-encoded-csv-in-excel-that-appears-ok/94e1a881-b184-449d-8a59-27997d5e97ef)




In [None]:
csv_data = csv.reader(data)

Cast to a list may give an error, note the **can't decode** line in the error, this is a giveaway that we have an encoding problem!

In [None]:
data_lines = list(csv_data)

Let's now try reading it with a "utf-8" encoding.

In [None]:
data = open('example.csv',encoding="utf-8")
csv_data = csv.reader(data)
data_lines = list(csv_data)

In [None]:
# Looks like it worked!
data_lines[:3]

Note the first item in the list is the header line, this contains the information about what each column represents. Let's format our printing just a bit:

In [None]:
for line in data_lines[:5]:
    print(line)

Let's imagine we wanted a list of  all the emails. For demonstration, since there are 1000 items plus the header, we will only do a few rows.

In [None]:
len(data_lines)

In [None]:
all_emails = []
for line in data_lines[1:15]:
    all_emails.append(line[3])

In [None]:
print(all_emails)

What if we wanted a list of full names?

In [None]:
full_names = []

for line in data_lines[1:15]:
    full_names.append(line[1]+' '+line[2])

In [None]:
full_names

## Writing to CSV Files

We can also write csv files, either new ones or add on to existing ones.

### New File 
**This will also overwrite any exisiting file with the same name, so be careful with this!**

In [None]:
# newline controls how universal newlines works (it only applies to text
# mode). It can be None, '', '\n', '\r', and '\r\n'. 
file_to_output = open('to_save_file.csv','w',newline='')

In [None]:
csv_writer = csv.writer(file_to_output,delimiter=',')

In [None]:
csv_writer.writerow(['a','b','c'])

In [None]:
csv_writer.writerows([['1','2','3'],['4','5','6']])

In [None]:
file_to_output.close()

____
### Existing File 

In [None]:
f = open('to_save_file.csv','a',newline='')

In [None]:
csv_writer = csv.writer(f)

In [None]:
csv_writer.writerow(['new','new','new'])

In [None]:
f.close()

That is all for the basics! If you believe you will be working with CSV files often, you may want to check out the powerful [pandas library](https://pandas.pydata.org/).

### Error handling


#### Check for file

In [None]:
import os 

filename = "example2.csv"

try:
    if not os.path.exists(filename):
        raise FileNotFoundError(f"File '{filename}' does not exist")
    
    with open(filename, 'r') as file:
        content = file.read()

except FileNotFoundError as e:
    print(f"Error: {e}")
except PermissionError:
    print(f"Error: Permission denied to read '{filename}'")
except Exception as e:
    print(f"Unexpected error: {e}")




#### Add error entry in csv

In [None]:
%%writefile example2.csv
id,first_name,last_name,email,gender,ip_address,city
1001,John,Smith,john.smith@email.com,Male,192.168.1.100,New York
1002,Jane,Doe,jane.doe@email.com,Female,10.0.0.50,Los Angeles
1003,Michael,Johnson,michael.johnson@email.com,Male,172.16.0.25,Chicago
1004,Sarah,Williams,sarah.williams@email.com,192.168.0.75,Houston
1005,David,Brown,david.brown@email.com,Male,10.1.1.200

In [None]:
import csv

try:
    with open('example2.csv', 'r') as file:
        reader = csv.reader(file)
        next(reader)  ## Skip the header row
        for row in reader:
            if len(row) < 7:
                print(f"Skipping row with missing data: {row}")
                continue
            id,first_name,last_name,email,gender,ip_address,city = row
            print(f"Name: {first_name}, Email: {email}, City: {city}")
except FileNotFoundError:
    print("Error: The CSV file could not be found.")
except csv.Error as e:
    print(f"Error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

#### Excercise
Add a logging file for above example2 file to keep log for all errors
- Ref: [Robust error handling in Python CSV](https://labex.io/tutorials/python-how-to-implement-robust-error-handling-in-python-csv-processing-398214)